贪心科技|让每个人享受个性化教育服务Review:PaperReadingGloVe:GlobalVectorsforWordRepresentation范老师2020/05/16贪心科技|让每个人享受个性化教育服务Content•Background•RelatedWork•ProblemDefinition•SystemDesign•Data•Experiment•Conclusion•Reference贪心科技|让每个人享受个性化教育服务Background•InformationRetrieval•DocumentClassification•QuestionAnswering•NamedEntityRecognition•Parsing贪心科技|让每个人享受个性化教育服务RelatedWork•Globalmatrixfactorizationmethods(LSA):notgoodatanalogy•Localcontextwindowmethods(word2vec):notconsiderstatisticalinformation贪心科技|让每个人享受个性化教育服务ProblemDefinitionLearnwordembeddingGloballog-bilinearregression+weightedleastsquaresmodel贪心科技|让每个人享受个性化教育服务System贪心科技|让每个人享受个性化教育服务System如何选择F?1.Dovectordifference2.Dodotproduct3.MakeFsymmetry4.Useexponential贪心科技|让每个人享受个性化教育服务System5.Addbias6.Costfunction7.f(x)贪心科技|让每个人享受个性化教育服务System–RelationshiptoOtherModelsAttempttomaximizethelogprobabilityasacontextwindowscansoverthecorpus近似表达以方便求解Pij带入变换避免交叉熵带来的弊端,变更为最小二乘降低大数取值将Xi用一个泛化函数代替贪心科技|让每个人享受个性化教育服务System–ComplexityoftheModel•耗时主要由X中的非零数值决定的,故而需要找一个tighterbound•假设:贪心科技|让每个人享受个性化教育服务Data•Corpus:Wikipedia,CommonCrawl•Preprocessing:tokenize,lowercaseandconstructX•X:useadecreasingweightingfunction,sothatwordpairsthataredwordsapartcontribute1/dtothetotalcount.•AdaGrad,learningrate=0.05,andetc.•TruncateXfortrainingMultipleSVD贪心科技|让每个人享受个性化教育服务Experiment–WordSimilarityandNER1.ComputesimilaritieswithGloVe,SVDandetc.2.Computerankbasedonsimilarityresults3.Computespearmanrankcorrelationonsimilarityresultswithhuman贪心科技|让每个人享受个性化教育服务Analysis–VectorLengthandContextSize1.(a)shows200maybeenoughforhighaccuracy2.Smallandasymmetricwindowisgoodforsyntacticsubtask3.Largewindowsizeisgoodforsemanticsubtask贪心科技|让每个人享受个...