杨植麟RecurrentAILatestAdvancesofNeuralLanguageModelsPartI:AboutMe•RecurrentAI联合创始人,曾效力于谷歌大脑研究院和Facebook人工智能研究院•与多位图灵奖得主合作发表论文•曾在自然语言理解、半监督学习等30多个数据集上取得历史最好结果(state-of-the-art)•2015年本科毕业于清华大学,2019年博士毕业于卡内基梅隆大学,师从苹果AI负责人RuslanSalakhutdinov自我介绍•XLNet(NeurIPS2019)•在20个数据集好于GoogleBERT•是目前世界上最好的预训练模型(给定标准FLOPs)•NeurIPSOral(0.5%)•被数十家AI媒体报道•Transformer-XL(ACL2019)•刷新所有主流自然语言建模世界纪录•历史上第一个同时在word-level和char-level超越LSTM的注意力模型•可以连贯生成几千个词的文本•HotpotQA(EMNLP2018)•多步推理数据集•被斯坦福、华盛顿大学、UTAustin、清华大学、字节跳动、京东、微软等机构用于模型评测•Semi-supervisedgraphlearning(ICML2016)•400+引用•推广了图学习领域的标准数据集•被数百项工作采纳为标准baseline主要研究成果LatestadvancesofneurallanguagemodelsPartII:XLNetLearningfromUnlabeledDataUnlabeleddataAbundant(1000xmore),accessibleLabeleddataScarce,expensiveUnsupervisedPretrainingUnlabeleddataLabeleddataAlgorithms/ModelsImproveoversupervisedlearningUnsupervisedPretraining•RBMs(Salakhutdinovetal2007),Autoencoders(Vincentetal2008),Jigsaw(NorooziandFavaro2016),GANs(DonahueandSimonyan2019)…•word2vec(Mikolovetal2013),GloVe(Penningtonetal2014)•Semi-supervisedsequencelearning(DaiandLe2015),ELMo(Petersetal2017),CoVe(McCannetal2017),GPT(Radfordetal2018),BERT(Devlinetal2018)RelatedWorkTwoObjectivesforPretrainingAuto-regressive(AR)languagemodelingUnidirectionalTransformerNew(Denoising)Auto-encoding(AE)YorkisacityYorkisacityBidirectionalTransformer[mask][mask]isacityNewYorkNotabletomodelbidirectionalcontext.Predictedtokensareindependentofeachother.[mask]isnotusedduringfinetuning.•Sampleafactorizationorder•Determinetheattentionmasksbasedontheorder•Optimizeastandardlanguagemodelingobjective•Benefits:•Autoregressive,avoidingdisadvantagesofAE•AbletomodelbidirectionalcontextNewObjective:PermutationLanguageModeli...