第22卷第2期2023年2月Vol.22No.2Feb.2023软件导刊SoftwareGuide基于Schema增强的中文实体关系抽取方法饶东宁,李冉(广东工业大学计算机学院,广东广州510000)摘要:针对中文实体关系抽取任务中的实体边界切分错误和实体关系重叠,以及不同数据集的关系种类不能很好地迁移的问题,提出一种基于Schema增强的实体关系抽取方法。首先,采用字词混合嵌入的方式融合字与词的语义信息,避免中文分词时边界切分出错所造成的歧义问题;其次,利用指针标注的方式解决关系重叠问题;最后,提取出每个数据集的Schema进行合并作为先验特征传入模型中,以解决实体冗余及关系种类迁移问题。在三大中文实体关系抽取数据集DuIE、FinRE、SanWen上进行实验,相较于先前的模型,该方法分别取得10%、18%、11%的F1提升,且表现出更高的稳定性。关键词:命名实体识别;关系抽取;Schema增强;字词混合嵌入;指针标注DOI:10.11907/rjdk.221225开放科学(资源服务)标识码(OSID):中图分类号:TP181文献标识码:A文章编号:1672-7800(2023)002-0047-06ChineseEntityRelationExtractionMethodBasedonSchemaEnhancementRAODong-ning,LIRan(ComputerCollege,GuangdongUniversityofTechnology,Guangzhou510000,China)Abstract:Aimingattheproblemsofincorrectentityboundarysegmentation,overlappingentityrelations,andtheinabilitytotransferthere⁃lationtypesofdifferentdatasets,anentityrelationextractionmethodbasedonSchemaenhancementisproposed.Firstofall,thesemanticin⁃formationofcharactersandwordsismergedbywordmixingandembeddingtoavoidtheambiguityproblemcausedbytheerrorofboundarysegmentationinChinesewordsegmentation.Secondly,theproblemofrelationshipoverlapissolvedbyusingpointerannotation.Finally,theSchemaofeachdatasetisextractedandmergedintothemodelasapriorifeaturestosolvetheproblemofentityredundancyandrelationshiptypemigration.ExperimentswerecarriedoutonthethreeprimaryChineseentityrelationextractiondatasetsDuIE,FinRE,andSanWen.Com⁃paredwiththepreviousmodel,theyachievedF1improvementsof10%,18%,and11%,respectively,andshowedhigherstability.KeyWords:namedentityrecognition;relationextraction;Schemaenhancement;character-wordembedding;pointerlabeling0引言实体关系抽取...