第47卷第2期电网技术Vol.47No.22023年2月PowerSystemTechnologyFeb.2023文章编号:1000-3673(2023)02-0562-09中图分类号:TM721文献标志码:A学科代码:470·40基于深度强化学习近端策略优化的电网无功优化方法张沛,朱驻军,谢桦(北京交通大学电气工程学院,北京市海淀区100044)ReactivePowerOptimizationBasedonProximalPolicyOptimizationofDeepReinforcementLearningZAHNGPei,ZHUZhujun,XIEHua(SchoolofElectricalEngineering,BeijingJiaotongUniversity,HaidianDistrict,Beijing100044,China)ABSTRACT:Thefluctuationsofrenewableenergiesandloadsposeagreatchallengetoreactivepoweroptimization.Consideringthetime-varyingcharacteristicsofnewenergiesandloads,thereactivepoweroptimizationproblemisconstructedasareinforcementlearningproblem.Themethodofconstraint-targetdivisionandtargetpresuppositionisproposedtodesignarewardfunction,andtheproximalpolicyoptimizationalgorithmisusedtosolvethereinforcementlearningproblem,gettingthereactivepoweroptimizationpolicy.AcasestudyiscarriedoutwiththemodifiedIEEE39system,andtheresultsshowthattheproposedrewardfunctioncanimprovetheconvergencespeedoftheagent.Thereactivepoweroptimizationstrategybasedonreinforcementlearningissupiriortothetraditionaldeterministicoptimizationalgorithmindecision-makingeffectsanddecision-makingtime.KEYWORDS:reactivepoweroptimization;newpowersystem;deepreinforcementlearning;proximalpolicyoptimization;datadriving摘要:新能源和负荷波动给无功优化带来更大的挑战。考虑新能源和负荷时变特性,将无功优化问题构建成强化学习问题。提出了约束–目标划分和目标预设的方法设计奖励函数,并采用近端策略优化算法求解强化学习问题,获得无功优化策略。以改进的IEEE39系统开展案例分析,结果表明所提的奖励函数能提高智能体收敛速度,基于强化学习求解的无功优化策略在决策效果和决策时间上优于传统确定性优化算法。关键词:无功优化;新型电力系统;深度强化学习;近端策略优化;数据驱动基金项目:南方电网公司科技项目(面向系统平衡和电网安全的源网荷储协同调控技术研究,科技编码YNKJXM20222463)。ProjectSupportedbyScienceandTechnologyProjectofChinaSouthernPowerGrid(YNKJXM20222463).DOI:10.13335/j.1000...