·雷达智能信号处理专题·DOI:10.16592/j.cnki.1004-7859.2022.12.004基于深度强化学习的雷达智能决策生成算法赵家琛*,张劲东,李梓瑜(南京航空航天大学电子信息工程学院,南京211100)摘要:针对雷达系统面临的干扰场景复杂多变、人工设计抗干扰策略性能难以保证以及实时性不高的问题,构建了基于深度强化学习的智能决策生成模型,设计了有针对性的动作集、状态集和奖励函数。同时提出了基于双深度Q网络(DDQN)的决策网络训练算法,用于克服深度Q网络(DQN)算法中目标网络与评估网络相耦合导致Q值的过估计。仿真结果表明:与DQN、Q学习、人工制定策略与遍历策略库等方法相比,文中所设计的智能决策模型和训练方法对干扰的抑制效果好,泛化能力更强,反应时间更快,有效地提升了雷达自主决策能力。关键词:雷达智能决策;深度强化学习;深度Q网络;双深度Q网络中图分类号:TN972文献标志码:A文章编号:1004-7859(2022)12-0025-09引用格式:赵家琛,张劲东,李梓瑜.基于深度强化学习的雷达智能决策生成算法[J].现代雷达,2022,44(12):25-33.ZHAOJiachen,ZHANGJindong,LIZiyu.Radarintelligentdecisiongenerationalgorithmbasedondeepreinforcementlearning[J].ModernRadar,2022,44(12):25-33.RadarIntelligentDecisionGenerationAlgorithmBasedonDeepReinforcementLearningZHAOJiachen*,ZHANGJindong,LIZiyu(SchoolofElectronicandInformationEngineering,NanjingUniversityofAeronauticsandAstronautics,Nanjing211100,China)Abstract:Inordertosolvetheproblemsfacedbyradarsystemsuchascomplexjammingscenes,lowreliabilityandbadreal-timeperformance,anintelligentdecisiongenerationmodelisconstructedbasedonDeepReinforcementLearning,wheretargetedactionset,statesetandrewardfunctionaredesigned.Afterthat,adecisionnetworktrainingalgorithmbasedondoubledeepQ-networkisproposedtoovercometheproblemofQvalueoverestimationwhichcausedbythecouplingoftargetnetworkandevaluationnetworkinDeepQ-network(DQN).Thesimulationresultsshowthat,comparedwithDQN,Qlearningandtraversalalgorithm,theintelligentdecisionmodelandtrainingmethoddesignedinthispaperhavebetterinterferencesuppressioneffect,strongergeneralizationabilityandfasterresponsetime,andeffectivelyimprovetheradarindep...