Heuristic Function Negotiation for Markov Decision Process and Its Application in UAV Simulation

Fengfei ZHAO  Zheng QIN  Zhuo SHAO  

IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.1   pp.89-97
Publication Date: 2014/01/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E97.D.89
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Artificial Intelligence, Data Mining
Markov decision processes,  heuristic function,  reinforcement learning,  UAV,  

Full Text: PDF(2.5MB)>>
Buy this Article

The traditional reinforcement learning (RL) methods can solve Markov Decision Processes (MDPs) online, but these learning methods cannot effectively use a priori knowledge to guide the learning process. The exploration of the optimal policy is time-consuming and does not employ the information about specific issues. To tackle the problem, this paper proposes heuristic function negotiation (HFN) as an online learning framework. The HFN framework extends MDPs and introduces heuristic functions. HFN changes the state-action dual layer structure of traditional RL to the triple layer structure, in which multiple heuristic functions can be set to meet the needs required to solve the problem. The HFN framework can use different algorithms to let the functions negotiate to determine the appropriate action, and adjust the impact of each function according to the rewards. The HFN framework introduces domain knowledge by setting heuristic functions and thus speeds up the problem solving of MDPs. Furthermore, user preferences can be reflected in the learning process, which improves the flexibility of RL. The experiments show that, by setting reasonable heuristic functions, the learning results of the HFN framework are more efficient than traditional RL. We also apply HFN to the air combat simulation of unmanned aerial vehicles (UAVs), which shows that different function settings lead to different combat behaviors.