EconPapers    
Economics at your fingertips  
 

A Sarsa( λ ) Algorithm Based on Double-Layer Fuzzy Reasoning

Quan Liu, Xiang Mu, Wei Huang, Qiming Fu and Yonggang Zhang

Mathematical Problems in Engineering, 2013, vol. 2013, 1-9

Abstract:

Solving reinforcement learning problems in continuous space with function approximation is currently a research hotspot of machine learning. When dealing with the continuous space problems, the classic Q -iteration algorithms based on lookup table or function approximation converge slowly and are difficult to derive a continuous policy. To overcome the above weaknesses, we propose an algorithm named DFR-Sarsa( λ ) based on double-layer fuzzy reasoning and prove its convergence. In this algorithm, the first reasoning layer uses fuzzy sets of state to compute continuous actions; the second reasoning layer uses fuzzy sets of action to compute the components of Q -value. Then, these two fuzzy layers are combined to compute the Q -value function of continuous action space. Besides, this algorithm utilizes the membership degrees of activation rules in the two fuzzy reasoning layers to update the eligibility traces. Applying DFR-Sarsa( λ ) to the Mountain Car and Cart-pole Balancing problems, experimental results show that the algorithm not only can be used to get a continuous action policy, but also has a better convergence performance.

Date: 2013
References: Add references at CitEc
Citations:

Downloads: (external link)
http://downloads.hindawi.com/journals/MPE/2013/561026.pdf (application/pdf)
http://downloads.hindawi.com/journals/MPE/2013/561026.xml (text/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:561026

DOI: 10.1155/2013/561026

Access Statistics for this article

More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().

 
Page updated 2025-03-19
Handle: RePEc:hin:jnlmpe:561026