Date of Award

10-1-2021

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Machine learning, Q-learning, Reinforcement learning, SARSA

Supervisor

J. Chen

Supervisor

D. Wu

Rights

info:eu-repo/semantics/openAccess

Abstract

In order to perform a large variety of tasks and achieve human-level performance in complex real-world environments, an intelligent agent must be able to learn from its dynamically changing environment. Generally speaking, agents have limitations in obtaining an accurate description of the environment from what they perceive because they may not have all the information about the environment. The present research is focused on reinforcement learning algorithms that represent a defined category in the field of machine learning because of their unique approach based on a trial-error basis. Reinforcement learning is used to solve control problems based on received rewards. The core of its learning task is defined by a reward function where an unsuitable choice of action results in more negative rewards. The reinforcement learning framework comprises of the notion of cumulative rewards over time, to enable an agent to select actions that promote long-term results. Q-learning and SARSA are two popular methods along this approach. These two methods are similar except that Q-learning follows an off-policy strategy while SARSA is an on-policy algorithm. In this thesis, we present the comparison of Q-learning and SARSA algorithms for the global path planning of an agent in a grid-world game environment in order to verify the efficiency in different scenarios. In this thesis, simulation was performed in the grid-world environment comprising of static obstacles with a density of 30%. The results demonstrate that both approaches reach the optimal policy with a complete success rate of the learning episodes in the test cases. The comparison shows that the Q-learning algorithm outperforms the SARSA algorithm by 34% in terms of the computation time as both approaches tend closer to negative rewards while arriving at the optimal path. However, with 12% higher convergence ratio, the SARSA approach better avoids large penalties from exploratory moves which in turn proposes a safer route as the optimal path.

Recommended Citation

Obawole, Daniel, "Comparative Study of Reinforcement Learning Methods in Path Planning" (2021). Electronic Theses and Dissertations. 8757.
https://scholar.uwindsor.ca/etd/8757

Download

Included in

Computer Sciences Commons

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Comparative Study of Reinforcement Learning Methods in Path Planning

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Comparative Study of Reinforcement Learning Methods in Path Planning

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner