Date of Award

10-1-2021

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Machine learning, Q-learning, Reinforcement learning, SARSA

Supervisor

J. Chen

Supervisor

D. Wu

Rights

info:eu-repo/semantics/openAccess

Abstract

In order to perform a large variety of tasks and achieve human-level performance in complex real-world environments, an intelligent agent must be able to learn from its dynamically changing environment. Generally speaking, agents have limitations in obtaining an accurate description of the environment from what they perceive because they may not have all the information about the environment. The present research is focused on reinforcement learning algorithms that represent a defined category in the field of machine learning because of their unique approach based on a trial-error basis. Reinforcement learning is used to solve control problems based on received rewards. The core of its learning task is defined by a reward function where an unsuitable choice of action results in more negative rewards. The reinforcement learning framework comprises of the notion of cumulative rewards over time, to enable an agent to select actions that promote long-term results. Q-learning and SARSA are two popular methods along this approach. These two methods are similar except that Q-learning follows an off-policy strategy while SARSA is an on-policy algorithm. In this thesis, we present the comparison of Q-learning and SARSA algorithms for the global path planning of an agent in a grid-world game environment in order to verify the efficiency in different scenarios. In this thesis, simulation was performed in the grid-world environment comprising of static obstacles with a density of 30%. The results demonstrate that both approaches reach the optimal policy with a complete success rate of the learning episodes in the test cases. The comparison shows that the Q-learning algorithm outperforms the SARSA algorithm by 34% in terms of the computation time as both approaches tend closer to negative rewards while arriving at the optimal path. However, with 12% higher convergence ratio, the SARSA approach better avoids large penalties from exploratory moves which in turn proposes a safer route as the optimal path.

Share

COinS