A grid world exists in which a taxi cab has to deliver passenger from one part of the city to another. The Q-Learning (Reinforcement Learning) algorithm was implemented to make the taxi agent learn a good strategy for dealing with traffic conditions on its own through experimentation and feedback from the environment in the form of punishments and rewards. In order to minimize punishments the agent had to learn the US traffic rules and how to avoid collisions with other cars at intersections. In order to maximize its rewards, the agent had to learn a good strategy for reaching its destination within a designated amount of time.
The agent learned to follow the traffic rules without violations, and learned to reach its destination within set time limit at least 90% of the time.
Note: This was submitted as a project for my Machine Learning Nanodegree at Udacity.
The full writeup and source code can be accessed using the following links.