Q-LEARNING ALGORITHM

Overview

Q-Learning is a model-free reinforcement learning algorithm that learns the quality of actions, telling an agent what action to take under what circumstances. It doesn't require a model of the environment and can work with stochastic transitions.

Key Concepts

›Q-Table stores state-action values
›Exploration vs Exploitation
›Temporal Difference Learning
›Epsilon-Greedy Strategy

Q-Learning Update

Q(s,a) = Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]

where:
s  = current state
a  = action taken
s' = next state
r  = reward received
α  = learning rate (0 < α ≤ 1)
γ  = discount factor (0 ≤ γ ≤ 1)

# Epsilon-greedy action selection:
if random() < ε:
    action = random_action()
else:
    action = argmax(Q[state])

Applications

Game Playing

Robot Navigation

Trading Strategies

Resource Allocation