Q-LEARNING ALGORITHM

Overview

Q-Learning is a model-free reinforcement learning algorithm that learns the quality of actions, telling an agent what action to take under what circumstances. It doesn't require a model of the environment and can work with stochastic transitions.

Key Concepts

  • Q-Table stores state-action values
  • Exploration vs Exploitation
  • Temporal Difference Learning
  • Epsilon-Greedy Strategy

Q-Learning Update

Q(s,a) = Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]

where:
s  = current state
a  = action taken
s' = next state
r  = reward received
α  = learning rate (0 < α ≤ 1)
γ  = discount factor (0 ≤ γ ≤ 1)

# Epsilon-greedy action selection:
if random() < ε:
    action = random_action()
else:
    action = argmax(Q[state])

Applications

Game Playing
Robot Navigation
Trading Strategies
Resource Allocation