Q-LEARNING ALGORITHM
Overview
Q-Learning is a model-free reinforcement learning algorithm that learns the quality of actions, telling an agent what action to take under what circumstances. It doesn't require a model of the environment and can work with stochastic transitions.
Key Concepts
- ›Q-Table stores state-action values
- ›Exploration vs Exploitation
- ›Temporal Difference Learning
- ›Epsilon-Greedy Strategy
Q-Learning Update
Q(s,a) = Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]
where:
s = current state
a = action taken
s' = next state
r = reward received
α = learning rate (0 < α ≤ 1)
γ = discount factor (0 ≤ γ ≤ 1)
# Epsilon-greedy action selection:
if random() < ε:
action = random_action()
else:
action = argmax(Q[state])Applications
Game Playing
Robot Navigation
Trading Strategies
Resource Allocation