DECISION TREES

Overview

Decision Trees are supervised learning algorithms used for both classification and regression. They create a model that predicts target values by learning simple decision rules inferred from data features in a tree-like structure.

Tree Components

  • Root Node - Top decision point
  • Internal Nodes - Decision points
  • Leaf Nodes - Final predictions
  • Branches - Outcome paths

Information Gain

Information Gain = H(S) - Σ(|Sv|/|S| * H(Sv))

Entropy: H(S) = -Σ(p_i * log2(p_i))

Gini Impurity: 
Gini(S) = 1 - Σ(p_i)²

# Algorithm:
1. Calculate impurity of dataset
2. For each feature, calculate 
   weighted impurity after split
3. Choose feature with highest 
   information gain
4. Recursively build subtrees

Advantages

Easy to interpret
No feature scaling needed
Handles missing values
Feature selection automatic