Decision Trees — Rapid Q&A Refresher (DSBA)

Date: December 21, 2025 · Author: P Baburaj Ambalam
← Back to index
Version 2.0 · Last updated: December 21, 2025

Technique Description

Decision Trees are non-parametric models for classification and regression that partition the feature space into axis-aligned regions via recursive binary splits. A non-parametric model does not assume a fixed functional form or fixed number of parameters for the relationship between inputs and outputs; instead, it adapts its structure entirely based on the training data. Decision Trees do not assume linearity, normality, or any specific distribution, making them flexible for modeling highly complex, nonlinear relationships with few assumptions about the data.

At each node, a split is chosen to maximize impurity reduction: for classification, common impurities are Gini G = 1 − Σ pk2 and entropy H = −Σ pk log2 pk; for regression, splits minimize within-node variance (MSE). Trees are powerful for handling heterogeneous feature types, capturing non-linear interactions, and providing interpretable rules. However, they can overfit easily without pruning or constraints and require more data to generalize well compared to parametically compact models. Training proceeds greedily; regularization includes limiting depth and post-pruning via cost-complexity (ccp_alpha).

Explain the Technique (Four Levels)

For a 10-year-old

A tree asks simple yes/no questions about your data until it reaches an answer; each question splits the possibilities.

For a beginner student

A decision tree makes rule-based splits on features; each path ends in a leaf with a prediction.

For an intermediate student

Greedy recursive partitioning maximizes impurity reduction; control overfitting with depth limits and cost-complexity pruning.

For an expert

Axis-aligned partitions optimized by information gain; `ccp_alpha` regularizes subtree growth, and ensembles mitigate variance.

When to Use This Technique

Ideal Use Cases

Avoid When

Related Techniques

Random Forests (reduces variance via bagging)
Boosting (reduces bias via sequential correction)
Feature Engineering (preprocessing for better splits)

Q&A

Python Example

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text
import matplotlib.pyplot as plt

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(max_depth=5, criterion='gini', random_state=42)
clf.fit(X_train, y_train)

Quiz (15)

  1. Which impurity measure is G = 1 − Σ pk2?
  2. What does entropy measure in a node?
  3. When might you prefer entropy over Gini?
  4. What is information gain?
  5. Which criterion is typical for regression trees?
  6. What does max_depth control?
  7. How does min_samples_leaf affect variance?
  8. What does ccp_alpha control?
  9. How should you handle class imbalance during splitting?
  10. How do trees usually handle categorical variables?
  11. Why use stratified cross-validation?
  12. Which visual tool prints text rules for a tree?
  13. What does max_features do in a tree splitter?
  14. Why are trees prone to overfitting?
  15. How does a Random Forest differ from a single tree?

Practical Checklist

Common Implementation Errors (10)

Quiz Answers

  1. Gini.
  2. Node class uncertainty/impurity.
  3. When you want a more information-theoretic measure; often similar to Gini.
  4. Parent impurity minus weighted child impurities.
  5. Mean squared error (variance minimization).
  6. Maximum path length; deeper allows more complex fits.
  7. Larger leaves reduce variance by requiring more samples per leaf.
  8. Cost-complexity pruning strength.
  9. Use stratified splits and optionally class_weight='balanced'.
  10. One-hot encode then split on encoded columns.
  11. To preserve class ratios across folds.
  12. export_text.
  13. Limits number of features considered at each split.
  14. They fit sharp boundaries and can memorize noise when deep.
  15. A forest averages many randomized trees to reduce variance.

References

  1. scikit-learn User Guide — Decision Trees
  2. scikit-learn API — DecisionTreeClassifier
  3. Breiman, Friedman, Olshen, Stone — Classification and Regression Trees (1984)