Decision Trees are non-parametric models for classification and regression that partition the feature space into axis-aligned regions via recursive binary splits. A non-parametric model does not assume a fixed functional form or fixed number of parameters for the relationship between inputs and outputs; instead, it adapts its structure entirely based on the training data. Decision Trees do not assume linearity, normality, or any specific distribution, making them flexible for modeling highly complex, nonlinear relationships with few assumptions about the data.
At each node, a split is chosen to maximize impurity reduction: for classification, common impurities are Gini G = 1 − Σ pk2 and entropy H = −Σ pk log2 pk; for regression, splits minimize within-node variance (MSE). Trees are powerful for handling heterogeneous feature types, capturing non-linear interactions, and providing interpretable rules. However, they can overfit easily without pruning or constraints and require more data to generalize well compared to parametically compact models. Training proceeds greedily; regularization includes limiting depth and post-pruning via cost-complexity (ccp_alpha).
Explain the Technique (Four Levels)
For a 10-year-old
A tree asks simple yes/no questions about your data until it reaches an answer; each question splits the possibilities.
For a beginner student
A decision tree makes rule-based splits on features; each path ends in a leaf with a prediction.
For an intermediate student
Greedy recursive partitioning maximizes impurity reduction; control overfitting with depth limits and cost-complexity pruning.
For an expert
Axis-aligned partitions optimized by information gain; `ccp_alpha` regularizes subtree growth, and ensembles mitigate variance.
When to Use This Technique
Ideal Use Cases
Tabular data with mixed feature types (numeric, categorical)
Need for model interpretability and explainable rules
Non-linear relationships without manual feature engineering
Quick baseline models and exploratory analysis
Small to medium datasets where stability is less critical
What is a Decision Tree?A tree-structured model that makes sequential splits on features to predict labels or values.
What does "non-parametric model" mean?A model that does not assume a fixed functional form or fixed number of parameters; it adapts its structure based entirely on training data without assuming linearity, normality, or specific distributions.
When are Decision Trees most useful?On tabular data with mixed types, non-linear relationships, and a need for interpretability.
What is a node, split, and leaf?A node holds data; a split partitions data by feature/threshold; a leaf is a terminal node with a prediction.
Define impurity in classification.A measure of class mix at a node; lower impurity indicates purer class composition.
What is Gini impurity?G = 1 − Σ pk2, where pk is the class proportion at the node.
What is entropy?H = −Σ pk log2 pk; higher entropy means more class uncertainty.
What is information gain?Parent impurity minus the weighted child impurities achieved by a candidate split.
What criterion is used for regression trees?Minimizing mean squared error (MSE) or variance within nodes.
How do trees handle numeric vs categorical features?Numeric: threshold splits; categorical: typically one-hot encode and split on encoded columns.
How to deal with missing values?Impute with SimpleImputer; optionally add missing indicators.
Key hyperparameters to control overfitting?max_depth, min_samples_split, min_samples_leaf, max_features, ccp_alpha.
What does ccp_alpha do?Controls cost-complexity pruning; larger values remove more weak subtrees to reduce overfitting.
What is pre-pruning vs post-pruning?Pre-pruning limits growth via hyperparameters; post-pruning removes branches after full growth using a penalty.
What is the bias–variance trade-off in trees?Deeper trees reduce bias but increase variance; pruning mitigates variance.
Which metrics are suited for classification?Accuracy, precision, recall, F1, ROC-AUC.
Which metrics are suited for regression?RMSE, MAE, and R2.
How to evaluate reliably?Use stratified cross-validation for classification; KFold or TimeSeriesSplit when appropriate.
How to visualize a tree?Use plot_tree or export_text.
How to interpret feature importance?Use impurity-based or permutation importance.
Strategy for class imbalance?Use stratified splits, class_weight='balanced', and metrics beyond accuracy.