Parametric models assume a fixed functional form with a predetermined number of parameters that summarize the data, regardless of sample size. Common examples include linear regression (y = β₀ + β₁x₁ + ... + βₚxₚ), logistic regression, and naive Bayes. These models make strong assumptions about data distributions (e.g., linearity, normality) and are mathematically compact, requiring fewer data points to estimate parameters reliably.
Non-parametric models do not assume a fixed functional form or fixed number of parameters; their complexity grows with the data. Examples include decision trees, k-nearest neighbors (KNN), kernel density estimation, and support vector machines with certain kernels. They adapt entirely to the training data structure, making fewer distributional assumptions and better capturing complex, nonlinear relationships. However, they can overfit easily without regularization and typically require more data to generalize well.
Explain the Technique (Four Levels)
For a 10-year-old
Parametric is like drawing a straight line through dots; non-parametric lets the line wiggle to follow every dot.
For a beginner student
Parametric models use simple equations with fixed parameters; non-parametric models adapt their shape based on the data.
For an intermediate student
Parametric models commit to a functional form (e.g., linear), estimating fixed coefficients; non-parametric models learn flexible structures that grow in complexity with data size.
For an expert
Parametric models impose strong inductive bias via predefined functional families (finite parameter space); non-parametric models inhabit infinite-dimensional function spaces, with model capacity scaling with sample size.
When to Use Each Approach
Parametric Models — Ideal Use Cases
Data follows known distributions (e.g., linear relationships, Gaussian errors)
Small to medium datasets where sample efficiency matters
Need for interpretability and coefficient-based insights
Nonlinear patterns without manual feature engineering
Robustness to distributional assumptions is needed
Flexibility is more important than interpretability
Within-sample prediction is the primary goal
Avoid Parametric When
True relationship is highly nonlinear and feature engineering is infeasible
Assumptions (linearity, normality, homoscedasticity) are violated
Avoid Non-Parametric When
Sample size is small relative to feature dimensionality (curse of dimensionality)
Interpretability and coefficient estimates are essential
Extrapolation is required (non-parametric models don't extrapolate well)
Related Techniques
→ Regularization (Ridge, Lasso): adds constraints to parametric models
→ Ensemble methods (Random Forests, Boosting): improves non-parametric stability
→ Feature Engineering: bridges gap by making parametric models more expressive
Comparison Table
Aspect
Parametric
Non-Parametric
Functional Form
Fixed (e.g., linear)
Flexible, data-dependent
Parameters
Fixed number
Grows with data
Assumptions
Strong (linearity, normality)
Weaker (smoothness, locality)
Sample Efficiency
High (good with small data)
Lower (needs more data)
Interpretability
High (clear coefficients)
Lower (black box)
Flexibility
Limited by form
High, adapts to complexity
Extrapolation
Possible (with caveats)
Poor, stays in training range
Bias-Variance
Higher bias, lower variance
Lower bias, higher variance
Examples
Linear/logistic regression, LDA
Trees, KNN, kernel methods
Q&A
What is a parametric model?A model that assumes a fixed functional form with a predetermined number of parameters, regardless of data size (e.g., linear regression, logistic regression).
What is a non-parametric model?A model that does not assume a fixed functional form; its complexity grows with the data, adapting structure based entirely on training examples (e.g., decision trees, KNN).
What are key differences between parametric and non-parametric models?Parametric: fixed form, finite parameters, strong assumptions, sample efficient, interpretable. Non-parametric: flexible form, parameters grow with data, fewer assumptions, needs more data, less interpretable.
Give examples of parametric models.Linear regression, logistic regression, linear discriminant analysis (LDA), naive Bayes, polynomial regression with fixed degree.
Give examples of non-parametric models.Decision trees, random forests, k-nearest neighbors (KNN), kernel density estimation, Gaussian processes, support vector machines (with RBF kernel).
When should I choose a parametric model?When data follows known distributions, sample size is small, interpretability matters, or you need extrapolation and statistical inference.
When should I choose a non-parametric model?When relationships are complex/nonlinear, you have large datasets, distributional assumptions are uncertain, or flexibility is prioritized over interpretability.
What does "non-parametric" really mean?It doesn't mean zero parameters; it means the model structure and effective number of parameters are not fixed in advance and grow with data.
How do parametric models handle the bias-variance tradeoff?Parametric models have higher bias (strong assumptions limit flexibility) but lower variance (fewer parameters reduce sensitivity to data fluctuations).
How do non-parametric models handle the bias-variance tradeoff?Non-parametric models have lower bias (high flexibility) but higher variance (can overfit to training data noise without regularization).
Can parametric models be made more flexible?Yes, via polynomial features, interaction terms, basis expansions, or regularization (Ridge/Lasso to control complexity).
Can non-parametric models be regularized?Yes, via hyperparameters like max_depth (trees), n_neighbors (KNN), or bandwidth (kernel methods) to control model complexity.
Do non-parametric models make any assumptions?They make fewer and weaker assumptions (e.g., smoothness, local similarity) compared to parametric models' strong distributional assumptions.
Which type is better for small datasets?Parametric models are generally better; they're sample efficient and less prone to overfitting with limited data.
Which type is better for large, complex datasets?Non-parametric models excel with large data, capturing complex patterns without restrictive assumptions.
Can parametric models extrapolate?Yes, parametric models can extrapolate based on their functional form (though accuracy depends on whether the form holds outside training range).
Can non-parametric models extrapolate?No, non-parametric models typically don't extrapolate well; they predict based on training data neighborhoods and may give poor results outside training range.
How does interpretability differ?Parametric models offer clear coefficient interpretations (e.g., β₁ = effect of x₁); non-parametric models are often "black boxes" requiring post-hoc methods (SHAP, feature importance).
What is the curse of dimensionality?In high dimensions, non-parametric models suffer because data becomes sparse; distances lose meaning, requiring exponentially more data to maintain density.
Are ensemble methods parametric or non-parametric?Random forests and boosting are non-parametric (they aggregate flexible models); ensemble of linear models remains parametric.
What about neural networks?Deep neural networks are technically parametric (fixed architecture, finite weights), but behave non-parametrically in practice due to extreme flexibility and overparameterization.
How do you test distributional assumptions for parametric models?Use residual plots, normality tests (Shapiro-Wilk, Q-Q plots), homoscedasticity tests (Breusch-Pagan), and linearity checks (partial residual plots).
What is model capacity?The range of functions a model can represent; parametric models have limited capacity (fixed form), non-parametric models have higher capacity (grows with data).
Can you combine parametric and non-parametric approaches?Yes, via semi-parametric models (e.g., generalized additive models), or using parametric preprocessing (feature engineering) with non-parametric estimators.
Common pitfall when choosing between them?Using parametric models when assumptions are violated, or using non-parametric models with insufficient data (leads to overfitting and poor generalization).
Python Example
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression # Parametric
from sklearn.tree import DecisionTreeRegressor # Non-parametric
from sklearn.neighbors import KNeighborsRegressor # Non-parametric
import numpy as np
# Generate nonlinear data
X, y = make_regression(n_samples=300, n_features=5, noise=10, random_state=42)
y += 0.5 * X[:, 0]**2 # Add nonlinearity
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Parametric model (Linear Regression)
lr = LinearRegression()
lr.fit(X_train, y_train)
lr_score = lr.score(X_test, y_test)
print(f'Linear Regression (Parametric) R²: {lr_score:.3f}')
# Non-parametric model (Decision Tree)
dt = DecisionTreeRegressor(max_depth=5, random_state=42)
dt.fit(X_train, y_train)
dt_score = dt.score(X_test, y_test)
print(f'Decision Tree (Non-parametric) R²: {dt_score:.3f}')
# Non-parametric model (KNN)
knn = KNeighborsRegressor(n_neighbors=10)
knn.fit(X_train, y_train)
knn_score = knn.score(X_test, y_test)
print(f'KNN (Non-parametric) R²: {knn_score:.3f}')
# Cross-validation comparison
models = [('Linear', lr), ('DecisionTree', dt), ('KNN', knn)]
for name, model in models:
cv_scores = cross_val_score(model, X, y, cv=5, scoring='r2')
print(f'{name} CV R²: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}')
# Number of parameters
print(f'\nLinear Regression parameters: {lr.coef_.size + 1}') # coefficients + intercept
print(f'Decision Tree nodes: {dt.tree_.node_count}') # grows with data
print(f'KNN: stores all {len(X_train)} training samples') # lazy learner
Quiz (15)
What defines a parametric model?
What defines a non-parametric model?
Give two examples of parametric models.
Give two examples of non-parametric models.
Which type makes stronger distributional assumptions?
Which type is more sample efficient with small data?
Which type handles nonlinear relationships better without feature engineering?
Which type can extrapolate beyond training data?
Which type is more interpretable?
What is the curse of dimensionality?
How do parametric models handle bias-variance tradeoff?
How do non-parametric models handle bias-variance tradeoff?
Are decision trees parametric or non-parametric?
Is linear regression parametric or non-parametric?
What is model capacity?
Practical Checklist
Understand your data size: small → parametric, large → non-parametric favored.
Check distributional assumptions: use residual plots and statistical tests.