Cross-Validation
A technique to estimate how well an ML model will generalise to unseen data.
What is Cross-Validation?
Splitting your data into train / test naively gives you one estimate of model performance. Cross-validation gives you many estimates — by splitting the data multiple ways — making your performance estimate more reliable.
The classic approach is **K-fold cross-validation**: split the data into K equal folds, train on K-1, validate on the held-out one, rotate, average. K=5 or K=10 are typical. For time-series, use **time-series cross-validation** to respect temporal order.
Why this matters: a single train/test split can give a misleadingly good or bad score by luck. Cross-validation reveals the variability — a model with mean=82% accuracy and std=2% is different from one with mean=82% and std=15%.
Cross-validation is the #1 thing junior data scientists skip and senior interviewers probe. Knowing it signals real ML maturity.
A Hyderabad credit-risk team caught overfitting in their model only because cross-validation revealed accuracy varied from 71% to 86% across folds — a sign the model was unstable. They added regularisation; the variance dropped; deployment succeeded.
Want to master this?
Learn Cross-Validation in a structured cohort
3-month live program with mentors, real projects, and 50+ partner placement support.
