2026 / Independent ML project

March Madness Prediction Pipeline

Top 1% Kaggle-style tournament model with leakage-safe features, calibrated ensembles, and Stage 2 submissions

MLSports AnalyticsCatBoostLightGBMForecastingKaggle

March Madness prediction cover

Overview

Built an end-to-end March Machine Learning Mania pipeline for men's and women's NCAA tournament prediction, combining point-in-time feature engineering, calibrated model sweeps, strict walk-forward validation, and final Stage 2 submission generation. The project reached a top 1% result while keeping the modeling workflow auditable and reproducible.

Problem

Tournament prediction is a small-sample, high-variance forecasting problem where leakage is easy: seeds, ratings, injuries, and late-season signals must be available only as of the prediction date. The goal was to maximize calibrated win probabilities without letting future tournament outcomes contaminate training.

Role

Individual project - data pipeline, feature engineering, model selection, validation, submission strategy, and reporting

Timeline

2026

Tools

Python / pandas / CatBoost / LightGBM / XGBoost / scikit-learn / pytest

Data

Kaggle men's and women's NCAA regular-season, tournament, seed, slot, conference, coach, city, and Massey ordinal files
Stage 2 sample submission with 132,133 matchup rows validated against exact sample IDs
Point-in-time feature snapshots cached by division, season, and cutoff day
Optional external 2026 injury, prospect, and bracket projection snapshots for inference experiments

Approach

Created leakage-safe team-season and matchup features from regular-season results only, including Elo, Glicko-like ratings, rating uncertainty, ORtg, DRtg, NetRtg, pace, eFG, turnover, rebounding, free-throw, 3PA, opponent-adjusted margin, conference strength, seed priors, and Massey aggregates
Ran multi-family sweeps across logistic elastic net, HistGB, CatBoost, LightGBM, XGBoost, and OOF stacking
Compared Platt, isotonic, and beta calibration under walk-forward validation
Built distinct final strategies: balanced, chalk-leaning, upset-leaning, and uncertainty-robust, then generated Stage 2 A/B submissions

Evaluation

Strict validation folds: 2022, 2023, 2024, and 2025 with train seasons strictly earlier than the validation season
Primary metric: Brier score; secondary metrics: LogLoss and expected calibration error
Best stable Stage 2 candidate: CatBoost rating-focused model with mean Brier 0.164668, mean LogLoss 0.494439, and mean ECE 0.032421 across four folds
Final strategy audit reported robust strategy mean Brier 0.164916 and balanced strategy mean Brier 0.165139 under four-fold validation

Results

Achieved top 1% performance with a fully reproducible prediction workflow
Produced final Stage 2 submissions A and B plus four strategy submissions for balanced, chalk, upset, and robust risk profiles
Built automated reports, figures, validation audits, submission checks, and tests for parsing, leakage guardrails, and output format

Deployment

One-command training and report pipeline via Python modules
Generated CSV submissions, model summaries, experiment leaderboards, calibration figures, and PDF reports
Validation checks ensure exact sample ID alignment and probability bounds before submission

Limitations

Tournament sample size remains limited and season-to-season variance is high
External injury mapping is noisy, especially for women's coverage
Some strategy-level metrics include additional calibration-selection layers and are treated directionally rather than as direct single-run comparisons

Evidence

Final strategy metrics and comparison

Model-family results across experiment sweeps

Calibration curve for the balanced strategy

Sensitivity analysis for injury signal experiments

Repro Steps

Install project requirements and place Kaggle competition CSVs under data/raw
Run python -m src.experiments.stage2_finalize --asof 2026-02-21 --budget 35 --rebuild_features
Validate submissions against SampleSubmissionStage2.csv and inspect outputs/reports

Next Steps

Add minute-level player availability priors mapped to possession-level impact
Expand uncertainty modeling with bootstrap and fold variance
Increase sweep budget with early stopping and run-time pruning
Add explicit monotonic constraints for seed features in LightGBM variants

View repository ->