<- Data projects

2026 / Independent ML project

March Madness Prediction Pipeline

Top 1% Kaggle-style tournament model with leakage-safe features, calibrated ensembles, and Stage 2 submissions

MLSports AnalyticsCatBoostLightGBMForecastingKaggle
Tournament probability calibration curve
Tournament probability calibration curve

Overview

Built an end-to-end March Machine Learning Mania pipeline for men's and women's NCAA tournament prediction, combining point-in-time feature engineering, calibrated model sweeps, strict walk-forward validation, and final Stage 2 submission generation. The project reached a top 1% result while keeping the modeling workflow auditable and reproducible.

Problem

Tournament prediction is a small-sample, high-variance forecasting problem where leakage is easy: seeds, ratings, injuries, and late-season signals must be available only as of the prediction date. The goal was to maximize calibrated win probabilities without letting future tournament outcomes contaminate training.

Role

Individual project - data pipeline, feature engineering, model selection, validation, submission strategy, and reporting

Timeline

2026

Tools

Python / pandas / CatBoost / LightGBM / XGBoost / scikit-learn / pytest

Data

  • Kaggle men's and women's NCAA regular-season, tournament, seed, slot, conference, coach, city, and Massey ordinal files
  • Stage 2 sample submission with 132,133 matchup rows validated against exact sample IDs
  • Point-in-time feature snapshots cached by division, season, and cutoff day
  • Optional external 2026 injury, prospect, and bracket projection snapshots for inference experiments

Approach

  • Created leakage-safe team-season and matchup features from regular-season results only, including Elo, Glicko-like ratings, rating uncertainty, ORtg, DRtg, NetRtg, pace, eFG, turnover, rebounding, free-throw, 3PA, opponent-adjusted margin, conference strength, seed priors, and Massey aggregates
  • Ran multi-family sweeps across logistic elastic net, HistGB, CatBoost, LightGBM, XGBoost, and OOF stacking
  • Compared Platt, isotonic, and beta calibration under walk-forward validation
  • Built distinct final strategies: balanced, chalk-leaning, upset-leaning, and uncertainty-robust, then generated Stage 2 A/B submissions

Evaluation

  • Strict validation folds: 2022, 2023, 2024, and 2025 with train seasons strictly earlier than the validation season
  • Primary metric: Brier score; secondary metrics: LogLoss and expected calibration error
  • Best stable Stage 2 candidate: CatBoost rating-focused model with mean Brier 0.164668, mean LogLoss 0.494439, and mean ECE 0.032421 across four folds
  • Final strategy audit reported robust strategy mean Brier 0.164916 and balanced strategy mean Brier 0.165139 under four-fold validation

Results

  • Achieved top 1% performance with a fully reproducible prediction workflow
  • Produced final Stage 2 submissions A and B plus four strategy submissions for balanced, chalk, upset, and robust risk profiles
  • Built automated reports, figures, validation audits, submission checks, and tests for parsing, leakage guardrails, and output format

Deployment

  • One-command training and report pipeline via Python modules
  • Generated CSV submissions, model summaries, experiment leaderboards, calibration figures, and PDF reports
  • Validation checks ensure exact sample ID alignment and probability bounds before submission

Limitations

  • Tournament sample size remains limited and season-to-season variance is high
  • External injury mapping is noisy, especially for women's coverage
  • Some strategy-level metrics include additional calibration-selection layers and are treated directionally rather than as direct single-run comparisons

Evidence

Calibration curve for the balanced strategy
Calibration curve for the balanced strategy
Calibration curve for the chalk-leaning strategy
Calibration curve for the chalk-leaning strategy
Calibration curve for the upset-leaning strategy
Calibration curve for the upset-leaning strategy
Calibration curve for the robust strategy
Calibration curve for the robust strategy

Repro Steps

  • Install project requirements and place Kaggle competition CSVs under data/raw
  • Run python -m src.experiments.stage2_finalize --asof 2026-02-21 --budget 35 --rebuild_features
  • Validate submissions against SampleSubmissionStage2.csv and inspect outputs/reports

Next Steps

  • Add minute-level player availability priors mapped to possession-level impact
  • Expand uncertainty modeling with bootstrap and fold variance
  • Increase sweep budget with early stopping and run-time pruning
  • Add explicit monotonic constraints for seed features in LightGBM variants
View repository ->