From notebook to production: an MLOps checklist
A practical MLOps checklist for shipping machine learning models to production: reproducible training, deployment, monitoring, evaluation, retraining, and cost control.
A model that works in a notebook is maybe 30% of the work. The rest is making it run reliably, observably, and affordably in production — and keeping it healthy after launch. Here's the checklist I work through to get a model from prototype to dependable production.
Reproducible training
- Version data, code, and configuration so any model can be rebuilt exactly.
- Track experiments (parameters, metrics, artifacts) so results are comparable, not lost in notebooks.
- Establish a simple baseline first — you can't tell if a complex model is worth it without one.
Deployment
- Decide batch vs. real-time based on how predictions are consumed.
- Version the deployed model and keep rollback trivial.
- Separate the model artifact from the serving code so you can update either independently.
Monitoring
Models fail silently. Monitor input data drift, prediction distributions, and quality metrics where ground truth is available, plus operational signals like latency and error rates. Set thresholds that page a human before users notice.
Evaluation and retraining
Define an evaluation harness that runs on every candidate model, and decide the trigger for retraining — scheduled, drift-based, or performance-based. A model that can't be re-evaluated and refreshed isn't really in production; it's just deployed.
Cost control
Inference and training costs creep. Right-size compute, batch where possible, cache repeated work, and revisit whether a smaller model meets the bar. Cost is a first-class production metric, not an afterthought.
None of this is glamorous, but it's the difference between a model that delivers value for years and one that quietly degrades until someone notices the numbers look wrong.