ML in Production¶

Taking a model from notebook to production. Covers model serialization, serving, monitoring, and the operational concerns that separate prototypes from products.

Model Serialization¶

# Pickle (sklearn, catboost)
import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

# Joblib (better for large numpy arrays)
import joblib
joblib.dump(model, 'model.joblib')
model = joblib.load('model.joblib')

# PyTorch
torch.save(model.state_dict(), 'model.pth')
model.load_state_dict(torch.load('model.pth'))

# ONNX (framework-agnostic)
import torch.onnx
torch.onnx.export(model, dummy_input, 'model.onnx')

Serving Models¶

Flask/FastAPI¶

from fastapi import FastAPI
import pickle

app = FastAPI()
model = pickle.load(open('model.pkl', 'rb'))

@app.post("/predict")
def predict(features: dict):
    X = preprocess(features)
    prediction = model.predict(X)
    return {"prediction": prediction.tolist()}

Batch Prediction¶

For non-real-time use cases: run predictions on schedule, store results in database.

# Batch scoring pipeline
predictions = model.predict(batch_features)
df['prediction'] = predictions
df.to_parquet('predictions.parquet')

Monitoring¶

Data Drift¶

Features in production diverge from training distribution.

Compare feature distributions between training and production data
Monitor statistical tests (KS test, PSI) for drift detection
Alert when drift exceeds threshold

Model Degradation¶

Performance declines over time as world changes.

Monitor prediction distribution shifts
Track business metrics correlated with model output
Set up retraining triggers (scheduled or drift-based)

Logging¶

Log every prediction with features, timestamp, and model version for debugging and retraining.

A/B Testing in Production¶

Route fraction of traffic to new model
Compare business metrics between control (old) and test (new)
Use statistical tests to confirm improvement
Gradually increase traffic to winner
Monitor for regression after full rollout

Pipeline Automation¶

Feature pipelines: automated feature computation and storage
Training pipelines: scheduled retraining with latest data
Validation gates: automated checks before deployment
Rollback: ability to quickly revert to previous model version

Gotchas¶

Pickle files are not secure - don't load untrusted pickles
Model + preprocessing must be versioned together (scaler mismatch = wrong predictions)
Batch prediction is simpler and sufficient for most use cases - don't build real-time serving unless needed
Data drift doesn't always mean model degradation - investigate before retraining
Hardware requirements: CPU usually sufficient for inference; GPU only for large neural networks