Skip to main content

Command Palette

Search for a command to run...

From Jupyter Notebook to Production — What Actually Runs on the Server?

Updated
4 min read

Hello World, People!

CTs are finally over! So naturally, I went back to doing what I actually enjoy — building stuff. I was working on a classification model, poking around my dataset in my sweet little Jupyter notebook, when a thought hit me out of nowhere: How does this prediction model actually exist on a server? Is the Jupyter notebook somehow running behind it? Or does someone manually rewrite every cell into a Python file, line by line?

I couldn't let it go. So I did what I always do when curiosity gets the better of me — I used Claude to explore, went down a rabbit hole, and came out the other side with some things I genuinely didn't know before. Here's what I found.

In Production, It's Neither a Notebook Nor a Raw Python Script

This was my first surprise. The thing actually running in production is not your Jupyter notebook, and it's not a plain Python script computing predictions on every request either. Here's why both fall apart:

Jupyter Notebooks are built for exploration, not deployment. They're full of graphs, experimental cells, half-baked ideas, and markdown notes to yourself. There's no clean way to version them, debug them properly, or swap out a model a year later when you decide to try a different algorithm. Dumping everything into one notebook is great when you're learning — terrible when something needs to reliably serve thousands of requests.

Raw Python scripts have a different problem. They'd have to recompute the model's weights on every single request, handle all the preprocessing logic inline, and somehow guarantee that the data going in looks exactly like the data the model trained on. That last part — reproducibility — is where things quietly break in ways that are very hard to debug.

So What Do We Actually Use? — A Model Artifact

Once notebooks and raw scripts are off the table, the answer becomes cleaner: you train your model once, and then you save it. What you save is called a model artifact.

What Is a Model Artifact?

A model artifact is essentially a frozen snapshot of your entire model pipeline. It stores everything the model needs to make predictions later — the algorithm configuration, the learned weights, scaling parameters like means and standard deviations, and any other stateful information from training.

Common formats you'll see:

  • .pkl — Python's Pickle format, quick and easy for most sklearn models

  • .joblib — Similar to Pickle but more efficient for large numpy arrays

  • .onnx — Open Neural Network Exchange, framework-agnostic and great for cross-platform use

The key idea: train once, save the state, load it anywhere. No recomputing, no guessing — the artifact carries everything the model learned.

Wrapping It All Together with FastAPI

Here's where it clicks into a real system. The model artifact gets loaded by a set of dedicated Python modules — one for preprocessing, one for prediction, one for training — each with a single, clear responsibility. FastAPI then sits on top and exposes these modules as clean API endpoints. So the flow looks like this:

Request comes in → FastAPI receives it → calls the preprocessing module → passes clean data to the prediction module → loads the artifact → returns a prediction.

No giant notebook, no recomputing weights, no spaghetti logic. Just clean separation of concerns and a model artifact doing the heavy lifting.

This whole thing made me want to actually build it — not just read about it. So I did. I started putting together a proper model serving system using FastAPI, leaning heavily on AI to understand concepts I'd never touched before. I haven't figured everything out yet, but that's kind of the point of writing this — to share what I understand as I understand it.

What's Coming Next

Building this system threw two new concepts at me that I hadn't really thought about before, and both turned out to be more interesting than I expected:

SHAP — how do you actually explain what your model is doing? Not just "it predicted X" but why it predicted X.

Stateless vs. Stateful & Training-Serving Skew — how do you make sure the data your model sees in production looks exactly the same as the data it trained on? (Spoiler: this is sneakily one of the most important problems in ML engineering.)

Those are next. Stay tuned.

— to be continued