My Workflow with Jupytext

I will take the example of a typical day at work. Today, I have to answer a new question about our data and algorithms. It turns out that I already answered a similar question in the past. So, to begin, I search which of my existing notebooks is the closest to yield the answer to today’s question.

Since I use Jupytext, all my .ipynb notebooks have a paired .py representation. So I open PyCharm, and use the Find in Path search window to identify, among my collection of .py notebooks, which can get me started on today's question:

Let me add that

the search experience is much improved when you restrict the search to *.py files, and add .ipynb_checkpoints to Ignore Files and Folders in PyCharm Settings/Editor/File Types

files, and add to Ignore Files and Folders in PyCharm Settings/Editor/File Types and that, if you want to pair all the notebook in the current directory to percent scripts, you can simply run jupytext --set-formats ipynb,py:percent *.ipynb .

Now I open the existing notebook, which will serve as a template. I copy an extract of its content — the part that I want to start with — to a new .py file. I take care of including the YAML header, because it's where the Jupytext pairing information and the notebook kernel are defined. Then I adjust the new .py notebook to today's question. In a Markdown cell (delimited with # %% [markdown] ), I write a few words about what I want to do today. Then I adjust the code to better address the current question. Doing this in an IDE is more comfortable than in the notebook. It is also safer and faster, as I benefit from the IDE syntax checks and highlighting.

When my draft is good enough, I open the .py file in Jupyter as a notebook (single click in Jupyter Notebook; in JupyterLab, right-click on the file, and choose Edit with/Notebook). At this stage, it has no outputs, so I run it. Of course, the new notebook will probably not work well on the first run, so I continue editing the notebook in Jupyter, until it runs properly.

When I save the notebook in Jupyter, the .py file gets updated to match the latest contents. Also, an .ipynb file is created, with outputs included, because the Jupytext header has ipynb at this line: formats:ipynb,py:percent . And if I forgot to copy the header, I can use the Jupytext menu in Jupyter and select: Pair notebook with .ipynb document to activate the pairing with a .ipynb notebook.

Now I am done. Usually, I will share the .ipynb file (using, for instance, Jupyter nbviewer), and version the .py file.

What do I most like about this workflow?

Searching among notebooks is super easy — they are just text files.

Drafting a new notebook is so comfortable. Never before was I able to copy-paste multiple cells from different notebooks so easily.

When I edit the .py notebook in PyCharm, I benefit from the advanced capabilities of the IDE: syntax checks, completion, reformat, documentation tips...

notebook in PyCharm, I benefit from the advanced capabilities of the IDE: syntax checks, completion, reformat, documentation tips... And I am also free to edit the notebook in Jupyter.

Jupytext solves the issue of version control. Usually, I don’t want to keep the notebook outputs in git , so I only version the .py file, which has a clean diff history.

Note that here I use PyCharm Professional, but it’s only because it is my favorite IDE. You can use any other editor, the workflow will work the same.

What I am less fan of: