I'm a programmer, mostly not a data scientist nowadays, mostly working with Python. I have tried Jupyter Lab/Notebook on and off over the last 10 years, and I believe I have now firmly settled on my conclusion:

Everyone should aim to minimize the amount of work they do in Jupyter Lab / Notebook.

It shocks me a bit to find myself saying that, as it is such a beautiful piece of work. Furthermore the people who wrote it are better software engineers than I'll ever be: the frontend, the zeromq-mediated communication with the kernel, the fact that the architecture has generalized so successfully to other language kernels, its huge popularity and reach. Nevertheless, I believe I'm serious. It really comes down to just two related issues, but they're extremely important: debugging and version control.

If you're a software engineer, and not a data scientist, here's how you probably debug already, or if not then how you should debug:

- You identify (a) commit(s) on which the behavior is correct, and (a) commit(s) where it is not correct.

- You experiment with fixes. Perhaps you stash them, perhaps you create experimental commits.

The critical point is that you use your version control system (probably Git) to navigate between alternative versions of the code. With a single command, you can switch the version of your code base, and the subsequent process you invoke to test your code is a fresh process, unpolluted by any state from the version of your code that you were on 30 seconds ago.

In contrast, Jupyter notebook does not encourage this style of work at all. In practice, what you will do when trying to debug some code in Jupyter is comment out lines, temporarily delete code, add experimental lines, add experimental new cells, etc. All creating a working tree, and a collection of in-memory python objects, that is a baffling mixture of changes related to the original feature development, and changes related to experimental debugging. Debugging will wear you out, as the state of your notebook gradually approaches complete incomprehensibility.

If you're a software engineer, you'll already know the benefits of being able to make precise adjustments to the state of your code with git commands. You want to learn statistics and data analysis skills from data scientists, but in doing so you should not regress to a worse style of development by starting to write much of your code in Jupyter notebooks.

And if you're a data scientist, you will want to acquire the debugging skills of software engineers. If you are not using git, you want to start learning it now.

Crudely, we can imagine a 2-dimensional diagram with one axis for engineering skills and another for data science skills. Everyone wants to be in the top-right quadrant. In that quadrant, version control is used, and the version control system is used for debugging. Debugging is rather important in developing all software, whether scientific/numerical or not.

So both groups should be minimizing the amount of code written in the Jupyter notebook UI: instead, write code in a standard Python package, in a virtualenv, installed in editable mode with `pip install -e`. If you need to use a notebook for graphical display, or HTML display of Pandas dataframes, or display of an audio playing widget, or any of the other amazing things it does so well then fine: use importlib.reload in your notebook to load and reload the bulk of your code from your Python package. The notebook should just feature calls to plotting routines etc that you have implemented in standard code files using your text editor/IDE. You could even aim for your notebook to contain so few lines of code that in some projects you might not even bother committing it.

Are there projects to integrate notebooks with version control?

There are a few efforts on this front. Here are two that I know of for JupyterLab:

https://github.com/jupyterlab/jupyterlab-git

https://github.com/elyra-ai/elyra#notebook-versioning-based-...