What does HackerNews think of nbstripout?

strip output from Jupyter and IPython notebooks

Language: Python

#37 in Hacktoberfest
nbstripout[0] does that and installs a pre-commit hook

[0] -- https://github.com/kynan/nbstripout

I used something as a precommit hook in the past that removed plots and other rendered content and only kept text and code in git index. I'm almost sure it was https://github.com/kynan/nbstripout but it's been a while and I could be wrong.

Once the hook was in place git diff worked well enough to not need any other diffing tool.

You can use it with source control, I do it for about 18 notebooks I use on a daily basis:

https://github.com/kynan/nbstripout

For .ipynb notebooks, I highly recommend using nbstripout [0] to strip the Jupiter output before committing the notebooks to the repository (thus making the diffs sane).

You can also set it up as a 'filter', so it automatically runs before any git operations, whether it's add, commit, diff or an interactive rebase.

[0] https://github.com/kynan/nbstripout

I recommend nbstripout https://github.com/kynan/nbstripout

It eases most of the pain regarding version control. You can use it as a 'git filter', so only inputs would be shown in diffs and committed (and also works with interactive adding!), while keeping outputs in your working tree.

You bet. I built ReviewNB[1] specifically for Jupyter Notebook code reviews.

There's also,

- nbstripout[2] for stripping outputs automatically before every commit

- nbdime[3] for diff'ing notebooks locally

- jupytext[4] for converting notebooks to markdown and vice-a-versa

[1] https://www.reviewnb.com/

[2] https://github.com/kynan/nbstripout

[3] https://github.com/jupyter/nbdime

[4] https://github.com/mwouts/jupytext

nbstripout [1] is my favorite tool for this. Installing it in your Git repo is 2 lines:

$ pip install --upgrade nbstripout # install nbstripout bin

$ nbstripout --install # install Git hook in current repo

Then, any .ipynb files that you check in will have their output stripped in the index (without affecting your working copy).

(Surprised it's not mentioned in the article.)

[1] https://github.com/kynan/nbstripout

Not OP, but I can recommend the handy https://github.com/kynan/nbstripout which acts as a git filter which makes version control ignore cell outputs.

With that approach, though notebooks are clean they're still fairly poor for easily evaluating diffs between versions. If code review / diffs are more important than preserving the notebook, then you could use a post save hook to convert notebook input to a .py file and output to .html:

https://towardsdatascience.com/version-control-for-jupyter-n...

Well, good question. The file format for Jupyter is not ideal for 'code craftsmanship', as pointed out by another comment. There are utilities to strip out some of the metadata from the Jupyter files, such as rendered output and run counters, but that is a trade-off to be decided by your team:

https://github.com/kynan/nbstripout

Here is a pip package called "nbstripout" which tell git to ignore notebook output: https://github.com/kynan/nbstripout It can really help establish good practice in a project with little effort:

    pip install --upgrade nbstripout
    nbstripout --install
Use nbstripout[0] as a git filter. Then you have seamless git control of notebooks. There's talk of automatically saving code only separate versions in future releases.

[0] https://github.com/kynan/nbstripout