I work with a bunch of 'data scientists' / 'strategists' and the like who love their notebooks but it's a pain to convert their code into an application!
In particular:
* Notebooks store code and data together, which is very messy if you want to look at [only] code history in git. * It's hard to turn a notebook into an assertive test. * Converting a notebook function into a python module basically involves cutting and pasting from the notebook into a .py file.
These must be common issues for anyone working in this area. Are there any guides on best practices for bridging from notebooks to applications?
Ideally I'd want to build a python application that's managed via git, but some modules / functions are lifted exactly from notebooks.
- We mostly use notebooks as scratchpads or alpha prototypes.
- Papermill is a great tool when setting up a scheduled notebook and then shipping the output to S3: https://papermill.readthedocs.io/en/latest/
- When turning notebooks into more user-facing prototypes, I've found Streamlit is excellent and easy-to-use. Some of these prototypes have stuck around as Streamlit apps when there's 1-3 users who need to use them regularly.
- Moving to full-blown apps is much tougher and time-consuming.