What does HackerNews think of hydrogen?
:atom: Run code interactively, inspect data, and plot. All the power of Jupyter kernels, inside your favorite text editor.
The strength that kept me coming back to notebooks was their power at iterating on a problem but I was continuously frustrated at the difficulty of extracting my solution / tracking it in git / collaborating with colleagues etc. Also I didn't enjoy the editor experience from a UX point of view.
I've since started using hydrogen [1], a plugin for Atom which (via a Jupyter kernel) seems to get what I wanted from both worlds - it's just a python file but I get most of the notebook fun!
As others have mentioned in the comments it seems that similar workflows have existed across a number of languages and IDEs for many years, but it seems that they haven't really caught mainstream attention (in the sense of there being common conventions that multiple languages and tools follow).
For my own workflow, I have a debugger-like tool for IPython (https://github.com/nikitakit/xdbg) that lets me set a breakpoint anywhere in my code and resume the current Jupyter session once the breakpoint is reached, making sure that scope is properly adjusted to provide access to all local variables. When combined with text editor integration (such as https://github.com/nteract/hydrogen), this is the best I've managed to come up with in terms of minimizing the "penalty for abstraction" while maintaining interactivity.
> Pandas, one of the biggest "offenders", is trying to be an in-memory database with only one table but ends up having far fewer features and a far clunkier interface (want to do a simple map/reduce? Welcome to chaining a strange combination of '.loc', '&', and ':,' "operators").
What makes Pandas so great is that you can apply arbitrary functions to rows and columns, with the full expressivity of Python. In some cases it might be clunkier (though you should almost never need `.loc` and other indexing methods) but mostly it's just `df.groupby(...).apply(...)` or vectorized methods like `df.column + df.other_column`. This is a huge improvement over having half of your analysis in database queries and half in a programming language.
> Matplotlib is unintuitive and poorly documented
Try https://seaborn.pydata.org/ for statistical graphics.
> Pandas also implements it's own versions of standard python objects! You need to know, and go back and forth between two, ways of doing things.
This sucks but is unavoidable, because Python does not have fast data types with support for missing values built in, so all your columns would have to be of mixed type (the actual type + None) and everything would slow down and simple things like computing the mean of a column with missing values would not work.
Note that you don't actually "need to go back and forth" because Pandas will happily convert plain Python objects to their Numpy equivalents for you.
> 3. All these libraries separate logically grouped concepts.
It's not functional, you're just going to have to deal with that. But split-apply-combine and similar patterns are quite elegant in Pandas: http://pandas.pydata.org/pandas-docs/stable/groupby.html
> 4. Because everything is meaningless lists of numbers there are no ways to reuse code.
A lot of data analysis is throw-away code. Some of it can be abstracted into reusable code, some of it can't.
Lastly, don't forget that Python does have a lot of things going for it when it comes to data analysis, from geospatial tools (http://toblerity.org/shapely/) to Bayesian modeling (http://pymc-devs.github.io/pymc3/index.html), as well as interactive coding with Jupyter and Hydrogen for the Atom editor (https://github.com/nteract/hydrogen).
That plugins are written in javascript (larger developer base) and allows more customization of Atom (more open plugin api) also yields more niche plugins.