What does HackerNews think of ploomber?
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
And you can run Jupyter notebooks from the CLI with Ploomber: https://github.com/ploomber/ploomber
We have another project to cover the orchestration/pipelines aspect: https://github.com/ploomber/ploomber and we have plans to work on the rest of features. For now, we're focusing on those two.
We just released debuglater (https://github.com/ploomber/debuglater), an open-source library that serializes a Python traceback object for later debugging.
You can see a quick video demo here: https://github.com/ploomber/debuglater/blob/master/README.md
Countless times, we've scheduled overnight jobs to find out the following day that they failed. While logs are helpful, they are often insufficient for debugging. debuglater allows you to store the traceback object so you can start a debugging session at any moment.
We built this to support our open-source framework for data scientists (https://github.com/ploomber/ploomber), who often execute long-running code in remote environments. However, we realized this could be useful for the Python community, so we created a separate package. This project is a fork of Eli Finer's pydump, so kudos to him for laying the foundations!
The implementation is quite interesting. You can see it here (https://github.com/ploomber/debuglater/blob/master/src/debug...). The serialization step has two parts: it takes the traceback object and wraps it into a new object so it can be serialized; secondly, it stores the source code so you can debug even if you don't have access to the source code!
Please take it for a spin and let us what you think!
We're building tools to help data scientists develop and ship faster. We recently closed our seed round and are looking to assemble a small team of amazing individuals to take our tooling to the next level.
Job board: https://www.ycombinator.com/companies/ploomber/jobs
GitHub: https://github.com/ploomber/ploomber
Website: https://ploomber.io/
As an example: one of the reasons why I don't use Kubeflow is because it requires having a Kubernetes cluster up and running, which is an overkill in many cases.
Check out the project I'm working on: https://github.com/ploomber/ploomber
We're working on notebook tooling as well (https://github.com/ploomber/ploomber), but our focus is at the macro level so to speak (how to develop projects that are made up of several notebooks). In recent conversations with data teams, the question "how do you ensure the code quality of each notebook?" has come up a lot, and it's great to see you are tackling that problem. It'll be exciting to see people using both MutableAI and Ploomber! Very cool stuff! I'll give it a try!
A few years ago, I started working as a data scientist at a big financial firm and reviewed all workflow orchestrator available tools (including Kedro). I didn't like that all of them forced me to re-write my Jupyter code into their frameworks (they're supposed to make me more productive, not less).
True, notebooks have their issues but they can be fixed (I don't buy that "Jupyter is only for prototyping argument"). So, long story short, I started a project with a friend that makes us more productive by fixing the problems that notebooks' problems. https://github.com/ploomber/ploomber