What does HackerNews think of ploomber?

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

Language: Python

#201 in Hacktoberfest
For those who don't know, Jupyter has a bash kernel: https://github.com/takluyver/bash_kernel

And you can run Jupyter notebooks from the CLI with Ploomber: https://github.com/ploomber/ploomber

Fair point. MLflow has a lot of features to cover the end-to-end dev cycle. This SQLite tracker only covers the experiment tracking part.

We have another project to cover the orchestration/pipelines aspect: https://github.com/ploomber/ploomber and we have plans to work on the rest of features. For now, we're focusing on those two.

Hi HN!

We just released debuglater (https://github.com/ploomber/debuglater), an open-source library that serializes a Python traceback object for later debugging.

You can see a quick video demo here: https://github.com/ploomber/debuglater/blob/master/README.md

Countless times, we've scheduled overnight jobs to find out the following day that they failed. While logs are helpful, they are often insufficient for debugging. debuglater allows you to store the traceback object so you can start a debugging session at any moment.

We built this to support our open-source framework for data scientists (https://github.com/ploomber/ploomber), who often execute long-running code in remote environments. However, we realized this could be useful for the Python community, so we created a separate package. This project is a fork of Eli Finer's pydump, so kudos to him for laying the foundations!

The implementation is quite interesting. You can see it here (https://github.com/ploomber/debuglater/blob/master/src/debug...). The serialization step has two parts: it takes the traceback object and wraps it into a new object so it can be serialized; secondly, it stores the source code so you can debug even if you don't have access to the source code!

Please take it for a spin and let us what you think!

Ploomber (W22) | Developer Advocate & Software Engineers | Full-time | Remote and NYC

We're building tools to help data scientists develop and ship faster. We recently closed our seed round and are looking to assemble a small team of amazing individuals to take our tooling to the next level.

Job board: https://www.ycombinator.com/companies/ploomber/jobs

GitHub: https://github.com/ploomber/ploomber

Website: https://ploomber.io/

When it comes to scale and DS work I'd use the ploomber open-source (https://github.com/ploomber/ploomber). It allows an easy transition between dev and production, incrementally building the DAG so you avoid expensive compute time and costs. It's easier to maintain and integrates seamlessly with Airflow, generating the DAGs for you.
One of my deal breakers when choosing tooling is how easy is to move from a local environment to a distributed environment. Ideally, you want to start locally and move to a distributed env if you need to. So choose one tool that allows you to get started quickly and move from there.

As an example: one of the reasons why I don't use Kubeflow is because it requires having a Kubernetes cluster up and running, which is an overkill in many cases.

Check out the project I'm working on: https://github.com/ploomber/ploomber

Congrats on the launch! As a former data scientist, it pumps me up to see more notebook-centric tooling, as I believe it is the best environment for data exploration and rapid iterations.

We're working on notebook tooling as well (https://github.com/ploomber/ploomber), but our focus is at the macro level so to speak (how to develop projects that are made up of several notebooks). In recent conversations with data teams, the question "how do you ensure the code quality of each notebook?" has come up a lot, and it's great to see you are tackling that problem. It'll be exciting to see people using both MutableAI and Ploomber! Very cool stuff! I'll give it a try!

Well written. I think that airflow is being enforced in organizations as the main orchestrator even though it's not always the right too for the job. In addition, organizations has to enforce a micro-services approach to have modular components. Besides that managing those frameworks is a nightmare. We built Ploomber (https://github.com/ploomber/ploomber) specifically for this reason, modular components and easy deployments. It standardize your pipelines and allows you to deploy seamlessly on Airflow, Argo (Kubernetes), Kubeflow and cloud providers.
I completely agree with the cons you outlined, especially your point about "productivity drops when data scientists leave notebooks."

A few years ago, I started working as a data scientist at a big financial firm and reviewed all workflow orchestrator available tools (including Kedro). I didn't like that all of them forced me to re-write my Jupyter code into their frameworks (they're supposed to make me more productive, not less).

True, notebooks have their issues but they can be fixed (I don't buy that "Jupyter is only for prototyping argument"). So, long story short, I started a project with a friend that makes us more productive by fixing the problems that notebooks' problems. https://github.com/ploomber/ploomber

I believe essentially all of the tools mentioned here are focusing on the engineering persona and not the Data Scientist. Writing classes and functions isn't the jargon of a Data Scientist. At Ploomber we tried to put the Data Scientists in the center of everything, helping them to work together with OPS. Check it out! https://github.com/ploomber/ploomber
We've built the Ploomber to work with R as well comparing to others who believe only python is for ML. Ploomber is an open source tool that can integrate with R and R studio out of the box. Check it out! https://github.com/ploomber/ploomber
We've built the Ploomber open source tool for that exact reason - true open source! We've been trying to focus on the data scientists, not taking them out of jupyter and definitely making sure they can execute what they want without a dedicated infra/ops person. Check it out! https://github.com/ploomber/ploomber
This is a great insight! I think parameterizing the notebooks is part of the solution, moving to production shouldn't be time-consuming and definitely no need to refactor the code like I've seen some people do. I'd love to get your feedback. We're building a framework to help people develop maintainable work from Jupyter! https://github.com/ploomber/ploomber
We'd love to get your feedback. We're building a framework to help people develop maintainable work from Jupyter! https://github.com/ploomber/ploomber
This is a daily pain we've experienced while working in the industry! Our projects would usually allocate a few weeks to refactor notebooks before deployment! So we started working on an open-source framework to help us produce maintainable work from Jupyter. It allows easy git collaboration and eases deployment. https://github.com/ploomber/ploomber