What does HackerNews think of luigi?

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Language: Python

#102 in Python
I agree there are many options in this space. Two others to consider:

- https://airflow.apache.org/

- https://github.com/spotify/luigi

There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…

I like Airflow because you can give access to the web UI to operators and they can kick/run/stop tasks or graphs of tasks. Both Airflow and Luigi expect you to express your workflow as a DAG in Python code.

What are you trying to do? Distributed scheduler with a single instance? No database? Are you sure you don't just mean "a scheduler" ala Luigi? https://github.com/spotify/luigi

And what kind of scheduler? Again, for "a single instance" it doesn't need to be distributed. For distributed operation, Nomad is as simple and generic as you can get. If you need to define a DAG, that's never going to be simple.

Take look at luigi, which is a lightweight task orchestrator with minimalistic dependencies.

[1] https://github.com/spotify/luigi

I used Luigi [1] to automate data processing at a previous job. It's a simple job queue with a UI. You request jobs from it, and then run them for minutes or hours, so it shouldn't normally be a bottleneck and it makes sense to use a language that's quick and easy to write.

It's written in Python and works fine to process thousands of jobs per day. Once you start having tens of thousands of jobs in the queue, it gets slow enough that it can back things up. This compounds the problem, eventually resulting in the whole thing crashing.

By switching the interpreter to PyPy, I was able to keep the data pipeline running at that scale without having to rewrite anything.

[1] https://github.com/spotify/luigi

Luigi from Spotify:

https://github.com/spotify/luigi

We’ve been using it for complex update workflows for about 5 yrs now, and it just works.

It doesn’t do scheduling or have a fancy ui, but it’s a solid workhorse.