What does HackerNews think of siuba?

For further inspiration, this is a pretty good-looking "dplyr for Python": https://github.com/machow/siuba

There's precedent in Ecto for a "magic" operator (`^` if I'm not mistaken), so it wouldn't be a stretch to implement it here as well.

More intuitive partial function application in Python | Feb 2022

This is not a new idea. siuba (a dplyr-like Pandas alternative) is one project that does it, and it also has a "pipeline operator": https://github.com/machow/siuba

(mtcars >> group_by(_.cyl) >> summarize(avg_hp = _.hp.mean()) )

(Edit: ok, it seems siuba doesn't have the "parameter fixing" thing.)

Related: Coconut - a functional superset of Python - https://github.com/evhub/coconut

range(10) |> map$(pow$(?, 2)) |> list

There were others but I can't remember which now.

A trick to have arbitrary infix operators in Python | Jan 2022

Expand Context ↕

See 'https://github.com/machow/siuba'.

What would it take to recreate dplyr in Python? (2020) | Jan 2022

Author here--happy to answer questions :)

Siuba has come a long way since I wrote this, and now can optimize for fast grouped operations!:

* https://github.com/machow/siuba

* https://siuba.readthedocs.io/en/latest/developer/pandas-grou...

Practical SQL for Data Analysis | May 2021

For what it's worth, I maintain a library called siuba that lets you generate SQL code from pandas methods.

It's crazy to me how people use SELECT * -> pandas, but also how people in SQL type a ton of code over and over.

https://github.com/machow/siuba

Show HN: Csql – Python lib for composeable SQL queries | Oct 2020

Expand Context ↕

Interesting approach! I've been working on it from the other angle: having pandas code generate SQL. If you're interested in checking it out, happy to try and show how it would generate the query in your readme!

https://github.com/machow/siuba

One thing I was wondering about the gnarly ast stuff you mention, what about operator overloading? E.g. Q("Select a from" + subquery + "where a < 1")

Ask HN: What weird or hard problems are you trying to solve? | Jul 2020

Working on siuba, a data analysis tool for python. It's a port of the R library dplyr, and can produces SQL queries!

I've programmed in python for much longer than R, and really want to be able to move at the same speed when using python for data analysis :o.

It's a weird problem though because the two languages have basically opposite approaches to DataFrames. pandas has a very fat DataFrame implementation, R an extremely minimal one. (Pros and cons to both approaches).

https://github.com/machow/siuba

Scaling Pandas: Comparing Dask, Ray, Modin, Vaex, and Rapids | Jul 2020

Articles like these are interesting, but what surprises me is that they rarely set up a holistic use case, so most debates imagine how long it would take an expert user to use each tool. But time constraints (eg spent coding) separates expert from novice performance in many domains.

FWIW I have 2020 set aside to implement siuba, a python port of the popular R library dplyr (siuba runs on top of pandas, but also can generate SQL). A huge source of inspiration has been screencasts by Dave Robinson using R to analyze data he's never seen before lightning fast.

Has anyone seen similar screencasts with pandas? I suspect it's not possible (given some constraints on its interface), but would love to be wrong here, because I'd like to keep all my work in python :o.

Expert R screencasts: https://youtu.be/NY0-IFet5AM

Siuba: https://github.com/machow/siuba

Ask HN: What's your quarantine side project? | May 2020

porting the R library dplyr to python. :)

(I set aside 2020 to work on this, but quarantine definitely created much more time for it)

https://github.com/machow/siuba

Five methods for Filtering data with multiple conditions in Python | Jan 2020

Expand Context ↕

I've been working on a library over the past year that does exactly that, including generating dbplyr style SQL queries!

Would love your feedback :)

https://github.com/machow/siuba