What does HackerNews think of siuba?

Python library for using dplyr like syntax with pandas and SQL

Language: Python

#164 in Python
#44 in SQL
For further inspiration, this is a pretty good-looking "dplyr for Python": https://github.com/machow/siuba

There's precedent in Ecto for a "magic" operator (`^` if I'm not mistaken), so it wouldn't be a stretch to implement it here as well.

This is not a new idea. siuba (a dplyr-like Pandas alternative) is one project that does it, and it also has a "pipeline operator": https://github.com/machow/siuba

(mtcars >> group_by(_.cyl) >> summarize(avg_hp = _.hp.mean()) )

(Edit: ok, it seems siuba doesn't have the "parameter fixing" thing.)

Related: Coconut - a functional superset of Python - https://github.com/evhub/coconut

range(10) |> map$(pow$(?, 2)) |> list

There were others but I can't remember which now.

Author here--happy to answer questions :)

Siuba has come a long way since I wrote this, and now can optimize for fast grouped operations!:

* https://github.com/machow/siuba

* https://siuba.readthedocs.io/en/latest/developer/pandas-grou...

For what it's worth, I maintain a library called siuba that lets you generate SQL code from pandas methods.

It's crazy to me how people use SELECT * -> pandas, but also how people in SQL type a ton of code over and over.

https://github.com/machow/siuba

Interesting approach! I've been working on it from the other angle: having pandas code generate SQL. If you're interested in checking it out, happy to try and show how it would generate the query in your readme!

https://github.com/machow/siuba

One thing I was wondering about the gnarly ast stuff you mention, what about operator overloading? E.g. Q("Select a from" + subquery + "where a < 1")

Working on siuba, a data analysis tool for python. It's a port of the R library dplyr, and can produces SQL queries!

I've programmed in python for much longer than R, and really want to be able to move at the same speed when using python for data analysis :o.

It's a weird problem though because the two languages have basically opposite approaches to DataFrames. pandas has a very fat DataFrame implementation, R an extremely minimal one. (Pros and cons to both approaches).

https://github.com/machow/siuba

Articles like these are interesting, but what surprises me is that they rarely set up a holistic use case, so most debates imagine how long it would take an expert user to use each tool. But time constraints (eg spent coding) separates expert from novice performance in many domains.

FWIW I have 2020 set aside to implement siuba, a python port of the popular R library dplyr (siuba runs on top of pandas, but also can generate SQL). A huge source of inspiration has been screencasts by Dave Robinson using R to analyze data he's never seen before lightning fast.

Has anyone seen similar screencasts with pandas? I suspect it's not possible (given some constraints on its interface), but would love to be wrong here, because I'd like to keep all my work in python :o.

Expert R screencasts: https://youtu.be/NY0-IFet5AM

Siuba: https://github.com/machow/siuba

porting the R library dplyr to python. :)

(I set aside 2020 to work on this, but quarantine definitely created much more time for it)

https://github.com/machow/siuba

I've been working on a library over the past year that does exactly that, including generating dbplyr style SQL queries!

Would love your feedback :)

https://github.com/machow/siuba