What does HackerNews think of siuba?
Python library for using dplyr like syntax with pandas and SQL
There's precedent in Ecto for a "magic" operator (`^` if I'm not mistaken), so it wouldn't be a stretch to implement it here as well.
(mtcars >> group_by(_.cyl) >> summarize(avg_hp = _.hp.mean()) )
(Edit: ok, it seems siuba doesn't have the "parameter fixing" thing.)
Related: Coconut - a functional superset of Python - https://github.com/evhub/coconut
range(10) |> map$(pow$(?, 2)) |> list
There were others but I can't remember which now.
Siuba has come a long way since I wrote this, and now can optimize for fast grouped operations!:
* https://github.com/machow/siuba
* https://siuba.readthedocs.io/en/latest/developer/pandas-grou...
It's crazy to me how people use SELECT * -> pandas, but also how people in SQL type a ton of code over and over.
https://github.com/machow/siuba
One thing I was wondering about the gnarly ast stuff you mention, what about operator overloading? E.g. Q("Select a from" + subquery + "where a < 1")
I've programmed in python for much longer than R, and really want to be able to move at the same speed when using python for data analysis :o.
It's a weird problem though because the two languages have basically opposite approaches to DataFrames. pandas has a very fat DataFrame implementation, R an extremely minimal one. (Pros and cons to both approaches).
FWIW I have 2020 set aside to implement siuba, a python port of the popular R library dplyr (siuba runs on top of pandas, but also can generate SQL). A huge source of inspiration has been screencasts by Dave Robinson using R to analyze data he's never seen before lightning fast.
Has anyone seen similar screencasts with pandas? I suspect it's not possible (given some constraints on its interface), but would love to be wrong here, because I'd like to keep all my work in python :o.
Expert R screencasts: https://youtu.be/NY0-IFet5AM
(I set aside 2020 to work on this, but quarantine definitely created much more time for it)
Would love your feedback :)