It‘s a bit sad that Pandas has become the default API for data manipulation in Python. I think it‘s less intuitive than any of the other API I‘ve worked with, for example R‘s tidyverse, Julia‘s DataTable and even Mathematica‘s approach makes more sense than Pandas in my opinion.
I'm hoping that PRQL [0] will one day become the universal API for tabular/relational data. We're targeting SQL as the backend in the first iteration given it's universality but there are early plans to support other backends.
You can already use PRQL with Pandas, the tidyverse, shell and pretty much any database. See my presentation [1]. PRQL reads very similarly to dplyr, and in my (biased) opinion, actually a bit better than dplyr because it can do away with some of the punctuation due to being its own language.
For questions see our Discord [2] and if you would like to see PRQL in more places, file an issue on Github [3].
Some examples below:
## Pandas
```python
#!pip install pyprql
import pandas as pd
import pyprql.pandas_accessor
df = pd.read_csv("data/customers.csv")
df.prql.query('filter country=="Germany"')
```
## tidyverse ```sh
mkdir -p ~/.local/R_libs
R -q -e 'install.packages("prqlr", repos = "https://eitsupi.r-universe.dev", lib="~/.local/R_libs/")'
```
```R
library(prqlr, lib.loc="~/.local/R_libs/")
library("tidyquery")
"
from mtcars
filter cyl > 6
sort [-mpg]
select [cyl, mpg]
" |> prql_to_sql() |> query()
```
### PRQL ```prql
from employees
filter start_date > @2021-01-01
derive [
gross_salary = salary + (tax ?? 0),
gross_cost = gross_salary + benefits_cost,
]
filter gross_cost > 0
group [title, country] (
aggregate [
average gross_salary,
sum_gross_cost = sum gross_cost,
]
)
filter sum_gross_cost > 100_000
derive id = f"{title}_{country}"
derive country_code = s"LEFT(country, 2)"
sort [sum_gross_cost, -country]
take 1..20
```
[0]: https://prql-lang.org/[1]: https://github.com/snth/normconf2022/blob/main/notebooks/nor...