Does anyone else find the Polars syntax kind of clunky and ambiguous?
For example, from the link, here's how Polars and Pandas handles manipulating data in a subset of a dataframe:
f = pl.DataFrame({'a': [1,2,3,4,5], 'b':[10,20,30,40,50]})
# Polars
f.with_column(
pl.when(pl.col("a") <= 3)
.then(pl.col("b") // 10)
.otherwise(pl.col("b"))
)
# Pandas
f.loc[f['a'] <= 3, "b"] = f['b'] // 10
Its not clear in the Polars approach that the column "b" is being modified. An additional minor nitpick here is the use of when/then/otherwise for their conditional logic. Aren't these just if/else-if/else conditions? It's seems more in line with mathematical/python convention to use if/else... am I missing something?The Pandas equivalent, on the other hand, is much more concise, and more explicit. It also seems more mathematical to me. Polars mutates the dataframe, whereas in Pandas a function is applied to a dataframe indexed like a matrix. Pandas also benefits from it's reliance on symbolic notation, it makes everything visually clearer, whereas in Polars, the use of pl.col("b") and other similar methods contribute to multiple nested brackets and redundant naming calls contributing to less interpretability.
I know there's a lot of thought thats been put into Polars, so I assume I'm missing some of the advantages of the Polars approach, and would appreciate anyone who can shed some light on it.
I do understand, and partially agree, with the idea that indexing in Pandas leads to a lot of bugs. But in the example above, Pandas isn't really using indexing, it's using a boolean map to "index" the values from the same dataframe, so should be fairly robust. Is there a reason why Polars is trying to avoid this kind of filtering in the row/column indices?
Modin (https://github.com/modin-project/modin) seems more promising at this point, particularly since a migration path for standing Pandas code is highly desirable.