AI enhanced data analysis is nice, but this is not how I would have implemented it. Passing the data frame as part of the query limits the quantity of data you can handle, and you’re at risk of the quiet adding undesired operations on your data.

I expected that this would be a Copilot like approach that was tuned for pandas specifically, and you pass it your data frame schema/column names or something, and it passing back some code to evaluate locally.

I’m definitely not excited about passing the data frame itself and letting the llm handle it. Maybe with time I’ll get there, but I think this is a step too far for anyone that needs reproducible/verifiable results.

There is a package for querying dataframes with SQL: https://github.com/yhat/pandasql/. I assumed this was going to be LLM -> SQL -> Pandas.