i would seriously consider sqlite-utils here.

https://sqlite-utils.datasette.io/en/stable/cli.html

Was going to post the same thing, I suspect converting the dataset to a SQLite db would be infinitely more fast and productive than pecking away at it with pandas and such.

At this size, I doubt it. While SQLite can read JSON if compiled with support for it, it stores it as TEXT. The only native indexing possible for that that I'm aware of is full-text search, and I suspect the cardinality of JSON characters would make that inefficient. Not to mention that the author stated they didn't have enough memory to store the entire file, so with a DB you'd be reading from disk.

MySQL or Postgres with their native JSON datatypes _might_ be faster, but you still have to load it in, and storing/indexing it in either of those is [0] its own [1] special nightmare full of footguns.

Having done similar text manipulation and searches with giant CSV files, parallel and xsv [2] is the way to go.

[0]: https://dev.mysql.com/doc/refman/8.0/en/json.html

[1]: https://www.postgresql.org/docs/current/datatype-json.html

[2]: https://github.com/BurntSushi/xsv