What does HackerNews think of xsv?

A fast CSV command line toolkit written in Rust.

Language: Rust

#32 in Rust
I've written books for GNU grep, sed, awk as well as one for coreutils. Free to read online. See https://github.com/learnbyexample/scripting_course#ebooks for links.

Have you looked at https://github.com/BurntSushi/xsv for csv processing?

I have done some similar, simpler data wrangling with xsv (https://github.com/BurntSushi/xsv) and jq. It could process my 800M rows in a couple of minutes (plus the time to read it out from the database =)
If quoted string is the only thing you need to handle extra (i.e. no escaped quotes, newlines, etc) and if you have GNU awk:

    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $1}'
    "foo"
    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
    "bar,baz"
For a more robust solution, see https://stackoverflow.com/q/45420535 or use other tools like https://github.com/BurntSushi/xsv
Personally, I use xsv and it’s been tremendously helpful, especially when working with larger files. https://github.com/BurntSushi/xsv
I suggest trying xsv as a first step: https://github.com/BurntSushi/xsv
If it could be tabular in nature, maybe convert to sqlite3 so you can make use of indexing, or CSV to make use of high-performance tools like xsv or zsv (the latter of which I'm an author).

https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql...

https://github.com/BurntSushi/xsv

At this size, I doubt it. While SQLite can read JSON if compiled with support for it, it stores it as TEXT. The only native indexing possible for that that I'm aware of is full-text search, and I suspect the cardinality of JSON characters would make that inefficient. Not to mention that the author stated they didn't have enough memory to store the entire file, so with a DB you'd be reading from disk.

MySQL or Postgres with their native JSON datatypes _might_ be faster, but you still have to load it in, and storing/indexing it in either of those is [0] its own [1] special nightmare full of footguns.

Having done similar text manipulation and searches with giant CSV files, parallel and xsv [2] is the way to go.

[0]: https://dev.mysql.com/doc/refman/8.0/en/json.html

[1]: https://www.postgresql.org/docs/current/datatype-json.html

[2]: https://github.com/BurntSushi/xsv

csvkit and miller are both extraordinarily slow

try xsv (https://github.com/BurntSushi/xsv) or zsv (https://github.com/liquidaty/zsv) instead (the latter of which I'm an author of)

Looks very cool! I don't care so much about YAML, but I do a ton of processing of JSON and csv/tsv. Any word on the performance relative to jq and xsv [1]?

[1] https://github.com/BurntSushi/xsv

While mentioning alternatives, xsv[1] can do joins on csv files instead of doing naive comma splitting. Also unlike gnu join, xsv does not require input to be sorted (afaik)

[1] https://github.com/BurntSushi/xsv