What does HackerNews think of xsv?

The bash book to rule them all | Nov 2023

I've written books for GNU grep, sed, awk as well as one for coreutils. Free to read online. See https://github.com/learnbyexample/scripting_course#ebooks for links.

Have you looked at https://github.com/BurntSushi/xsv for csv processing?

Joining CSV Data Without SQL: An IP Geolocation Use Case | Oct 2023

I have done some similar, simpler data wrangling with xsv (https://github.com/BurntSushi/xsv) and jq. It could process my 800M rows in a couple of minutes (plus the time to read it out from the database =)

The Awk Programming Language, Second Edition | Jun 2023

If quoted string is the only thing you need to handle extra (i.e. no escaped quotes, newlines, etc) and if you have GNU awk:

    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $1}'
    "foo"
    $ echo '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
    "bar,baz"

For a more robust solution, see https://stackoverflow.com/q/45420535 or use other tools like https://github.com/BurntSushi/xsv

Working with CSV files on shell/terminal | Jun 2023

Personally, I use xsv and it’s been tremendously helpful, especially when working with larger files. https://github.com/BurntSushi/xsv

Fascination of Awk | Mar 2023

I suggest trying xsv as a first step: https://github.com/BurntSushi/xsv

Analyzing multi-gigabyte JSON files locally | Mar 2023

If it could be tabular in nature, maybe convert to sqlite3 so you can make use of indexing, or CSV to make use of high-performance tools like xsv or zsv (the latter of which I'm an author).

https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql...

https://github.com/BurntSushi/xsv

Analyzing multi-gigabyte JSON files locally | Mar 2023

Expand Context ↕

At this size, I doubt it. While SQLite can read JSON if compiled with support for it, it stores it as TEXT. The only native indexing possible for that that I'm aware of is full-text search, and I suspect the cardinality of JSON characters would make that inefficient. Not to mention that the author stated they didn't have enough memory to store the entire file, so with a DB you'd be reading from disk.

MySQL or Postgres with their native JSON datatypes _might_ be faster, but you still have to load it in, and storing/indexing it in either of those is [0] its own [1] special nightmare full of footguns.

Having done similar text manipulation and searches with giant CSV files, parallel and xsv [2] is the way to go.

[0]: https://dev.mysql.com/doc/refman/8.0/en/json.html

[1]: https://www.postgresql.org/docs/current/datatype-json.html

[2]: https://github.com/BurntSushi/xsv

Ask HN: What modern tools should be standard part of a modern unixy distro?Why? | Mar 2023

Expand Context ↕

csvkit and miller are both extraordinarily slow

try xsv (https://github.com/BurntSushi/xsv) or zsv (https://github.com/liquidaty/zsv) instead (the latter of which I'm an author of)

yq: command-line YAML, JSON, XML, CSV and properties processor | Feb 2023

Looks very cool! I don't care so much about YAML, but I do a ton of processing of JSON and csv/tsv. Any word on the performance relative to jq and xsv [1]?

[1] https://github.com/BurntSushi/xsv

Awk equivalents to SQL query data manipulation | Jan 2023

Expand Context ↕

While mentioning alternatives, xsv[1] can do joins on csv files instead of doing naive comma splitting. Also unlike gnu join, xsv does not require input to be sorted (afaik)

[1] https://github.com/BurntSushi/xsv

Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet | Sep 2022

Expand Context ↕

xsv is invaluable for processing big csv files: https://github.com/BurntSushi/xsv

One-liner for running queries against CSV files with SQLite | Jun 2022

In the past I've used https://github.com/BurntSushi/xsv to query some large CSVs