Standard awk warning: it's tempting to try to use awk on csv files. You'll even get good results on simple csv files that leave you encouraged to go further. Don't.

Csv is not standardized and the quoting rules are weird (and not standardized).

If you can live with a certain amount of loss of fidelity in your output, you can get away with using awk. If you want a coarse prototype, use awk.

If you need robust, production-grade handling of csv files, use (or write) something else.

Csv files are a little bit like like dates: superficially simple, with lots of corner cases. Largely for the same reason: lack of standardization.

That said, awk is awesome. It's small enough to fit in your brain, unlike Perl (maybe yours is larger than mine?). It's also pretty universally available, with few massive incompatibilities between versions, unlike shell (provided you avoid the gawk-specific features). I love it.

dkarl

I would love to have a command-line tool that reads CSV and has a ton of features to cover different quirks and errors, which can output cleaner formats that I can pipe into other command-line tools.

csvkit [0] might be that tool; I discovered it after my last painful encounter with CSV files and haven't used it in anger yet. Among other things, it translates CSV to JSON, so you can compose it with jq.

[0] https://csvkit.readthedocs.io/en/latest/index.html

MrDOS

At my last employer, I built a filter program, creatively called CSVTools[0], to do something like this. One piece of the project parses CSVs and replaces the commas/newlines (in an escaping- and multiline-aware manner, of course) with ASCII record/unit separator characters[1] (0x1E and 0x1F); the other piece converts that format back into well-formed CSV files. I usually used this with GNU awk, and reconfigured RS[2] and FS[3] appropriately. Or you can just set the input separators (IRS/IFS) and produce plaintext output from AWK.

[0]: https://bitbucket.org/rbr/csvtools

[1]: https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text

[2]: https://www.gnu.org/software/gawk/manual/html_node/awk-split...

[3]: https://www.gnu.org/software/gawk/manual/html_node/Field-Sep...

dbro

Good idea! Looks similar to something I wrote called csvquote https://github.com/dbro/csvquote , which enables awk and other command line text tools to work with CSV data that contains embedded commas and newlines.