It seems lots of people's knowledge of awk is limited to printing fields, and they'll happily chain awk with a bunch of grep and sed when a single awk invocation would do the job without fuss. For instance, TFA uses
awk '{print $1","$2}' | sed '1i count,word'
when you can just add a BEGIN block: awk 'BEGIN { print "count,word" } { print $1","$2 }'
I beg to differ. I did a lot of csv wrangling on Unix. Csv is a beast. My tools of choice ultimately was miller, an absolutely underrated tool:
I use to be very comfortable using awk/sed/perl/sort/uniq/tr/tail/head from the CLI for the sort of data cleaning this article is talking about. However, over the past year I've found I use VisiData https://github.com/saulpw/visidata for interactive work.
If I need to clean up the data first, I'll use mlr or jq as input to Visidata. If my data is too dirty for mlr, then I'll use Unix toolbox tools mentioned as input to mlr, jq or VisiData.
VisiData provides some ability to script, but when possible I prefer to have the shell do the scripting with all the tools mentioned as input to Visidata.