The author has also written related tools. One to convert XML to JSON and back (https://github.com/ldn-softdev/jtm) and another to convert JSON to SQLite tables (https://github.com/ldn-softdev/jsl). Combining these with the hxnormalize tool ( https://www.w3.org/Tools/HTML-XML-utils/man1/hxnormalize.htm...), one can do very sophisticated manipulation on HTML web pages.

HTML -> XML (via hxnormalize) -> JSON (via jtm) -> process using jtc (or even jq)

This suggest a very scalable, easy approach to extract data from somewhat regular HTML...

I generally use xidel [1] for that type of task. Feed it xpath, css selectors or its own pattern matching thing.

[1] https://github.com/benibela/xidel