So the tool is being used to extract text and then a regex is used to extract the relevant fields/values. Seems that pdftotext[1][2] with awk can do the job on your local machine without uploading your docs.

1. brew install pkg-config poppler (on mac)

2. sudo apt-get install poppler-utils (on Debian/Ubuntu)

Analyzing the text is the problem. Not extracting. Are there any good open source libs out there?

Sure. But the tool posted here doesn't do that. It merely extracts text, and the "analysis" is a couple of regexes that are tailor-made for that particular pdf. Awk can do that much and a lot more.

If you want to extract tables from a pdf, there's Tabula[1], but it isn't automated to run over the whole pdf - you've to do a manual rectangular selection around the table you want to extract.

1. https://github.com/tabulapdf/tabula