Yes, PDFMiner in Python https://github.com/euske/pdfminer

Apache PDFBox in Java https://pdfbox.apache.org

Previous discussion https://news.ycombinator.com/item?id=11327493

For a list of others, see http://okfnlabs.org/blog/2016/04/19/pdf-tools-extract-text-a...

According to the PDFMiner site, pdf2txt.py cannot recognize text drawn as images that would require optical character recognition. I'm interested in software that combines OCR with some sort of math notation rendering engine.

espeed

For handwritten character recognition, see:

https://www.tensorflow.org/tutorials/mnist/beginners/ (also google "tensorflow ocr")

http://yann.lecun.com/exdb/mnist/

CROHME: Competition on Recognition of Online Handwritten Mathematical Expressions http://www.isical.ac.in/~crohme/

Closed-sourced API: http://mathpix.com https://photomath.net/en/

Best off-the-shelf OCR (originally developed by HP, now Google):

https://github.com/tesseract-ocr/tesseract

https://github.com/tesseract-ocr/tesseract/wiki

Two Clojure talks...

Machine Learning Live - Mike Anderson https://www.youtube.com/watch?v=QJ1qgCr09j8

Adventures in Understanding Documents - Scott Tuddenham https://www.youtube.com/watch?v=94NjRg8zoCA