I'll play with the Wolf filter, thanks.
Math typesetting is too messy for current OCR tools. It would be nice to reverse-engineer the LaTeX source for a math paper, but not likely soon. OCR for the language would help in mind-mapping a web connecting my saved papers, but I wouldn't use it for reading.
I want everything to look like a 600dpi scan mixed down, as I would make, rather than what the libraries thought would be acceptable. For the pure joy of reading.
The easiest approach that might work would be language agnostic, understanding only what clean scans of characters look like. Can we back-solve a clean scan from a lower resolution mess, matching up similar characters in the text without identifying the characters?
Somehow I imagine this is a giant singular value problem. I'm ok if it takes a day to run per paper, I have spare machines.
https://github.com/lukas-blecher/LaTeX-OCR
https://github.com/harvardnlp/im2markup
Also some LaTeX editors:
LyX https://en.wikipedia.org/wiki/LyX
TeXstudio https://en.wikipedia.org/wiki/TeXstudio
GNU TeXmacs https://en.wikipedia.org/wiki/GNU_TeXmacs