> Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. From 2006 until November 2018 it was developed by Google.
I have used Tesseract for OCRing scanned books and it was great. I had no idea it was so old, nor that it had been through so many maintainers. To all of them past and present, thank you.
I have permission to publish an ebook edition of an out of print history of Portland, Oregon. I haven’t found the time to work on the project.
One point of friction has been selecting an OCR workflow. Any chance you would share what you’ve been successful with?
I use https://github.com/4lex4/scantailor-advanced to deskew the images and generate the PDF.
It isn't perfect but my purposes are more around research than publication, so, YMMV!