What does HackerNews think of doctr?

Show HN: BetterOCR combines and corrects multiple OCR engines with an LLM | Oct 2023

Yup! But I'm still exploring options. (any recommendations would be welcomed!) Here are some candidates I'm considering:

- https://github.com/mindee/doctr

- https://github.com/open-mmlab/mmocr

- https://github.com/PaddlePaddle/PaddleOCR (honestly I don't know Mandarin so I'm a bit stuck)

- https://github.com/clovaai/donut -- While it's primarily an "OCR-free document understanding transformer," I think it's worth experimenting with. Think I can sort this out by letting the LLM reason through it multiple times (although this will impact performance)

- yesterday got a suggestion to consider https://github.com/kakaobrain/pororo -- don't think development is still active but the results are pretty great on Korean text

OCR at Edge on Cloudflare Constellation | Jul 2023

Expand Context ↕

EasyOCR is a popular project if you are in an environment where you can use run Python and PyTorch (https://github.com/JaidedAI/EasyOCR). Other open source projects of note are PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) and docTR (https://github.com/mindee/doctr).

DeepDoctection: Document extraction and analysis using deep learning models | Apr 2023

Expand Context ↕

Last I checked I saw a grocery bill example using https://github.com/mindee/doctr and was fairly accurate. Bear in mind that was last year, hopefully it got even better or there are other libraries

Frog: OCR Tool for Linux | Nov 2022

There's also DocTR which can do text detection and extraction out of the box.

It's command line driven but can display the detected text as an overlay of the document.

https://github.com/mindee/doctr

OCRmyPDF: Add an OCR text layer to scanned PDF file | Jul 2022

Expand Context ↕

If you want to OCR a document image, modern versions of Tesseract can work well. If you last used it a few years ago, the recognition has improved since due to a new text recognition algorithm that uses modern (deep learning) techniques. Browser demo using a modern version: https://robertknight.github.io/tesseract-wasm/.

OCR processing typically consist of two major steps: detecting/locating words or lines of text on the page, and recognizing lines of text.

Tesseract's text recognition uses modern methods, but the text detection phase is still based on classical methods involving a lot of heuristics, and you may need to experiment with various configuration variables to get the best results. As a result it can fail to detect text if you present it with something other than a reasonably clean document image.

Doctr (https://github.com/mindee/doctr) is a new package that uses modern methods for both text detection and recognition. It is pretty new however and I expect will take more time and effort to mature.

OCRmyPDF: Add an OCR text layer to scanned PDF file | Jul 2022

DocTR: https://github.com/mindee/doctr

It also has TensorFlow.js version to run in-browser: https://blog.tensorflow.org/2022/06/ocr-in-browser-using-ten...