What does HackerNews think of PaddleOCR?

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language: Python

When I was evaluating options a few months ago I found https://github.com/PaddlePaddle/PaddleOCR to be a very strong contender for my use case (reading product labels), but you'll definitely want to put together some representative docs/images and test a bunch of solutions to see what works for you.
Thanks for sharing! For context this is a demo of PaddleOCR V2 [0] which was released yesterday. You can find their original repo here [1]. We built this demo using Gradio [2] and deployed it on HuggingFace's Spaces [3].

[0]: https://arxiv.org/abs/2109.03144 [1]: https://github.com/PaddlePaddle/PaddleOCR [2]: https://gradio.app/ [3]: https://huggingface.co/spaces

Demo is cool, but it tells us nothing about this particular OCR.

* Github: https://github.com/PaddlePaddle/PaddleOCR

* PyPi: https://pypi.org/project/paddleocr/

Two alternatives, which are designed for OCR from photos: https://github.com/PaddlePaddle/PaddleOCR/ https://github.com/JaidedAI/EasyOCR/ It's worth trying them if Tesseract isn't giving you good accuracy.
Two alternatives, which are designed for OCR from photos: https://github.com/PaddlePaddle/PaddleOCR/ https://github.com/JaidedAI/EasyOCR/ It's worth trying them if Tesseract isn't giving you good accuracy.
I recently came across CRAFT wich appears to have come out of the ICDAR2017 Robust reading challenge.

It performed better than expected. I only tested a few images so please don't take my word for it.

That led me to PaddleOCR. There is still plenty of room for improvement but I found it way more convenient to use for my purposes than messing with Tesseract.

https://github.com/clovaai/CRAFT-pytorch

https://github.com/PaddlePaddle/PaddleOCR

I would recommend https://github.com/PaddlePaddle/PaddleOCR over the default tesseract. It seems to do a better job these days and uses more modern approaches.