The text detection is lacking in comparison to Google's Vision API. Here is a real-life comparison between Tesseract and Google's Vision API, based on a PDF a user of our website uploaded.
Original text [http://i.imgur.com/CZGhKhn.png]:
> I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well
Google detects [http://i.imgur.com/pSJym1x.png]:
> “ I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well ”
Tesseract detects [http://i.imgur.com/wwbLU6g.png]:
> \ am also a mp pmfesslonzl on Thummack wmcn Is a sue 1m peop‘e \ookmg (or professmna‘ semces We on glg salad P‘ezse see my rewews 1mm my cuems were as weH
Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.
Edit:
A screenshot of the same text at a higher resolution: https://imgur.com/a/W7IGu
Tesseract.js output: https://imgur.com/a/niIfM
"I am also a top professional on thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well"
Tesseract.js analysis:
Although Googie's API is certaihiy better,
Tesseract.js should work simiiarly if you
increase the font size.
Screenshots taken
on 'retiha’ devices are around the smailest
text it can handie well.
Edit:
A screenshot of the same text at a higher
resolution:
httgs:[[imgurxomZaN/UGu
Tesseract.js
output: httgs://imguricom[a[hiIfM
This is a neat toy, but not impressive compared to the results from tesseract-ocr/tesseract [0]: $ curl -s http://i.imgur.com/uuFhw90.png \
| tesseract stdin stdout
Although Google's API is certainly better,
Tesseract.js should work similarly if you
increase the font size.
Screenshots taken on 'retina' devices are
around the smallest text it can handle well.
Edit:
A screenshot of the same text at a higher
resolution: https:[ZimguncomlalWHGu
Tesseract.js output:
https:[[imgur.com[a[nilfM
Notice how Tesseract.js results suffer from being unable to differentiate between n's and h's, i's and l's.