pyronite

The text detection is lacking in comparison to Google's Vision API. Here is a real-life comparison between Tesseract and Google's Vision API, based on a PDF a user of our website uploaded.

Original text [http://i.imgur.com/CZGhKhn.png]:

> I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well

Google detects [http://i.imgur.com/pSJym1x.png]:

> “ I am also a top professional on Thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well ”

Tesseract detects [http://i.imgur.com/wwbLU6g.png]:

> \ am also a mp pmfesslonzl on Thummack wmcn Is a sue 1m peop‘e \ookmg (or professmna‘ semces We on glg salad P‘ezse see my rewews 1mm my cuems were as weH

bijection

Although Google's API is certainly better, Tesseract.js should work similarly if you increase the font size. Screenshots taken on 'retina' devices are around the smallest text it can handle well.

Edit:

A screenshot of the same text at a higher resolution: https://imgur.com/a/W7IGu

Tesseract.js output: https://imgur.com/a/niIfM

"I am also a top professional on thumbtack which is a site for people looking for professional services like on gig salad. Please see my reviews from my clients there as well"

jaytaylor

Your comment (zoomed in Chrome on Win 10): http://i.imgur.com/uuFhw90.png

Tesseract.js analysis:

    Although Googie's API is certaihiy better,
    Tesseract.js should work simiiarly if you
    increase the font size.
    Screenshots taken
    on 'retiha’ devices are around the smailest
    text it can handie well.
    
    Edit:
    
    A screenshot of the same text at a higher
    resolution:
    httgs:[[imgurxomZaN/UGu
    
    Tesseract.js
    output: httgs://imguricom[a[hiIfM

This is a neat toy, but not impressive compared to the results from tesseract-ocr/tesseract [0]:

    $  curl -s http://i.imgur.com/uuFhw90.png \
        | tesseract stdin stdout

    Although Google's API is certainly better,
    Tesseract.js should work similarly if you
    increase the font size.
    Screenshots taken on 'retina' devices are
    around the smallest text it can handle well.
    
    Edit:
    A screenshot of the same text at a higher
    resolution: https:[ZimguncomlalWHGu
    Tesseract.js output:
    https:[[imgur.com[a[nilfM

Notice how Tesseract.js results suffer from being unable to differentiate between n's and h's, i's and l's.

[0] https://github.com/tesseract-ocr/tesseract