Maybe compare to some of the free-non-hosted options?

Like where are we with OCR? Last I checked it was CTC magic. Any progress?

Tesseract[0] is the classic example. There's a bunch of advice for improving your accuracy with it, like making your images larger (literally just scale it up x2 or x4).

It would be interesting to the benchmark from the article repeated with different scaling options (or other preprocessing, depending on platform).
