For me, Evernote’s killer feature is automatic OCR. (Subs that I can just upload a doc through my phone’s camera).

It looks like this doesn’t have that, unfortunately.

Noted! I'll see if this feature can be added.

https://www.elastic.co/guide/en/elasticsearch/plugins/curren... :

> [Teh ElasticSearch Core Ingest Attachment Processor Plugin]: The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library Tika.

> The source field must be a base64 encoded binary. If you do not want to incur the overhead of converting back and forth between base64, you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation. The processor will skip the base64 decoding then

Apache Tika supported formats > Images > TesseractOCR: https://tika.apache.org/2.4.0/formats.html https://tika.apache.org/2.4.0/formats.html#Image_formats :

> When extracting from images, it is also possible to chain in Tesseract, via the TesseractOCRParser, to have OCR performed on the contents of the image.

/? Meilisearch "ocr" GitHub;

Looks like e.g. paperbase (agpl) also implements ocr with tesseractocr: https://docs.paperbase.app/

tesseract-ocr/tesseract https://github.com/tesseract-ocr/tesseract

/? https://github.com/awesome-selfhosted/awesome-selfhosted#sea... ctrl-f "ocr"