I literally used an OCR tool to grab the text directly out of the first box. I think this is meant to be guarding against copy/pasting—not OCR.

So this is interesting. I guess I didn't realize that there are (common?) tools to OCR screenshots. And do that end, there probably isn't a whole lot I can do to stop it. But when you're looking at a huge tax return, or sworn testimony, or just a dump of 3000 emails, you're not gonna screenshot each one. You're going to want to automate the OCR, which most PDF readers (at least the commercial ones) will let you do. It is against that type of OCR that my app is resistant to. They look for image data within the PDF and OCR that. They bypass my text because to the pdf reader, it already is in a text format.

I'm 1000% sure there are gurus who could whip up a script to overcome this. But its kind of one of those things where you don't have to outrun the bear, you have to outrun your friend running next to you. It makes your sensitive documents just that much less likely to be scanned/found.

If it's on the dark web then they probably know how to use 'ocrmypdf' as well (which uses tesseract under the hood).

https://github.com/ocrmypdf/OCRmyPDF