Using Pdfsandwich in college was like having a superpower. We would often be given PDFs with only image data. While my peers were still scrolling through and copying quotes by hand, I was there in seconds with Ctrl-F to find and copy/paste.

Once you have text in the PDF, you can use any sort of text analysis tools. You can use tools to convert it to plain text and grep through, or anything else you want.

That being said, it's not perfect, but still pretty awesome. Sometimes the spacing was off or it would confuse symbols like 1, I, or l. But these are minor and usually only on poorly scanned PDFs.

On an even more macro level I've had a great experience with ripgrep-all[0], which uses Tesseract internally.

I have e.g. a directory with all weekly lecture slides for one lecture, and can directly find where (both file and page) we learned something related to photosynthesis via `rga photoshynthesis`.

[0]: https://github.com/phiresky/ripgrep-all