Are there any open source tools that would slurp in content like this and develop its own sense of relationships in the data, that I could then explore by hand?
Bedarra's Text Analyzer[1] kinda floored me and I'd like to use something similar for various tasks, if there was something good and free.
> Are there any open source tools that would slurp in content like this ...
Yes, tesseract[1] can do a pretty good job. Here[2] is a blog post which describes using it to perform OCR on PDF's.
As for searching the PDF contents, Solr[3] might be what you are looking for instead.
1 - https://github.com/tesseract-ocr/tesseract
2 - http://fransdejonge.com/2012/04/ocr-text-in-pdf-with-tessera...
3- http://stackoverflow.com/questions/6694327/indexing-pdf-with...