Are there any open source tools that would slurp in content like this and develop its own sense of relationships in the data, that I could then explore by hand?

Bedarra's Text Analyzer[1] kinda floored me and I'd like to use something similar for various tasks, if there was something good and free.


> Are there any open source tools that would slurp in content like this ...

Yes, tesseract[1] can do a pretty good job. Here[2] is a blog post which describes using it to perform OCR on PDF's.

As for searching the PDF contents, Solr[3] might be what you are looking for instead.

1 -

2 -
