ddumas

This looks nice. What I'd really like to see, along these lines, is a python library for automated document metadata extraction with confidence assessment, like this:

./autometa.py --author --verbose academic-paper.pdf

Author: "Edward Witten"\nConfidence: High (matches template "amslatex")

kalkin

As far as I have been able to tell, the public state of the art in academic paper metadata parsing is Grobid: https://github.com/kermitt2/grobid

Not quite as simple a commandline interface as you suggest, but not too hard to set up, and pretty impressive. Now if only Google Scholar would open-source whatever they use...