What does HackerNews think of grobid?
A machine learning software for extracting information from scholarly documents
Language:
Java
#33
in
Deep learning
#34
in
Machine learning
Can you elaborate on how you parse the PDF? Are you simply converting it to text using a python library or something more robust like GROBID[1]?
For academic papers: GROBID [0] is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications.
As far as I have been able to tell, the public state of the art in academic paper metadata parsing is Grobid: https://github.com/kermitt2/grobid
Not quite as simple a commandline interface as you suggest, but not too hard to set up, and pretty impressive. Now if only Google Scholar would open-source whatever they use...