This looks awesome, and really useful.

A few weeks ago I asked in Hacker News "I'm in the middle of a graduate degree and am reading lots of papers, how could I get ChatGPT to use my whole library as context when answering questions?"

And I was told, basically, "It's really easy! Just First you just extract all of the text from the PDFs into arxiv, parse to separate content from style, then store that in a a DuckDB database, with zstd compression, then just use some encoder model to process all of these texts into Qdrant database. Then use Vicuna or Guanaco 30b GPTQ, with langcgain, and....."

I was like, ok... guess I won't be asking ChatGPT where I can find which paper talked about which thing after all.

https://github.com/whitead/paper-qa

>This is a minimal package for doing question and answering from PDFs or text files (which can be raw HTML). It strives to give very good answers, with no hallucinations, by grounding responses with in-text citations.