These days, my mind immediately jumps to "what if you plugged all those books into GPT-2!?"

mek here from Internet Archive's openlibrary.org project. We've been in broad talks w/ folks like OpenAI about how the contents of texts may be used to power better discovery and to increase usefulness of books. Open Library is pretty far from GPT-2, but we do have fulltext search across ~3.5M books: http://openlibrary.org/search/inside

We're also an open source project [https://github.com/internetarchive/openlibrary] and happy to collaborate w/ folks on such projects. I'm personally very inspired by the https://techcrunch.com/2014/07/25/apple-booklamp/ Booklamp project; building a genome for every book and surfacing as much content as we can to help patrons discover citations, quotes, and other useful content which would inform their reading choices and otherwise be completely inaccessible behind a borrow.

If anyone is interested in helping us move the needle on such an effort, please do get in touch and we'd be glad to invite you to Open Library's slack channel.