What does HackerNews think of aitextgen?

A robust Python tool for text-based AI training and generation using GPT-2.

Language: Python

To train small gpt-like models, there's also aitextgen: https://github.com/minimaxir/aitextgen
> I mean this as person that is actually studying this technology

I literally publish open-source packages on how to use this technology.

https://github.com/minimaxir/gpt-2-simple

https://github.com/minimaxir/aitextgen

Hey HN! I've been lurking for a while now and I've finally created something that I feel is worth sharing.

I've called this project "Tensorpedia." At its core, Tensorpedia takes in a title and utilizes it as a prompt for GPT-2 to synthesize the introductory part of a Wikipedia article. The machine learning stuff is written using a wonderful library called aitextgen [0], using Wikipedia's "Vital Articles" as a data set [1]. The server is written in Node, and it uses Redis as an article cache. If you want to read my article about it (for some reason), you can check it out here [2].

I created this project to get more experience with server technologies. While I wouldn't say it's a complicated application, I learned quite a lot from it.

Additionally, as I was inspired by all of those this-x-doesn't-exist projects from a while back, this project is mostly for fun. As such, I don't know how much practical use it has, but I've generated some pretty hilarious articles from it.

[0] https://github.com/minimaxir/aitextgen

[1] https://en.wikipedia.org/wiki/Wikipedia:Vital_articles/Level...

[2] https://jonahsussman.net/posts/2022-01-this-wiki-dne/

AI text content generation is indeed a legit industry that's still in its nascent stages. It's why I myself have spent a lot of time working with it, and working on tools for fully custom text generation models (https://github.com/minimaxir/aitextgen).

However, there are tradeoffs currently. In the case of GPT-3, it's cost and risk of brushing against the Content Guidelines.

There's also the surprisingly underdiscussed risk of copyright of generated content. OpenAI won't enforce their own copyright, but it's possible for GPT-3 to output existing content verbatim which is a massive legal liability. (it's half the reason I'm researching custom models fully trained with copyright-safe content)