What does HackerNews think of nanoGPT?
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Is there a toy conversational LLM on Github or elsewhere?
Something like: https://github.com/karpathy/nanoGPT
In particular, you'll probably want to skip to nanoGPT (https://github.com/karpathy/nanoGPT) and then maybe if you are interested in a bit more of the theory, Zero to Hero (https://karpathy.ai/zero-to-hero.html), and his comments in one of the threads linked: https://news.ycombinator.com/item?id=34414716
Fine tuning may also be a faster and better place to start, this is a good guide for fine tuning some publicly released LLMs: https://erichartford.com/uncensored-models
Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]
[1]: https://www.foundersandcoders.com/ml
[2]: https://github.com/karpathy/micrograd
[3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...
I think the barrier to entry here is low. OpenAI is ahead now, but I doubt that lives forever.
Worth checking out: https://github.com/karpathy/nanoGPT The associated videos will go into this more, iirc, at the end he said the real GPT-2 would take 5-20 seconds to generate a small amount of text using CPU only.