What does HackerNews think of nanoGPT?

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language: Python

It would be interesting to see: (1) why a lamini account is required? (2) how this compares to https://github.com/karpathy/nanoGPT
Thanks,

Is there a toy conversational LLM on Github or elsewhere?

Something like: https://github.com/karpathy/nanoGPT

I compiled this list - https://gist.github.com/TikkunCreation/5de1df7b24800cc05b482...

In particular, you'll probably want to skip to nanoGPT (https://github.com/karpathy/nanoGPT) and then maybe if you are interested in a bit more of the theory, Zero to Hero (https://karpathy.ai/zero-to-hero.html), and his comments in one of the threads linked: https://news.ycombinator.com/item?id=34414716

Fine tuning may also be a faster and better place to start, this is a good guide for fine tuning some publicly released LLMs: https://erichartford.com/uncensored-models

I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.

Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]

[1]: https://www.foundersandcoders.com/ml

[2]: https://github.com/karpathy/micrograd

[3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...

[4]: https://github.com/karpathy/makemore

[5]: https://github.com/karpathy/nanoGPT

Just playing casually with NanoGPT (https://github.com/karpathy/nanoGPT) with a desktop holding a 2080ti, it's really really really clear to me that the path to get to a pre-fine-tuned LLM is remarkably easy. RLHF is the piece above this which appears to also be surprisingly easy (if Sam Altman is to be believed). The juice is making these tools incredibly easy.

I think the barrier to entry here is low. OpenAI is ahead now, but I doubt that lives forever.

It's quite unlikely that you would even be able to run it at all. The model is quite large and probably won't fit in memory. If you could get it to run, it would be extremely slow

Worth checking out: https://github.com/karpathy/nanoGPT The associated videos will go into this more, iirc, at the end he said the real GPT-2 would take 5-20 seconds to generate a small amount of text using CPU only.