What does HackerNews think of minGPT?

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Language: Python

Tried it once. Its promise is to take the dataset's seasonal trend into account, which makes sense for Facebook's original use case.

We ran it on such a dataset and found out that directly using https://github.com/karpathy/minGPT consistently gives a better result. So we ended up using the output of Prophet as an input feature to a neural network, but the result was not improved in any significant way.

Karpathy has a bunch of great resources on this front! His minGPT writeup is excellent https://github.com/karpathy/minGPT His more recent project nanoGPT which references this video is a much more capable, but still learning friendly, implementation.
A small, very clearly written and well commented implementation: https://github.com/karpathy/minGPT
He contributed some commits to his excellent https://github.com/karpathy/minGPT during that time.

BTW, Andrej, if you're reading this, it is not just excellent it is beyond excellent. I do a lot of tinkering with transformers and other models lately, and base them all on minGPT. My fork is now growing into a kind of monorepo for deep learning experimentation, though lately it started looking like a repo of Theseus, and the boat is not as simple anymore :)

I have been enjoying Natural Language Processing with Transformers [1]. It's largely focused on the Huggingface library, but Chapter 3 has a very nice walkthrough that builds up the encoder portion of an encoder-decoder Transformer from "scratch" (it still uses some primitives found in PyTorch like nn.Embedding). The decoder portion is covered in less depth and they instead refer folks to Karpathy's awesome minGPT [2], which implements a decoder-only (GPT-style) Transformer in ~300 lines of nicely-commented Python+PyTorch code.

For a higher-level conceptual view of how Transformers work, you can check out the now-classic "Illustrated Transformer" series [3] and this programmer-oriented explanation (with code in Rust) from someone at Anthropic [4].

[1] https://www.oreilly.com/library/view/natural-language-proces...

[2] https://github.com/karpathy/minGPT

[3] https://jalammar.github.io/illustrated-transformer/

[4] https://blog.nelhage.com/post/transformers-for-software-engi...