What does HackerNews think of Megatron-LM?

Ongoing research training transformer models at scale

Language: Python

GPU cluster scaling has come a long way. Just check out the scaling plot here: https://github.com/NVIDIA/Megatron-LM
I'm very bullish on the entire sector. One incumbent vs startup story to watch in the AI accelerator space is NVidia vs Lightmatter, If they can realize the cost savings of photonic computing it looks like a 5-7x improvement. NVidia's Megatron trillion parameter language model requires astounding compute capabilities: 3000+ A100 GPUs, And while I don't see GPU dominance retreating through 2024 at least, as we get into universal translation and global parallel corpora by the end of the decade, the limits become apparent. And it probably won't be talent, design or money that becomes the bottleneck. But the relative difficulty of working with photonic crystals compared to the low hanging fruit of silicon that has provided such a bounteous harvest for the last 70 years.

https://github.com/NVIDIA/Megatron-LM

Hi, author here! Some details on the model:

* Trained 17GB of code from the top 10,000 most popular Debian packages. The source files were deduplicated using a process similar to the OpenWebText preprocessing (basically a locality-sensitive hash to detect near-duplicates).

* I used the [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) code for training. Training took about 1 month on 4x RTX8000 GPUs.

* You can download the trained model here: https://moyix.net/~moyix/csrc_final.zip and the dataset/BPE vocab here: https://moyix.net/~moyix/csrc_dataset_large.json.gz https://moyix.net/~moyix/csrc_vocab_large.zip

Happy to answer any questions!