What does HackerNews think of Megatron-LM?

Why did Google Brain exist? | Apr 2023

GPU cluster scaling has come a long way. Just check out the scaling plot here: https://github.com/NVIDIA/Megatron-LM

Semiconductor Startups – Are they back? | Apr 2021

I'm very bullish on the entire sector. One incumbent vs startup story to watch in the AI accelerator space is NVidia vs Lightmatter, If they can realize the cost savings of photonic computing it looks like a 5-7x improvement. NVidia's Megatron trillion parameter language model requires astounding compute capabilities: 3000+ A100 GPUs, And while I don't see GPU dominance retreating through 2024 at least, as we get into universal translation and global parallel corpora by the end of the decade, the limits become apparent. And it probably won't be talent, design or money that becomes the bottleneck. But the relative difficulty of working with photonic crystals compared to the low hanging fruit of silicon that has provided such a bounteous harvest for the last 70 years.

https://github.com/NVIDIA/Megatron-LM

Try to guess if code is real or GPT2-generated | Feb 2021

Hi, author here! Some details on the model:

* Trained 17GB of code from the top 10,000 most popular Debian packages. The source files were deduplicated using a process similar to the OpenWebText preprocessing (basically a locality-sensitive hash to detect near-duplicates).

* I used the [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) code for training. Training took about 1 month on 4x RTX8000 GPUs.

* You can download the trained model here: https://moyix.net/~moyix/csrc_final.zip and the dataset/BPE vocab here: https://moyix.net/~moyix/csrc_dataset_large.json.gz https://moyix.net/~moyix/csrc_vocab_large.zip

Happy to answer any questions!

Megatron-LM: Nvidia's 8.3B Parameter GPT-2 Transformer | Aug 2019

Expand Context ↕

https://github.com/NVIDIA/Megatron-LM