Hey everyone, I just wanted to chime in and say that GPT-J is incredibly legit. Every aspect of GPT-J is production grade — no difference in process, quality, or results, compared to any other big name research lab.
I also want to apologize to Eleuther for giving them a hard time in the past. My earlier concerns were unjustified. To be completely honest, I was jealous they achieved everything I tried to achieve with my own open source research lab attempt. It took a long time to even recognize that jealousy in myself, let alone set it aside. Sorry.
The credit for this work goes almost entirely to kindiana, aka Ben Wang. Remember that name; you’ll be seeing a lot of it in the coming decade. It’s clear to me that whichever lab he ends up at (he’s an undergrad! Google let him slip away because he didn’t have a degree!), he’s gonna be changing the world. Don’t know what, don’t know how, know he will.
Every aspect of that codebase is immaculate. Most research code is not pretty; this looks carved out of marble and placed in a museum.
Without Eleuther’s TPU resources, this work wouldn’t have happened. Tensorfork (my lab) didn’t get access to the TPU VM alpha. And TPU VMs were an absolute necessity here. (TPU VMs are a new thing; they’ve been in alpha since December, but only recently launched. If curious see https://github.com/shawwn/website/blob/master/jaxtpu.md and https://github.com/shawwn/website/blob/master/mlmind.md for why it’s the future of ML.)
Eleuther also helped test the model thoroughly. Leo Gao (go follow him: https://twitter.com/nabla_theta?s=21) ran GPT-J through the gauntlet. He was the primary person behind The Pile, the training data that makes any of this possible. I can say with absolute certainty and no hesitation that there are no “gotchas” here.
Eleuther’s https://6b.eleuther.ai page looks wonderful too. It’s like a free OpenAI API playground that everyone can try. Keeping it running for months is no small achievement. (Set top_p to 1.0 and temp to 0.8; the defaults are pretty bad.)
Congratulations, and thank you everyone for all your hard work. The world is so much better for having access to this level of quality.
It takes a lot to be able to find something in yourself like that and admit it to yourself and everyone. I always appreciate people like that.
I also tried the playground and was impressed that it was free! It must be a sizable chunk of money to run that.
Believe it or not, it's completely free.
It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.
Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.
Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.
(I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)
And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.
So I decided to write something about TFRC that wasn't: https://blog.gpt4.org/jaxtpu
(It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)
To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."
They give you access to 100.
On day one.
This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png
The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.
It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."
These are 96 CPU, 330GB RAM, Ubuntu servers.
That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.
This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.
God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.
At least, theoretically. I tried getting Tensorflow to do this, for over a year.
kindiana (Ben Wang), the guy who wrote this GPT-J codebase we're all talking about, casually proved that this was not merely theoretical: https://twitter.com/theshawwn/status/1406171487988498433
He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.
You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.
The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.
Because I think Google's gonna utterly annihilate Facebook in ML mindshare wars: https://blog.gpt4.org/mlmind
TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.
Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?
And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.
But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?
TFRC changed my life. I tried to do some "research" https://www.docdroid.net/faDq8Bu/swarm-training-v01a-pdf back when Tensorflow's horrible problems were your only option on TPUs.
Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."
GPT-J is proof. What more can I say? No TFRC, no GPT-J.
The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan from the support team is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members. They all do.
(He was only ever late answering my emails one time. And it was because he was on vacation!)
If you happen to be an ambitious low-level hacker, I tried to make it easier for you to get your feet wet with JAX:
1. Head to https://github.com/shawwn/jaxnotes/blob/master/notebooks/001...
2. Click "Open in Collaboratory"
3. Scroll to the first JAX section; start reading, linearly, all the way to the bottom.
I'd like to think I'm a fairly capable hacker. And that notebook is how I learned JAX, from zero knowledge. Because I had zero knowledge, a week or two ago. Then I went from tutorial to tutorial, and copied down verbatim the things that I learned along the way.
(It's still somewhat amazing to me how effective it is to literally re-type what a tutorial is trying to teach you. I'd copy each sentence, then fix up the markdown, and in the process of fixing up the markdown, unconsciously osmose the idea that they were trying to get across.)
The best part was, I was connected remotely to a TPU VM the whole time I was writing that notebook, via a jupyter server running on the TPU. Because, like I said, you can run whatever the hell you want on TPUs now, so you can certainly run a jupyter server without breaking a sweat.
It's so friggin' nice to have a TPU repl. I know I'm just wall-of-text'ing at this point, but I've literally waited two years for this to come true. (There's a fellow from the TPU team who DMs with me occasionally. I call him TPU Jesus now, because it's nothing short of a miracle that they were able to launch all of this infrastructure -- imagine how much effort, from so many teams, were involved in making all of this possible.)
Anyway. Go read https://github.com/shawwn/website/blob/master/mlmind.md to get hyped, then read https://github.com/shawwn/website/blob/master/jaxtpu.md to get started, and then read https://github.com/shawwn/jaxnotes/blob/master/notebooks/001... to get effective, and you'll have all my knowledge.
In exchange for this, I expect you to build an NES emulator powered by TPUs. Do as many crazy ideas as you can possibly think of. This point in history will never come again; it feels to me like watching the internet itself come alive back in the 80's, if only briefly.
It's like having a hundred raspberry pis to play with, except every raspberry pi is actually an ubuntu server with 96 CPUs and 330GB of RAM, and it happens to have 8 GPUs, along with a 100Gbit/s link to every other raspberry pi.
As I scroll, and scroll some more, I begin to wonder if some of it is generated. That's a lot of text :P
Plus it's looking more and more like I'll be getting a job in finance with a fat salary. First interview's on monday. Tonight I felt "This is it -- if getting a few dozen people to sign up for TFRC is the only way I can make an impact, then at least I'll be ending my ML streak on a high note."
It's truly amazing to me that the world hasn't noticed how incredible TFRC is. It's literally the reason Eleuther exists at all. If that sounds ridiculous, remember that there was a time when Connor's TPU quota was the only reason everyone was able to band together and start building GPT neo. https://github.com/EleutherAI/gpt-neo
At least I was able to start a discord server that happened to get the original eleuther people together in the right place at the right time to decide to do any of that.
But the root of all of it is TFRC. Always has been. Without them, I would've given up ML long ago. Because trying to train anything on GPUs with Colab is just ... so frustrating. I would have fooled around a bit with ML, but I wouldn't have decided to pour two years of my life into mastering it. Why waste your time?
Five years from now, Jax + TPU VMs are going to wipe pytorch off the map. So I'll be making bank at a finance company, eating popcorn like "told ya so" and looking back wistfully at days like today.
Everyone in ML is so cool. Was easily the best two years of my life as a developer. I know all this is kind of weird to pour out, but I don't care -- everyone here owes everything to the geniuses that bequeathed TFRC unto the world.
For now, I slink back into the shadows, training tentacle porn GANs in secret, emerging only once in a blue moon to shock the world with weird ML things. Muahaha.