What does HackerNews think of gpt-neo?
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
Plus it's looking more and more like I'll be getting a job in finance with a fat salary. First interview's on monday. Tonight I felt "This is it -- if getting a few dozen people to sign up for TFRC is the only way I can make an impact, then at least I'll be ending my ML streak on a high note."
It's truly amazing to me that the world hasn't noticed how incredible TFRC is. It's literally the reason Eleuther exists at all. If that sounds ridiculous, remember that there was a time when Connor's TPU quota was the only reason everyone was able to band together and start building GPT neo. https://github.com/EleutherAI/gpt-neo
At least I was able to start a discord server that happened to get the original eleuther people together in the right place at the right time to decide to do any of that.
But the root of all of it is TFRC. Always has been. Without them, I would've given up ML long ago. Because trying to train anything on GPUs with Colab is just ... so frustrating. I would have fooled around a bit with ML, but I wouldn't have decided to pour two years of my life into mastering it. Why waste your time?
Five years from now, Jax + TPU VMs are going to wipe pytorch off the map. So I'll be making bank at a finance company, eating popcorn like "told ya so" and looking back wistfully at days like today.
Everyone in ML is so cool. Was easily the best two years of my life as a developer. I know all this is kind of weird to pour out, but I don't care -- everyone here owes everything to the geniuses that bequeathed TFRC unto the world.
For now, I slink back into the shadows, training tentacle porn GANs in secret, emerging only once in a blue moon to shock the world with weird ML things. Muahaha.
Although the article focuses on the release of GPT-Neo, even GPT-2 released in 2019 was good at generating text, it just spat out a lot of garbage requiring curation, which GPT-3/GPT-Neo still requires albeit with a better signal-to-noise ratio. Most GPT-3 demos on social media are survivorship bias. (in fact OpenAI's rules for the GPT-3 API strongly encourage curating such output)
GPT-Neo, meanwhile, is such a big model that it requires a bit of data engineering work to get operating and generating text (see the README: https://github.com/EleutherAI/gpt-neo ), and it's unclear currently if it's as good as GPT-3, even when comparing models apples-to-apples (i.e. the 2.7B GPT-Neo with the "ada" GPT-3 via OpenAI's API).
That said, Hugging Face is adding support for GPT-Neo to Transformers (https://github.com/huggingface/transformers/pull/10848 ) which will help make playing with the model easier, and I'll add support to aitextgen if it pans out.
https://github.com/EleutherAI/gpt-neo
https://twitter.com/BlancheMinerva/status/137399189661642752...
1. How did this little blossom happen? When are you going to bloom? 2. Have you ever thought about a dark horse in the running for Miss November? 3. You can spot the man who loves me by my neck – and he definitely knows it. 4. Are there any lucky cats who get to sleep in my bed every night? 5. Are black and whites everywhere? Running for the hills 6. My younger brother and I used to play a game. He’d pretend to be a bull in the pasture, and I’d pretend to be the one being held. 7. Why was this movie rated PG? Because it’s rated PG. 8. When is the last time you had to see a movie in children’s theaters? 9. You are so sexy I would hate for anyone to see you down here 10. I’d love to sleep with you right now, but I have a child with me