What does HackerNews think of gpt-neox?
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Btw I asked chat.langchain.dev and it said:
> LangChain uses pre-trained models from Hugging Face, such as BERT, GPT-2, and XLNet. For more information, please see the Getting Started Documentation[0].
That links to a 404, but I did find the correct link[1]. Oddly that doc only mentions an OpenAI API wrapper. I couldn’t find anything about the other models from huggingface.
Does LangChain have any tooling around fine tuning pre-trained LLMs like GPTNeoX[2]?
[0]https://langchain.readthedocs.io/en/latest/getting_started.h...
[1]https://langchain.readthedocs.io/en/latest/getting_started/g...
Related system/hardware requirements:
- https://nlpcloud.com/deploying-gpt-neox-20-production-focus-...
Benchmarks of GPT-NeoX-20B vs GPT-3 DaVinci:
- https://the-decoder.com/gpt-3-alternative-eleutherai-release...
GitHub Download:
https://www.eleuther.ai/projects/gpt-neox/ https://github.com/EleutherAI/gpt-neox https://arxiv.org/abs/2204.06745
They have a 20B parameter model. I think the primary dataset for these open models is The Pile: https://arxiv.org/abs/2101.00027 (web scrape, pubmed, arxiv, github, wikipedia, etc. There is a nice diagram on page 2 that summarizes the contents.)
https://github.com/EleutherAI/gpt-neox
To manage your expectations it is nowhere as good as ChatGPT.
If you are interested in programming only:
https://github.com/salesforce/CodeGen
is decent.
Suggest looking into GPT-NeoX and GPT-J instead.
I suspect they have run into training issues since they are moving to a new repo[2]
[1] https://twitter.com/arankomatsuzaki/status/13737326468119674...