What does HackerNews think of gpt-neox?

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Language: Python

#2 in R
What LLMs does LangChain support?

Btw I asked chat.langchain.dev and it said:

> LangChain uses pre-trained models from Hugging Face, such as BERT, GPT-2, and XLNet. For more information, please see the Getting Started Documentation[0].

That links to a 404, but I did find the correct link[1]. Oddly that doc only mentions an OpenAI API wrapper. I couldn’t find anything about the other models from huggingface.

Does LangChain have any tooling around fine tuning pre-trained LLMs like GPTNeoX[2]?

[0]https://langchain.readthedocs.io/en/latest/getting_started.h...

[1]https://langchain.readthedocs.io/en/latest/getting_started/g...

[2]https://github.com/EleutherAI/gpt-neox

Yes. EleutherAI is doing it, probably one of many:

https://www.eleuther.ai/projects/gpt-neox/ https://github.com/EleutherAI/gpt-neox https://arxiv.org/abs/2204.06745

They have a 20B parameter model. I think the primary dataset for these open models is The Pile: https://arxiv.org/abs/2101.00027 (web scrape, pubmed, arxiv, github, wikipedia, etc. There is a nice diagram on page 2 that summarizes the contents.)

Here is an example of one general purpose open source LLM, probably the best you can get:

https://github.com/EleutherAI/gpt-neox

To manage your expectations it is nowhere as good as ChatGPT.

If you are interested in programming only:

https://github.com/salesforce/CodeGen

is decent.

You could follow EleutherAI's official guide: https://github.com/EleutherAI/gpt-neox You could also use a hosted service that proposes GPT-NeoX like https://nlpcloud.com or https://goose.ai
Money. There's a lot of other reasons to hide behind like safety and such but really it is just money in the end. These models are expensive to train and why not try to profit off them if they are truly useful.

Suggest looking into GPT-NeoX and GPT-J instead.

https://6b.eleuther.ai/

https://github.com/EleutherAI/gpt-neox/

GPT-NeoX, which is a model from the same group but using GPUs instead of TPUs, uses techniques from DeepSpeed:

https://github.com/EleutherAI/gpt-neox/

Yeah. They say they are doing a 10B release soon[1].

I suspect they have run into training issues since they are moving to a new repo[2]

[1] https://twitter.com/arankomatsuzaki/status/13737326468119674...

[2] https://github.com/EleutherAI/gpt-neox/

GPT-NeoX is an example project that is using deepspeed and Zero-3 offloading. The wider project intend to train a GPT-3 sized model and release it freely to the world.

https://github.com/EleutherAI/gpt-neox