What does HackerNews think of ollama?

Get up and running with Llama 2 and other large language models locally

Language: Go

#42 in Go
#29 in Go
From, https://github.com/jmorganca/ollama

> Get up and running with large language models locally.

> To run and chat with Llama 2:

    ollama run llama2
Many people are. It's quite popular to build yourself as it's quite easy. There are also more extensive implementations like https://github.com/KillianLucas/open-interpreter which is wildly popular (>27k stars).

You can also use local LLMs via cli which is also quite popular https://github.com/jmorganca/ollama

Take a look at Databricks' getting a hand from internal team to create the dolly 15k dataset. ( https://www.databricks.com/blog/2023/04/12/dolly-first-open-... )

For training AGI (artificial general intelligence) maybe only a select few mega companies with massive datasets will be able to come up with training data.

There are so many other use cases that OSS projects can enable otherwise. Individuals or smaller companies have unique data that can be used to augment existing open source models. Many use cases are area specific, and without the need for general intelligence.

Palantir just did a talk at the AIPCon ( https://youtu.be/o2b0DwNg6Ko ), where they recommended the use of many LLMs, open and closed. ( the example had Llama 2 70B, GPT4, Palm coding, claude, + fine-tuned models) feeding into their synthesizer.

While I want open source to win, especially as an open source maintainer on Ollama (if you haven't seen it yet, it's one of the easiest ways to run LLMs locally - https://github.com/jmorganca/ollama ), I think the work so far in this space has been a positive sum one - open or closed.

This is neat. Model weights are split into their layers and distributed across several machines who then report themselves in a big hash table when they are ready to perform inference or fine tuning "as a team" over their subset of the layers.

It's early but I've been working on hosting model weights in a Docker registry for https://github.com/jmorganca/ollama. Mainly for the content addressability (Ollama will verify the correct weights are downloaded every time) and ultimately weights can be fetched by their content instead of by their name or url (which may change!). Perhaps a good next step might be to split the models by layers and store each layer independently for use cases like this (or even just for downloading + running larger models over several "local" machines).

I'd be interested to see how models behave at different parameter sizes or quantization levels locally with the Ollama integration. For anyone trying promptfoo's local model Ollama provider, Ollama can be found at https://github.com/jmorganca/ollama

From some early poking around with a basic coding question using Code Llama locally (`ollama:codellama:7b` `ollama:codellama:13b` etc in promptfoo) it seems like quantization has little effect on the output, but changing the parameter count has pretty dramatic effects. This is quite interesting since the 8-bit quantized 7b model is about the same size as a 4-bit 13b model. Perhaps this is just one test though – will be trying this with more tests!

try `ollama pull llama2-uncensored`

https://github.com/jmorganca/ollama

it's not completely "uncensored", but you avoid at least some of the silliness of the standard model.

Love how simple of an interface this has. Local LLM tooling can be super daunting, but reducing it to a simple ingest() and then prompt() is really neat.

By chance, have you checked out Ollama (https://github.com/jmorganca/ollama) as a way to run the models like Llama 2 under the hood?

One of the goals of the project is to make it easy to download and run GPU-accelerated models, ideally with everything pre-compiled so it's easy to get up and running. It's API that can be used by tools like this – would love to know if it would be helpful (or not!)

There's a LangChain model integration for it and a PrivateGPT example as well that might be a good pointer on using the LangChain integration: https://github.com/jmorganca/ollama/tree/main/examples/priva.... There's also a LangChain PR open to add support for generating embeddings, although there's a bit more work to do to support the major embedding models.

Best of luck with the project!

If you want to try running Llama 2 locally, you can use https://github.com/jmorganca/ollama

To run Llama 2 with it:

  ollama run llama2