For my fellow Windows shills, here's how you actually build it on windows:

Before steps:

1. (For Nvidia GPU users) Install cuda toolkit https://developer.nvidia.com/cuda-downloads

2. Download the model somewhere: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolv...

In Windows Terminal with Powershell:

    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
    mkdir build
    cd build
    cmake .. -DLLAMA_CUBLAS=ON
    cmake --build . --config Release
    cd bin/Release
    mkdir models
    mv Folder\Where\You\Downloaded\The\Model .\models
    .\main.exe -m .\models\llama-2-13b-chat.ggmlv3.q4_0.bin --color -p "Hello, how are you, llama?" 2> $null

`-DLLAMA_CUBLAS` uses cuda

`2> $null` is to direct the debug messages printed to stderr to a null file so they don't spam your terminal

Here's a powershell function you can put in your $PSPROFILE so that you can just run prompts with `llama "prompt goes here"`:

    function llama {
        .\main.exe -m .\models\llama-2-13b-chat.ggmlv3.q4_0.bin -p $args 2> $null
    }

adjust your paths as necessary. It has a tendency to talk to itself.

I have a question: Last week I downloaded llama-7b-chat from meta's github directly (https://github.com/facebookresearch/llama) using the URL they sent via e-mail. As a result, I now have the model as consolidated.00.pth.

Your commands assume the model is a .bin file (so I guess there must be a way to convert the pytorch model .pth to the .bin file). How can I do this and what is the difference between the two models?

The facebook repo provides commands for using the models, these commands don't work on my windows machine: "NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to ...."

The facebook repo does not describe which OS you are supposed to use, so I assumed it would work on Windows too. But then if this can work why would anyone need the ggerganov llama code? I am new to all of this and easily confused, so any help is appreciated