I added a Python library API to my LLM CLI tool recently which offers a very lightweight way to call models: https://llm.datasette.io/en/stable/python-api.html

    import llm
    model = llm.get_model("gpt-3.5-turbo")
    model.key = 'YOUR_API_KEY_HERE'
    response = model.prompt(
        "Five surprising names for a pet pelican"
    )
    print(response.text())
Or you can stream the responses like this:

    response = model.prompt(
        "Five diabolical names for a pet goat"
    )
    for chunk in response:
        print(chunk, end="")
It works with other models too, installed via plugins - including models that can run directly on your machine:

    pip install llm-gpt4all
Then:

    model = llm.get_model("ggml-vicuna-7b-1")
    print(model.prompt(
        "What is the capital of France?"
    ).text())
It also handles conversations, where each prompt needs to include the previous context of the conversation:

    c = model.conversation()
    print(c.prompt("Capital of France?").text())
    print(c.prompt("what language do they speak?").text())
I wrote more about the new plugin system for adding extra models here: https://simonwillison.net/2023/Jul/12/llm/

Nice. Now all we need is a vector database atop SQLite.

I had a go at one of those a few months ago: https://datasette.io/plugins/datasette-faiss

Alex Garcia built a better one here as a SQLite Rust extension: https://github.com/asg017/sqlite-vss