The things I've seen all use hosted language models. For example https://github.com/jerryjliu/gpt_index depends on LangChain, which wraps APIs from hosted LLMs: https://langchain.readthedocs.io/en/latest/reference/modules...

AFAIK there's no GPT-3-like LLM that's easy to run at home, because the number of parameters is so so large. Your gaming PC's GPU won't have enough RAM to hold the model. For example, gpt-neox-20b needs about 40GB of RAM: https://huggingface.co/EleutherAI/gpt-neox-20b/discussions/1...