For a general guide, I recommend: https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

There's a subreddit r/LocalLLaMA that seems like the most active community focused on self-hosting LLMs. Here's a recent discussion on hardware: https://www.reddit.com/r/LocalLLaMA/comments/12lynw8/is_anyo...

If you're looking just for local inference, you're best bet is probably to buy a consumer GPU w/ 24GB of RAM (3090 is fine, 4090 more performance potential), which can fit a 30B parameter 4-bit quantized model that can probably be fine-tuned to ChatGPT (3.5) level quality. If not, then you can probably add a second card later on.

Alternatively, if you have an Apple Silicon Mac, llama.cpp performs surprisingly well, it's easy to try for free: https://github.com/ggerganov/llama.cpp

Current AMD consumer cards have terrible software support and IMO isn't really an option. On Windows you might be able to use SHARK or DirectML ports, but nothing will run out of the box. ROCm still has no RDNA3 support (supposedly coming w/ 5.5 but no release date announced) and it's unclear how well it'll work - basically, unless you would rather be fighting w/ hardware than playing around w/ ML, it's probably best to avoid (the older RDNA cards also don't have tensor cores, so perf would be hobbled even if you could get things running. Lots of software has been written w/ CUDA-only in mind).