What is the cheapest way to run it? I'm looking to build a product over it.
Probably quantizing or using base weights and this project https://github.com/ggerganov/llama.cpp on a CPU machine with AVX512 instructions.
What is the cheapest way to run it? I'm looking to build a product over it.