Uh... is there some way to use this without connecting to a server? Like, for a game that can be played offline?

Finding a way to make the machine learning piece a completely self-contained library that can be shipped at scale to run on individual computers is the big hurdle to making AI like this practical for games. If I have to rely on your service staying up for my game to work, that's an unacceptable supply chain risk.

7B parameter models are more than enough for this and run faster than talking pace on even a low end CPU.

Even a finetuned 3B model would be excellent for generative agents and only use about 2GB of RAM to at high speeds on even a single core CPU.

Can you share some examples Of what models you’re referring to?