So, and this is an ELI5 kind of question I suppose. There must be something going on like "processing a kazillion images" and I'm trying to wrap my head around how (or what part of) that work is "offloaded" to your home computer/graphics card? I just can't seem to make sense of how you can do it at home if you're not somehow in direct contact with "all the data?" e.g. must you be connected to the internet, or "stable-diffusions servers" for this to work?

SD has 860M weights for the main workhorse part. At 16-bit precision that is only 1.6 GB of data, which in some very real sense has condensed the world's total knowledge of art and photography and styles and objects.

It's not a search engine, it's self-contained and the closest analogy is that it's a very very knowledgable and skilled artist.

Is there a smaller version of the model available (<4gb) intended for use with 16 bit precision?

Diffusers shows how to use the fp16 variant.

https://github.com/huggingface/diffusers