I don't understand what this does. Examples mention "human prompt" but I don't see it anywhere.

It's a controlnet model trained on SAM segmentation maps. The final model takes a prompt and a segmentation map as input and generates an image conditioned on them.

And I thought I was moderately "with it" with regard to AI… Could I get that in ELI5?

https://github.com/lllyasviel/ControlNet

Controlnet is a neural network added to an already trained model so they can be conditioned on new stuff like canny edge, depth map, segmentation map. Controlnet let you train this model on the new condition "easily", without catastrophic forgetting and without a huge dataset. In the repo linked by OP, they have trained a controlnet model on the segmentation map generated by SAM: https://segment-anything.com/