I happen to have RPi 4B with HomeAssistant. Is this something I could set up on it and integrate with HA to control it with speech, or is it overkill?

I doubt it. I'm running 4-bit 30B and 65B models with 64GB ram, a 4080 and a 7900x. The 7B models are less demanding, but even so, You'll need more than an rpi. Even then, it would be a project to get these to control something. This is more 'first baby steps' toward the edge.

The article shows example running on RPI that recognizes colour names. I could just come up with keywords that would invoke certain commands and feed them to HA, which would match them to an automation (i.e. turn off kitchen, or just kitchen ) . I think a PoC is doable, but I'm aware I could run into limitations quickly. Idk might give it a try when I'm bored.

Would love voice assistant running locally but probably there are solutions out there - didn't get to do the research yet

Shameless plug, I'm the founder of Willow[0].

In short you can:

1) Run a local Willow Inference Server[1]. Supports CPU or CUDA, just about the fastest implementation of Whisper out there for "real time" speech.

2) Run local command detection on device. We pull your Home Assistant entities on setup and define basic grammar for them but any English commands (up to 400) are supported. They are recognized directly on the $50 ESP BOX device and sent to Home Assistant (or openHAB, or a REST endpoint, etc) for processing.

Whether WIS or local our performance target is 500ms from end of speech to command executed.

[0] - https://github.com/toverainc/willow

[1] - https://github.com/toverainc/willow-inference-server