So they generated training data from one laptop and microphone then generated test data with the exact same laptop and microphone in the same setup, possibly one person pressing the keys too. For the Zoom model they trained a new model with data gathered from Zoom. They call it a practical side channel attack but they didnt do anything to see if this approach could generalize at all

I believe that is the generalisable version of the attack. You're not looking to learn the sound of arbitrary keyboards with this attack, rather you're looking to learn the sound of specific targets.

For example, a Twitch streamer enters responses into their stream-chat with a live mic. Later, the streamer enters their Twitch password. Someone employing this technique could reasonably be able to learn the audio from the first scenario, and apply the findings in the second scenario.

Finally, a real security weakness to cite when making fun of people for their mechanical keyboard. Time to start recording the audio of Zoom calls with some particularly loud typers...

Not according to the article.. Microphones are sensitive enough to mount the attack on quieter keyboards.

What we clearly need are louder keyboards - which overload the mic so as to render keystrokes indistinguishable.

Adding a gain knob to my keyboard, be right back.

My mechanical keyboard already has a knob that I've configured to control the system audio volume, all that's left is configuring Linux to play an audio recording of a keypress every time I press a key...

You want https://github.com/zevv/bucklespring then.

Lagniappe: “To temporarily silence bucklespring, for example to enter secrets, press ScrollLock twice”