I've been playing a bit with stacking transformer adapters to add knowledge to models and so far it has met my needs. It doesn't have the same illusion of intelligence, but so far it's just as good as a multitasking intern, so I am still having fun with it. I wonder if this is basically doing the same thing.

Interesting. Do you know if this can be done with Sentence Transformers, too? Picking a good performing one from HF. Then training an adapter for the domain (unsupervised). Then adding another one using actual training triplets (base, similar, non-similar)?

I haven't done this with sentence transformers but I imagine it's possible since they can be loaded as regular transformers.

Check out https://github.com/huggingface/peft -- they've packaged it up nicely- and read up on LoRA (https://arxiv.org/pdf/2106.09685.pdf) That should get you started.