Worringly, I am it sure the people working on this really understand what a Transformer is

Quote from them:

“ There is still active research in non-transformer based language models though, such as Amazon’s AlexaTM 20B which outperforms GPT-3“

Quote from said paper

“ For AlexaTM 20B, we used the standard Transformer model architecture“

(Its just an encoder decoder transformer)

Thanks for pointing this out. That was my mistake – my brain must have swapped out "different transformer architectures" with "different model architectures".

I just updated the guide: https://github.com/brexhq/prompt-engineering/commit/3a3ac17a...

A better example would have been the RWKV [1]

[1] https://github.com/BlinkDL/RWKV-LM