Worringly, I am it sure the people working on this really understand what a Transformer is
Quote from them:
“ There is still active research in non-transformer based language models though, such as Amazon’s AlexaTM 20B which outperforms GPT-3“
Quote from said paper
“ For AlexaTM 20B, we used the standard Transformer model architecture“
(Its just an encoder decoder transformer)
Thanks for pointing this out. That was my mistake – my brain must have swapped out "different transformer architectures" with "different model architectures".
I just updated the guide: https://github.com/brexhq/prompt-engineering/commit/3a3ac17a...
A better example would have been the RWKV [1]