Is there a popular and performant open source model that uses mixture of experts?
Google have released the models and code for the Switch Transformer from Fedus et al. (2021) under the Apache 2.0 licence. [0]
There's also OpenMoE - an open-source effort to train a mixture of experts model. Currently they've released a model with 8 billion parameters. [1]
[0] https://github.com/google-research/t5x/blob/main/docs/models...