Thanks a lot for this video, best LLM usage tutorial I've seen so far.

At https://youtu.be/jkrNMKz9pWU?si=Dvz-Hs4InJXNozhi&t=3278 when talking about valid use cases for a local model vs GPT4 is: "You might want to create your own model that's particularly good at solving the kinds of problems that you need to solve using fine tuning, and these are all things that you absolutely can get better than GPT4 performance".

In regards to this, there's an idea I've been thinking about for some time: Imagine a chatbot that is backed by multiple "small" models (such as 7B parameters), where each model is fine tuned for a specific task. Could such a system outperform GPT4?

Here's a high level overview how I imagine this to work:

- Context/prompt is sent to a "router model", which is trained to determine what kind of expert model can best answer/complete the prompt.

- The system then passes the context/prompt to the expert model and returns that answer.

- If no expert model is found, just use a generic instruct tuned general purpose LLM to answer

If you can theoretically get better than GPT4 performance on a small models fine tuned for that task, maybe a cluster of such small models could collectively outperform GPT4.

Does that make sense?

It makes a lot of sense! In fact there's a number of open source projects working on just such a model right now. Here's a great example: https://github.com/XueFuzhao/OpenMoE/