Well, sort of ..., I'm refining an algo that takes several (carefully calibrated) outputs from a given LLM and infers the most plausible set of parameters behind it. I was expecting to find clusters of parameters very much alike to what they observe.
I informally call this problem inverting an LLM, and obv., it turns out to be non-trivial to solve. Not completely impossible, tho! as so far I've found some good approximations to it.
Anyway, quite an interesting read, def. will keep an eye on what they publish in the future.
Also, from the linked manuscript at the end,
>Another hypothesis is that some features are actually higher-dimensional feature manifolds which dictionary learning is approximating.
Well, you have something that behaves like a continuous, smooth space so you could define as many manifolds as you'd need to suit your needs, so yes :^). But, pedantry off, I get the idea and IMO that's definitely what's going on and the right framework to approach this problem from.
One amazing realization one can get from this is, what is the conceptual equivalent of the transition functions that connect all different manifolds in this LLM space? When you see it your mind will be blown, not because of its complexity, but rather because of its exceptional simplicity.
I’m curious to learn more about how LLMs work too.
You may need some intermediate knowledge of linear algebra and this thing called "data science" nowadays, which is pretty much knowing how to mangle data and visualize it.
Try creating a small model on your own, it doesn't have to be super fancy just make sure it does something you want it to do. And then ... you'll probably could go on your own then.