Gosh. Can somebody help me to understand how a LLM has achieved this capability.

I had thought that a LLM was essentially only doing completions under the hood using statistical likelihood of next words, wrapped in some lexical sugar/clever prompt mods. But evidently far more is going on here since, in OP’s example, some of its output (eg future time stamps) will not be present in its training data. Even with several billion parameters that seems impossible to me. (Clearly not!)

Could somebody join the dots for me on the nature of whatever framework the LLM is embedded inside that allows it to achieve these behaviours that emulate logic and allow generation of unique content.

> Can somebody help me to understand how a LLM has achieved this capability.

It's worth clarifying what is being accomplished here. iOS is handling speech recognition, and Shortcuts is handling task execution with an unspecified and presumably long user script. What GPT does here is convert text instructions into JSON formatted slot filling[1] responses.

It's somewhat amazing that GPT is emitting valid JSON, but I guess it's seen enough JSON in the training set to understand the grammar, and we shouldn't be too surprised it can learn regular grammars if it can learn multiple human languages. Slot filling is a well studied topic, and with the very limited vocabulary of slots, it doesn't have as many options to go wrong as commercial voice assistants. I would be way more amazed if this were able to generate Shortcuts code directly, but I don't think that's allowed.

> some of its output (eg future time stamps) will not be present in its training data. Even with several billion parameters that seems impossible

Maybe this is a feature of attention, which lets each token look back to modify its own prediction, and special token segmentation[2] for dates?

[1]: http://nlpprogress.com/english/intent_detection_slot_filling... [2]: https://github.com/google/sentencepiece