> Can somebody help me to understand how a LLM has achieved this capability.

It's worth clarifying what is being accomplished here. iOS is handling speech recognition, and Shortcuts is handling task execution with an unspecified and presumably long user script. What GPT does here is convert text instructions into JSON formatted slot filling[1] responses.

It's somewhat amazing that GPT is emitting valid JSON, but I guess it's seen enough JSON in the training set to understand the grammar, and we shouldn't be too surprised it can learn regular grammars if it can learn multiple human languages. Slot filling is a well studied topic, and with the very limited vocabulary of slots, it doesn't have as many options to go wrong as commercial voice assistants. I would be way more amazed if this were able to generate Shortcuts code directly, but I don't think that's allowed.

> some of its output (eg future time stamps) will not be present in its training data. Even with several billion parameters that seems impossible

Maybe this is a feature of attention, which lets each token look back to modify its own prediction, and special token segmentation[2] for dates?

[1]: http://nlpprogress.com/english/intent_detection_slot_filling... [2]: https://github.com/google/sentencepiece