Autograd (and most of the current approaches) work by having a special object for the data and overloaded methods that instead of immediately executing an operation they instead store it in a graph of transformations. Then when you need the gradient it applies the chain rule over this graph. The support for loops/control flow is possible since at each call you destroy and recreate the graph, which is not optimal for performance but makes it very dynamic (tensorflow eager/pytorch vs tensorflow graph interface).

That's also an approach that Julia excels because of multiple dispatch which you can see explained in [1].

In that case you have effectively two separate languages, the language used to generate the graph and the graph, each. This approach applies the transformation directly on the Julia IR to generate the gradient descent as if you wrote it directly on Julia side by side with the code that was written in a way that is completely unaware of that transformation (such as the ability to differentiate libraries that were built before that approach even existed). So the end product is something that is similar to the tensorflow graph (it has all control flow already embedded and can be pre-optimized by a compiler), but that is even easier to write than tensorflow eager (which is also the intent of Swift for Tensorflow).

[1] https://github.com/MikeInnes/diff-zoo