This looks neat . In python is this also a project that does the same thing ? https://github.com/HIPS/autograd
That's also an approach that Julia excels because of multiple dispatch which you can see explained in [1].
In that case you have effectively two separate languages, the language used to generate the graph and the graph, each. This approach applies the transformation directly on the Julia IR to generate the gradient descent as if you wrote it directly on Julia side by side with the code that was written in a way that is completely unaware of that transformation (such as the ability to differentiate libraries that were built before that approach even existed). So the end product is something that is similar to the tensorflow graph (it has all control flow already embedded and can be pre-optimized by a compiler), but that is even easier to write than tensorflow eager (which is also the intent of Swift for Tensorflow).