What does HackerNews think of diff-zoo?

Differentiation for Hackers

Language: Julia

For an example in julia, see Mike Innes tutorial: https://github.com/MikeInnes/diff-zoo

I'm only a beginner in Julia and not and AD expert, but I went through the exercise of porting this to python and found it very enlightening

This is an obscure one, but Mike Innes "[automatic] differentiation for hackers" tutorial. It's a code tutorial, not software, if that counts. Both the way it's constructed and the functionality of Julia that gets shown off here.

https://github.com/MikeInnes/diff-zoo

Some more related info on different algorithmic differentiation approaches in Julia: https://github.com/MikeInnes/diff-zoo
Definitely, since Julia has the same approach as Python of allowing quick and dirty solutions for data analysis/modelling, but with a much larger scope even when you don't have complete library support. In a few hours you can make a functional pytorch clone (and just using special GPU arrays you get it running on GPUs) with similar performance [1], and within a day (given a very good understanding of the language) a method that compiles the gradient directly from unmodified Julia code [2]. Plus native matlab-like goodness such as multi-dimensional arrays, so you don't have a separate library for fast operations and you can just use normal loops or whatever you want.

But while Julia targets Python fast and concise (while not compromising speed or power), it does not target the slower but more correct (though there is a culture of testing, which is quite important for math-oriented problems since the type system will not catch the most subtle and troublesome problems). There is space for a language to do exploratory/research work that can be quickly deployed in a fast iterative cycle and another for the new Spark/Flink or critical production areas that needs to take the extra effort (like self-driving cars), which could be Rust (or Scala, or Haskell, or Swift or stay with C++/Fortran).

[1] https://github.com/MikeInnes/diff-zoo

[2] http://blog.rogerluo.me/2019/07/27/yassad/

Autograd (and most of the current approaches) work by having a special object for the data and overloaded methods that instead of immediately executing an operation they instead store it in a graph of transformations. Then when you need the gradient it applies the chain rule over this graph. The support for loops/control flow is possible since at each call you destroy and recreate the graph, which is not optimal for performance but makes it very dynamic (tensorflow eager/pytorch vs tensorflow graph interface).

That's also an approach that Julia excels because of multiple dispatch which you can see explained in [1].

In that case you have effectively two separate languages, the language used to generate the graph and the graph, each. This approach applies the transformation directly on the Julia IR to generate the gradient descent as if you wrote it directly on Julia side by side with the code that was written in a way that is completely unaware of that transformation (such as the ability to differentiate libraries that were built before that approach even existed). So the end product is something that is similar to the tensorflow graph (it has all control flow already embedded and can be pre-optimized by a compiler), but that is even easier to write than tensorflow eager (which is also the intent of Swift for Tensorflow).

[1] https://github.com/MikeInnes/diff-zoo

That's essentially what a source-to-source AD does, just with support for the extra features that show up in programming languages. For example, handling variable bindings gets you the typical Wengert list, and handling function calls gets you the Pearlmutter and Siskind style backpropagator (I wrote a bit about the relationships at [0]).

The short answer is that CAS systems work with a "programming language" that doesn't have these features and is therefore a bit too limited for the kinds of models we're interested in.

[0] https://github.com/MikeInnes/diff-zoo