AD is important to understand for ML practitioners in the same way as compilers are important to understand for programmers. You can get away without knowing all the details, but it helps to understand where your gradients come from. However this paper is probably not be a good place to start if you're new to AD. If you want a better introduction, here are a few good resources:

Autodidact is a pedagogical implementation of AD: https://github.com/mattjj/autodidact

A nice literature review from JMLR: http://www.jmlr.org/papers/volume18/17-468/17-468.pdf

This paper reinterprets AD through the lens of category theory, an abstraction for modeling a wide class of problems in math and CS. It provides a language to describe these problems in a simple and powerful way, and is the foundation for a lot of work in functional programming (if you're interested in that kind of stuff). There was a thread on HN recently that discusses why category theory is useful: https://news.ycombinator.com/item?id=18267536

"Category Theory for the Working Hacker" by Philip Wadler is a great talk if you're interested in learning more: https://www.youtube.com/watch?v=gui_SE8rJUM

Also recommend checking out Bartosz Milewski's "Category Theory for Programmers": https://github.com/hmemcpy/milewski-ctfp-pdf