What does HackerNews think of Distributions.jl?

A Julia package for probability distributions and associated functions.

Language: Julia

If you look at Julia open source projects you'll see that the projects tend to have a lot more contributors than the Python counterparts, even over smaller time periods. A package for defining statistical distributions has had 202 contributors (https://github.com/JuliaStats/Distributions.jl), etc. Julia Base even has had over 1,300 contributors (https://github.com/JuliaLang/julia) which is quite a lot for a core language, and that's mostly because the majority of the core is in Julia itself.

This is one of the things that was noted quite a bit at this SIAM CSE conference, that Julia development tends to have a lot more code reuse than other ecosystems like Python. For example, the various machine learning libraries like Flux.jl and Lux.jl share a lot of layer intrinsics in NNlib.jl (https://github.com/FluxML/NNlib.jl), the same GPU libraries (https://github.com/JuliaGPU/CUDA.jl), the same automatic differentiation library (https://github.com/FluxML/Zygote.jl), and of course the same JIT compiler (Julia itself). These two libraries are far enough apart that people say "Flux is to PyTorch as Lux is to JAX/flax", but while in the Python world those share almost 0 code or implementation, in the Julia world they share >90% of the core internals but have different higher levels APIs.

If one hasn't participated in this space it's a bit hard to fathom how much code reuse goes on and how that is influenced by the design of multiple dispatch. This is one of the reasons there is so much cohesion in the community since it doesn't matter if one person is an ecologist and the other is a financial engineer, you may both be contributing to the same library like Distances.jl just adding a distance function which is then used in thousands of places. With the Python ecosystem you tend to have a lot more "megapackages", PyTorch, SciPy, etc. where the barrier to entry is generally a lot higher (and sometimes requires handling the build systems, fun times). But in the Julia ecosystem you have a lot of core development happening in "small" but central libraries, like Distances.jl or Distributions.jl, which are simple enough for an undergrad to get productive in a week but is then used everywhere (Distributions.jl for example is used in every statistics package, and definitions of prior distributions for Turing.jl's probabilistic programming language, etc.). I had never seen anything like that before in the R or Python space, by comparison Python almost feels like it's all solo projects.

Turing.jl is in an interesting spot because it is essentially a DSL-free probabilistic programming language. While it technically has a DSL of sorts given by the `@model` macro, anything that is AD-compatible can be used in this macro and since Julia's AD tools work on things written in the Julia language, this means that you can just throw code from other Julia packages into Turing and just expect AD-compatible things to work with Hamiltonian Monte Carlo and all of that. So things like DifferentialEquations.jl ODEs/SDEs/DAEs/DDEs/etc. work quite well with this, along with other "weird things for a probabilistic programming language to support" like nonlinear solving (via NLsolve.jl) or optimization (via Optim.jl, and I mean doing Bayesian inference where a value is defined as the result of an optimization). If you are using derivative-free inference methods, like particle sampling methods or variants of Metropolis-Hastings, then you can throw pretty much any existing Julia you had as a nonlinear function and do inference around it.

So while it's in some sense similar to PyMC3 or Stan, there's a huge difference in the effective functionality that you get by supporting a language-wide infrastructure vs the more traditional method of one-by-one adding features and documenting them. So while PyMC3 ran a Google Summer of Code to get some ODE support (https://docs.pymc.io/notebooks/ODE_API_introduction.html) and Stan has 2 built-in methods you're allowed to use (https://mc-stan.org/docs/2_19/stan-users-guide/ode-solver-ch...), with Julia you get all of DifferentialEquations.jl just because it exists (https://docs.sciml.ai/latest/). This means that Turing.jl doesn't document or doesn't have to document most of its features, but they exist due to composibility.

That's quite different from a "top down" approach to library support. This explains why Turing has been able to develop so fast as well, since it's developer community isn't just "the people who work on Turing", but it's pretty much the whole ecosystem of Julia. Its distributions are defined by Distributions.jl (https://github.com/JuliaStats/Distributions.jl), its parallelism is given by Julia's base parallelism work + everything around it like CuArrays.jl and KernelAbstractions.jl (https://github.com/JuliaGPU/KernelAbstractions.jl), derivatives come from 4 libraries, ODEs from etc. the list keeps going.

So bringing it back to deep learning, Turing currently has 4 modes for automatic differentiation (https://turing.ml/dev/docs/using-turing/autodiff), and thus supports any library that's compatible with those. It turns out that Flux.jl is compatible with them, so therefore Turing.jl can do Bayesian deep learning. In that sense it's like Edward or Pyro, but supporting "anything that AD's with Julia AD packages" (which soon will allow multi-AD overloads via ChainRules.jl) instead of "anything on TensorFlow graphs" or "anything compatible with PyTorch".

As for performance and robustness, I mentioned in a SciML ecosystem release today that our benchmarks pretty clearly show Turing.jl as being more robust than Stan while achieving about a 3x-5x speedup in ODE parameter estimation (https://sciml.ai/2020/05/09/ModelDiscovery.html). However, that's utilizing the fact that Turing.jl's composibility with packages gives it top notch support (I want to work with Stan developers so we can use our differential equation library with their samplers to better isolate differences and hopefully improve both PPLs, but for now we have what we have). If you isolate it down to just "Turing.jl itself", it has wins and losses against Stan (https://github.com/TuringLang/Turing.jl/wiki). That said, there's some benchmarks which indicate using the ReverseDiff AD backend will give about 2 orders of magnitude performance increases in many situations (https://github.com/TuringLang/Turing.jl/issues/1140, note that ThArrays is benchmarking PyTorch AD here) which would then probably tip the scales in Turing's favor. As for benchmarking against Pyro or Edward, it would probably just come down to benchmarking the AD implementations.

I feel like people are perhaps a bit overly negative about the state of Julia's packages. Compared to other early-stage languages, I'd say our package ecosystem is vibrant and full of fantastic packages. Naturally perhaps they are more math/scientific focussed, so if you are looking for cutting-edge packages for web dev you won't find them yet (although there are packages!).

See http://pkg.julialang.org/pulse.html, for example. We have over 470 packages in total that are registered, and on Julia 0.3 we have over 300 packages with tests that pass - and we run the tests in all registered packages every night.

Some of my favorite packages (that I didn't make, of course :D) would include

https://github.com/JuliaStats/Distributions.jl

https://github.com/JuliaStats/StatsBase.jl

https://github.com/pluskid/Mocha.jl (deep learning)

https://github.com/stevengj/PyCall.jl

The JuliaOpt stack of optimization packages (http://juliaopt.org)

and then you get fun new ones like https://github.com/anthonyclays/RomanNumerals.jl