Can someone informed give some suggestions as to compare/contrast to other tools at the intersection of probabilistic programming and deep learning? What are relative strengths and weaknesses vs edward or pyro?
So while it's in some sense similar to PyMC3 or Stan, there's a huge difference in the effective functionality that you get by supporting a language-wide infrastructure vs the more traditional method of one-by-one adding features and documenting them. So while PyMC3 ran a Google Summer of Code to get some ODE support (https://docs.pymc.io/notebooks/ODE_API_introduction.html) and Stan has 2 built-in methods you're allowed to use (https://mc-stan.org/docs/2_19/stan-users-guide/ode-solver-ch...), with Julia you get all of DifferentialEquations.jl just because it exists (https://docs.sciml.ai/latest/). This means that Turing.jl doesn't document or doesn't have to document most of its features, but they exist due to composibility.
That's quite different from a "top down" approach to library support. This explains why Turing has been able to develop so fast as well, since it's developer community isn't just "the people who work on Turing", but it's pretty much the whole ecosystem of Julia. Its distributions are defined by Distributions.jl (https://github.com/JuliaStats/Distributions.jl), its parallelism is given by Julia's base parallelism work + everything around it like CuArrays.jl and KernelAbstractions.jl (https://github.com/JuliaGPU/KernelAbstractions.jl), derivatives come from 4 libraries, ODEs from etc. the list keeps going.
So bringing it back to deep learning, Turing currently has 4 modes for automatic differentiation (https://turing.ml/dev/docs/using-turing/autodiff), and thus supports any library that's compatible with those. It turns out that Flux.jl is compatible with them, so therefore Turing.jl can do Bayesian deep learning. In that sense it's like Edward or Pyro, but supporting "anything that AD's with Julia AD packages" (which soon will allow multi-AD overloads via ChainRules.jl) instead of "anything on TensorFlow graphs" or "anything compatible with PyTorch".
As for performance and robustness, I mentioned in a SciML ecosystem release today that our benchmarks pretty clearly show Turing.jl as being more robust than Stan while achieving about a 3x-5x speedup in ODE parameter estimation (https://sciml.ai/2020/05/09/ModelDiscovery.html). However, that's utilizing the fact that Turing.jl's composibility with packages gives it top notch support (I want to work with Stan developers so we can use our differential equation library with their samplers to better isolate differences and hopefully improve both PPLs, but for now we have what we have). If you isolate it down to just "Turing.jl itself", it has wins and losses against Stan (https://github.com/TuringLang/Turing.jl/wiki). That said, there's some benchmarks which indicate using the ReverseDiff AD backend will give about 2 orders of magnitude performance increases in many situations (https://github.com/TuringLang/Turing.jl/issues/1140, note that ThArrays is benchmarking PyTorch AD here) which would then probably tip the scales in Turing's favor. As for benchmarking against Pyro or Edward, it would probably just come down to benchmarking the AD implementations.