What does HackerNews think of veneur?

A distributed, fault-tolerant pipeline for observability data

Language: Go

This was the idea behind Stripe's Veneur project - spans, logs, and metrics all in the same format, "automatically" rolling up cardinality as needed - which I thought was cool but also that it would be very hard to get non-SRE developers on board with when I saw a talk about it a few years ago.

https://github.com/stripe/veneur

One pain point with Prometheus is that is has relatively weak support for quantiles, histograms, and sets[1]:

- Histograms require manually specifying the distribution of your data, which is time-consuming, lossy, and can introduce significant error bands around your quantile estimates.

- Quantiles calculated via the Prometheus "summary" feature are specific to a given host, and not aggregatable, which is almost never what you want (you normally want to see e.g. the 95th percentile value of request latency for all servers of a given type, or all servers within a region). Quantiles can be calculated from histograms instead, but that requires a well-specified histogram and can be expensive at query time.

- As far as I know, Prometheus doesn't have any explicit support for unique sets. You can compute this at query time, but persisting and then querying high-cardinality data in this way is expensive.

Understanding the distribution of your data (rather than just averages) is arguably the most important feature you want from a monitoring dashboard, so the weak support for quantiles is very limiting.

Veneur[2] addresses these use-cases for applications that use DogStatsD[3] by using clever data structures for approximate histograms[4] and approximate sets[5], but I believe its integration with Prometheus is limited and currently only one-way - there is a CLI app to poll Prometheus metrics and push them into Veneur[6], but there's no output sink for Veneur to write to Prometheus (or expose metrics for a Prometheus instance to poll), and you aren't able to use the approximate histogram or approximate set datatypes if you go that route, because they can't be expressed as Prometheus metrics.

It would be extremely useful to have something similar for Prometheus, either by integrating with Veneur or implementing those data structures as an extension to Prometheus.

[1] https://prometheus.io/docs/practices/histograms/

[2] https://github.com/stripe/veneur

[3] https://docs.datadoghq.com/developers/dogstatsd/

[4] https://github.com/stripe/veneur#approximate-histograms

[5] https://github.com/stripe/veneur#approximate-sets

[6] https://github.com/stripe/veneur/tree/master/cmd/veneur-prom...

Veneur[1] can compute approximate global percentiles (among other things) during metric collection, and the percentiles can be stored and queried in even in a datastore that doesn't know anything about distributions.

[1] https://github.com/stripe/veneur

Seems similar to Veneur (like many other projects mentioned in comments here; didn't realize this space was so crowded!) - down to the first two letters of the name: https://github.com/stripe/veneur

Veneur is more metrics-focused, but might offer inspiration as you work on metrics support in Vector - in particular the SSF source, internal aggregation, and Datadog and SignalFX sinks.

This is awesome! We make heavy use of HyperLogLogs in our monitoring systems[0], which are also written in Go and currently use Clark Duvall's library, which this library is based on.

I'm excited to try this out on our systems and see what results we get.

[0] https://github.com/stripe/veneur