Grafana truly is best in class, but I have strong reservations about Prometheus.

I really want to like it, it’s just so _easy_, publish a little webpage with your metrics and Prometheus takes care of the rest. Lovely.

But I often find that the cardinality of the data is substantially lower than even the defaults of alternatives (influxdb has 1s and even Zabbix has 5s).

Not to mention the lost writes (missing data points) which have no logged explanation.

All of this, however, was in my homelab, which, while unconstrained in resources lacks a lot of the fit and finish of a prod system.

I also take pause with the architecture; it’s not meant to scale. It’s written on the tin so it’s not like I’m picking fault, but when you’re building a dashboard that sucks in data from 25 different Prometheus data sources, it becomes difficult to run functions like SUM(), because the keys may be out of sync causing some really ugly and inaccurate representations of data.

Everything about the design (polling, single database) tells me that it was designed primarily to sit alongside something small. It could never handle the tens of millions of data points per second that I ingest(ed) at my (now previous) job.

But it has a lot of hype, and maybe I’m holding it wrong.

Victoriametrics[0] is API-compatible with Prometheus but also a horizontally scalable, distributed and persisting timeseries database (cf influxdb). Together with vmagent it essentially becomes a HA drop-in replacement (almost) for Prometheus.

[0]: https://github.com/VictoriaMetrics/VictoriaMetrics