Percentiles become painful when trying to aggregate them. Suppose you record the p99 latency of some service, but you collect these metrics at the rack or data center level. Now you ask, what is the overall p99 latency of the service? Not an easy question to answer. Especially if you automatically subsample older time series data in order to store more of it (I've seen people trying to perform a weighted average of subsampled percentile metrics—it turns into a mess).

We need an efficient way to compactly represent the entire distribution of a metric over time so arbitrary aggregations can be performed accurately. There is some research on this topic, but nothing really production-ready that I'm aware of.

Sounds like you’re looking for T-digests[1] - most production systems I’ve worked on that do this are using them under the hood.

1: https://github.com/tdunning/t-digest