I love observability probably more than most. And my initial reaction to this article is the obvious: why not both?
In fact, I tend to think more in terms of "events" when writing both logs and tracing code. How that event is notified, stored, transmitted, etc. is in some ways divorced from the activity. I don't care if it is going to stdout, or over udp to an aggregator, or turning into trace statements, or ending up in Kafka, etc.
But inevitably I bump up against cost. For even medium sized systems, the amount of data I would like to track gets quite expensive. For example, many tracing services charge for the tags you add to traces. So doing `trace.String("key", value)` becomes something I think about from a cost perspective. I worked at a place that had a $250k/year New Relic bill and we were avoiding any kind of custom attributes. Just getting APM metrics for servers and databases was enough to get to that cost.
Logs are cheap, easy, reliable and don't lock me in to an expensive service to start. I mean, maybe you end up integrating splunk or perhaps self-hosting kibana, but you can get 90% of the benefits just by dumping the logs into Cloudwatch or even S3 for a much cheaper price.
Luckily, some of the larger incumbents are also moving away from this model, especially as OpenTelemetry is making tracing more widespread as a baseline of sorts for data. And you can definitely bet they're hearing about it from their customers right now, and they want to keep their customers.
Cost is still a concern but it's getting addressed as well. Right now every vendor has different approaches (e.g., the one I work for has a robust sampling proxy you can use), but that too is going the way of standardization. OTel is defining how to propagate sampling metadata in signals so that downstream tools can use the metadata about population representativeness to show accurate counts for things and so on.
What newer tools/companies are in this category? Any that you recommend?
Clickhouse gives users much greater flexibility in tradeoffs than either a time-series or inverted-index based store could offer (along with S3 support). There's nothing like a system that can balance high performance AND (usable) high cardinality.
[1] https://github.com/hyperdxio/hyperdx
disclaimer (in case anyone just skimmed): I'm one of the authors of HyperDX