I love OpenTelemetry and we want to trace almost every span happening. We’d be bankrupt if we went any vendor. We wired opentelemetry with Java magic, 0 effort and pointed to a self hosted Clickhouseand store 700m+ span per day with a 100$ EC2.
https://clickhouse.com/blog/how-we-used-clickhouse-to-store-...
I've got a small personal project submitting traces/logs/metrics to Clickhouse via SigNoz. Only about 400k-800k spans per day (https://i.imgur.com/s0J6Mzo.png), but running on a single t4g.small with CPU typically at 11% and IOPS at 4%. I also have everything older than a certain number of GB getting pushed to a sc1 cold storage drive.
w/ 1 month retention for traces:
┌─parts.table─────────────────┬──────rows─┬─disk_size──┬─engine────┬─compressed_size─┬─uncompressed_size─┬────ratio─┐
│ signoz_index_v2 │ 26902115 │ 17.06 GiB │ MergeTree │ 6.21 GiB │ 66.74 GiB │ 0.0930 │
│ durationSort │ 26901998 │ 5.44 GiB │ MergeTree │ 5.40 GiB │ 53.02 GiB │ 0.10190 │
│ trace_log │ 123185362 │ 2.64 GiB │ MergeTree │ 2.64 GiB │ 37.96 GiB │ 0.0695 │
│ trace_log_0 │ 120052084 │ 2.46 GiB │ MergeTree │ 2.45 GiB │ 37.60 GiB │ 0.06528 │
│ signoz_spans │ 26902115 │ 2.21 GiB │ MergeTree │ 2.21 GiB │ 76.73 GiB │ 0.028784 │
│ query_log │ 16384865 │ 1.91 GiB │ MergeTree │ 1.90 GiB │ 18.31 GiB │ 0.10398 │
│ part_log │ 17906105 │ 846.73 MiB │ MergeTree │ 845.39 MiB │ 3.84 GiB │ 0.21521 │
│ metric_log │ 4713151 │ 820.92 MiB │ MergeTree │ 806.13 MiB │ 14.56 GiB │ 0.05405 │
│ part_log_0 │ 15632289 │ 702.82 MiB │ MergeTree │ 701.70 MiB │ 3.34 GiB │ 0.20490 │
│ asynchronous_metric_log │ 795170674 │ 576.24 MiB │ MergeTree │ 562.50 MiB │ 11.11 GiB │ 0.049429 │
│ query_views_log │ 6597156 │ 461.35 MiB │ MergeTree │ 459.75 MiB │ 6.36 GiB │ 0.07060 │
│ logs │ 6448259 │ 408.59 MiB │ MergeTree │ 406.65 MiB │ 5.99 GiB │ 0.06627 │
│ samples_v2 │ 949110122 │ 345.01 MiB │ MergeTree │ 325.31 MiB │ 22.09 GiB │ 0.014382 │
If I was less stupid I'd get a machine with the recommended Clickhouse specs and save myself a few hours of tuning, but this works great.Downsides:
- clickhouse takes about 5 minute to start up because my tiny sc1 drive has like 4 IOPS allowed
- signoz's UI isn't amazing. It's totally functional, and they've been improving very quickly, but don't expect datadog-level polish
If anyone wants to check our project, here’s our GitHub repo - https://github.com/SigNoz/signoz