Is data being inserted into CH as it's received, or is there an intermediary buffer? A general overview of the flight of telemetry data through the system would be very welcome.

The data is received via OTLP (Otel protocol) and almost immediately inserted into ClickHouse buffer table in small batches. Simple and very efficient.

Tail-based sampling will require buffering spans in memory for some time, but tail-based sampling is not implemented yet.

Cloud version also uses Kafka to survive surges in traffic, but I guess "personal" / company version does not need that as much. So no need to introduce additional dependency.

For tail-based sampling, does it mean that every process in a trace will keep its spans in memory until the initial process 'ends' the trace? How does the flushing happen (e.g. all processes 'commit' their buffer spans)? Many thanks for the explanations!

Uptrace / Go process will buffer spans in memory for some short period of time (5-15 seconds). It does not work for long traces, but most traces are short.

There is some discussion at https://github.com/open-telemetry/opentelemetry-collector-co...