Haven't used any of these yet, but how does ClickHouse compare to Postgres extensions like TimescaleDB and Citus (which recently launched a columnar feature)? I remember reading in the ClickHouse docs some time ago that it does not have DELETE functionality. Does this pose any problems with GDPR and data deletion requests?
There are many independent comparisons of ClickHouse vs TimescaleDB:
By Splitbee: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... By GitLab: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... And others: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... https://github.com/ClickHouse/ClickHouse/issues/22398#issuec...
If you'll find more, please post it there.
TimescaleDB can work pretty fine in time series scenario but does not shine on analytical queries. For most of time series queries, it is below ClickHouse in terms of performance but for small (point) queries it can be better.
The main advantage of TimescaleDB is that it better integrates with Postgres (for obvious reasons).
There are also many comparisons of ClickHouse vs Citus. The most notable is here: https://blog.cloudflare.com/http-analytics-for-6m-requests-p...
ClickHouse can do batch DELETE operations for data cleanup. https://clickhouse.com/docs/en/sql-reference/statements/alte... It is not for frequent single-record deletions, but it can fulfill the needs for data cleanup, retention, GDPR requirements.
Also you can tune TTL rules in ClickHouse, per table or per columns (say, replace all IP addresses to zero after three months).
@zX41ZdbW@ - Thanks for pointing out the various benchmarks that have been run by other companies between Clickhouse and TimescaleDB using TSBS[1]. As we mentioned, we'll dig deeper into a similar benchmark with much more detail than any of those examples in an upcoming blog post.
One notable omission on all of the benchmarks that we've seen is that none of them enable TimescaleDB compression (which also transforms row-oriented data into a columnar-type format). In our detailed benchmarking, queries on compressed columnar data in Timescale outperformed Clickhouse in most queries, particularly as cardinality increases, often by 5x or more. And with compression of 90% or more, storage is often comparable. (Again, blog post coming soon - we are just making sure our results are accurate before rushing to publish.)
The beauty of TimescaleDB columnar compression model is that it allows the user to decide when their workload can benefit from deep/narrow queries of data that doesn't change often (although it can still be modified just like regular row data), verses shallow/wide queries for things like inserting data and near-time queries.
It's a hybrid model that provides a lot of flexibility for users AND significantly improves the performance of historical queries. So yes, we do agree that columnar storage is a huge performance win for many types of queries.
And of course, with TimescaleDB, one also gets all of the benefits of PostgreSQL and its vibrant ecosystem.
Can't wait to share the details in the coming weeks!