I can only recommend TimescaleDB. It solves the right problems (storing timeseries) while not creating new ones (deployment, backup, hot failover) as it relies on Postgres to provide the underlying infrastructure. I stored 100 million sensor samples in TimeScale and had not issues with scaling on medium sized boxes, despite issuing complex time-series queries.
As for the hosting option, currently sadly AWS doesn’t offer Timescale as part of RDS. There are two options: Azure offers Timescale now as part of their hosted Postgres. Or you go with aiven.io who can host you postgres with TimeScaleDB on all cloud providers (AWS, GCP, Azure, DO, ?) as a service, including replicas and backups.
Overall, I’m very happy to see the Postgres ecosystem growing.
Interesting. My team currently uses (abuses?) postgres for timeseries data. You mind ansswering some general questions about your experience with timescale? You said 100 million sensor samples. What was the upload/download frequency? Our application is pushing hundreds of millions of rows across many different data sources every day. On top of that, we are also querying the shit out of this data to run models and we need VERY quick queries. like 10-100ms speed.
How do you think timescaleDB would handle that size and also velocity of data?
Probably, ClickHouse [1] would fit better your needs. It can write millions of rows per second [2]. It can scan billions of rows per second on a single node and it scales to multiple nodes.
Also I'd recommend taking a look at other open-source TSDBs with cluster support:
- M3DB [3]
- Cortex [4]
- VictoriaMetrics [5]
These TSDBs speak PromQL instead of SQL. PromQL is specially optimized query language for typical time series queries [6].
[2] https://blog.cloudflare.com/http-analytics-for-6m-requests-p...
[4] https://github.com/cortexproject/cortex
[5] https://github.com/VictoriaMetrics/VictoriaMetrics/
[6] https://medium.com/@valyala/promql-tutorial-for-beginners-9a...