I’m assuming anyone who has to make that decision already knows this but, while PostgreSQL is great to host a production database, it isn’t a great choice for an analytic database at scale, or to train or store your machine learning features. It works, but it’s not great at scale.
You can get away with having a scheduled pg_dump early on, some reports on that, while you figure out an ETL/Messaging process — but picking something that handles concurrent large-scale queries will matter fast.
Could you name some better alternatives for analytical db?
- For OLTP workload and some ad-hoc queries, you can use TiDB + TiKV. One of our adopters has a production cluster with 300+ TB data and can easily cope with the spike caused by the brought by COVID-19.
- For more complex queries, TiSpark+ TiKV might work well; for heavier queries, we added a columnar store, TiFlash, see https://pingcap.com/blog/delivering-real-time-analytics-and-...