What does HackerNews think of tidb?
TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try free: https://tidbcloud.com/free-trial
We work on a MySQL compatible distributed database called TiDB https://github.com/pingcap/tidb/ and key-value store called TiKV.
TiDB is written in Go and TiKV is written in Rust.
More roles and locations are available on https://www.pingcap.com/careers/
For the HTAP systems, as mentioned in the above blog, there are quite a few industrial products, like Google just announced AlloyDB (https://cloud.google.com/alloydb), Snowflake's UniStore (https://www.snowflake.com/workloads/unistore/), and one of the most popular open source projects TiDB (https://github.com/pingcap/tidb) which have been deployed by many business applications.
Hopefully these may help a little bit :-)
Seven years before, we started to build TiDB to solve MySQL sharding problem, yes, "I Don’t Want to Shard (MySQL)" https://news.ycombinator.com/item?id=32041656 too. At that time, we only had a community version of TiDB.
After we had developed TiDB for more that one year, we needed to consider earning money because we (PingCAP) are a company. So we built an enterprise version to sell to our customers. Unlike other companies do mostly, they separate the community and enterprise version a lot, there are even some killing features that only exist in the enterprise version. But we believe that all our customers, whether they pay for us or not, must get the benefits, nearly the same value from TiDB. So we decide to keep the same between the community and enterprise. The only difference for the enterprise version is that the version contains another two tiny features - audit log and IP white list. As far as I know, no community user asks us for these two feature util now.
Things were going well, and a few years later, we met the same dilemma for TiDB community version and TiDB cloud service https://tidb.cloud. Of course, at this time, we still insist on that the TiDB must be open source, any improvement for the cloud will be contributed to the community version at first, then we deploy to our own service later. This mean anyone can deploy TiDB easily on the cloud too.
This is our open source decision. Thanks to this decision, our products are now increasingly well known, you can check the insights from https://ossinsight.io/analyze/pingcap/tidb/.
But as the article mentioned, NewSQL is already there. DBs in this category has actually been awhile. I think Spanner's paper was 10 years ago. And it is ubiquitous across Google. So let's just accept that we should think things differently in 2022. Distributed RDBMS is already a thing, used in production, in many companies. Like TiDB(https://github.com/pingcap/tidb) mentioned in the article, Square already uses it to replace some MySQL's use cases (https://www.youtube.com/watch?v=TjqL50qzy3A).
But there are also https://www.cockroachlabs.com/product/, https://github.com/pingcap/tidb and https://ydb.tech/ which each has promise.
There is definitely lots of progress to make in this space. From what I have seen they are all significantly slower on a single node and are not as battle-hardened. But I think with time we will have a few really nice options to pick from.
I experienced the "Try Gitea" service and migrated our TiDB repo https://github.com/pingcap/tidb to it. When I clicked the Activity tab and selected "1 year" period, I found the page loading was so slow, nearly 90s. And I also found that this Activity doesn't have a Cache, I re-selected "1 year" again, and the page loading was nearly the same time.
I guess Gitea uses git command to traverse all the logs for the period every time. Maybe it can use a database to speed up, or like Github only provide at max "1 month" period.
I have been thinking about this problem with my peers when I started to build TiDB (https://github.com/pingcap/tidb) seven years ago. At that time, nearly all of us were familiar with Go language, so we decided to use Go to build the SQL layer of TiDB. Thanks to Go, we could develop TiDB very quickly and released the first MVP in half a year. I remembered clearly the sense when we ran TPC-C successfully, although the TPMC was just 1 at that time, this was a good start for us.
But Go had some problems, e.g. the GC was not good before, the fair scheduling might cause some latency problem, or data racing may happen sometimes. So when we decided to build a distributed storage (aha, TiKV (https://githbu.com/tikv/tikv), we wanted use another language to guarantee safety. I really admire our courage - we chose Rust which was just released 1.0 and missed lots of libraries at that time. Now it seems that this is an awesome choice, TiKV has been graduated from CNCF, and been used as building block not only for TiDB, but also for other distributed systems. Thanks Rust.
When TiDB started being used in many companies, we found that our customer not only ran lots of online transactions in TiDB, but also they wanted to ran some realtime analytic queries directly because the data has been in TiDB already. So we decided to build a HTAP database, to introduce a column storage beside TiKV, this is TiFlash (https://github.com/pingcap/tiflash). We build TiFlash based on Clickhouse, so of course, we use C++.
As you can see, to build only one integrated database - TiDB, we at least use three languages, every language has its own reason to be introduced. We can treat the distributed database as a service system, each service can be built with your favorite language and the services are linked by gRPC like TiDB does now. You may doubt that - “hey, guys, you are building a database, performance is very importance”. Yes, this is true, but we also build a complex distributed system, especially on the cloud. Scale-out, elastic, user experience must be important too. This is trade off for an engineer :-)
Actually let me ask another thing. Your FAQ mentions you're considering hosting CockroachDB as a drop-in distributed replacement for PostgreSQL [0], and also you currently offer a distributed, eventually consistent PostgreSQL replication solution [1].
Is either Tikv [2] (distributed key-value store) or Tidb [3] (distributed database with a mysql interface, built on top of Tikv) on your radar?
You already offer Redis as a key-value store, but Tikv has an amazing property: it ensures strong consistency globally (not eventual consistency). Tidb, being built on top of Tikv, also has strong consistency.
[0] https://fly.io/blog/fly-answers-questions/#q-what-is-fly-doi...
[1] https://fly.io/blog/globally-distributed-postgres/
- For OLTP workload and some ad-hoc queries, you can use TiDB + TiKV. One of our adopters has a production cluster with 300+ TB data and can easily cope with the spike caused by the brought by COVID-19.
- For more complex queries, TiSpark+ TiKV might work well; for heavier queries, we added a columnar store, TiFlash, see https://pingcap.com/blog/delivering-real-time-analytics-and-...
First, it’s true that the current setup/deployment of TiDB is not easy. This is something that we're making serious moves to improve. For example,
A. We provide Ansible playbooks to simplify the deployment and rolling upgrade for on-prem users;
B. We built and open-sourced TiDB Operator (https://github.com/pingcap/tidb-operator) to enable TiDB on Kubernetes. We are working on a fully managed service in the public cloud (coming soon). Whether it is one binary or multiple binaries, it’ll be all transparent at the user level;
C. We are improving the default or self-adaptative parameters and are continuously refining the configuration process;
D. We are also trying to reduce the number of components. For example, the new version of CDC is implemented directly inside TiKV.
E. We are developing TiOps tools in a single binary to improve the operating and maintaining experience of the cluster. A fair amount of customers around the world are using TiDB in their production environments and we are making sure they get our help when needed in the setup so it would not be a deal-breaker.
Second, TiDB’s multiple-component or highly-layered architecture is challenging for deployment but the benefits are also obvious:
A. The separation of the storage and computing layers makes it flexible and agile to scale/upgrade each layer as needed. Different layers need different types or different number of hardware resources. If the computing resources become the bottleneck, users can scale the SQL layer by adding more TiDB instances in real-time; if the bottleneck is the storage layer, they can easily add more TiKV instances to increase the storage capacity.
B. As is known to many that we have donated TiKV to CNCF last year. We are fully committed to the open-source community and would like to see TiKV be the building block and foundation of the next-generation infrastructure. For example, we are happy to see some community users sit Redis on top of TiKV, and we ourselves built the TiSpark (https://github.com/pingcap/tispark) connector to run Apache Spark on TiKV.
For more thoughts about this, please take a look at my blog: https://pingcap.com/blog/9-whys-to-ask-when-evaluating-a-dis...
Feel free to give more feedback on https://github.com/pingcap/tidb and our community Slack channel https://pingcap.com/tidbslack. We're glad to discuss more with you on this issue!
Some Chinese startups are adopting open-source databases instead of Oracle and IBM. One of them, PingCAP's TiDB [1], happens to be built by a Chinese startup. And that's it? I was expecting to hear that the cold dead hand of the Chinese Communist Party was forcing this, but no, it seems to be simple capitalism at play -- nobody wants to pay the Oracle tax if they can avoid it.
TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads.
https://github.com/pingcap/tidb
So yes i can believe it!
- hybrid transactional and analytical processing (HTAP), coined by Gartner, - hybrid operational and analytical workloads (HOAP), by 451 Research - Translytical, by Forrester
If that's the solution you want to explore, TiDB (https://github.com/pingcap/tidb), the open source distributed scalable HTAP database, might be able to help you. ETL is no longer necessary with TiDB’s hybrid OLTP/OLAP architecture.
Here is a use case about how it helps the largest B2C fresh produce online marketplace in China to acquire real-time intelligence:
https://www.datanami.com/2018/02/22/hybrid-database-capturin...
Here is a tutorial about how you can try TiDB/TiSpark on your own laptop using Docker Compose: https://www.pingcap.com/blog/how_to_spin_up_an_htap_database...
Disclaimer: I work for TiDB.
If you are open to trying other options, TiDB (https://github.com/pingcap/tidb) is also a good choice. A use case is just published on Datanami today: https://www.datanami.com/2018/02/22/hybrid-database-capturin...
Maybe they should try TiDB(https://github.com/pingcap/tidb). It is a MySQL drop-in replacement that scales.
At least, TiDB, CRDB, RethinkDB are open source :)
https://github.com/pingcap/tidb https://github.com/cockroachdb/cockroach
They provide the features of both RDBMS, such as ACID transactions, SQL, and the scalability of NoSQL.
1 - https://www.cockroachlabs.com 2 - https://github.com/pingcap/tidb
Have you examined emerging databases like Tarantool https://tarantool.org/, GunDB http://gundb.io, TiDB https://github.com/pingcap/tidb, ClickHouse https://clickhouse.yandex/ ?
It would be great to read some deep and independent analysis for them to.
This is actually affiliated to https://github.com/pingcap/tidb, which is another implementation of Google Spanner/F1, similar to CockroachDB. So think TiKV as spanner, and TiDB as F1.
They actually started out using Go. And the reason they are changing to Rust is not trying to be a Rust cool kid, but due to real constraints: Go's GC introduces too much of latency time (and no, Go 1.6 is not enough despite being better), they are also not satisfied with cgo's costs. That's why they are changing to Rust. Note that according to their words, they still love Go better, and TiDB is also happily using Go. But for TiKV's case, Rust is perfect.
Disclaimer: I'm not working on TiKV or TiDB, I'm just a fan of their work(they are also the founders behind Codis: https://github.com/CodisLabs/codis, tho not working on that anymore). I'm happy to list references on where I got all the information in case anyone are interested but they are all in Chinese, which I think might not be that polite since not everyone here reads Chinese.