What does HackerNews think of tidb?

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try free: https://tidbcloud.com/free-trial

Language: Go

#2 in Database
#2 in Go
#15 in Hacktoberfest
#1 in MySQL
#1 in Kubernetes
#2 in Serverless
#1 in SQL
PingCAP | https://www.pingcap.com | Database Engineer, Product Manager, Developer Advocate and more | Remote in California | Full-time

We work on a MySQL compatible distributed database called TiDB https://github.com/pingcap/tidb/ and key-value store called TiKV.

TiDB is written in Go and TiKV is written in Rust.

More roles and locations are available on https://www.pingcap.com/careers/

OLTP usually comes with high throughput of transactions, which means usually write(e.g., IUD - insert, update, delete) to read (e.g., select) ratio is above 4 or 5 or even higher. There are some good benchmarks to test OLTP workload like TPC-C (https://www.tpc.org/tpcc/), and some benchmarks to test OLAP workload like TPC-H (https://www.tpc.org/tpch/). For mixed or hybrid OLTP and OLAP (it's called HTAP, see this blog for some background https://en.pingcap.com/blog/the-beauty-of-htap-tidb-and-allo...), TPC-H was originally designed for this, however, it actually doesn't reveal the real world workload with several drawbacks. A newer research work from UC Berkeley proposed a HTAP benchmark called TAOBench (https://www.vldb.org/pvldb/vol15/p1965-cheng.pdf) which is pretty interesting and worthy to check.

For the HTAP systems, as mentioned in the above blog, there are quite a few industrial products, like Google just announced AlloyDB (https://cloud.google.com/alloydb), Snowflake's UniStore (https://www.snowflake.com/workloads/unistore/), and one of the most popular open source projects TiDB (https://github.com/pingcap/tidb) which have been deployed by many business applications.

Hopefully these may help a little bit :-)

I am very agree with some options of this blog. As the maintainer of the open source distributed database TiDB https://github.com/pingcap/tidb, we also face the same problem of choice. We have a community version, an enterprise version(of course, we must sell it to our customers to earn money) and also a cloud service named TiDB cloud.

Seven years before, we started to build TiDB to solve MySQL sharding problem, yes, "I Don’t Want to Shard (MySQL)" https://news.ycombinator.com/item?id=32041656 too. At that time, we only had a community version of TiDB.

After we had developed TiDB for more that one year, we needed to consider earning money because we (PingCAP) are a company. So we built an enterprise version to sell to our customers. Unlike other companies do mostly, they separate the community and enterprise version a lot, there are even some killing features that only exist in the enterprise version. But we believe that all our customers, whether they pay for us or not, must get the benefits, nearly the same value from TiDB. So we decide to keep the same between the community and enterprise. The only difference for the enterprise version is that the version contains another two tiny features - audit log and IP white list. As far as I know, no community user asks us for these two feature util now.

Things were going well, and a few years later, we met the same dilemma for TiDB community version and TiDB cloud service https://tidb.cloud. Of course, at this time, we still insist on that the TiDB must be open source, any improvement for the cloud will be contributed to the community version at first, then we deploy to our own service later. This mean anyone can deploy TiDB easily on the cloud too.

This is our open source decision. Thanks to this decision, our products are now increasingly well known, you can check the insights from https://ossinsight.io/analyze/pingcap/tidb/.

Yes, hardware has limits and you just cannot have a single machine with unlimited number of cores and disks.

But as the article mentioned, NewSQL is already there. DBs in this category has actually been awhile. I think Spanner's paper was 10 years ago. And it is ubiquitous across Google. So let's just accept that we should think things differently in 2022. Distributed RDBMS is already a thing, used in production, in many companies. Like TiDB(https://github.com/pingcap/tidb) mentioned in the article, Square already uses it to replace some MySQL's use cases (https://www.youtube.com/watch?v=TjqL50qzy3A).

Disclaimer: I now work at https://www.yugabyte.com/yugabytedb/, I joined because I liked their approach and think it has a lot of merit.

But there are also https://www.cockroachlabs.com/product/, https://github.com/pingcap/tidb and https://ydb.tech/ which each has promise.

There is definitely lots of progress to make in this space. From what I have seen they are all significantly slower on a single node and are not as battle-hardened. But I think with time we will have a few really nice options to pick from.

Gitea is very easy to use, but I find the Activity feature is a little slow.

I experienced the "Try Gitea" service and migrated our TiDB repo https://github.com/pingcap/tidb to it. When I clicked the Activity tab and selected "1 year" period, I found the page loading was so slow, nearly 90s. And I also found that this Activity doesn't have a Cache, I re-selected "1 year" again, and the page loading was nearly the same time.

I guess Gitea uses git command to traverse all the logs for the period every time. Maybe it can use a database to speed up, or like Github only provide at max "1 month" period.

One of the founder of TiDB/TiKV here from PingCAP (https://pingcap.com)

I have been thinking about this problem with my peers when I started to build TiDB (https://github.com/pingcap/tidb) seven years ago. At that time, nearly all of us were familiar with Go language, so we decided to use Go to build the SQL layer of TiDB. Thanks to Go, we could develop TiDB very quickly and released the first MVP in half a year. I remembered clearly the sense when we ran TPC-C successfully, although the TPMC was just 1 at that time, this was a good start for us.

But Go had some problems, e.g. the GC was not good before, the fair scheduling might cause some latency problem, or data racing may happen sometimes. So when we decided to build a distributed storage (aha, TiKV (https://githbu.com/tikv/tikv), we wanted use another language to guarantee safety. I really admire our courage - we chose Rust which was just released 1.0 and missed lots of libraries at that time. Now it seems that this is an awesome choice, TiKV has been graduated from CNCF, and been used as building block not only for TiDB, but also for other distributed systems. Thanks Rust.

When TiDB started being used in many companies, we found that our customer not only ran lots of online transactions in TiDB, but also they wanted to ran some realtime analytic queries directly because the data has been in TiDB already. So we decided to build a HTAP database, to introduce a column storage beside TiKV, this is TiFlash (https://github.com/pingcap/tiflash). We build TiFlash based on Clickhouse, so of course, we use C++.

As you can see, to build only one integrated database - TiDB, we at least use three languages, every language has its own reason to be introduced. We can treat the distributed database as a service system, each service can be built with your favorite language and the services are linked by gRPC like TiDB does now. You may doubt that - “hey, guys, you are building a database, performance is very importance”. Yes, this is true, but we also build a complex distributed system, especially on the cloud. Scale-out, elastic, user experience must be important too. This is trade off for an engineer :-)

Fair enough. Indeed I didn't consider support costs. Thank you for your answer!

Actually let me ask another thing. Your FAQ mentions you're considering hosting CockroachDB as a drop-in distributed replacement for PostgreSQL [0], and also you currently offer a distributed, eventually consistent PostgreSQL replication solution [1].

Is either Tikv [2] (distributed key-value store) or Tidb [3] (distributed database with a mysql interface, built on top of Tikv) on your radar?

You already offer Redis as a key-value store, but Tikv has an amazing property: it ensures strong consistency globally (not eventual consistency). Tidb, being built on top of Tikv, also has strong consistency.

[0] https://fly.io/blog/fly-answers-questions/#q-what-is-fly-doi...

[1] https://fly.io/blog/globally-distributed-postgres/

[2] https://github.com/tikv/tikv

[3] https://github.com/pingcap/tidb

Another Chinese database seems to be doing a better job at this front: https://github.com/pingcap/tidb
Shameless plug: You might be interested in TiDB (https://github.com/pingcap/tidb), an open-source distributed HTAP database.

- For OLTP workload and some ad-hoc queries, you can use TiDB + TiKV. One of our adopters has a production cluster with 300+ TB data and can easily cope with the spike caused by the brought by COVID-19.

- For more complex queries, TiSpark+ TiKV might work well; for heavier queries, we added a columnar store, TiFlash, see https://pingcap.com/blog/delivering-real-time-analytics-and-...

PingCAP CTO here, thanks for these comments! We highly appreciate all the feedback!

First, it’s true that the current setup/deployment of TiDB is not easy. This is something that we're making serious moves to improve. For example,

A. We provide Ansible playbooks to simplify the deployment and rolling upgrade for on-prem users;

B. We built and open-sourced TiDB Operator (https://github.com/pingcap/tidb-operator) to enable TiDB on Kubernetes. We are working on a fully managed service in the public cloud (coming soon). Whether it is one binary or multiple binaries, it’ll be all transparent at the user level;

C. We are improving the default or self-adaptative parameters and are continuously refining the configuration process;

D. We are also trying to reduce the number of components. For example, the new version of CDC is implemented directly inside TiKV.

E. We are developing TiOps tools in a single binary to improve the operating and maintaining experience of the cluster. A fair amount of customers around the world are using TiDB in their production environments and we are making sure they get our help when needed in the setup so it would not be a deal-breaker.

Second, TiDB’s multiple-component or highly-layered architecture is challenging for deployment but the benefits are also obvious:

A. The separation of the storage and computing layers makes it flexible and agile to scale/upgrade each layer as needed. Different layers need different types or different number of hardware resources. If the computing resources become the bottleneck, users can scale the SQL layer by adding more TiDB instances in real-time; if the bottleneck is the storage layer, they can easily add more TiKV instances to increase the storage capacity.

B. As is known to many that we have donated TiKV to CNCF last year. We are fully committed to the open-source community and would like to see TiKV be the building block and foundation of the next-generation infrastructure. For example, we are happy to see some community users sit Redis on top of TiKV, and we ourselves built the TiSpark (https://github.com/pingcap/tispark) connector to run Apache Spark on TiKV.

For more thoughts about this, please take a look at my blog: https://pingcap.com/blog/9-whys-to-ask-when-evaluating-a-dis...

Feel free to give more feedback on https://github.com/pingcap/tidb and our community Slack channel https://pingcap.com/tidbslack. We're glad to discuss more with you on this issue!

TiDB is Apache 2.0 licensed, but it is MySQL compatible and features horizontal scalability. https://github.com/pingcap/tidb
Shameless self-plug: If you are looking for an open-source scale-out MySQL that supports both multiple writes and multiple reads, feel free to give TiDB (https://github.com/pingcap/tidb) a try.
This article is... not great journalism.

Some Chinese startups are adopting open-source databases instead of Oracle and IBM. One of them, PingCAP's TiDB [1], happens to be built by a Chinese startup. And that's it? I was expecting to hear that the cold dead hand of the Chinese Communist Party was forcing this, but no, it seems to be simple capitalism at play -- nobody wants to pay the Oracle tax if they can avoid it.

[1] https://github.com/pingcap/tidb

Somehow I've never heard of this before:

TiDB (The pronunciation is: /'taɪdiːbi:/ tai-D-B, etymology: titanium) is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads.

https://github.com/pingcap/tidb

So many implementations of this out there already, like tidb.

https://github.com/pingcap/tidb

So yes i can believe it!

TiDB is a distributed HTAP database compatible with the MySQL protocol(https://github.com/pingcap/tidb)
If I understand it correctly, the "business intelligence stack" you are looking for is something that bridges the gap between the online transactional processing (OLTP) and online analytical processing (OLAP). If that's the case, then some new jargons might help you:

- hybrid transactional and analytical processing (HTAP), coined by Gartner, - hybrid operational and analytical workloads (HOAP), by 451 Research - Translytical, by Forrester

If that's the solution you want to explore, TiDB (https://github.com/pingcap/tidb), the open source distributed scalable HTAP database, might be able to help you. ETL is no longer necessary with TiDB’s hybrid OLTP/OLAP architecture.

Here is a use case about how it helps the largest B2C fresh produce online marketplace in China to acquire real-time intelligence:

https://www.datanami.com/2018/02/22/hybrid-database-capturin...

Here is a tutorial about how you can try TiDB/TiSpark on your own laptop using Docker Compose: https://www.pingcap.com/blog/how_to_spin_up_an_htap_database...

Disclaimer: I work for TiDB.

This is a fairly good point. Personally, I believe this also opens a debate about open source software VS proprietary software. Customers of proprietary software companies usually have minimal influence on the priorities and timelines of where/when the product is going. Not to mention the opacity which makes it unable to modify the code or debug it effectively. Most of the time when an issue occurs, you are faced with unclear error codes/messages/documents. If there is no existing workaround or patch, you have to wait until your issue climbs up to the top of the vendor's priority list. Whereas in the open source community, many eyes are examing the source code and many hands (including yourself) are ready to jump in all the time, making it more possible that more bugs are exposed and quicker to get a fix or workaround.

If you are open to trying other options, TiDB (https://github.com/pingcap/tidb) is also a good choice. A use case is just published on Datanami today: https://www.datanami.com/2018/02/22/hybrid-database-capturin...

TiDB seems to have a pretty active community, judging by its repo (https://github.com/pingcap/tidb). Also, saw this thread on TiDB v MySQL recently that's pretty detailed (https://www.quora.com/How-does-TiDB-compare-with-MySQL). Looks like a good option worth trying out.
>Of the list, AngularJS and MySQL have been the only ones to give us scaling problems. Our monolithic AngularJS code-bundle has got too big and the initial download takes quite a while and the application is a bit too slow. MySQL (in RDS) crashes and restarts due to growing BI query complexity and it’s been hard to fix this.

Maybe they should try TiDB(https://github.com/pingcap/tidb). It is a MySQL drop-in replacement that scales.

Another good candidate is TiDB (https://github.com/pingcap/tidb). It has elastic scalability, ACID compliances, high availability, etc.

At least, TiDB, CRDB, RethinkDB are open source :)

TiDB and CockroachDB also support distributed transactions. The transaction model in TiDB (https://github.com/pingcap/tidb) is inspired by Google Percolator, It’s mainly a two-phase commit protocol with some optimizations. More info on the blog: https://github.com/pingcap/blog/blob/master/_posts/2016-10-1...
Wow! Congratulations! For those who want to check it out: https://github.com/pingcap/tidb
I have to bring this up again: NewSQL. There are quite a few new choices out there, Google Spanner, Cockroachdb(https://github.com/cockroachdb/cockroach), TiDB (https://github.com/pingcap/tidb). All of them are very easy to scale while maintaining the ACID transactions.
There are also quite some NewSQL solutions out there. TiDB, CockroachDB, Spanner, just to name a few. Besides, the first two are open source...

https://github.com/pingcap/tidb https://github.com/cockroachdb/cockroach

They provide the features of both RDBMS, such as ACID transactions, SQL, and the scalability of NoSQL.

I'd be interested in any comparisons of CockroachDB vs TIDB[1], especially when it comes to speed.

1 - https://github.com/pingcap/tidb

You may be interested in CockroachDB[1] and TIDB[2], which are open-source newSQL databases inspired by Spanner and F1.

1 - https://www.cockroachlabs.com 2 - https://github.com/pingcap/tidb

Very thoughtful notes, thanks. Waiting for your full blog posts.

Have you examined emerging databases like Tarantool https://tarantool.org/, GunDB http://gundb.io, TiDB https://github.com/pingcap/tidb, ClickHouse https://clickhouse.yandex/ ?

It would be great to read some deep and independent analysis for them to.

You can give TiDB a try. TiDB is a NewSQL database inspired by Google Spanner and F1. It supports the best features of both traditional RDBMS and NoSQL. Check it out here: https://github.com/pingcap/tidb
I know the authors of this project also read Hacker News, so it probably is better if we wait till them to state their motives. But I suppose I could give a brief first, since most of the confusing comments are due to language barriers(sorry if you don't know enough Chinese, you might not have read enough about this project).

This is actually affiliated to https://github.com/pingcap/tidb, which is another implementation of Google Spanner/F1, similar to CockroachDB. So think TiKV as spanner, and TiDB as F1.

They actually started out using Go. And the reason they are changing to Rust is not trying to be a Rust cool kid, but due to real constraints: Go's GC introduces too much of latency time (and no, Go 1.6 is not enough despite being better), they are also not satisfied with cgo's costs. That's why they are changing to Rust. Note that according to their words, they still love Go better, and TiDB is also happily using Go. But for TiKV's case, Rust is perfect.

Disclaimer: I'm not working on TiKV or TiDB, I'm just a fan of their work(they are also the founders behind Codis: https://github.com/CodisLabs/codis, tho not working on that anymore). I'm happy to list references on where I got all the information in case anyone are interested but they are all in Chinese, which I think might not be that polite since not everyone here reads Chinese.