What does HackerNews think of questdb?

An open source time-series database for fast ingest and SQL queries

Language: Java

#9 in C++
#21 in Database
#93 in Hacktoberfest
#16 in Java
#18 in PostgreSQL
#22 in PostgreSQL
#18 in SQL
QuestDB | Python/cloud engineers, UX Designer, Solution engineer, Core Database Engineers | 100% Remote |

QuestDB (YCS20) is building the fastest open source time series database. We hire talented and passionate people who share our mission to empower developers to solve their problems with data.

https://questdb.io/careers https://github.com/questdb/questdb

QuestDB | Backend engineer (python) | Remote| https://github.com/questdb/questdb

We help developers handle explosive amounts of data while getting them started in just a few minutes with the fastest and most accessible open source time series database.

We're hiring a python backend developer to build a SaaS solution from the ground up.

Link: https://questdb.io/careers/senior-backend-engineer-python/

QuestDB | remote | https://github.com/questdb/questdb

We help developers handle explosive amounts of data while getting them started in just a few minutes with the fastest and most accessible open source time series database.

We're hiring across engineering, developer relations, customer success, talent acquisition.

Link: https://questdb.io/careers/

QuestDB (YCS20) is an open source time-series database. We're growing the team and hiring core database engineers (low-latency java/c++), cloud engineers, front-end engineers and developer relation engineers. https://github.com/questdb/questdb
here is one: https://github.com/questdb/questdb. Disclaimer, I work on this project. Main reason we use Java is speed of development (which increased with amount of base libraries written) and ease of testing.
Author here.

A few weeks ago, we wrote about how we implemented SIMD instructions to aggregate a billion rows in milliseconds [1] thanks in great part to Agner Fog’s VCL library [2]. Although the initial scope was limited to table-wide aggregates into a unique scalar value, this was a first step towards very promising results on more complex aggregations. With the latest release of QuestDB, we are extending this level of performance to key-based aggregations.

To do this, we implemented Google’s fast hash table aka “Swisstable” [3] which can be found in the Abseil library [4]. In all modesty, we also found room to slightly accelerate it for our use case. Our version of Swisstable is dubbed “rosti”, after the traditional Swiss dish [5]. There were also a number of improvements thanks to techniques suggested by the community such as prefetch (which interestingly turned out to have no effect in the map code itself) [6]. Besides C++, we used our very own queue system written in Java to parallelise the execution [7].

The results are remarkable: millisecond latency on keyed aggregations that span over billions of rows.

We thought it could be a good occasion to show our progress by making this latest release available to try online with a pre-loaded dataset. It runs on an AWS instance using 23 threads. The data is stored on disk and includes a 1.6billion row NYC taxi dataset, 10 years of weather data with around 30-minute resolution and weekly gas prices over the last decade. The instance is located in London, so folks outside of Europe may experience different network latencies. The server-side time is reported as “Execute”.

We provide sample queries to get started, but you are encouraged to modify them. However, please be aware that not every type of query is fast yet. Some are still running under an old single-threaded model. If you find one of these, you’ll know: it will take minutes instead of milliseconds. But bear with us, this is just a matter of time before we make these instantaneous as well. Next in our crosshairs is time-bucket aggregations using the SAMPLE BY clause.

If you are interested in checking out how we did this, our code is available open-source [8]. We look forward to receiving your feedback on our work so far. Even better, we would love to hear more ideas to further improve performance. Even after decades in high performance computing, we are still learning something new every day.

[1] https://questdb.io/blog/2020/04/02/using-simd-to-aggregate-b...

[2] https://www.agner.org/optimize/vectorclass.pdf

[3] https://www.youtube.com/watch?v=ncHmEUmJZf4

[4] https://github.com/abseil/abseil-cpp

[5] https://github.com/questdb/questdb/blob/master/core/src/main...

[6] https://github.com/questdb/questdb/blob/master/core/src/main...

[7] https://questdb.io/blog/2020/03/15/interthread

[8] https://github.com/questdb/questdb

One step beyond good Java GCs is to write fully zero-GC Java code. The advantage of it is complete control over your performance which means your software is going to be consistently fast. The disadvantage is that it is relatively difficult to obtain.

If you want to see an example of fully zero-GC Java, you can check out QuestDB on Github [1] - Disclaimer I work for QuestDB.

[1]https://github.com/questdb/questdb

I found this curious regarding QuestDB[1]:

> Java 8 64-bit. We recommend Oracle Java 8, but OpenJDK8 will also work (although a little slower).

Anyone have an idea why?

[1]https://github.com/questdb/questdb

Hi, QuestDB's author here, thanks for posting! I wanted to post this on Show HN, but someone beat me to it!

We are an open source (Apache 2.0) time-series database, programmed in zero-GC Java. You can find us on GitHub https://github.com/questdb/questdb. We would like to get your feedback.