It looks like a competent and thoughtful implementation but, as best I can determine and not to take anything away from it, using an old design. The performance and scalability is throttled by the use of secondary indexing structures. You would have to use some pretty expensive hardware for the performance cliffs to not be immediately evident.

I don’t do a lot of work on graph databases these days, but I’ve seen state-of-the-art implementations do 10x this many inserts/sec/server on EC2 VMs where the local data model size was 100x the available RAM. And in principle these architectures could easily scale-out. Indexing structure and storage engine design figure prominently, both usually need to be built for the purpose.

Excellent points. To add, and I’m not taking away from the technical effort, but the use of “native graph” is rather misleading. Existing computer architecture cannot represent a graph with sequentialized memory access (1-dimensional). So any representation has to make assumptions and use random memory access patterns. A longer discussion here (obvious bias aside, the other no longer works at DSE): https://www.datastax.com/blog/2013/11/letter-regarding-nativ...

shermanye

Thanks for the reply. "Native Graph" here means the system (including the storage and query engine) is designed around the Graph data structure. The opposite part of the "Native Graph" is usually called "multi-mode" databases. In other systems, the storage is designed either as tables, or as some other data structures. They only provide a Graph query interface to simulate the graph query engine. But behind the scene, they are still doing the SQL (or whatever) queries.

In Nebula, data are stored in a way so that getting all neighbors is actually a sequential read

rajman187

I see, thanks for the clarification. Can you expand on that a bit more? Is this some sort of index-free adjacency then? I still don't understand how the neighbours can be stored sequentially in memory, especially if this is a distributed system.

ddorian43

dgraph.io uses posting lists.

I think the state of the art is https://github.com/GraphBLAS.

Examples: https://github.com/michelp/pggraphblas & https://github.com/RedisGraph/RedisGraph/