This looks really interesting, potentially makes GraphBLAS much more accessible for exploratory work. A few questions for the author:

Does this work in blocking or non-blocking mode? Naively I imagine there might be more opportunity for the GraphBLAS implementation to optimize execution in non-blocking mode.

Is there a way to efficiently store and load matrices to and from files? Ideally in a such a way that the data is just mmap'ed or copied directly into memory on load?

Does this only work with SuiteSparse or could it potentially work with a GPU implementation like https://github.com/gunrock/graphblast too?

espeed

The GraphBLAS GPU code is in the works [1]. For storage see the new RedisGraph 1.0 implementation released last year and 'michelp has a Python Postgres implementation in development.

[1] Sparse versus dense in GraphBLAS: sometimes dense is better http://aldenmath.com/sparse-verses-dense-in-graphblas-someti...

[2] RedisGraph: A Graph Database Module for Redis http://graphblas.org/?title=Graph_BLAS_Forum#Graph_analysis_...

[3] Graph Processing with Postgres and GraphBLAS https://news.ycombinator.com/item?id=19379800

laurencerowe

For the bioinformatics datasets I work with it is not cost efficient to load everything into a database. For some of these datasets (e.g. GWAS - Genome Wide Association Studies which are essentially sparse matrices) it might be interesting to explore with graph queries. I guess my ideal would be to have a GraphBLAS equivalent to Spark SQL queries working across files in cloud storage / NAS.

espeed

There are several teams working on distributed GraphBLAS (the GraphBLAS/D4M model was designed to run on supercomputers [0]). Kepner's team at MIT is one [1].

NB: D4M was the original name before it was changed to GraphBLAS and became a standard.

[0] GraphBLAS: Building Blocks For High Performance Graph Analytics https://crd.lbl.gov/news-and-publications/news/2017/graphbla...

[1] A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M Databases https://arxiv.org/abs/1902.00846

http://www.mit.edu/~kepner/

sandGorgon

This is very cool. I wonder if the Graphblas and the Dask team should collaborate.

Dask has a production grade distributed computing system (that is cloud compatible with kubernetes, yarn, EMR, Dataproc,etc).

lmeyerov

RAPIDS has picked up Dask for multi-gpu aspects of cudf (think spark/pandas on GPUs), and as cugraph is single GPU (https://github.com/rapidsai/cugraph) for going fast on ~billion row datasets... I'm guessing dask+cugraph will be happening for the next 100-1000X, if not already.

Graph partitioning is a weird world, so will be interesting to see!