This looks really interesting, potentially makes GraphBLAS much more accessible for exploratory work. A few questions for the author:
Does this work in blocking or non-blocking mode? Naively I imagine there might be more opportunity for the GraphBLAS implementation to optimize execution in non-blocking mode.
Is there a way to efficiently store and load matrices to and from files? Ideally in a such a way that the data is just mmap'ed or copied directly into memory on load?
Does this only work with SuiteSparse or could it potentially work with a GPU implementation like https://github.com/gunrock/graphblast too?
The GraphBLAS GPU code is in the works [1]. For storage see the new RedisGraph 1.0 implementation released last year and 'michelp has a Python Postgres implementation in development.
[1] Sparse versus dense in GraphBLAS: sometimes dense is better http://aldenmath.com/sparse-verses-dense-in-graphblas-someti...
[2] RedisGraph: A Graph Database Module for Redis http://graphblas.org/?title=Graph_BLAS_Forum#Graph_analysis_...
[3] Graph Processing with Postgres and GraphBLAS https://news.ycombinator.com/item?id=19379800
For the bioinformatics datasets I work with it is not cost efficient to load everything into a database. For some of these datasets (e.g. GWAS - Genome Wide Association Studies which are essentially sparse matrices) it might be interesting to explore with graph queries. I guess my ideal would be to have a GraphBLAS equivalent to Spark SQL queries working across files in cloud storage / NAS.
There are several teams working on distributed GraphBLAS (the GraphBLAS/D4M model was designed to run on supercomputers [0]). Kepner's team at MIT is one [1].
NB: D4M was the original name before it was changed to GraphBLAS and became a standard.
[0] GraphBLAS: Building Blocks For High Performance Graph Analytics https://crd.lbl.gov/news-and-publications/news/2017/graphbla...
[1] A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M Databases https://arxiv.org/abs/1902.00846
This is very cool. I wonder if the Graphblas and the Dask team should collaborate.
Dask has a production grade distributed computing system (that is cloud compatible with kubernetes, yarn, EMR, Dataproc,etc).
Graph partitioning is a weird world, so will be interesting to see!