Would anyone be interested in discussing this paper together, especially through the lens of "how can we schedule algebraic expressions and checkpoint computed progress across a heterogenous pool of consumer/donated machines?"

I'm an infrastructure engineer, mostly focused on databases, data pipelines, ML infra for the past 10-15 years. Even when designing homogenous compute clusters, I had to dig in and understand compiler-level implementations in MLIR and LLVM. I'm not a compiler expert by any measure, but know just enough to be dangerous and curious about (safely) scheduling computations across a pool of volunteers machines. Seems especially important to chew on now, with training of foundational LLM weights costing 7-9 figures.

(This is more of a link-dump than a paper discussion --)

For the line of inquiry w.r.t tensor compilers and MLIR/LLVM (linalg, polyhedral, [sparse_]tensor, etc), I personally found the following really helpful: https://news.ycombinator.com/item?id=25545373 (links to a survey), https://github.com/merrymercy/awesome-tensor-compilers

I also have an interest in the community more widely associated with pandas/dataframes-like languages (e.g. modin/dask/ray/polars/ibis) with substrait/calcite/arrow their choice of IR. Some links: https://github.com/modin-project/modin, https://github.com/dask/dask/issues/8980, https://news.ycombinator.com/item?id=16510610, https://news.ycombinator.com/item?id=35521785

I broadly classify them as such since the former has a stronger disposition towards linear/tensor-algebra, while the latter towards relational algebra, and it isn't yet clear (to me) how well innovations in one carry over to the other (if they do), and hence I'm also curious to hear more about proposals for a unified language across linalg and relational alg (e.g. https://news.ycombinator.com/item?id=36349015).

I'm particularly interested in pandas precisely because it seems to be right at the intersection of both forms of algebra (and draws a strong reaction from people who are familiar/comfortable with one community and not the other). See e.g. https://datapythonista.me/blog/pandas-20-and-the-arrow-revol... and https://wesmckinney.com/blog/apache-arrow-pandas-internals/