Hello HN! Post author here. I’m happy to answer questions & fix typos once morning rolls around here in Australia
Hi josephg, I'm a CRDT researcher. This is great to see so much work around CRDT nowadays!
Some optimizations whom you discuss are already proposed by some papers and implementations.
For instance, LogootSplit [1] proposes an implementation based on an AVL tree with extra metadatas to get a range tree. LogootSplit proposes also a block-wise approach that stores strings instead of individual characters. Xray [2], an experimental editor built by Github and written in Rust, uses a copy-on-write B-tree. Teletype [3] uses a splay tree to speedup local insertions/deletions based on the observation that a user performs several edits on the same region.
[1] https://members.loria.fr/CIgnat/files/pdf/AndreCollabCom13.p... [2] https://github.com/atom-archive/xray [3] https://github.com/atom/teletype
And I'm not surprised these techniques have been invented before. Realising a tree is an appropriate data structure here is a pretty obvious step if you have a mind for data structures.
To name it, I often find myself feeling defensive when people read my work and respond with a bunch of links to academic papers. Its probably totally unfair and a complete projection from my side, but I hear a voice in my head reword your comment to instead say something awful like: "Cool, but everything you did was done before. Even if they didn't make any of their work practical, usable or good they still published first and you obviously didn't do a good enough literature review if you didn't know that." And I feel an unfair defensiveness arise in me as a result that wants to find excuses to dismiss the work, even if the work might be otherwise interesting.
Its hard to compare their benchmark results because they used synthetic randomized editing traces, which always have different performance profiles than real edits for this stuff. Their own university gathered some great real world data in an earlier study. It would have been much more instructive if that data set was used here. At a glance their RAM usage looks to be about 2 orders of magnitude worse than diamond-types or yjs. And their CPU usage... ?? I can't tell because they have no tables of results. Just some hard to read charts with log scales, so you can't even really eyeball the figures. So its really hard to tell if their work ends up performance-competitive without spending a couple days getting their enterprise style java code running with a better data set. Do you think thats worth doing?