Awesome! I've been wishing something like FDB's record-layer w/o Java/JVM and this has a lot of potential.
I have a question though- How does it handle conflicts within a sqlite DB and among multiple sqlite DBs?
I believe FDB maintains strict serializability by detecting conflicts among concurrent transactions by checking conflict ranges.
I read the doc and the mvstore code and perhaps it's working by writing the deltas of changed pages with the read version obtained at the sqlite transaction creation?
If that's the case, I'm still unsure what you added to the conflict ranges other than the deltas of the pages. To make it actually serializable, you'd need to add conflict ranges for all the pages that are `select`ed within the sqlite transaction?
TLDR: The conflict range is the entire SQLite database. mvsqlite does not support concurrent read-write transactions to the same DB. If multiple RW transactions with overlapping [read_version, commit_version] ranges are are requested to be committed, only one of the commits will succeed.
To scale out writes, you can use smaller databases - one database per user, for example. It is possible to do multi-database serializable transactions with mvsqlite (not yet implemented, but the logic shouldn't be complex).
mvsqlite actually doesn't just use FDB's native transaction, as I would like to avoid FDB's low txn size and time limits. Instead, there are two separate keyspaces for each SQLite DB - one "page index" keyspace, and one content-addressed store keyspace.
For reads: Pages are fully versioned, so they are always snapshot-readable in the future. The read version is fetched from `mvstore` when each SQLite transaction starts, and is used as the page index per-page range scan upper bound in future read requests.
For writes: Pages are first written to the content-addressed store keyed by the page's hash. At commit, hashes of each written page in the SQLite transaction is written to the page index in a single FDB transaction to preserve atomicity. With 8K pages and ~60B per key-value entry in the page index, each SQLite transaction can be as large as 1.3 GB (compared to FDB's native txn size limit of 10 MB).
So actually, you can do one page read or write per FDB transaction and still preserve ACID properties.
Before you managed to produce mvsqlite, I was wondering if it is possible to rebase https://github.com/dolthub/dolt content-addressed page store(implemented with ProllyTree over a standard OS FS) onto FDB, so that there will be a MySQL-compat DB with similar properties to mvsqlite where actual page updates can be done outside FDB transactions to overcome 5sec FDB limit. Apparently, you already materialized a similar idea in a more sophisticated, practical, and complete way.
Thanks a lot for clarifying and keep up the great work. Your work is totally awesome!