What does HackerNews think of litefs?

FUSE-based file system for replicating SQLite databases across a cluster of machines

Language: Go

Author here. I think we could have set better expectations with our Postgres docs. It wasn't meant to be a managed service but rather some tooling to help streamline setting up a database and replicas. I'm sorry about the troubles you've had and that it's come off as us being disingenuous. We blog about things that we're working on and find interesting. It's not meant say that we've figured everything out but rather this is what we've tried.

As for this post, it's not managed SQLite but rather an open source project called LiteFS [1]. You can run it anywhere that runs Linux. We use it in few places in our infrastructure and found that sharing the underlying database for internal tooling was really helpful for that use case.

[1]: https://github.com/superfly/litefs

I think you are thinking that Fly is like a serverless platform. They aren't. They are the opposite. They are a server platform. They provide server for you and you have to manage your server yourself.

Nothing they provide is managed by them. You have to do that.

LiteFS is just a replication service for your sqlite database so you can keep your database synced across multiple nodes.

https://github.com/superfly/litefs

LiteFS Cloud which is the a service they provide just helps you backup and recover sqlite databases. You can do this yourself.

https://fly.io/docs/litefs/backup/

Litestream author here. The streaming replication was moved to a different project called LiteFS[1]. That may be a fit for you or, at the very least, it could be a helpful reference for your Rust port.

[1]: https://github.com/superfly/litefs

#. SQLite WAL mode

From https://www.sqlite.org/isolation.html https://news.ycombinator.com/item?id=32247085 :

> [sqlite] WAL mode permits simultaneous readers and writers. It can do this because changes do not overwrite the original database file, but rather go into the separate write-ahead log file. That means that readers can continue to read the old, original, unaltered content from the original database file at the same time that the writer is appending to the write-ahead log

#. superfly/litefs: a FUSE-based file system for replicating SQLite https://github.com/superfly/litefs

#. sqldiff: https://www.sqlite.org/sqldiff.html https://news.ycombinator.com/item?id=31265005

#. dolthub/dolt: https://github.com/dolthub/dolt :

> Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a Git repository. [...]

> Dolt can be set up as a replica of your existing MySQL or MariaDB database using standard MySQL binlog replication. Every write becomes a Dolt commit. This is a great way to get the version control benefits of Dolt and keep an existing MySQL or MariaDB database.

#. github/gh-ost: https://github.com/github/gh-ost :

> Instead, gh-ost uses the binary log stream to capture table changes, and asynchronously applies them onto the ghost table. gh-ost takes upon itself some tasks that other tools leave for the database to perform. As result, gh-ost has greater control over the migration process; can truly suspend it; can truly decouple the migration's write load from the master's workload.

#. vlcn-io/cr-sqlite: https://github.com/vlcn-io/cr-sqlite :

> Convergent, Replicated SQLite. Multi-writer and CRDT support for SQLite

> CR-SQLite is a run-time loadable extension for SQLite and libSQL. It allows merging different SQLite databases together that have taken independent writes.

> In other words, you can write to your SQLite database while offline. I can write to mine while offline. We can then both come online and merge our databases together, without conflict.

> In technical terms: cr-sqlite adds multi-master replication and partition tolerance to SQLite via conflict free replicated data types (CRDTs) and/or causally ordered event logs.

yjs also does CRDTs (Jupyter RTC,)

#. pganalyze/libpg_query: https://github.com/pganalyze/libpg_query :

> C library for accessing the PostgreSQL parser outside of the server environment

#. Ibis + Substrait [ + DuckDB ] https://ibis-project.org/blog/ibis_substrait_to_duckdb/ :

> ibis strives to provide a consistent interface for interacting with a multitude of different analytical execution engines, most of which (but not all) speak some dialect of SQL.

> Today, Ibis accomplishes this with a lot of help from `sqlalchemy` and `sqlglot` to handle differences in dialect, or we interact directly with available Python bindings (for instance with the pandas, datafusion, and polars backends).

> [...] `Substrait` is a new cross-language serialization format for communicating (among other things) query plans. It's still in its early days, but there is already nascent support for Substrait in Apache Arrow, DuckDB, and Velox.

#. ibis-project/ibis-substrait: https://github.com/ibis-project/ibis-substrait

#. tobymao/sqlglot: https://github.com/tobymao/sqlglot :

> SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between 19 different dialects like DuckDB, Presto, Spark, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically and semantically correct SQL in the targeted dialects.

> It is a very comprehensive generic SQL parser with a robust test suite. It is also quite performant, while being written purely in Python.

> You can easily customize the parser, analyze queries, traverse expression trees, and programmatically build SQL.

> Syntax errors are highlighted and dialect incompatibilities can warn or raise depending on configurations. However, it should be noted that SQL validation is not SQLGlot’s goal, so some syntax errors may go unnoticed.

#. benbjohnson/postlite: https://github.com/benbjohnson/postlite :

> postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.

> The proxy works by translating Postgres frontend wire messages into SQLite transactions and converting results back into Postgres response wire messages. Many Postgres clients also inspect the pg_catalog to determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables. The proxy also performs minor rewriting on these system queries to convert them to usable SQLite syntax.

> Note: This software is in alpha. Please report bugs. Postlite doesn't alter your database unless you issue INSERT, UPDATE, DELETE commands so it's probably safe. If anything, the Postlite process may die but it shouldn't affect your database.

#. > "Hosting SQLite Databases on GitHub Pages" (2021) re: sql.js-httpvfs, DuckDB https://news.ycombinator.com/item?id=28021766

#. >> - bittorrent/sqltorrent https://github.com/bittorrent/sqltorrent

>> Sqltorrent is a custom VFS for sqlite which allows applications to query an sqlite database contained within a torrent. Queries can be processed immediately after the database has been opened, even though the database file is still being downloaded. Pieces of the file which are required to complete a query are prioritized so that queries complete reasonably quickly even if only a small fraction of the whole database has been downloaded.

#. simonw/datasette-lite: https://github.com/simonw/datasette-lite datasette, *-to-sqlite, dogsheep

"Loading SQLite databases" [w/ datasette] https://github.com/simonw/datasette-lite#loading-sqlite-data...

#. awesome-db-tools: https://github.com/mgramin/awesome-db-tools

Lots of neat SQLite/vtable/pg/replication things

Why is anyone on HN "dunking" on Fly.IO of all companies?

Michael - Don't take the bait.

As someone who has zero affiliation with Fly.IO other than a few PR's to their OSS(I don't even know Michael), I greatly appreciate the contributions they have given back to the community.

There are a lot of great hosting companies. Fly.IO stands out due to their revolutionary architecture and contributions back to the OSS community. I wish more companies operated like this.

It's understandable some are upset about an outage. But Fly is doing really interesting and game-changing things, not copying a traditional vmware, cpanel or k8s route.

Just as a reminder to what this company has offered back to everyone.

SQLite: Ben Johnson's OSS work around SQLite stands out. Fly.IO and his work have really made sqlite a contender. - https://fly.io/blog/all-in-on-sqlite-litestream/ - https://fly.io/blog/introducing-litefs/ - https://github.com/superfly/litefs - https://github.com/benbjohnson/litestream - https://fly.io/blog/sqlite-internals-wal/ - https://fly.io/blog/wal-mode-in-litefs/

Who really considered sqlite as a production option before Fly and Ben? Not me.

Firecracker: Firecracker is amazing, but difficult to debug when something bad happens. There aren't a ton of people in devops who would share what they have. If you've ever used Firecracker, you've really been helped a lot by the various guides they have provided back to the community like these: - https://fly.io/docs/reference/architecture/ - https://fly.io/blog/fly-machines/ - https://fly.io/blog/sandboxing-and-workload-isolation/

Their architecture is beautiful and revolutionary. They're probably the first or second ones to find a lot of the new edge cases as they grow.

It's a lot harder to be the first one over the wall than it is to copy. They've literally given the average developer a blueprint to build scalable businesses that compete with their own.

Basically, LiteFS: https://github.com/superfly/litefs

And then some load balancer cleverness that reroutes writes to a specific VM: https://fly.io/blog/globally-distributed-postgres/

Shouldn't require any tweaking, the sliding window should be able to work on sqlite files just as well as the game content files used in this article. If you want something smarter and very much more sqlite optimized, you probably want to look at litefs. https://github.com/superfly/litefs
Litestream is one-way replication, so it can't sync across multiple writers.

LiteFS is Ben Johnson's newer project that does SQLite synchronization across nodes: https://github.com/superfly/litefs

The very problem of SQLite: Single user only. Although SQLite does have WAL but it still doesn't allow you to do concurrent write unless you want to see file corruption.

This means SQLite is very much locked to things that works with one specific purpose and almost nothing else. Sure, you can be read-only, but you have to run alongside the app in the specific node, too.

Another problem (although without solving the single user mindset this wouldn't be a problem at all) is high availability. You want to make sure that your database won't get lost do you.

Things like Litestream [1] attempts to solve the SQLite backup problem it by continuously saving the database state and pack t up to S3-compatibles or file system but its just half the story. You want to make sure your operation not stopping. This is where HA comes in to save you from an emergency fixup when you are enjoying your holiday.

It doesn't mean that nobody tried to solve both these problems though. Ahem, introducing rqlite [2]. Although my own experience is not very great because the memory usage is quite high and does not fit my need (because the embedded device only has 512MB on it, and every byte counts, sorry), I guess that's the price to pay if you want to turn a non-multiuser, non-concurrently acccess database into one...Another honorable mention would be LiteFS [3] but I haven't used it yet so I have no say on it.

[1]: https://litestream.io/

[2]: https://github.com/rqlite/rqlite

[3]: https://github.com/superfly/litefs

We actually kinda see parts of this in https://github.com/superfly/litefs, albeit for SQLite.

And here we see some ideas forming around "pluggable storage for PostgresQL": https://wiki.postgresql.org/wiki/Future_of_storage#Pluggable...

Seriously! If any of this sounds interesting to build, reach out, and we'll make it happen!

I really like the ongoing innovation in the SQLite ecosystem! How does this compare to LiteFS?

https://github.com/superfly/litefs

Are you going to use LiteFS then for replication [1]? LiteFS replication is asynchronous, meaning failover can lose the latest data. Will LiteFS scale down to 0? Does scaling down to zero mean electing a leader when scaling back up to 1 and will that have a delay? Will the read replicas scale down to 0 along with LiteFS when the primary scales down to 0?

[1] https://github.com/superfly/litefs

For folks' context, the new tool that's being discussed in the thread mentioned by the parent here is litefs [0], as well as which you can also look at rqlite [1] and dqlite [2], which all provide different trade-offs (e.g. rqlite is 'more strongly consistent' than litefs).

[0]: https://github.com/superfly/litefs

[1]: https://github.com/rqlite/rqlite

[2]: https://github.com/canonical/dqlite

Relevant reading about FoundationDB building a SQL database on top of a distributed key-value store: https://www.voltactivedata.com/blog/2015/04/foundationdbs-le... (That one replaced SQLite's btree, this one puts pages of the btree as values in the key-value store.)

Another approach using FUSE, making arbitrary SQLite-using applications leader-replica style distributed for HA: https://github.com/superfly/litefs (see also https://litestream.io/ for WAL-streaming backups, that's the foundation of this)