What does HackerNews think of sqlightning?

SQLite3 ported to use LMDB instead of its original Btree code. See https://github.com/LumoSQL/LumoSQL for maintained fork.

Language: C

I'm guessing you mean "and the software is fast enough", because OpenLDAP itself has been the world's fastest distributed database for over a decade, and nothing else even comes close.

Since you built on top of SQLite, then you're already at least an order of magnitude slower. You could speed it up a bit by using SQLightning, which replaces SQLite's Btree engine with OpenLDAP's.

https://github.com/LMDB/sqlightning

But yes, it's obvious your aim is to provide bare minimum LDAP functionality for people who were only taught SQL and have no desire to learn LDAP. A bit of a loss, because the workloads LDAP is used for most often are things RDBMSs suck at.

Have you seen SQLightning and/or LumoSQL?

* https://github.com/LMDB/sqlightning

* https://github.com/LumoSQL/LumoSQL

SQLightning was the initial project combining SQLite3 with an LMDB backend. It seemed to be more an experimental/Proof-of-Concept thing, and isn't maintained.

LumoSQL is an alternative project (maintained), providing a SQLite3 front end with various optional storage backends. One of which is LMDB.

Note - I'm not affiliated with either project, I just remembered they exist. :)

This has been shared without context but I guess the SQLite team is starting to modularize the btree code in order to facilitate work like SQLightning: https://github.com/LMDB/sqlightning

At the time SQLightning greatly improved SQLite performance but due to LMDB's requirement to have keys fit in 2/3 of a page it wasn't really useful as a general purpose replacement of SQLite's internal b-tree implementation.

EDIT: It looks like SQLightning got adopted and has been worked on by the SQLite team under the name LumoSQL. Here's the project's readme: https://lumosql.org/src/lumosql/doc/trunk/README.md which contains at the end "A Brief History of LumoSQL"

LMDB and SQLite are not directly comparable. LMDB is a transactional B+tree-based key/value store. SQLite is an implementation of a transactional SQL data model on top of a B+tree-based key/value store, so it is logically at least one abstraction layer higher than LMDB. (Key/value stores underlie pretty much all of the other data models you'll ever use.)

That aside - LMDB is not just smaller, faster, and more reliable than SQLite, it is also smaller/faster/more reliable than SQLite's own B+tree implementation, and SQLite can be patched to use LMDB instead of its own B+tree code, resulting in a smaller/faster footprint for SQLite itself.

Proof of concept was done here https://github.com/LMDB/sqlightning

A new team has picked this up and carried it forward https://github.com/LumoSQL/LumoSQL

Generally, unless your application has fairly simple data storage needs, it's better to use some other data model built on top of LMDB than to use it (or any K/V store) directly. (But if building data storage servers and implementing higher level data models is your thing, then you'd most likely be building directly on top of LMDB.)

Can this be made into a back end for sqlite?

edit example like this: https://github.com/LMDB/sqlightning

LMDB is a storage engine whereas SQLite is a small database. There is even a version of SQLite that used LMDB as the underlying storage engine: https://github.com/LMDB/sqlightning.
We have only ever compared LMDB in synchronous mode to other DBs in synchronous mode, and LMDB in asynch mode to other DBs in asynch mode. Come on, that's too obvious.

And LMDB beats the crap out of SQLite, in any mode. http://www.lmdb.tech/bench/microbench/

Replacing SQLite's Btree engine with LMDB makes the SQLite footprint smaller, faster, and more reliable too. https://github.com/LMDB/sqlightning

The Btree engine in SQLite is quite inefficient, and also unreliable on its own. (Needs the transaction log support to provide reliability.) LMDB blows it away.

https://github.com/LMDB/sqlightning

I'm surprised there wasn't more investigation of SQLite and LMDB:

https://github.com/LMDB/sqlightning

The performance there shows either little to no performance difference, up to substantial speed increases.

Premature optimization is evil, but preemptive optimization is necessary unless you want to paint yourself into a corner. I realized this after implementing a bitcoin full node.

In my bitcoin implementation, as an experiment, I tried storing the blockchain in sqlite, postgres, and leveldb. I gathered up a bunch of data from the first ~200k blocks of the blockchain and benchmarked all three databases. I queried for something like 30,000 utxos out of a set of a couple million. What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.

Now, you can tell me I did the benchmarks wrong, and "oh, if you just did this it would be faster!", but 30+ seconds is slower to an absolutely insane level. Needless to say, I went the key-value store route, but I was still astounded at how slow sqlite was once it got a few million records in the database.

I actually like sqlite, but when you know you're going to be dealing with 70gb of data and over 10 million records, preemptive optimization is the key. If I were the author, I would consider switching to postgres if there are over 500k-1m records to be expected. That being said, if they're partial to sqlite, SQLightning (https://github.com/LMDB/sqlightning) looks pretty interesting (SQLite with an LMDB backend).

edit: To clarify, these weren't particularly scientific benchmarks. This was me timing a very specific query to get an idea of the level of data management I was up against. Don't take my word for it.

Eh, you're right on some and wrong on many other points. An embedded key-value store can be a good foundation for an RDBMS - or many other data models. The OpenLDAP Project used BerkeleyDB for more than 15 years. BerkeleyDB architects still point to OpenLDAP as a reference standard for transactional BDB code. http://www.bozemanpass.com/services/bdb.html

As the primary author of OpenLDAP's BDB support, and of the LMDB library which now supersedes BDB, I've got quite a deep perspective into this topic.

BDB is deadlock-prone, no argument there. But that doesn't mean the embedded engine approach can't work. LMDB is deadlock-proof, and serves perfectly well as the engine for an RDBMS as well as the engine for a hierarchical DB (X.500/LDAP directory), or arbitrary graphs (http://sph.io/content/2faf , https://github.com/pietermartin/thundergraph , etc...) etc. When you start with a model-less key/value store you can implement any other data model on top.

LMDB's robustness is, in a word, flawless. It's crash-proof by design and multiple independent researchers have borne out the integrity of that design. (https://www.usenix.org/conference/osdi14/technical-sessions/... https://www.usenix.org/conference/osdi14/technical-sessions/... ) Your assertion that you can't build a robust multi-client system in a lightweight embedded system is flat wrong. In fact, this approach is the only way to get maximum performance from a software system. LMDB is so much faster than every other data management mechanism in existence, nothing else even comes close. http://symas.com/mdb/#bench

Crap tools, crap locking - these are certainly weaknesses in BDB's locking design. LMDB's locking design doesn't have these weaknesses. Querying the size of the DB is an O(1) operation, simply reading a few words out of the DB header.

Underdocumentation - I have no idea what you're talking about here. To me, BDB was copiously documented, and I largely owe my understanding of transactional systems to the BDB documentation. Yes, BDB is complicated and has too many moving parts. LMDB doesn't have this flaw either. But I can't take anyone seriously who says BDB was underdocumented.

C++ interface - I don't use C++, so no comment here.

C >> C++.

SQLite on BDB - yes, it's a joke. I've run it, and the performance is pretty awful. But we do far better. https://github.com/LMDB/sqlightning

There are many positive and negative lessons in software design to be learned from Berkeley DB. It really started life as a test bed, for learning. In that, I think it succeeded admirably. Its authors were able to experiment with extensible hashing, B+trees, ARIES logging, and a multitude of other important software architecture and data management concepts. LMDB owes much of its design to lessons learned from BDB.

LMDB's use of a read-only memory map by default means you can safely allow multi-client access without fearing corruption due to memory bugs. LMDB's exclusive use of mmap instead of complex application-level caching means you avoid all of the horrors of BDB's cache tuning, bloat driving into swap space, and other admin/tuning nightmares. LMDB's crash-proof persistent on-disk format means you don't need to worry about error-prone and inherently unreliable transaction logging mechanisms. All of these features are part of the LMDB design because of our experience working with BDB over the past 15+ years.