For anyone thinking Berkeley DB, because it has transactions and logging, is a good replacement for an out-of-process DB, the answer is no. Its embedded nature means that any "client" of a shared DB is free to deadlock the whole thing (if the client dies) or corrupt the database's structure (if the client has a memory bug).

By the time you have a working robust multi-client system, you have completely reinvented half of what an RDBMS provides, since you will have pushed most functionality into a central server to avoid the lack of robustness its embedded nature brings, to simplify coordination of recovery when it (inevitably) crashes, and to centralize the rest of the infrastructure you have to build around it.

Its tools are crap: it can't tell you how big a database is without traversing every single entry, they hold locks while writing to the terminal (meaning you can't pipe them into a pager without freezing the whole DB), and if one of them crashes (due to e.g. a dropped SSH connection) they'll bring down the whole DB due to stale locks.

Everything is underdocumented, especially the knobs you have to tweak to make it not run out of memory. Its more "advanced" features are unusuably buggy. Partition traversal (by cursor) is completely broken, and if you try to use partitions and key compression at once, things crash. MVCC inexplicably keeps thousands of transactions open, causing out-of-memory issues.

Its C++ interface is a joke. Somehow they managed to take an "object-oriented" C library, and jam it into C++ in a way that completely tosses all the benefits of C++'s object-orientedness. RAII is not a concept the developers were familiar with: objects are created by factory functions and are variably and inconsistently freed by either their "parent" object, their own methods, and their destructors. Do the wrong thing, and you trample memory, likely corrupting the database. This is surprisingly easy due to the lack of documentation.

And my sympathy to anyone who tries to run SQLite on this bad boy… combined with SQLite's shoddy track record interpreting any query more complicated than a key-value lookup, it would be a wonder if you're able to get any data in and out intact.

Sorry for the rant. Berkeley DB is many things, but a usable building block for a reliable RDBMS is not one of them.

Eh, you're right on some and wrong on many other points. An embedded key-value store can be a good foundation for an RDBMS - or many other data models. The OpenLDAP Project used BerkeleyDB for more than 15 years. BerkeleyDB architects still point to OpenLDAP as a reference standard for transactional BDB code. http://www.bozemanpass.com/services/bdb.html

As the primary author of OpenLDAP's BDB support, and of the LMDB library which now supersedes BDB, I've got quite a deep perspective into this topic.

BDB is deadlock-prone, no argument there. But that doesn't mean the embedded engine approach can't work. LMDB is deadlock-proof, and serves perfectly well as the engine for an RDBMS as well as the engine for a hierarchical DB (X.500/LDAP directory), or arbitrary graphs (http://sph.io/content/2faf , https://github.com/pietermartin/thundergraph , etc...) etc. When you start with a model-less key/value store you can implement any other data model on top.

LMDB's robustness is, in a word, flawless. It's crash-proof by design and multiple independent researchers have borne out the integrity of that design. (https://www.usenix.org/conference/osdi14/technical-sessions/... https://www.usenix.org/conference/osdi14/technical-sessions/... ) Your assertion that you can't build a robust multi-client system in a lightweight embedded system is flat wrong. In fact, this approach is the only way to get maximum performance from a software system. LMDB is so much faster than every other data management mechanism in existence, nothing else even comes close. http://symas.com/mdb/#bench

Crap tools, crap locking - these are certainly weaknesses in BDB's locking design. LMDB's locking design doesn't have these weaknesses. Querying the size of the DB is an O(1) operation, simply reading a few words out of the DB header.

Underdocumentation - I have no idea what you're talking about here. To me, BDB was copiously documented, and I largely owe my understanding of transactional systems to the BDB documentation. Yes, BDB is complicated and has too many moving parts. LMDB doesn't have this flaw either. But I can't take anyone seriously who says BDB was underdocumented.

C++ interface - I don't use C++, so no comment here.

C >> C++.

SQLite on BDB - yes, it's a joke. I've run it, and the performance is pretty awful. But we do far better. https://github.com/LMDB/sqlightning

There are many positive and negative lessons in software design to be learned from Berkeley DB. It really started life as a test bed, for learning. In that, I think it succeeded admirably. Its authors were able to experiment with extensible hashing, B+trees, ARIES logging, and a multitude of other important software architecture and data management concepts. LMDB owes much of its design to lessons learned from BDB.

LMDB's use of a read-only memory map by default means you can safely allow multi-client access without fearing corruption due to memory bugs. LMDB's exclusive use of mmap instead of complex application-level caching means you avoid all of the horrors of BDB's cache tuning, bloat driving into swap space, and other admin/tuning nightmares. LMDB's crash-proof persistent on-disk format means you don't need to worry about error-prone and inherently unreliable transaction logging mechanisms. All of these features are part of the LMDB design because of our experience working with BDB over the past 15+ years.