The killer feature I'm looking for is Riak's "it just works" -- especially in the case of nodes failing, soft failing, going offline, timing out, whatever.
In my situation, I don't care about the performance at all, because I don't have many keys at any given moment. The few that I have matter greatly.
I care that when I store a key, it's reliably, durably, stored and replicated, and that when nodes fail I don't have to do anything special to keep running. (This is in contrast to PostgreSQL, MySQL, or Mongo replication, where you have to fail over, then switch back eventually, and it takes special effort.)
AFAICT, It's not provided by Redis or CouchDB either, because their replication is async -- keys can get lost.
Having looked at a bunch of options in the last couple weeks, it seems like only Riak and Cassandra truly offer durable, synced replication that isn't difficult to admin. (...and of the two of them, Riak's documentation gives much more confidence about the ongoing admin efforts.)
Has anyone used any solid options I've perhaps overlooked?
I don't want to turn this into a MongoDB vs Riak thread, but you may want to take another look at Mongo as the points you've mentioned are now handled. As of 1.6 we support automatic fail-over[1] and synchronous replication[2]. In 1.8 we added a journal for durability[3] which will be enabled by default in 2.0 (due out this month - rc2 released today). Optional automatic fail-back[4], for when your preferred primary (if any) comes back online, is also coming in 2.0.
[1] http://www.mongodb.org/display/DOCS/Why+Replica+Sets
[2] http://www.mongodb.org/display/DOCS/Verifying+Propagation+of...
[3] http://www.mongodb.org/display/DOCS/Journaling
[4] http://www.mongodb.org/display/DOCS/Replica+Sets+-+Priority
Mathias, what Riak (or other distributed databases provide) that Mongo doesn't is "just-works" scaling characteristics beyond one machine.
* If I want to ensure that my data is written to three machines, all my writes will stop working in MongoDB if one machine goes down. With Riak, it will start issuing the writes to another node in the cluster and rebalance when the missing node comes back online.
* If I'm running MongoDB in a sharded configuration, if one of the shards cannot be reached, all writes will stop. With Riak, any node will accept the writes and, once the network issues are resolved, move them to the appropriate node.
That said, conflict resolution is hard and there's no real way to get around it when you're using a distributed database like Riak. With Riak, as Chad says you get "increased development complexity for massively decreased deployment complexity." There's no silver bullet, it's important to look at the trade-offs of each option.
The riak-java-client[2] provides a simple way to hook conflict resolution into your code (thanks to the ideas and solutions in that talk.) There is also the excellent Statebox from Mochi, which automates conflict resolution.
[1] Riak and Scala at Yammer - http://blog.basho.com/2011/03/28/Riak-and-Scala-at-Yammer/ [2] Riak-java-client - https://github.com/basho/riak-java-client/ [3] Statebox - https://github.com/mochi/statebox