If the databases in question (Elastic, MongoDB, others) make it too easy to set up unsecured access, possibly because they default to an unsecured state on installation, then some good may come of this: The reputation hit to the database vendors should encourage them to mend their ways.

If that happens, then the attack can arguably be justified despite the damage — consider all the future database installations which wouldn't otherwise have been secured which are now spared from not only destruction but theft.

I've said it on here before, but the way in which Elasticsearch used to lock away critical security functionality (like TLS support and RBAC) behind a paid subscription whilst making just enough functionality available for free such that users could shoot their foot off is disgusting. This only ever changed after Open Distro for Elasticsearch came onto the scene and forced Elastic's hand.

I entirely agree the vendors are (partially) to blame here.

This is why we are refactoring our database to be able to migrate to Amazon documentdb from MongoDB. Encryption at rest.... Pay up!

Curious, why do you use Mongo? Does it give you something that a JSONB column in Postgres wouldn’t?

I use jsonb heavily. While it is amazing, I definitely wouldn’t rely on it as a general purpose replacement for NoSQL/schemaless data storage.

An example of an issue I am dealing with currently: while you can create a gin index to speed up containment queries, Postgres doesn’t keep any statistics about jsonb columns. This means the query planner will sometimes do stupid things, like using the index even for very non-selective overlap conditions, which is a lot slower than just doing a sequential scan.

Less of an issue for me but worth considering: the size of the gin index in my use case seems to be about 5x bigger than the size of the unindexed data. I was surprised by the size increase. I only use the containment operator so I could make a smaller/faster index using the jsonb_path_ops operator class. This is on my todo list :)

Like all non-btree indexes in Postgres, the index is unordered. That means sorting by values in the jsonb column will always be slow. This doesn’t matter for selective queries, but exacerbates my already slow non-selective queries that return large result sets.

That said, if your queries are selective, jsonb + gin indexes are surprisingly performant (in the 0.5-10ms range for small result sets). My use case is a mix of structured relational data with jsonb for user-defined values (which of course they want to use for querying/sorting and I was dumb enough to say “sure, why not?”)

In terms of the magnitude of data, there’s roughly 10 million rows. Each team using this service has the query scoped to about 500k-1 million records, and then additional filters (on the jsonb column) will scope that down to anywhere between 60k-0 results.

Thanks for the detailed response. I'm curious, is the NoSQL store a canonical store of its own data? If not, how do you replicate from postgres to the secondary store?

I ask because where I work we sync postgres to a secondary store for search, but the way it's done in a piecemeal, application-specific way gives me the heebie jeebies. It almost certainly will result in that secondary store drifting. Unfortunately we can't use something like zombodb [1] as we're on amazon RDS. It seems like you know your stuff, and seeing non-deterministic consistency irritates the heck out of me!

[1] https://github.com/zombodb/zombodb