As a long time user of MySQL, I've switched to PostgreSQL for my most recent startup precisely because the PostgreSQL team continue to innovate in a sensible way.

One of the core requirements is a stable, fast and reliable initial import of existing data from 3rd party services, and because of throttling concerns that led us to a batched, retryable solution - and a requirement for simple, fast job control.

We've built this around a parallelised solution using the FOR UPDATE/SKIP LOCKED feature from PostgreSQL 9.5 onwards.

It's not bleeding edge, but for us it was a great example of how the pg team pick up the useful features from e.g. Oracle, and incorporate them into the product in a solid way, without all the enterprisey marketing crap that Oracle make you swallow before you can understand the actual feature-set.

[the startup is a SaaS search tool that helps you index across Trello, email, GitHub, Slack, Drive - https://getctx.io if you're interested.]

devj

For a product like yours, don't you think a NoSQL solution would be a better fit? If yes, why didn't you opt for it? If no, then why(especially considering the fact that your schemas are always in a state of flux)? PS: This is under the assumption that you have used Postgres for storing all the data from different sources.

rorygibson

Hi devj, You're absolutely correct that an RDBMS would be an odd solution for a search index *

Anyway, on CTX I use PostgreSQL for storing things that deserve to be relational - job control, batching, users, accounts, billing, invitation codes...

All the indexed content is in an Elastic Search cluster (http://elastic.co if you're unfamiliar - it's a specialised search indexing data store built atop Apache Lucene)

* though actually PostgreSQL has a really good capability as a JSON store that means people do sometimes use it as a NoSQL solution

kuschku

PostgreSQL has really good capability not just as JSON store, but also for search and indexing.

Its text search solutions come close to ElasticSearch, with - surprisingly - even better performance.

rorygibson

tbh I haven't looked into PostgreSQL as a direct competitor to ES in the text search space, mostly because I have quite a lot of ES experience of things like aggregation for faceting, stemming and textual analysis, and I had a (perhaps incorrect) assumption that PostgresSQL was less capable in those more specialised areas.

One thing I would like to understand is how the PostgreSQL search experience scales with data volume. My main cluster for CTX will potentially get (very) big and I don't know enough about PostgreSQL, multi-master and full-text search across large amounts of data.

I'll definitely do some digging though, thanks!

jacobr1

You might want to look at https://github.com/zombodb/zombodb. It extends PG to use ES as an external source for indexes. So you can do an ES based text search and return a data stored in PG along with doing joins to other tables. Zombo even works with Citus to handle PG level sharding.