Just from reading the documentation, the full text search features on Postgres already look pretty powerful. And it is encouraging that they are actively being worked on. I'm wondering how this compares to a dedicated search engine like Solr or Elasticsearch.
Are there huge differences in performance, features or search quality? At which scale does using Postgres for full text search still make sense?
While it's ok for our purposes, I would wish for a bit better customisability of the text parser and it definitely needs better support for compound words to be perfect.
The first issue is with relation to https://www.postgresql.org/docs/9.6/static/textsearch-parser...: The documentation says
> At present PostgreSQL provides just one built-in parser, which has been found to be useful for a wide range of applications
and it really means it - changing the behaviour of this component is not possible unless you write a completely different parser in C which, while possible is no fun experience.
We're using the full text feature over product data and we're having to work around the parser sometimes too eagerly detecting email addresses and URLs which messes with properly detecting brand names which might contain some of these special characters.
The other problem is the compound support. A lot of our data is in German which like other languages likes to concatenate nouns.
For example, you'd absolutely want to find the term "Weisswürste" for the query "wurst" (note the concatenation and the added umlaut for the plural in wurst).
Traditionally, you do this using a dictionary and while Postgres has support for ispell and hunspell dictionaries, only hunspell has acceptable compound support, which in turn isn't supported by Postgres.
So we've ended up using a hacked ispell dictionary where we have to mark all known compounds which is annoying and error-prone.
Also, once you have to use a dictionary, you end up with a further issue: Loading the dictionary takes time and due to the way how Postgres currently works, it has to happen per connection. In our case, with the 20MB hacked german ispell dictionary, this takes ~0.5s which is way too long.
The solution for this is to use a connection pooler in front of Postgres. This works fine but, of course, adds more overhead.
The other solution is http://pgxn.org/dist/shared_ispell/, but I've had multiple postmaster crashes due to corrupted shared memory (thank you, Postgres, for crashing instead of corrupting data) related to that extension, so I would not recommend this for production use.
Lucene and by extension ElasticSearch has much better built-in text analysis features so we could probably fix the parser and compound issue, but that would of course mean even more additional infrastructure, plus, probably some performance issues as we, unfortunately, absolutely cannot return all the FTS matches but instead have to check them for other reasons why they must not be shown which, of course, uses the database again and I'm wary of putting all that logic somehow into ES as well.
This is why we currently deal with the postgres tsearch limitations. But sooner or later, we'd probably want to bite the bullet and go dedicated solution.