There are two huge things your 5 minute setup is missing which are very hard techinically to tackle

1. Incrementally updating the search space. Not that easy to do, and becomes more important to not just do the dumb thing of retraining the entire index on every update for larger datasets.

2. Combining vector search and some database-like search in an efficient manner. I don't know if this Google post really solves that problem or if they just do the vector lookup followed by a parallelized linear scan, but this is still an open research/unsolved problem.

Correct, that would take more than 5 minutes, although still possible to do with Faiss (and not that hard relatively speaking - in the Teclis demo, I indeed did your second point - combine results with a keyword search engine and there are many simple solutions you can use out there like Meilisearch, Sonic etc.e). If you were to try using an external API for vector search, you would still need to build keyword based search separately (and then combining/ranking logic) so then you may be better off just building the entire stack anyway.

Anyway, for me, the number one priority was latency and it is hard to beat on-premise search for that.

Even then, a vector search API is just one component you will need in your stack. You need to pick the right model, create vectors (GPU intensive), then possibly combine search results with keyword based search (say BM25) to improve accuracy etc. I am still waiting to see an end-to-end API doing all this.

Interesting. Did you also tackle the incremental update problem with FAISS?

hnswlib[1] allows for incremental updates. And I believe in terms of accuracy it stacks up fairly well against alternatives like FAISS or ScaNN.

[1]: https://github.com/nmslib/hnswlib/