What does HackerNews think of ann-benchmarks?
Benchmarks of approximate nearest neighbor libraries in Python
Language:
Python
#48
in
Docker
(Blog author here). Thanks for the question. In this case the index for both DiskANN and pgvector HNSW is small enough to fit in memory on the machine (8GB RAM), so there's no need to touch the SSD. We plan to test on a config where the index size is larger than memory (we couldn't this time due to limitations in ANN benchmarks [0], the tool we use).
To your question about RAM usage, we provide a graph of index size. When enabling PQ, our new index is 10x smaller than pgvector HNSW. We don't have numbers for HNSWPQ in FAISS yet.
Try the original data source
A more important question is why supabase went with one of the slowest [0] implementations instead of building a better plugin especially since you are a VC funded company that's hardly lacking in cash. Simply gluing together a few random open source extensions is not great for developer experience.
Check out https://github.com/erikbern/ann-benchmarks for some benchmarks on some of the different ANN libraries out there. I'd be interested in hearing other's experiences using these libraries in production.
They used to publish some benchmarks on their site, but seem to have removed them. You can find them on archive.org[1]. I guess it is understandable, since vector search performance is pretty unpredictable, and depends on a lot of factors. If their target market is people who want vector search without needing to read a bunch of papers first, benchmarks might be more confusing than they are helpful.
edit: While I do think it's understandable, it's not great for transparency. Even if they don't want to open-source their index, I would admire it if they were willing to give ann-benchmarks[2] an API key to publish some independent results.
Disclaimer: I work on vector search at a different company
[1] https://web.archive.org/web/20210227105542/https://www.pinec... [2] https://github.com/erikbern/ann-benchmarks