What does HackerNews think of ann-benchmarks?

How We Made PostgreSQL a Better Vector Database | Sep 2023

(Blog author here). Thanks for the question. In this case the index for both DiskANN and pgvector HNSW is small enough to fit in memory on the machine (8GB RAM), so there's no need to touch the SSD. We plan to test on a config where the index size is larger than memory (we couldn't this time due to limitations in ANN benchmarks [0], the tool we use).

To your question about RAM usage, we provide a graph of index size. When enabling PQ, our new index is 10x smaller than pgvector HNSW. We don't have numbers for HNSWPQ in FAISS yet.

[0]: https://github.com/erikbern/ann-benchmarks/

Comparison of Vector Databases | Jul 2023

Try the original data source

https://github.com/erikbern/ann-benchmarks

20x faster than pgvector: HNSW index in Postgres with pg_embedding | Jul 2023

Expand Context ↕

A more important question is why supabase went with one of the slowest [0] implementations instead of building a better plugin especially since you are a VC funded company that's hardly lacking in cash. Simply gluing together a few random open source extensions is not great for developer experience.

[0] https://github.com/erikbern/ann-benchmarks

Faiss: A library for efficient similarity search | Mar 2023

Check out https://github.com/erikbern/ann-benchmarks for some benchmarks on some of the different ANN libraries out there. I'd be interested in hearing other's experiences using these libraries in production.

Vector search just got up to 10x faster and vertically scalable | Aug 2022

Expand Context ↕

They used to publish some benchmarks on their site, but seem to have removed them. You can find them on archive.org[1]. I guess it is understandable, since vector search performance is pretty unpredictable, and depends on a lot of factors. If their target market is people who want vector search without needing to read a bunch of papers first, benchmarks might be more confusing than they are helpful.

edit: While I do think it's understandable, it's not great for transparency. Even if they don't want to open-source their index, I would admire it if they were willing to give ann-benchmarks[2] an API key to publish some independent results.

Disclaimer: I work on vector search at a different company

[1] https://web.archive.org/web/20210227105542/https://www.pinec... [2] https://github.com/erikbern/ann-benchmarks