Lead engineer here, happy to answer any questions

This looks really cool! I'm excited to see a production deployment of DiskANN.

According to the single-threaded QPS experiments, your DiskANN solution should clock in at about 4.5ms latency (1000ms/224QPS) whereas pgvector is about 5.8ms latency (1000ms/173QPS). How is that possible? My (very shallow) knowledge of DiskANN vs HNSW tells me that DiskANN should generally have higher latency than HNSW — DiskANN needs to touch the SSD while HNSW only touches RAM.

Also, compared to pgvector and HNSWPQ in faiss, how much less RAM does your DiskANN-based solution use?

(Blog author here). Thanks for the question. In this case the index for both DiskANN and pgvector HNSW is small enough to fit in memory on the machine (8GB RAM), so there's no need to touch the SSD. We plan to test on a config where the index size is larger than memory (we couldn't this time due to limitations in ANN benchmarks [0], the tool we use).

To your question about RAM usage, we provide a graph of index size. When enabling PQ, our new index is 10x smaller than pgvector HNSW. We don't have numbers for HNSWPQ in FAISS yet.

[0]: https://github.com/erikbern/ann-benchmarks/