Lead engineer here, happy to answer any questions
This looks really cool! I'm excited to see a production deployment of DiskANN.
According to the single-threaded QPS experiments, your DiskANN solution should clock in at about 4.5ms latency (1000ms/224QPS) whereas pgvector is about 5.8ms latency (1000ms/173QPS). How is that possible? My (very shallow) knowledge of DiskANN vs HNSW tells me that DiskANN should generally have higher latency than HNSW — DiskANN needs to touch the SSD while HNSW only touches RAM.
Also, compared to pgvector and HNSWPQ in faiss, how much less RAM does your DiskANN-based solution use?
To your question about RAM usage, we provide a graph of index size. When enabling PQ, our new index is 10x smaller than pgvector HNSW. We don't have numbers for HNSWPQ in FAISS yet.