What does HackerNews think of hnswlib?
Header-only C++/python library for fast approximate nearest neighbors
After working through several projects that utilized local hnswlib and different databases for text and vector persistence, I integrated hnswlib with sqlite to create an embedded vector search engine that can easily scale up to millions of embeddings. For self-hosted situations of under 10M embeddings and less than insane throughput I think this combo is hard to beat.
Used it to index 40M text snippets in the legal domain. Allows incremental adding.
I love how it just works. You know, doesn’t ANNOY me or makes a FAISS. ;-)
it is now supported in ElasticSearch KNN index (they use HNSWLIB but you can call it a descendant of original LSH in a way)
check out ANN benchmarks [0] for comparison of LSH performance to other state of the art methods like proximity graphs/HNSWLIB [1] and quantization/SCANN [2]
As an introduction LSH (with MinHash) is also described in detail in the book "Mining Of Massive Datasets", ch.3, "Finding Similar items", highly recommended [3]
if you want to play with LSH, python "annoy" library is the best place to start [4]
[0] https://github.com/erikbern/ann-benchmarks
[1] https://github.com/google-research/google-research/tree/mast...
[2] https://github.com/nmslib/hnswlib