What does HackerNews think of hnswlib?

Header-only C++/python library for fast approximate nearest neighbors

Language: C++

hnswlib (https://github.com/nmslib/hnswlib) is a strong alternative to faiss that I have enjoyed using for multiple projects. It is simple and has great performance on CPU.

After working through several projects that utilized local hnswlib and different databases for text and vector persistence, I integrated hnswlib with sqlite to create an embedded vector search engine that can easily scale up to millions of embeddings. For self-hosted situations of under 10M embeddings and less than insane throughput I think this combo is hard to beat.

https://github.com/jiggy-ai/hnsqlite

https://github.com/nmslib/hnswlib

Used it to index 40M text snippets in the legal domain. Allows incremental adding.

I love how it just works. You know, doesn’t ANNOY me or makes a FAISS. ;-)

hnswlib is in cpp and has python bindings (you should be able to make your own for other languages). Faiss, Annoy (by Spotify) should also provide similar functionality.

https://github.com/nmslib/hnswlib

hnswlib[1] allows for incremental updates. And I believe in terms of accuracy it stacks up fairly well against alternatives like FAISS or ScaNN.

[1]: https://github.com/nmslib/hnswlib/

There's also hnswlib[1], which has supposedly lower memory requirements and allows for adding new vectors to an existing index.

[1]: https://github.com/nmslib/hnswlib/

great post and my favorite CS topic, LSH is particularly relevant to machine learning because of its use in indexing of embeddings, which are now omnipresent in ML (from word2vec to transformers to image and graph embeddings etc, etc.)

it is now supported in ElasticSearch KNN index (they use HNSWLIB but you can call it a descendant of original LSH in a way)

check out ANN benchmarks [0] for comparison of LSH performance to other state of the art methods like proximity graphs/HNSWLIB [1] and quantization/SCANN [2]

As an introduction LSH (with MinHash) is also described in detail in the book "Mining Of Massive Datasets", ch.3, "Finding Similar items", highly recommended [3]

if you want to play with LSH, python "annoy" library is the best place to start [4]

[0] https://github.com/erikbern/ann-benchmarks

[1] https://github.com/google-research/google-research/tree/mast...

[2] https://github.com/nmslib/hnswlib

[3] http://infolab.stanford.edu/~ullman/mmds

[4] https://github.com/spotify/annoy