Hello! This Firefox extension shows approximately how many points (if any) the current page scored on Hacker News using Bloom filters. Clicking the extension navigates to the best discussion for the current page. It is great for reading interesting comments about things you may not have known were submitted to HN in the past. I have been using it for the past two months while building it, and honestly it has been really fun to read insightful comments I otherwise never would have seen. It's also handy for figuring out whether to submit a page, since it shows whether it's been submitted before or is new territory.

Rather than determining if the current site has been submitted by querying the Firebase/Algolia APIs with every page you visit, the extension contains regularly-updating Bloom filters for all submitted HN stories to preserve user privacy. There is a single C library underlying both the code to generate Bloom filters, and the actual extension code (it is compiled to WebAssembly so the C functions can be called from JavaScript). I have included an architecture overview in the README to help a prospective reader get through the important parts of the code, and have tried to justify design decisions via extensive comments within the source.

I am happy to answer any questions! I'm also always looking for constructive criticism.

> Rather than determining if the current site has been submitted by querying the Firebase/Algolia APIs with every page you visit, the extension contains regularly-updating Bloom filters for all submitted HN stories to preserve user privacy.

Nice!

I built a pi-hole esque stub dns-resolver that uses Bloom Filters generated from hostfiles (60 MiB, 5M entries --> 2 MiB with 1% false positives) and it worked like a charm. At some point, I also looked into Xor Filters which are apparently even lighter and faster but couldn't find a JavaScript implementation for it [0].

I; however, stopped using Bloom Filters because its immutability meant building it over and over again which was a pain. Inverted Bloom Filters [1] or Spectral Bloom Filters [2] might have been useful since they can be updated in-place. Instead, I went for storing hostnames in a Finite State Automata [3], which while not as compact as Bloom Filters, could be updated in-place, are deterministic, and search faster. Likely not a fit for your use-case however.

PinSketches, otoh, might be a fit for accomplishing efficient set reconciliation [4] of the filters you're distributing.

[0] https://github.com/FastFilter/xorfilter#implementations-of-x...

[1] https://www.youtube.com/watch?v=eIs9nJ-JFvA

[2] https://pncnmnp.github.io/blogs/spectral-bloom-filters.html

[3] http://stevehanov.ca/blog/?id=115

[4] https://github.com/sipa/minisketch