Credit goes to Chris Lee: https://www.chrisleeportfolio.com

Explanation when you click “Wat this?”:

> This is a Streamlit app I prototyped for performing semantic search on the King James Bible. It conducts full text search as well as semantic search, which is useful for surfacing passages that are similar in meaning to the query, even if the passages don't explicitly contain the query keyword(s). Suppose you wanted to bring up all verses that reference the infamous snake that tempted Eve. In a traditional keyword search system, searching for 'snake' wouldn't yield any results because the KJV uses the term 'serpent'. A semantic search system would take that 'snake' query and retrieve the relevant verses that contain 'serpent' as well as similar verses like ones about reptiles.

> Under the hood, I've generated vector embeddings of every verse in the Bible using SBERT (https://www.sbert.net/), and stored those embeddings in a vector database called Pinecone (https://www.pinecone.io). Every time you submit a query, it's converted to its vector representation using SBERT. That query vector is then sent to Pinecone, which performs an Approximate Nearest Neighbor (https://www.pinecone.io/learn/what-is-similarity-search/) search, retrieving the top n verses that are the most semantically similar to our query. The verses returned are ranked in order of most to least similar.

(Full disclosure: I work for Pinecone, but I have no connection to this demo.)

How does pinecone compare to pgvector?

https://github.com/pgvector/pgvector