What does HackerNews think of pgvector?
Open-source vector similarity search for Postgres
https://github.com/pgvector/pgvector
They have a WHERE clause built in - no? And then you can additional sort by semantic similarity? Or is this a bit different than that...
Here [1] is an article that shows how to do that with the modern Postgres extension pgvector [2].
The downside is the reliance on a ML model to generate the embeddings, either from OpenAI as mentioned in the article from Supabase, or from an open-source library.
[1] https://supabase.com/blog/openai-embeddings-postgres-vector [2] https://github.com/pgvector/pgvector
For storing the vectors and doing the vector search: https://github.com/pgvector/pgvector
`ankane/pgvector` docker image is a drop in replacement for the postgres image, so you can fire this up with docker very quickly.
It's a normal postgres db with a vector datatype. It can index the vectors and allows efficient retrieval. Both AWS RDS and Google Cloud now support this in their managed Postgres offerings, so postgres+pgvector is a viable managed production vectordb solution.
> Also, how granular should the text chunks be?
That depends on the use case, the size of your corpus, the context of the model you are using, how much money you are willing to spend.
> Has anyone been able to achieve reliable results from these? Preferably w/o using Langchain.
Definitely. We use postgres+pgvector with php.
(1) https://github.com/pgvector/pgvector
(2) https://www.maths.tcd.ie/pub/HistMath/People/Hamilton/OnQuat...
(3) https://www.fda.gov/about-fda/changes-science-law-and-regula...
https://www.postgresql.org/docs/14/arrays.html
The full-text functionality kicks ass
https://www.postgresql.org/docs/14/textsearch.html
It can query JSON and XML documents directly
https://www.postgresql.org/docs/14/datatype-json.html https://www.postgresql.org/docs/14/functions-xml.html
It supports stored procedures
https://www.postgresql.org/docs/14/plpgsql.html
The extension mechanisms are very powerful, if you are interested doing nearest-neighbor vector search like Pinecone or FAISS (super hot today) you can install
https://github.com/pgvector/pgvector
Adding it all up, pgsql has a lot of the functionality you'd expect in a database like Oracle but it is also a product engineering managers love because it is highly reliable and easy to maintain.
On PGVector, I tried to use LangChains class (https://python.langchain.com/en/latest/modules/indexes/vecto...) but it was highly opinionated and it didn't make sense to subclass nor implement interfaces so in this particular project I did it myself.
As part of implementing with SQLModel I absolutely leaned on https://github.com/pgvector/pgvector :)
Thanks for the observation.
Only a sucker being forced to by their investors would use pinecone.
Also: vector database shilling on HN is getting out of hand; multiple companies literally plugging every mention on the radar, some actively begging for upvotes. Looking at it all makes you really appreciate pgvector[1] to a point where you would be more willing to buy 3.2 TB of high-bandwidth NVMe and dedicate it to a large IFV index than ever have to deal with all of this "purpose-built vector database" bullshit.
If anyone from AWS/Google/Azure is listening, please add pgvector [1] into your managed Postgres offerings!
Edit: This, btw, is also the reason why I think that this here popped up on the hackernews frontpage a short while ago: https://github.com/pgvector/pgvector
Also special shoutout to pgvector (https://github.com/pgvector/pgvector), which is used to store all of the embeddings.
The author, Greg[0], wanted to use pgvector in a Postgres services, so he created a PR[1] in our Postgres repo. He then reached out and we decided it would be fun to collaborate on a project together, so he helped us build a "ChatGPT" interface for the supabase docs (which we will release tomorrow).
This article explains all the steps you'd take to implement the same functionality yourself.
I want to give a shout-out to pgvector too, it's a great extension [2]
[0] Greg: https://twitter.com/ggrdson
[1] pgvector PR: https://github.com/supabase/postgres/pull/472
[2] pgvector: https://github.com/pgvector/pgvector