Can anyone please suggest a good stack for the following:

- calculating text embeddings using open-source/local methods (not OpenAI)

- storing them in a vector database. I'm confused by the myriad of options like Chromadb, Pinecone, etc.

- running vector similarity search using open-source/local methods.

Also, how granular should the text chunks be? Too short and we'll end up with a huge database, too long and we'll probably miss some relevant information in some chunks.

Has anyone been able to achieve reliable results from these? Preferably w/o using Langchain.