What does HackerNews think of marqo?

Tensor search for humans.

Language: Python

#70 in Hacktoberfest
Try this https://github.com/marqo-ai/marqo which handles all the chunking for you (and is configurable). Also handles chunking of images in an analogous way. This enables highlighting in longer docs and also for images in a single retrieval step.
As others have correctly pointed out, to make a vector search or recommendation application requires a lot more than similarity alone. We have seen the HNSW become commoditised and the real value lies elsewhere. Just because a database has vector functionality doesn’t mean it will actually service anything beyond “hello world” type semantic search applications. IMHO these have questionable value, much like the simple Q and A RAG applications that have proliferated. The elephant in the room with these systems is that if you are relying on machine learning models to produce the vectors you are going to need to invest heavily in the ML components of the system. Domain specific models are a must if you want to be a serious contender to an existing search system and all the usual considerations still apply regarding frequent retraining and monitoring of the models. Currently this is left as an exercise to the reader - and a very large one at that. We (https://github.com/marqo-ai/marqo, I am a co-founder) are investing heavily into making the ML production worthy and continuous learning from feedback of the models as part of the system. Lots of other things to think about in how you represent documents with multiple vectors, multimodality, late interactions, the interplay between embedding quality and HNSW graph quality (i.e. recall) and much more.
Marqo lets you use state of the art e5 embeddings (which are significantly more performant in retrieval than the openai embeddings), and will handle the embedding generation and retrieval on lucene indexes: https://www.marqo.ai/

It's also available opensource: https://github.com/marqo-ai/marqo

Marqo is an end-to-end vector search engine that handles both embedding creation and retrieval: https://github.com/marqo-ai/marqo
Just adding here that if you want a vector DB that handles inference for you too, you can try Marqo https://github.com/marqo-ai/marqo
Marqo will generate embeddings for you as well as run on a mac/in the cloud. https://github.com/marqo-ai/marqo
You could use Marqo, it is a vector search engine that includes the text chunking, inference for calculating embeddings, vector storage, and vector search. You can pick from a heap of open-source models or bring your own fine-tuned ones. It all runs locally in docker https://github.com/marqo-ai/marqo
Marqo provides automatic, configurable chunking (for example with overlap) and can allow you to bring your own model or choose from a wide range of opensource models. I think e5-large would be a good one to try. https://github.com/marqo-ai/marqo
Someone from Marqo here - if you're looking for an end-to-end vector search DB that handles vector search and transformation you should check out marqo. https://github.com/marqo-ai/marqo
If anyone is looking for a vector search engine, see here https://github.com/marqo-ai/marqo. Has additional functionality to make vector search much easier.
Not sure which one, but if you are after a vector search engine (not just a database) then I can recommend this https://github.com/marqo-ai/marqo. Includes inference, transformations, schema's, multi-modal search, multi-modal queries, multi-modal representations, text chunking and more.
You should check out https://github.com/marqo-ai/marqo for an end-to-end vector search database with batteries included.

Disclaimer, I'm from the Marqo team.

Not sure on your interest/use case but something that is designed for "documents in" -> "documents out" is here https://github.com/marqo-ai/marqo. It does retrieval using embeddings and combines all the text splitting and inference operations and can be easily deployed to production (its designed for that, not pip install). Works across images and allows for multi-vector representations.
Barely grasping but something to do with https://github.com/marqo-ai/marqo, which has come heavily recommended in some other threads.

> marqo: tensor search for humans

Hadn't heard of the thing they were putting their data into, Marqo, a "tensor search for humans" , https://github.com/marqo-ai/marqo
Marqo.ai | Senior Engineer | Hybrid (Melbourne) | Full-Time | https://marqo.ai

At Marqo we’re building an open-source search engine that thinks like humans. Marqo's open-source tensor search engine uses machine learning models for search, improving relevance, and providing solutions to problems that were previously difficult or impossible to solve.

We are growing and looking for a senior engineer to spearhead the development of our open source tensor search database. We are looking for someone who has experience building distributed databases or building search experiences. We are willing to help the right candidate relocate.

Check out our repo here: https://github.com/marqo-ai/marqo Send your resume to [email protected]

A big difficulty in using vector DBs in production for things like embeddings or LLMs it that there is alot that goes into converting and processing raw input into a vector form (think chunking, formatting, encoding, inference, metadata, etc). DBs like pinecone just don't handle any of that and therefore you have to build out large systems to do it yourself.

There are some platforms and open source tools that handle it end to end. https://github.com/marqo-ai/marqo is one, for example that is both open source and has a cloud offering.

Looks really interesting! Are you looking for more vector search integrations? we have one here https://github.com/marqo-ai/marqo which includes a lot of the transformation logic (including inference). If so, we can do a PR
if you want some more options (chunking, models, +more) check here https://github.com/marqo-ai/marqo and an example for RAG using context aware trimming of text for fitting into context windows https://github.com/marqo-ai/marqo/blob/mainline/examples/GPT...
Didn't even realise Milvus was so lacking. https://github.com/marqo-ai/marqo also has a hybrid approach. It's just a more complete/end-to-end platform than pinecone, so it really just depends on what you're building
Thanks for the link. Nice to see Marqo on there (disclaimer I am a co-founder of Marqo). For anyone that is interested it includes a really nice api for handling a lot of the manipulations and operations you want to do (adding, updating, patching documents, filtering, embeddings only a subset of fields, multi-modal querying, multi-modal document representations) which are absent from vector db's. It also takes care of inference https://github.com/marqo-ai/marqo
Marqo.ai | Senior Engineer | Hybrid (Melbourne) | Full-Time | https://marqo.ai

At Marqo we’re building an open-source search engine that thinks like humans. Marqo's open-source tensor search engine uses machine learning models for search, improving relevance, and providing solutions to problems that were previously difficult or impossible to solve.

We are growing and looking for a senior engineer to spearhead the development of our open source tensor search database. We are looking for someone who has experience building distributed databases or building search experiences. We are willing to help the right candidate relocate.

Check out our repo here: https://github.com/marqo-ai/marqo Send your resume to [email protected]

If you're wanting an end to end solution for something like this, you should check out https://github.com/marqo-ai/marqo. It handles all of this and it's entirely open-source too
You forgot to mention https://github.com/marqo-ai/marqo. It's a tensor search platform and has had traditional search features since Day 1.
It also depends how you process your documents to create embedding(s). The open-source tensor search project [marqo](https://github.com/marqo-ai/marqo) does a great job of dealing with these type of fixed attention window problems.
I'd take a look at the open source project https://github.com/marqo-ai/marqo instead. It does 1 and 2 out of the box. You can use CLIP or any model you want really.
You should try https://github.com/marqo-ai/marqo instead. Does all this and more and still very active development.