What does HackerNews think of txtai?
💡 Build AI-powered semantic search applications
Disclaimer: I am the author of txtai
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling and retrieval augmented generation.
txtai adopts a local-first approach. A production-ready instance can be run locally within a single Python instance. It can also scale out when needed.
txtai can use Faiss, Hnswlib or Annoy as it's vector index backend. This is relevant in terms of the ANN-Benchmarks scores.
Disclaimer: I am the author of txtai
GitHub: https://github.com/neuml/txtai
Article: https://medium.com/neuml/introducing-txtai-the-all-in-one-em...
txtai (https://github.com/neuml/txtai) sets out to be an all-in-one embeddings database. This is more than just being a vector database with semantic search. It can embed text into vectors, run LLM workflows, has components for sparse/keyword indexing and graph based search. It also has a relational layer built-in for metadata filtering.
txtai currently supports SQLite/DuckDB for relational data but can be extended. For example, relational data could be stored in Postgres, sparse/dense vectors in Elasticsearch/Opensearch and graph data in Neo4j.
I believe modular solutions like this where internal components can be swapped in and out are the best option but given I'm the author of txtai, I'm a bit biased. This setup enables the scaling and reliability of existing solutions balanced with someone being able get started quickly with a POC to evaluate the use case.
This major release adds sparse, hybrid and subindexes to the embeddings interface. It also makes significant improvements to the LLM pipeline workflow.
While there are a lot of vector databases continually popping up, txtai has now been around 3 years and is expanding on what is possible. This release has a lot of important changes and is the base for a lot to come.
See links below for more.
GitHub: https://github.com/neuml/txtai
Release Notes: https://github.com/neuml/txtai/releases/tag/v6.0.0
Article: https://medium.com/neuml/whats-new-in-txtai-6-0-7d93eeedf804
The ability to build indexes on these JSON functions is important. Found this article to be a good reference: https://www.delphitools.info/2021/06/17/sqlite-as-a-no-sql-d...
This major release adds sparse, hybrid and subindexes to the embeddings interface. It also makes significant improvements to the LLM pipeline workflow.
While there are a lot of vector databases continually popping up, txtai has now been around 3 years and is expanding on what is possible. This release has a lot of important changes and is the base for a lot to come.
See links below for more.
GitHub: https://github.com/neuml/txtai
Release Notes: https://github.com/neuml/txtai/releases/tag/v6.0.0
Article: https://medium.com/neuml/whats-new-in-txtai-6-0-7d93eeedf804
Disclaimer: I am the author of txtai
Disclaimer: I am the author of txtai
Disclaimer: I am the author of txtai
paperai is a sub-project focused on processing medical/scientific papers. https://github.com/neuml/paperai
Disclaimer: I am the author of both
Two key optimizations in EVA's AI-centric query optimizer are:
- Caching: EVA automatically caches and reuses previous query results (especially model inference results), eliminating redundant computation and reducing query processing time.
- Predicate Reordering: EVA optimizes the order in which the query predicates are evaluated (e.g., runs the faster, more selective model first), leading to faster queries and lower inference costs.
Consider these two exploratory queries on a dataset of dog images:
-- Query 1: Find all images of black-colored dogs
SELECT id, bbox FROM dogs
JOIN LATERAL UNNEST(YoloV5(data)) AS Obj(label, bbox, score)
WHERE Obj.label = 'dog'
AND Color(Crop(data, bbox)) = 'black';
-- Query 2: Find all Great Danes that are black-colored
SELECT id, bbox FROM dogs
JOIN LATERAL UNNEST(YoloV5(data)) AS Obj(label, bbox, score)
WHERE Obj.label = 'dog'
AND DogBreedClassifier(Crop(data, bbox)) = 'great dane'
AND Color(Crop(data, bbox)) = 'black';
By reusing the results of the first query and reordering the predicates based on the available cached inference results, EVA runs the second query 10 times faster!More generally, EVA's query optimizer factors the dollar cost of running models for a given AI task (like a question-answering LLM). It picks the appropriate model pipeline with the lowest price that satisfies the user's accuracy requirement.
Query optimization with a declarative query language is the crucial difference between EVA and inspiring AI pipeline frameworks like LangChain and TxtAI [1]. We would love to hear the community's thoughts on the pros and cons of these two approaches.
For example: SELECT id, text, date FROM txtai WHERE similar('machine learning') AND date >= '2023-03-30'
GitHub: https://github.com/neuml/txtai
This article is a deep dive on how the index format works: https://neuml.hashnode.dev/anatomy-of-a-txtai-index
Read more: https://medium.com/neuml/neuspo-d42a6e33031
neuspo is powered by txtai (https://github.com/neuml/txtai)
GitHub: https://github.com/neuml/codequestion
Article: https://medium.com/neuml/find-answers-with-codequestion-2-0-...
txtai can build vector indexes with Faiss/HNSW/Annoy and supports running SQL statements against them. External vector databases can also be plugged in.
https://github.com/neuml/txtai https://github.com/kuprel/min-dalle
txtai workflows can be containerized and run as a cloud serverless function - https://neuml.github.io/txtai/cloud/
Medium article - https://medium.com/neuml/neuspo-d42a6e33031
Much of the logic in neuspo builds on txtai - https://github.com/neuml/txtai
All the source is Apache 2.0 - https://github.com/neuml/txtai
Medium article (https://medium.com/neuml/neuspo-d42a6e33031)
Much of the logic in neuspo builds on txtai (https://github.com/neuml/txtai)