What does HackerNews think of txtai?

💡 Build AI-powered semantic search applications

Language: Python

#9 in API

#11 in Deep learning

#13 in Machine learning

#2 in Natural language processing

#79 in Python

What's the difference between LangChain, llama indexand others like autollm? | Nov 2023

Adding txtai to the list for consideration. Couple relevant links.

1. https://neuml.hashnode.dev/custom-api-endpoints

2. https://github.com/neuml/txtai

txtai is an all-in-one embeddings database for semantic search and LLM workflows | Nov 2023

Project is open source and available on GitHub: https://github.com/neuml/txtai

Approximate Nearest Neighbor Oh Yeah (Annoy) | Oct 2023

If you want an easy way to evaluate Faiss, Hnswlib and Annoy vector backends, check out txtai - https://github.com/neuml/txtai. txtai also supports NumPy and PyTorch vector storage.

Disclaimer: I am the author of txtai

Choosing vector database: a side-by-side comparison | Oct 2023

I'll add txtai to the list: https://github.com/neuml/txtai

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling and retrieval augmented generation.

txtai adopts a local-first approach. A production-ready instance can be run locally within a single Python instance. It can also scale out when needed.

txtai can use Faiss, Hnswlib or Annoy as it's vector index backend. This is relevant in terms of the ANN-Benchmarks scores.

Disclaimer: I am the author of txtai

txtai: open-source embeddings database for semantic search and llm workflows | Sep 2023

More info can be found below.

GitHub: https://github.com/neuml/txtai

Article: https://medium.com/neuml/introducing-txtai-the-all-in-one-em...

Do we really need a specialized vector database? | Aug 2023

There isn't a best universal choice for all situations. If you're already using Postgres and all you want is to add vector search, pgvector might be good enough.

txtai (https://github.com/neuml/txtai) sets out to be an all-in-one embeddings database. This is more than just being a vector database with semantic search. It can embed text into vectors, run LLM workflows, has components for sparse/keyword indexing and graph based search. It also has a relational layer built-in for metadata filtering.

txtai currently supports SQLite/DuckDB for relational data but can be extended. For example, relational data could be stored in Postgres, sparse/dense vectors in Elasticsearch/Opensearch and graph data in Neo4j.

I believe modular solutions like this where internal components can be swapped in and out are the best option but given I'm the author of txtai, I'm a bit biased. This setup enables the scaling and reliability of existing solutions balanced with someone being able get started quickly with a POC to evaluate the use case.

Show HN: txtai: the all-in-one embeddings database - 6.0 released | Aug 2023

Author of txtai here. I'm excited to release txtai 6.0 marking it's 3 year birthday!

This major release adds sparse, hybrid and subindexes to the embeddings interface. It also makes significant improvements to the LLM pipeline workflow.

While there are a lot of vector databases continually popping up, txtai has now been around 3 years and is expanding on what is possible. This release has a lot of important changes and is the base for a lot to come.

See links below for more.

GitHub: https://github.com/neuml/txtai

Release Notes: https://github.com/neuml/txtai/releases/tag/v6.0.0

Article: https://medium.com/neuml/whats-new-in-txtai-6-0-7d93eeedf804

SQLite Functions for Working with JSON | Aug 2023

The built-in JSON functionality is very powerful. txtai (https://github.com/neuml/txtai) takes full advantage of it and stores all relational data as JSON in SQLite.

The ability to build indexes on these JSON functions is important. Found this article to be a good reference: https://www.delphitools.info/2021/06/17/sqlite-as-a-no-sql-d...

txtai 6.0: the all-in-one embeddings database | Aug 2023

Author of txtai here. I'm excited to release txtai 6.0 marking it's 3 year birthday!

This major release adds sparse, hybrid and subindexes to the embeddings interface. It also makes significant improvements to the LLM pipeline workflow.

See links below for more.

GitHub: https://github.com/neuml/txtai

Release Notes: https://github.com/neuml/txtai/releases/tag/v6.0.0

Article: https://medium.com/neuml/whats-new-in-txtai-6-0-7d93eeedf804

Comparison of vector databases | Jul 2023

txtai (https://github.com/neuml/txtai) is another option to consider. It has vector search with SQL, topic modeling and LLM prompt-driven search (retrieval augmented generation).

Disclaimer: I am the author of txtai

The Problem with LangChain | Jul 2023

If you'd like another LLM framework option that also is a vector database, check out txtai (https://github.com/neuml/txtai).

Disclaimer: I am the author of txtai

Langchain Is Pointless | Jul 2023

If you'd like another option built on a foundation of open LLMs, open models and open source, check out txtai (https://github.com/neuml/txtai).

Disclaimer: I am the author of txtai

Build Personal ChatGPT Using Your Data | Jul 2023

txtai makes it easy to use Hugging Face embeddings and Faiss, all local and configurable. https://github.com/neuml/txtai

paperai is a sub-project focused on processing medical/scientific papers. https://github.com/neuml/paperai

Disclaimer: I am the author of both

Show HN: EVA – AI-Relational Database System | May 2023

Expand Context ↕

Great question! Besides improving usability, the key feature of the EVA database system is the query optimizer that seeks to speed up exploratory queries over a given dataset and save money spent on inference.

Two key optimizations in EVA's AI-centric query optimizer are:

- Caching: EVA automatically caches and reuses previous query results (especially model inference results), eliminating redundant computation and reducing query processing time.

- Predicate Reordering: EVA optimizes the order in which the query predicates are evaluated (e.g., runs the faster, more selective model first), leading to faster queries and lower inference costs.

Consider these two exploratory queries on a dataset of dog images:

  -- Query 1: Find all images of black-colored dogs
  SELECT id, bbox FROM dogs 
  JOIN LATERAL UNNEST(YoloV5(data)) AS Obj(label, bbox, score) 
  WHERE Obj.label = 'dog' 
    AND Color(Crop(data, bbox)) = 'black'; 

  -- Query 2: Find all Great Danes that are black-colored
  SELECT id, bbox FROM dogs 
  JOIN LATERAL UNNEST(YoloV5(data)) AS Obj(label, bbox, score) 
  WHERE Obj.label = 'dog' 
    AND DogBreedClassifier(Crop(data, bbox)) = 'great dane' 
    AND Color(Crop(data, bbox)) = 'black';

By reusing the results of the first query and reordering the predicates based on the available cached inference results, EVA runs the second query 10 times faster!

More generally, EVA's query optimizer factors the dollar cost of running models for a given AI task (like a question-answering LLM). It picks the appropriate model pipeline with the lowest price that satisfies the user's accuracy requirement.

Query optimization with a declarative query language is the crucial difference between EVA and inspiring AI pipeline frameworks like LangChain and TxtAI [1]. We would love to hear the community's thoughts on the pros and cons of these two approaches.

[1] https://github.com/neuml/txtai

Faiss: A library for efficient similarity search | Mar 2023

txtai combines Faiss and SQLite to support similarity search with SQL.

For example: SELECT id, text, date FROM txtai WHERE similar('machine learning') AND date >= '2023-03-30'

GitHub: https://github.com/neuml/txtai

This article is a deep dive on how the index format works: https://neuml.hashnode.dev/anatomy-of-a-txtai-index

Vector database built for scalable similarity search | Mar 2023

Expand Context ↕

txtai combines SQLite and Faiss to enable vector search. It also does a lot more than that.

https://github.com/neuml/txtai

OpenAI-to-SQLite | Feb 2023

txtai is an alternative approach to this. It builds a FAISS (also supports HNSW) index alongside a SQLite database. It works with sentence-transformers models. For example, this model https://huggingface.co/sentence-transformers/all-MiniLM-L6-v... is 384 dimensions and works great for semantic search.

https://github.com/neuml/txtai

https://neuml.github.io/txtai/embeddings/query/

One ML framework for micromodels up to large language models | Feb 2023

GitHub repo: https://github.com/neuml/txtai

Show HN: Summarize social media sports data with neuspo | Nov 2022

neuspo uses machine learning and graph networks to summarize content, aggregate similar text into events and run classifiers to only keep relevant fact-driven content.

neuspo is powered by txtai (https://github.com/neuml/txtai)

The Overflow Offline project | Oct 2022

There was a recent HN Post for codequestion which builds an offline semantic index (using https://github.com/neuml/txtai) on the archive.org Stack Overflow dumps - https://news.ycombinator.com/item?id=33110219

GitHub: https://github.com/neuml/codequestion

Article: https://medium.com/neuml/find-answers-with-codequestion-2-0-...

Show HN: Txtai 5.0 released – build semantic search applications and workflows | Sep 2022

Source: https://github.com/neuml/txtai

Measuring the popularity of different vector databases | Sep 2022

https://github.com/neuml/txtai

txtai can build vector indexes with Faiss/HNSW/Annoy and supports running SQL statements against them. External vector databases can also be plugged in.

Show HN: Build simple yet powerful open-source semantic applications | Jul 2022

GitHub Project Link: https://github.com/neuml/txtai

Show HN: Generate webpage summary images with DALL-E mini | Jul 2022

Expand Context ↕

Everything in the notebook is open-source. It mainly uses the following projects:

https://github.com/neuml/txtai https://github.com/kuprel/min-dalle

txtai workflows can be containerized and run as a cloud serverless function - https://neuml.github.io/txtai/cloud/

Machine learning to summarize social media sports data | Jan 2022

neuspo uses machine learning to summarize content, aggregate similar text into events and run classifiers to only keep relevant event-based content.

Medium article - https://medium.com/neuml/neuspo-d42a6e33031

Much of the logic in neuspo builds on txtai - https://github.com/neuml/txtai

No-code workflows for vector search | Jan 2022

Expand Context ↕

Nice, good luck!

All the source is Apache 2.0 - https://github.com/neuml/txtai

Vector Search with SQL | Jan 2022

Link to GitHub project: https://github.com/neuml/txtai

Show HN: Neuspo – machine learning to discover fact-driven sports data | Nov 2020

neuspo uses machine learning to summarize content, aggregate similar text into events and run classifiers to only keep relevant fact-driven content.

Medium article (https://medium.com/neuml/neuspo-d42a6e33031)

Much of the logic in neuspo builds on txtai (https://github.com/neuml/txtai)