This sounds less like Embeddings as a Service and more like Semantic Search (which happens to be using embeddings) as a Service.

Search is one use case we support, but you can perform a few other operations on your data, like clustering or fine-tuning. We're also working on a classification feature. Are there other async jobs you'd like to see?

The problem I'd like solved is that when I want to retrieve chunks of data for retrieval augmented generation, it's challenging to optimize the choice of embeddings model, chunking strategy, and overall retrieval algorithm. I'm not sure if that's the sort of problem you're focused on.

if you want some more options (chunking, models, +more) check here https://github.com/marqo-ai/marqo and an example for RAG using context aware trimming of text for fitting into context windows https://github.com/marqo-ai/marqo/blob/mainline/examples/GPT...