Not sure on your interest/use case but something that is designed for "documents in" -> "documents out" is here https://github.com/marqo-ai/marqo. It does retrieval using embeddings and combines all the text splitting and inference operations and can be easily deployed to production (its designed for that, not pip install). Works across images and allows for multi-vector representations.