Slightly related: what's the difference in use cases of llama-index vs LangChain? I know that LangChain can make retrieval using embeddings and I suspect that the nodes synth step is exclusive to llama-index, but I might be wrong.

Can someone more knowledgeable chime in?

Not sure on your interest/use case but something that is designed for "documents in" -> "documents out" is here https://github.com/marqo-ai/marqo. It does retrieval using embeddings and combines all the text splitting and inference operations and can be easily deployed to production (its designed for that, not pip install). Works across images and allows for multi-vector representations.