I like the integrated telemetry and caching. It looks like the prompt formatter adds text to the system instruction - have you tried using function calling to get json responses? Does the retry mechanism validate against the zod schema?
I've had a bit of trouble getting function calling to work with cases that aren't just extracting some data from the input. The format is correct but it was harder to get the correct data if it wasn't a simple extraction.

Hopefully OpenAI and others will offer something like https://github.com/guidance-ai/guidance at some point to guarantee overall output structure.

Failed validations will retry, but from what I've seen JSONSchema + generated JSON examples are decently reliable in practice for gpt-3.5-turbo and extremely reliable on gpt-4.