I've had a bit of trouble getting function calling to work with cases that aren't just extracting some data from the input. The format is correct but it was harder to get the correct data if it wasn't a simple extraction.

Hopefully OpenAI and others will offer something like https://github.com/guidance-ai/guidance at some point to guarantee overall output structure.

Failed validations will retry, but from what I've seen JSONSchema + generated JSON examples are decently reliable in practice for gpt-3.5-turbo and extremely reliable on gpt-4.