> This is a serious bug that has so far resisted all attempts at a fix.

Does ChatGPT have a large test suite consisting of a large number of input question and expected responses that have to match?

Or an equivalent specification?

If not, there can be no "bug" in ChatGPT.

>Does ChatGPT have a large test suite consisting of a large number of input question and expected responses that have to match?

They are trying to crowdsource this with OpenAI evals. https://github.com/openai/evals

I'm sure they have a lot of internal benchmarks too, but of course they won't share them.

>If not, there can be no "bug" in ChatGPT.

I don't understand the objection. Are you claiming that bugs only exist if you have testcases for them?