> This is a serious bug that has so far resisted all attempts at a fix.
Does ChatGPT have a large test suite consisting of a large number of input question and expected responses that have to match?
Or an equivalent specification?
If not, there can be no "bug" in ChatGPT.
>Does ChatGPT have a large test suite consisting of a large number of input question and expected responses that have to match?
They are trying to crowdsource this with OpenAI evals. https://github.com/openai/evals
I'm sure they have a lot of internal benchmarks too, but of course they won't share them.
>If not, there can be no "bug" in ChatGPT.
I don't understand the objection. Are you claiming that bugs only exist if you have testcases for them?