Humans are just compression with extra steps by that logic.

There's a fairly simple technical fix for codex/copilot anyway; stick a search engine on the back end and index the training data and don't output things found in the search engine.

That feature already exists, you can turn it on here:

https://github.com/settings/copilot

More info:

We built a filter to help detect and suppress the rare instances where a GitHub Copilot suggestion contains code that resembles public code on GitHub. You have the choice to turn that filter on or off during setup. With the filter on, GitHub Copilot checks code suggestions with its surrounding code for matches or near matches (ignoring whitespace) against public code on GitHub of about 150 characters. If there is a match, the suggestion will not be shown to you. In addition, we have announced that we are building a feature that will provide a reference for suggestions that resemble public code on GitHub so that you can make a more informed decision about whether and how to use that code, as well as explore and learn how that code is used in other projects.

https://github.com/features/copilot#what-can-i-do-to-reduce-...