Free idea for GitHub: a huge bit of missing context for the model right now is the last few edits made by the user. If you move your cursor to a different part of a long file, Copilot immediately forgets about that part of the file. If it knew the last few edits you did then it would be able to make much more intelligent suggestions based on the task you're working on, rather than just current cursor position.

Not sure how easy it would be to make work. Code edit data is not that prevalent. The best I can think of is looking at github commit changes. That's one place where Repl.it has a big advantage as it has live editing data from its users

modeless

They could start by simply including the code around previous cursor positions as additional context the same way they do with code from other files. Nothing specific to the edits themselves. That alone would help a lot I think. Maybe they already do but I don't think so based on the behavior I see, and this article doesn't mention anything like that.

But Copilot is getting tons of live editing data from its users too, and soon should be able to construct a nice dataset of edits. There's no way they aren't already doing that.

letitgo12345

You would be taking snippets of code (that are potentially unparseable), concating them together and putting it in the prompt. The issue is that it would be kind of prompt the model has never seen before in the training data. Maybe it would work with some clever 0-shot prompt. But if you look at the fill in the middle paper from OpenAI for example, they specifically pretrain the model with that kind of data to make it work.

The live data is gonna be useful though ya. Is Copilot allowed to use it though under ToS?

thakkarparth007

The "Privacy – Copilot for Individuals" section under https://github.com/features/copilot does say that Copilot collects code snippets if allowed by telemetry.

> User Engagement Data When you use GitHub Copilot it will collect usage information about events generated when interacting with the IDE or editor. These events include user edit actions like completions accepted and dismissed, and error and general usage data to identify metrics like latency and features engagement. This information may include personal data, such as pseudonymous identifiers.

> Code Snippets Data Depending on your preferred telemetry settings, GitHub Copilot may also collect and retain the following, collectively referred to as “code snippets”: source code that you are editing, related files and other files open in the same IDE or editor, URLs of repositories and files path.