> We believe that no data has been lost, unless the [...] GitLab copy was the only one.
One difference between how GitLab and GitHub run their infrastructure is that GitLab doesn't keep reflogs, and uses git's default "gc" settings.
As a result they won't have the data in question anymore in many cases[1]. Well, I don't 100% know that for sure, but it's the default configuration of their software, and I'm assuming they use like that themselves.
Whereas GitHub does keep reflogs, and runs "git repack" with the "--keep-unreachable" option. They don't usually delete git data unless someone bothers to manually do it, and they usually have data to reconstruct repositories as they were at any given point in time.
GitHub doesn't expose that to users in any way, although perhaps they'd take pity on some of their users after such an incident.
This isn't a critique of GitLab, just trivia about the storage trade-offs different major Git hosting sites have made, which might be informative to some other people.
I'm surprised no major Git hosting site has opted to provide such a "we have a snapshot of every version ever" feature. People would probably pay for it, you could even make them opt to pay for access to backups you kept already if they screwed things up :)
1. Well, maybe as disaster backups or something. But those are harder to access...
I'm surprised github runs regular git. I'd always assumed they were emulating it, especially with the lag we've observed between github-api and github-git at $DAYJOB (update repo 1 via api, update repo 2 via api, fetch repo 1 and repo 2 via git, we've had cases where the repo 2 update was visible but not the repo 1).