Reminder that GitHub has blocked Git commit collisions since 2017, and as far as anybody is aware hasn't seen one in the wild.

https://github.blog/2017-03-20-sha-1-collision-detection-on-...

Random collisions in 160-bit space are incredibly unlikely. This is talking about intentional collision, and means that it's entirely feasible for someone with significant compute power to create a git commit that has the exact same hash as another git commit. This could allow someone to silently modify a git commit history to e.g. inject malware or a known "bug" into a piece of software. The modified repository would be indistinguishable if you're only using git hashes.

Git's uses SHA-1 for unique identifiers, which is technically okay as long as they are not considered secure. If git were designed today it would probably use SHA2 or SHA3 but it's probably not going to change due to the massive install base.

Edit: anyone know if git's PGP signing feature creates a larger hash of the data in the repo? If not maybe git should add a feature where signing is done after the computation of a larger hash such as SHA-512 over all commits since the previous signature.

The defence used by GitHub specifically defends against these intentional collisions, not some mirage of random collisions.

Basically you collide a hash like SHA-1 or MD5 by getting it into a state where transitions don't twiddle as many bits, and then smashing the remaining bits by brute force trial. But, such states are weird so from inside the hash algorithm you can notice "Huh, this is that weird state I care about" and flag that at a cost of making the algorithm a little slower. The tweaked SHA1 code is publicly available.

If you're thinking "Oh! I should rip out our safe SHA256 code and use this unsafe but then retro-actively safer SHA1" No. Don't do that. SHA-256 is safer and faster. This is an emergency patch for people for whom apparently 20 years notice wasn't enough warning.

In theory the known way to do this isn't the only way, but, we have re-assuring evidence for MD5 that independent forces (probably the NSA) who have every reason to choose a different way to attack the hash to avoid detection do trigger the same weird states even though they're spending the eye-watering sum of money to break hashes themselves not just copy-pasting a result from a published paper.

So, if I understand correctly: the patched SHA-1 code generates the same hash, but is has checks on the internal state so that it will flag inputs which are likely to be intentionally colliding?