What does HackerNews think of sha1collisiondetection?

Library and command line tool to detect SHA-1 collision in a file

Language: C

Git no longer uses SHA-1. It instead uses a variant called SHA-1DC that detects some known problems, and in those cases returns a different answer. More info: <https://github.com/cr-marcstevens/sha1collisiondetection>. Git switched to SHA-1DC in its version 2.13 release in 2017. It's a decent stopgap but not a grrat long term solution.

There is also work to support SHA-256, though that seems to have stalled: https://lwn.net/Articles/898522/

The fundamental problem is that get developers assumed that hash algorithms would never be changed, and that was a ridiculous assumption. It's much wiser to implement crypto agility.

For sha1, people made a system where you can detect the patterns that lead to a collision, and (for example) replace it with a different hash only for inputs that would be a problem. https://github.com/cr-marcstevens/sha1collisiondetection i think git does this to eek more life out of sha1.

I imagine you could take a similar counter-cryptnalysis approach to md5. (I am out of my depth here, so there could be reasons this doesnt work for md5 im unaware of)

They're more likely to employ counter-cryptanalysis [1] in the meantime.

[1] https://github.com/cr-marcstevens/sha1collisiondetection

> it's very hard to make a "bad" variant of the code [...] looking like sane code

That is very hard, but not what was quoted above. The length has no part in it. The core part needed for the shattered collision attacks involves basically binary data.

    $ curl -s https://shattered.io/static/shattered-1.pdf | hexdump -C > s1
    $ curl -s https://shattered.io/static/shattered-2.pdf | hexdump -C > s2
    $ diff s1 s2
    13,20c13,20
    < 000000c0  73 46 dc 91 66 b6 7e 11  8f 02 9a b6 21 b2 56 0f  |sF..f.~.....!.V.|
    < 000000d0  f9 ca 67 cc a8 c7 f8 5b  a8 4c 79 03 0c 2b 3d e2  |..g....[.Ly..+=.|
    < 000000e0  18 f8 6d b3 a9 09 01 d5  df 45 c1 4f 26 fe df b3  |..m......E.O&...|
    < 000000f0  dc 38 e9 6a c2 2f e7 bd  72 8f 0e 45 bc e0 46 d2  |.8.j./..r..E..F.|
    < 00000100  3c 57 0f eb 14 13 98 bb  55 2e f5 a0 a8 2b e3 31  | 000000c0  7f 46 dc 93 a6 b6 7e 01  3b 02 9a aa 1d b2 56 0b  |.F....~.;.....V.|
    > 000000d0  45 ca 67 d6 88 c7 f8 4b  8c 4c 79 1f e0 2b 3d f6  |E.g....K.Ly..+=.|
    > 000000e0  14 f8 6d b1 69 09 01 c5  6b 45 c1 53 0a fe df b7  |..m.i...kE.S....|
    > 000000f0  60 38 e9 72 72 2f e7 ad  72 8f 0e 49 04 e0 46 c2  |`8.rr/..r..I..F.|
    > 00000100  30 57 0f e9 d4 13 98 ab  e1 2e f5 bc 94 2b e3 35  |0W...........+.5|
    > 00000110  42 a4 80 2d 98 b5 d7 0f  2a 33 2e c3 7f ac 35 14  |B..-....*3....5.|
    > 00000120  e7 4d dc 0f 2c c1 a8 74  cd 0c 78 30 5a 21 56 64  |.M..,..t..x0Z!Vd|
    > 00000130  61 30 97 89 60 6b d0 bf  3f 98 cd a8 04 46 29 a1  |a0..`k..?....F).|
An ASCII formatted file only has text data. Also, with the shattered attack you can't choose what the two versions should be so you are required to cross reference the different looking binary data to turn on/turn off some functionality. So the attack is mostly interesting when you include binary data. With the chosen prefix attack, you can have two arbitrary components, even textual ones, but they still have to be followed by such a binary component.

Also now git has collision detection code from sha1collisiondetection [1], making attacks even harder.

[1]: https://github.com/cr-marcstevens/sha1collisiondetection

No, you and joeyh are incorrect about the test (but correct about the result). As can be seen in the output, SHA1(bar)= f1d2d2f924e986ac86fdf7b36c94bcdf32beec15 but git_SHA1(bar) = 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 . Why is there a difference? Not because of hardened SHA1. Hardened SHA1 essentially always produces identical outputs to SHA1

> git doesn't really use SHA-1 anymore, it uses Hardened-SHA-1 (they just so happen to produce the same outputs 99.99999999999...% of the time).[1]

https://stackoverflow.com/questions/10434326/hash-collision-...

There's essentially no chance that the string "foo\n" fell into that tiny probability of difference. The reason there's a difference is because before git hashes something, git will do various processing to it (maybe appending and prepending various things) and those things broke the carefully created collision. But a chosen-prefix attack might mean those various things can be accounted for, and a collision could still be found.

So we need to directly run hardened SHA1 on the data, which I believe is located at https://github.com/cr-marcstevens/sha1collisiondetection

As seen in https://github.com/git/git/blob/master/sha1dc_git.c

So I tested that one:

    $ sha1collisiondetection-master/bin/sha1dcsum bar baz messageA messageB shattered-1.pdf shattered-2.pdf
    f1d2d2f924e986ac86fdf7b36c94bcdf32beec15  bar
    f1d2d2f924e986ac86fdf7b36c94bcdf32beec15  baz
    4f3d9be4a472c4dae83c6314aa6c36a064c1fd14 *coll* messageA
    9ed5d77a4f48be1dbf3e9e15650733eb850897f2 *coll* messageB
    16e96b70000dd1e7c85b8368ee197754400e58ec *coll* shattered-1.pdf
    e1761773e6a35916d99f891b77663e6405313587 *coll* shattered-2.pdf
So it does protect against the new attack.
The shattered prefix was chosen as well, see my other comment in the thread: https://news.ycombinator.com/item?id=21980759

The only thing that prefixing the length makes difficult is using the same prefix multiple times: you basically have to make up your mind about the type and length before mounting the shattered attack. Also, the prefix means you have to do your own shattered attack and can't use the PDFs that google provided as proof of their project's success. Price tag for that seems to be 11k.

[1]: https://github.com/cr-marcstevens/sha1collisiondetection

> SHAtter was waived by many because the threat model didn't convincingly apply to them. Example: git.

Git quickly switched to the sha1collisiondetection library[1] by default after the SHAttered attack was published. It's a SHA-1 library written by the authors of the paper which the attack.

Edit: Marc Stevens saying that existing library will mitigate this new attack: https://twitter.com/realhashbreaker/status/11284190295369236...

1. https://github.com/cr-marcstevens/sha1collisiondetection

For those who are paranoid, but can't move away from SHA-1 for whatever reason, consider using SHA-1DC. It's compatible with SHA-1, but will barf on the known collision attack against SHA-1: https://github.com/cr-marcstevens/sha1collisiondetection

It's what Git uses by default, of course there's no guarantee that new SHA-1 attacks won't be discovered, but it's better than nothing.

It is possible; the researchers estimate the likelihood of a false positive at 2^-90 (which puts us back in "Sun engulfs the Earth" territory).

There are metrics that will alert GitHub's infrastructure team if a collision is found (to confirm that we aren't seeing any false positives). Those metrics were quietly shipped (without the matching "die") for a week before flipping the final switch.

If you want to know more about the patterns, see the sha1collisiondetection project:

https://github.com/cr-marcstevens/sha1collisiondetection

There's a research paper linked in the README.

The article links to this repo that actually does the work of finding possible collisions: https://github.com/cr-marcstevens/sha1collisiondetection

My understanding of it is that it runs a SHA-1 and examines the internal state of the of the digest along the way to see if it matches up with known vectors that could be manipulated to cause a collision.

Correct. He's talking about the automated method used on shattered.io to detect files which use the attack. See: https://github.com/cr-marcstevens/sha1collisiondetection

They're basically building that into git so that if this specific collision attack is ever used, git will notice and throw a warning/error.

https://github.com/cr-marcstevens/sha1collisiondetection implements a hash that is compatible with SHA-1 for all non-nefarious purposes, and has no known weakneses.

Somebody already submitted patch series to (optionally) use it in git in place of SHA-1:

https://www.spinics.net/lists/git/msg296714.html

I love the fact that there is a tool for detecting any collision using this algorithm: https://github.com/cr-marcstevens/sha1collisiondetection

and it's super effective: The possibility of false positives can be neglected as the probability is smaller than 2^-90.

It's also interesting that this attack is from the same author that detected that Flame (the nation-state virus) was signed using an unknown collision algorithm on MD5 (cited in the shattered paper introduction).