What does HackerNews think of post-mortems?

A collection of postmortems. Sorry for the delay in merging PRs!

#25 in Hacktoberfest
I did start a site where I was logging what I could at outagereports.net, but I let it lapse when cpanel fees shot up. Dan Luu's site referenced below is a good source [1]. I think also there is a big list on github somewhere of all k8s-caused outages.

[1] https://github.com/danluu/post-mortems

This is a little more broad, beyond just cloud infra providers, but includes some of the kind of data you're looking for (post-mortems for outage events): https://github.com/danluu/post-mortems
This isn't a good example of an RCA - as other commenters have noted, it's outrightly lying about some issues during the incident, and using creative language to dance around other problems many people encountered.

If you want to dive into postmortems, there are some repos linking other examples

https://github.com/danluu/post-mortems

https://codeberg.org/hjacobs/kubernetes-failure-stories

Hi HN! Working on an incident management startup I’ve become pretty obsessed with discovering and reading public postmortem documents after major incidents, so I built https://postmortem.io as a way to try and better capture and organize the growing number of postmortems companies are publishing in a familiar, reddit-like format.

Besides being able to submit and discuss postmortems, posts are organized by tags that you can subscribe to so you can keep up to date about new postmortems you care about (either through a custom feed or a weekly email digest).

You can also submit your own postmortems as formatted text directly on the site, but admittedly this feature is pretty basic right now (I just started adding a few templates but would love feedback here and ideas for other templates).

I know there are already a few large postmortem lists[1] out there that are variously maintained and HN itself already does a really good job of capturing discussion about postmortems for large enough incidents the moment they happen, but the lists tend to get outdated and lack discussion, and the HN stories get lost after a day or so, so I thought creating a separate place dedicated to these documents might be useful to others as more and more companies participate in the practice.

This is very much a v1 passion project (inspired from the feedback on talks I’ve given about our startup Kintaba) so would love feedback or ideas!

[1] https://github.com/danluu/post-mortems - a commonly referenced existing list

The obvious choice is aviation -- there are thousands of commercial accident reports, many of which lead to process improvements for everyone else afterwards, and general aviation emergencies every day. YouTube especially is full of ATC audio combined with radar visuals and commentary for emergencies.

For tech, Dan Liu maintains a list of tech company incident public post-mortems: https://github.com/danluu/post-mortems

Blaming the intern instead of root causing it as a systemic failure just leads me to wonder if they will actually take the correct steps to prevent this from happening in the future.

Fwiw this is a good archive of mostly well done post mortems: https://github.com/danluu/post-mortems

Created an issue about a ToC, as it would be helpful when looking for things in that list.

But looks really good, thanks for creating and sharing it.

I like reading post mortem posts about security incidents too. There's a repo in GitHub that I follow: https://github.com/danluu/post-mortems

A great source for what to do, what to avoid, etc. Not only for security.

Probably not what he's talking about but there's a nice collection here: https://github.com/danluu/post-mortems
The saying goes, "complex systems fail in complex ways". Check out some of the cloud provider postmortems here for a few fascinating and detailed examples: https://github.com/danluu/post-mortems

You could say certain failures only occur and cascade under Special Circumstances. :)

To me, it looks like just 'Your account details are safe.'

> Additionally, Dell cybersecurity measures are in place to limit the impact of any potential exposure. These include the hashing of our customers’ passwords and a mandatory Dell.com password reset.

Hashed, how? Still using MD5? Is there even a salt?

Verified, by whom? Tim's brother-in-law's new startup who have no security expert staff? Verified as in had the encryption technique tested for collisions? That Dell were using it in the correct manner? Or just, 'Hey, I know that library, it works if you use it right.'

> Dell also retained a digital forensics firm to conduct an independent investigation

Who? Is this just someone who will tick boxes? Or is it a group who know what they're doing? Or were they just hired by marketing based on a pretty website?

> We are disclosing this incident now based on findings communicated to us by our independent digital forensics firm about the attempted extraction.

Wait... This investigation has already been done? Okay... They would have told you a hell of a lot more than you're telling us... So we can't look forward to more information?

> Though it is possible some of this information was removed from Dell’s network, our investigations found no conclusive evidence that any was extracted.

> Credit card and other sensitive customer information was not targeted.

One cannot be said conclusively, whilst the other can... Why? Tell us that CC data is kept separately, and tell us it is safe too. Just saying it's hashed doesn't mean bupkus, so feel free to say it publicly, you reveal nothing about your security features.

> The potentially extracted customer information is limited to names, email addresses and hashed passwords. There is no conclusive evidence any customer information was extracted. Additionally, Dell cybersecurity measures are in place to limit the effects of a potential exposure.

What additional cybersecurity measures? If the data is gone, it's in the wind. Names, and emails and possibly-breakable passwords. Are you talking about how you closed the hole? Then say how you accidentally exposed your victims.

---

Finally, before anyone says that this is an excessive amount of information for Dell to give out... It's what other tech companies relay in their post-mortems. [0]

All this is, is Dell admitting they had a problem. Not saying what that problem was, and not saying what they're doing to prevent it in future. And assuring their victims that they're taking care of them, despite their victims possibly sitting on lost information (a password, possibly in the wild) for nearly a month.

[0] https://github.com/danluu/post-mortems

I wondered "is there a list of interesting postmortems, like there is for 'falsehoods programmers believe' posts?", and found one at https://github.com/danluu/post-mortems
Great link there! Also check out his list of public postmortems at https://github.com/danluu/post-mortems

PS. On HN you should use asterisks to italicize instead of > for quoting.