What does HackerNews think of chaosmonkey?
Chaos Monkey is a resiliency tool that helps applications tolerate random instance failures.
Few unit tests in the kernel would be able to compete with so many chaos monkeys (a reference to Netflix's https://github.com/Netflix/chaosmonkey).
Like Well Architected Tool/Framework, Trusted Advisor etc?
I'm impressed that chaos monkey [0] is now a first party service. So you can break your own cloud account on demand and pay for breaking it?
[0]: https://github.com/Netflix/chaosmonkey (I think first popular open source implementation of chaos engineering)
Here's one from Netflix that will give you an ulcer: https://github.com/Netflix/chaosmonkey
Here's what Google does: https://www.usenix.org/conference/lisa15/conference-program/...
>> we all get dumber for it
Not _all_. Only those who feel inclined to reject the obvious.
I'd argue that it should be _easier_ for a 2-man company to adapt to cloud service outages, as they likely don't have to keep up with nearly as many backups or moving parts.
Doing so is hard, but only way to reliably know a system behaves given unpredictable failures.
So learn up:
But when the “goal” of the system is just “arbitrary short term desires of management” you can easily point out the problems, but there is no agreement on what constraints you can use to trade-off against it.
Especially for extensibility, where you can get carried away easily with making a system extensible for future changes, many of which turn out to be wasted effort because you did not end up needing that flexibility anyway, and everything changed after Q2 earnings were announced, etc.
In those cases, it can actually be more effective engineering to “overfit” to just what the management wants right now, and just accept that you have to pay the pain of hacking extensibility in on a case by case basis. This definitely reduces wasted effort from a YAGNI point of view.
The closest thing I could think of to the same idea of “regularizing” software complexity would be Netflix’s ChaosMonkey [0], which is basically like Dropout [1] but for deployed service networks instead of neural networks.
Extending this idea to actual software would be quite cool. Something like the QuickCheck library for Haskell, but which somehow randomly samples extensibility needs and penalizes some notion of how hard the code would be to extend to that case. Not even sure how it would work...
[0]: < https://github.com/Netflix/chaosmonkey >
[1]: < https://en.m.wikipedia.org/wiki/Dropout_(neural_networks) >
If Chaos Monkey had been responsible for setting off a global outage, I could imagine business leaders getting cold feet about using a tool like this. In traditional companies, anyways, they'd never have seen the benefit of it and after only hearing the costs, they'd probably be livid that a widespread outage had been caused by something like this.
Spinnaker
Chaos Monkey
https://github.com/Netflix/chaosmonkey
Principles of Chaos Engineering
https://github.com/Netflix/chaosmonkey
https://medium.com/netflix-techblog/chaos-engineering-upgrad...
http://www.oreilly.com/webops-perf/free/chaos-engineering.cs...
- Chaos Monkey by Netflix (https://github.com/Netflix/chaosmonkey)
- Jepsen Tests by Aphyr (http://jepsen.io/)
- PANIC by us (https://github.com/gundb/panic-server)