What does HackerNews think of fuse?

A Circuit Breaker for Erlang

Language: Erlang

> If you configure it well (the defaults are not always optimal) you can have your invisible mesh of services survive extended outages of the 3rd party APIs it depends on.

This is something that annoyed me a bit with OTP. The basic strategies aren't really enough for that, so you need something like https://github.com/jlouis/fuse

I wrote something like that myself, but it hasn't seen a ton of use: https://github.com/davidw/hardcore

This is the kind of thing I'm talking about though... we go from "Erlang is a piece of cake for distributed systems" to some pretty advanced concepts and telling people to "roll your own" for an extremely common use case: not wanting your entire application to be taken down by an unreachable external service.

Back in the day, I found this to be a pretty good 'circuit breaker' type of thing: https://github.com/jlouis/fuse

I wrote something similar for the (OTP) application level: https://github.com/davidw/hardcore

Not really, otherwise no one would have written things like this:

https://github.com/jlouis/fuse

> I mean I've got lots of books on OTP. But if i'm building a daemon which reads off a queue and talks to some databases - where are some patterns I can follow?

This is a fair point, even in Erlang land. There are a zillion things encouraging you to "let it crash" and far fewer going beyond that.

One thing that doesn't get mentioned often enough is a circuit-breaker like fuse: https://github.com/jlouis/fuse

This also has some more advanced topics:

https://www.erlang-in-anger.com/

Disclaimer: I'm the author of fuse.

Circuit Breakers provide three things not provided by your above scheme:

First, there is a configurable policy on how many errors to tolerate before breaking. This policy is not baked into your code, but lives outside, often in configuration files. Some of the work that is currently going on is the support for more advanced policies and more advanced ways of ramping up connectivity again on the flip side.

Second, there is no resource buildup. In your example, every request to iTunes will wait and use resources while it is waiting. Once the circuit breaks, you immediately respond with an error. In a system with 10k req/s the buildup is pretty serious if you have a timeout of 5 seconds, say, since it will effectively be 50k reqs waiting. Which could be 50k network sockets.

Third, fuse has a monitoring system built in. Any fuse you create will post its current state to an event manager which can be used to build monitoring applications (essentially this is a low-volume pub/sub pattern). Rolling your own, you have to provide this monitoring yourself, but using fuse, you get a nice way to plug into the fabric. This is used by Riak's Search system yokuzuna for instance.

Finally, what makes fuse special from the other circuit breakers is that it has a full QuickCheck specification. I.e., we have a pseudo-formal account of fuse working as intended according to the specification. In particular, I tend to generate random fuse scenarios for a couple of hours before releasing new versions. This amounts to a couple million random test cases, and we approach a full model-check of the code as we spend more time generating test cases. There are some novel work in there with respect to handling randomness and time in test cases via Erlang's QuickCheck's excellent mocking system. As a result, there have been few bug reports and likewise few fatal errors reported.

(Edit: for completeness, all the code of fuse is online here: https://github.com/jlouis/fuse including documentation)

Erlang has circuit breakers too, like this: https://github.com/jlouis/fuse

Sadly, they are not mentioned much in books or other documentation, despite being a potentially extremely useful piece of infrastructure for some kinds of projects.

A simple approach: Set up your apps' supervision tree such that you got poolboy [1] with a fixed pool of 20 worker processes and the one GenServer that distributes the work.

If you go with one GenServer per node, then that one just connects to redis and just pulls jobs (using BLPOP or whatever). When it has gotten a job it checks out a worker from poolboy and assigns it that work.

You could also just have every single worker process go directly to redis and have it pull jobs in a loop. But where's the fun in that ;)

If you want a single global coordinator instead of one per node you can use :global [2] to globally register a process in the cluster. This process is then cluster-wide reachable under its registered name. It can talk to each of your worker pools in the cluster and round-robin try to check out workers and assign them work. And if you do this you might as well ask yourself if you really need redis instead of keeping it all within your Elixir system.

Deciding on which node this process lives is still up to you, but there are libraries like locks [3] that allow you to automatically determine a leader in your cluster.

And once this is done you can start dealing with overload :)

Of course this is just a simple and naive approach, there are a lot of really useful Erlang libraries to check out. Here's a list of libs that helped me when getting started by reading their docs and sources in no particular order:

https://github.com/heroku/canal_lock - Erlang lock manager for concurrently variable resource numbers

https://github.com/jlouis/safetyvalve - queueing facilities for tasks to be executed so their concurrency and rate can be limited on a running system

https://github.com/fishcakez/sbroker - process broker for matchmaking between two groups of processes using sojourn time based active queue management to prevent congestion.

https://github.com/ferd/backoff - exponential backoffs and timers to be used within OTP processes when dealing with cyclical events, such as reconnections, or generally retrying things

https://github.com/jlouis/fuse - A Circuit Breaker for Erlang

https://github.com/basho/sidejob - Parallel worker and capacity limiting library for Erlang

https://github.com/pspdfkit-labs/sidetask - My humble Elixir wrapper for basho's sidejob

[1] https://github.com/devinus/poolboy also used in Ecto, look through the Ecto sources if you want to see how it's used in Elixir.

[2] http://erlang.org/doc/man/global.html

[3] https://github.com/uwiger/locks and then https://github.com/uwiger/locks/blob/master/doc/locks_leader...

> and if the supervisor crashes then its supervisor will restart it, and so on

Enough of this and it will crash the node. You need to design for this in an Erlang system.

The ever-helpful jlouis has some useful writing on the subject: http://jlouisramblings.blogspot.it/2010/11/on-erlang-state-a...

As well as these: https://github.com/jlouis/fuse

Sadly, this is not discussed as much as it should be in Erlang land.

Na, there are multiple implementations of this in Erlang, it isn't supervision, https://github.com/klarna/circuit_breaker and https://github.com/jlouis/fuse