What does HackerNews think of twemproxy?

A fast, light-weight proxy for memcached and redis

Language: C

I haven't created Redis itself but a Redis client library. It all started with a chat server I wanted to write on top of Boost.Beast (and by consequence Boost.Asio) and at the time none of the clients available would fill my needs and wishes

- Be asynchronous and based on Boost.Asio.

- Perform automatic command pipelining [1]. All C++ libraries I looked at the time would open new connections for that, which results in unacceptable performance losses.

- Parse Redis responses directly in their final data structures avoiding extra copies. This is useful for example to store json strings in Redis and read them back efficiently.

With time I built more performance features

- Full duplex communication.

- Support for RESP3 and server pushes on the same connection that is being used for request/response.

- Event demultiplexing: It can server thousands of requests (e.g. websocket sessions) on a single connection to Redis with back pressure. This is important to keep the number of connections to Redis low and avoid latency introduced by countermeasures like [3].

This client was proposed and accepted in Boost (but not yet integrated), interested readers can find links to the review here [2].

[1] https://redis.io/docs/manual/pipelining/

[2] https://github.com/boostorg/redis/issues/53

[3] https://github.com/twitter/twemproxy

Another approach to this problem is to use Twemproxy: https://github.com/twitter/twemproxy, which can be used like a sidecar Redis load-balancer.
Some software that Twitter has put out:

[1] Heron, a realtime, distributed, fault-tolerant stream processing engine - https://github.com/twitter/heron

[2] Finagle, a fault tolerant, protocol-agnostic RPC system - https://github.com/twitter/finagle/

[3] FlockDB, a distributed, fault-tolerant graph database - https://github.com/twitter/flockdb

[4] Gizzard, a flexible sharding framework for creating eventually-consistent distributed datastores - https://github.com/twitter/gizzard

[5] Twemcache, a Twitter fork of Memcached - https://github.com/twitter/twemcache

[6] Twemproxy, a fast, light-weight proxy for memcached and redis - https://github.com/twitter/twemproxy

Twitter puts out a LOT of software, and open-sources a good portion. I don't know if this is smart in the sense of being overstaffed relative to their revenue, but they do it regardless. Some examples:

[1] Fabric, an SDK for mobile apps - https://docs.fabric.io/android/fabric/overview.html

[2] Heron, a realtime, distributed, fault-tolerant stream processing engine - https://github.com/twitter/heron

[3] Finagle, a fault tolerant, protocol-agnostic RPC system - https://github.com/twitter/finagle/

[4] FlockDB, a distributed, fault-tolerant graph database - https://github.com/twitter/flockdb

[5] Ruby implementation of the ICU (International Components for Unicode - https://github.com/twitter/twitter-cldr-rb

[6] Clockwork Raven, Human-Powered Data Analysis with Mechanical Turk - https://github.com/twitter/clockworkraven

[7] Gizzard, a flexible sharding framework for creating eventually-consistent distributed datastores - https://github.com/twitter/gizzard

[8] Twemcache, a Twitter fork of Memcached - https://github.com/twitter/twemcache

[9] Twemproxy, a fast, light-weight proxy for memcached and redis - https://github.com/twitter/twemproxy

[10] Iago, a webapp load tester - https://github.com/twitter/iago

[11] Ospriet, a bestof/voting app - https://github.com/twitter/ospriet

I don't want to argue your point and you should read heavily into the "tragedy of the commons" problem that plagues many open source communities, but companies can give back in many ways, in terms of bug fixes, features and improvements to the projects they use (at least wise companies do).

In Twitter's case, we've also contributed to the Redis ecosystem via twemproxy: https://github.com/twitter/twemproxy

Twemproxy helps scale some of the traffic for the top websites in the world: https://github.com/twitter/twemproxy#users

How is this different from twemproxy[1]?

EDIT: Looks like instead of using distinct ports to delineate separate clusters like twemproxy, it uses key prefix routing. It also supports "replicated pools", and a few other fancy/neat things. Interesting!

[1]: https://github.com/twitter/twemproxy

You can do it with shared storage and/or replication without worrying about data loss on a failure.

The key to scaling and maintaining HA with redis is using clustering, either built into the app or through something like nutcracker (https://github.com/twitter/twemproxy) and making sure you properly balance.

The performance of redis makes it very worthwhile to deal with the persistence/HA issues.

I must say, the best thing to happen to memcached since memcached has been (in my experience) twemproxy[1].

We ($dayjob) have been using it in production and it has been _solid_. twemproxy is quality engineering.

[1]: https://github.com/twitter/twemproxy

Direct link to github repository: https://github.com/twitter/twemproxy

As a sidenote, look at the amount of shenanigans of complexity and redirects in the "github repository link" contained in the article:

    http://links.services.disqus.com/api/click?
    format=go&
    key=cfdfcf52dffd0a702a61bad27507376d&
    loc=http%3A%2F%2Fantirez.com%2Fnews%2F44&
    subId=804356&
    v=1&
    libid=1354646989332&
    out=https%3A%2F%2Fgithub.com%2Ftwitter%2Ftwemproxy&
    ref=http%3A%2F%2Fnews.ycombinator.com%2Fnews&
    title=Twemproxy%2C%20a%20Redis%20proxy%20from%20Twitter%20-%20Antirez%20weblog&
    txt=https%3A%2F%2Fgithub.com%2Ftwitter%2Ftwemproxy&
    jsonp=vglnk_jsonp_13546470034491

HOLY MOLY!
Just to clarify things since I helped write the initial blog post about twemcache... in our original post, we had an error regarding the slab calcification problem we mentioned, this problem ONLY applied to our v1.4.4 fork of memcached. After speaking with the upstream maintainers, we learned that recent memcached versions have addressed some of these problems. These are the type of conversations we want to have.

At the time we adopted memcached, that's the version we went with and made sure it worked well in our production environment as we scaled as a company. We also open sourced twemproxy [https://github.com/twitter/twemproxy] which is a lightweight proxy for memcached which has worked well for us in combination with twemcache and may work well for others too.

We just want to reiterate that twemcache has worked well for our unique environment and any teams evaluating memcached should try all their try all their options, just like any other piece of software you adopt in your stack.

One of the reasons of open sourcing our work was to share our ideas with the memcached community to see what worked well for us and help everyone. For example, this is also how we treat our work with our MySQL fork [https://github.com/twitter/mysql] which we maintain in the open and have signed an OCA with Oracle to help get work pushed upstream so everyone benefits in the long run.

They also announced twemproxy (a fast, light-weight proxy for memcached) which was quietly open sourced before: https://github.com/twitter/twemproxy