My feeling is that Malone has misconstrued a bunch of Colm MacCárthaigh's arguments. I will now do the same thing, by attempting to summarize MacCárthaigh's arguments myself.

First: TLS (and mTLS) create secure channels. A channel bears many requests and responses. In many designs, a single channel will bear requests for many different users. The job of a textbook secure channel is to prevent an attacker with control of the underlying network from tampering with requests and responses. Binding identities and authorization claims to requests is a different job. The textbook solution to that problem is authenticated requests, not authenticated secure channels. I wouldn't have used the word "layering violation", which I would contend Is Not A Thing, but the point MacCárthaigh makes is important. So: when MacCárthaigh says SQLI and Request Smuggling are "still things", he's not saying that mTLS introduces SQLI, but rather that authenticating at the level of secure channels means that a request smuggling attack is almost automatically a game-over for your application, because you aren't authenticating the requests independently of the channel. Even Basic Auth can potentially avoid that problem, if an attacker has to know a secret to slip into a request in order to forge a request on a compromised channel.

Second: TLS was designed for the WebPKI, and that's its most important application. A consequence of this is that a lot of TLS software is designed for WebPKI threat models. Like everything else the IETF produces, TLS has features that support a lot of other applications, like using heartbeat messages to debug remote processes. But what matters in the real world is the installed base of TLS software, and that installed base does not have sane defaults and safe ergonomics for mTLS. You have to be extra careful rolling it out that way.

Third: TLS is built on X.509, and X.509 is extremely complex and error prone. This complexity is deceptive, because we've spent 20 years filing down the complexity of X.509 in the WebPKI deployment mode, and developers mostly don't have to care about anything except the filenames for their certificates and keys. That's out the window with mTLS, where peers actually have to crack open certificates and look at them to accomplish authn and authz tasks.

Fourth: Revocation is a debacle. It's a debacle everywhere, but TLS complexifies it, because it has to work at Internet scale. If you're managing a fleet for even a large, complex application, you have recourse to designs that make revocation easier to handle.

Fifth: Fleet-wide credential rotation is much harder in mTLS than it is in simpler systems, where the root of trust might just be a simple secret you can quickly load on to a given machine and tabletop how to do a fleetwide deployment in a quick meeting. You can add mechanism to automate certificate issuance, but those systems have to be resilient to the total loss of the root of trust, which is something the TLS ecosystem is not generally good at; the WebPKI could suffer a total loss of all trust roots if we broke RSA, but not so much if we just broke OpenSSL.

Sixth: The standard modern solution to revocation, short-lifetime certificates, also relies on secure clocks. Breaking clocks Internet-wide might be difficult enough to be outside a reasonable threat model, but breaking clocks in a single data center is not.

Seventh: To make short-expiry work in a setting where a compromised certificate isn't just a potential coffee shop MITM for a subset of your users but instead a game-over compromise for the entire application, you might need lifetimes so short they're hard to operationalize without outages. Part of what we like about public key authentication schemes is that they don't require continuous access to a central trusted authority just to keep the system up and running!

Eighth: Serious applications want queryable audit trails for inter-service RPCs, which means having ready, often indexed access to principals and roles asserted by credentials. With expansive mTLS designs, this stuff is buried in X.509 certificates, which creates friction to getting these features built; even applications built on simple bearer tokens often do a bad job of getting this right, and adding friction just makes it harder.

Finally: If you're authenticating and authorizing requests, you want a consistent, preferably rigid structure for expressing claims. X.509 provides you nothing here you'd ever want to use; instead, you're going to slip comma-separated lists into string fields, and then build ad hoc tools to grovel the information you need out of them. You can do this consistently if you (a) build everything in the same language and (b) get everyone to use the same authentication and authorization code paths (as a software security auditor: good luck with that), but you're boned as soon as your mostly-Python application introduces a Rust component.

I could rebut Malone's post point-by-point (for instance: totally unclear to me how a hello-world example of the most simplified APIs a library provides for handling certificates addressing the huge complexity of X.509), but I'm not yet sure I need to; I think a fuller recitation of what MacCárthaigh was trying to say does a better job.

I will say that I'm actually not an mTLS opponent. As I recall, Colm's thread was prompted in part by a draft of my "Child's Garden of Inter-Service Authentication Schemes" post, which was mostly motivated by hatred of JWT and originally spoke warmly about mTLS. And I still like mTLS just fine --- for simple network topologies and trust models that are flat and uniform at the network layer. By all means, use mTLS to make sure only real apps in your environment can talk to Consul, and to make it harder to SSRF things. But be wary about asking it to do more than it's good at.

I await Colm's savaging of my attempt to say in 489479837 words what his short Twitter thread did a better job of communicating.

mmalone

Appreciate the thoughtful response!

Reading it over, I think we mostly agree on the facts. It's easy to do mTLS and x509 wrong. The question, then, is what's easier / more secure: doing mTLS/x509 right or doing something else? I think that's somewhat subjective: it depends on your requirements, your environment, and your skillset.

One point that I'd like to reiterate is this: if you want a consistent cryptographic solution that works everywhere, TLS is pretty much your only choice. You could use something else for client authentication, but you probably still need TLS.

As a strawman, here's a sketch of how I'd recommend doing TLS in a microservice system. I consider this "right" for most garden-variety microservices-in-cloud scenarios and don't think it's particularly hard to do. Most of this is already implemented in https://github.com/smallstep/certificates:

  * Deploy the root cert via automation (so it's quickly rotatable) and/or keep it in a managed HSM/KMS. You might harden root rotation a bit by signing your new root with your old root. But, generally, trust config management or container orchestration to push root(s) (you already trust it to push code and secrets). Root rotation (and, thus, bulk revocation) is now as fast as secret rotation (secrets are generally pushed the same way).

  * Issue short-lived certificates per logical entity. If it gets a box and a name in your architecture diagram, it should get an identity and each instance should get a certificate. Use domain names and email addresses that you control for names. Keep certs simple: one SAN. Certificates bind a name to a public key. That's it.

  * Automate certificate issuance. ACME can work for this, but there are other options (single-use tokens issued by config management, cloud-managed instance identity documents or service accounts, an existing device certificate issued by a manufacturer, etc.)

  * Automate certificate renewal. A simple mTLS HTTPS request works for this. This is easy to implement and easy to scale out with multiple intermediates. "Revoking" a certificate just marks it as "not renewable". To reduce risk of outage, in this architecture, it's safe to renew an expired certificate as long as it's not revoked (ACME-STAR basically does this, but it's push instead of pull).

  * If you really need active revocation, fine. One good solution is to push CRL to a cloud storage bucket. Short-lived certs will keep your CRLs small. If you need to do a mass rotation, rotate roots (push new root, wait for rotation, pull old root).

  * Use secure NTP for time.

  * Index issued certificates. CT (trillian) is cool if you want to be fancy. Your existing database or SIEM also works. zcertificate can parse x509 and output a JSON representation of a certificate that you can map to something like an Elastic Search schema: https://github.com/zmap/zcertificate

I want to respond specifically to your first and final points.

On your first point: I understand that in theory an attacker could slip a request across a secure channel, and binding authentication to a request could in theory prevent that. I don't understand how that's likely to happen in the context I'm thinking of here. Which may be different than the context you're thinking of. So let me clarify.

Suppose I have ` -> -> -> `. Let's focus on ` -> `. I don't see how using end-to-end mTLS, terminating in `` and `` application code, would be any more vulnerable to this variety of attack than an HTTP Basic header like `Authorization: Basic base64(service-a:password)`. Surely, the logic in `` is simply "insert HTTP Basic header into requests on their way out to ``". It doesn't matter if we're authenticating the request or the channel. If you're able to smuggle something malicious into that request, it's gonna get sent over to `` with proper authentication attached.

Are we talking past one another? Are you trying to make ``'s authenticated identity carry through `` to ``? If that's the case, then yes: I see what you're saying and you shouldn't use mTLS for that. I'm not sure if there's a term-of-art here, but I call this "end user identity propagation". You need something like a top-of-stack ticket service (a bearer token) for that. Or, better yet, macaroons. I consider those two separate things, though. mTLS is for authenticating your immediate peer. For end-user identity propagation mTLS is a poor choice.

On your final point: you could, in theory, express claims in x509. I'm sure you're aware, but it's been tried before (e.g., SPKI/SDSI). However, I agree that, unless you really know what you're doing, x509 is too complicated for that. Don't do it. You'll likely screw it up. If you're parsing x509 and ASN.1, you're doing it wrong. If you're processing strings that you've extracted from a certificate, and you're not in the habit of writing your own formal languages, you're definitely doing it wrong. Just put a flat name in a SAN. The only thing you should ever need to do with that string is an exact string comparison. If you need to know roles or groups or some other metadata look them up in a database.

(Or use macaroons)