A bit more on Twitter/X’s new encrypted messaging

A bit more on Twitter/X’s new encrypted messaging

Update 6/10: Based on a short conversation with an engineering lead at X, some of the devices used at X are claimed to be using HSMs. See more further below.

Matthew Garrett has a nice post about Twitter (uh, X)’s new end-to-end encryption messaging protocol, which is now called XChat. The TL;DR of Matthew’s post is that from a cryptographic perspective, XChat isn’t great. The details are all contained within Matthew’s post, but here’s a quick TL;DR:

  • There’s no forward secrecy. Unlike Signal protocol, which uses a double-ratchet to continuously update the user’s secret keys, the XChat cryptography just encrypts each message under a recipient’s long-term public key. The actual encryption mechanism is based on an encryption scheme from libsodium.
  • User private keys are stored at X. XChat stores user private keys at its own servers. To obtain your private keys, you first log into X’s key-storage system using a password such as PIN. This is needed to support stateless clients like web browsers, and in fairness it’s not dissimilar to what Meta has done with its encryption for Facebook Messenger and Instagram. Of course, those services use Hardware Security Modules (HSMs.)
  • X’s key storage is based on “Juicebox.” To implement their secret-storage system, XChat uses a protocol called Juicebox. Juicebox “shards” your key material across three servers, so that in principle the loss or compromise of one server won’t hurt you.

Matthew’s post correctly identifies that the major vulnerability in X’s system is this key storage approach. If decryption keys live in three non-HSM servers that are all under X’s control, then X could probably obtain anyone’s key and decrypt their messages. X could do this for their own internal purposes: for example because their famously chill owner got angry at some user. Or they could do it because a warrant or subpoena compels them to. If we judge XChat as an end-to-end encryption scheme, this seems like a pretty game-over type of vulnerability.

So in a sense, everything comes down to the security of Juicebox and the specific deployment choices that X made. Since Matthew wrote his post, I’ve learned a bit more about both of these things. In this post I’d like to go on a slightly deeper dive into the Juicebox portion of X’s system. This will hopefully shed some light on what X is up to, and why you shouldn’t use XChat.

What’s Juicebox even for?

Many end-to-end encryption (E2E) apps have run into a specific problem: these systems require users to store their own secret keys. Unfortunately, users are just plain bad at this.

Sometimes we forget keys because we lose our devices. Often we have more than one device, which means our keys end up in the wrong place. A much worse situation occurs when apps want to work in ordinary web browsers: this means that secret keys have to be airlifted into that context as well.

The obvious remedy for this problem is just to store secret keys with the service provider itself. This is convenient, but completely misses the whole point of end-to-end encryption, which is that service providers should not have access to your secrets! Storing decryption keys — in an accessible form — on the provider’s servers is absolutely a no-go.

One way out of this conundrum is for the user to encrypt their secret key, then upload the encrypted value to the service provider. In theory, they can download their secret keys anytime they want and they should know that their secrets are safe. But of course there’s a problem: what secret key are you going to use to encrypt your secret key!? Answering this question quickly leads you into an infinite pile of turtles.

Source: XKCD.

Rather than descend into paradox, systems like Juicebox, Signal SVR, and iCloud Key Vault offer an alternative. Their observation is that while cryptographic keys are hard to hang on to, users generally do remember simple passwords like PINs, particularly if they’re asked to re-enter them periodically. What if we use the user’s PIN/password to encrypt the stronger cryptographic key that we upload to the server?

While this is better than nothing, it isn’t good. Most human-selected passwords and PINs make for terrible cryptographic keys. In particular, short PINs (like the 6-digit decimal pins many people use for their phone passcode) are vulnerable to efficient brute-force guessing attacks. A six-digit PIN provides at most 220 security, which is what cryptographers call “a pretty small number.” Even if you use a “hard” key derivation function like scrypt or Argon2 with insane difficulty settings, you’re still probably still going to lose your data.

Fortunately there is another way.

For many years, cryptographers have considered the problem of turning “weak secrets” into strong ones. The problem is sometimes known as password hardening, and doing it well usually requires additional components. First, you need to have some strong cryptographic secret that can be “mixed” into the user’s password to make a produce a truly strong encryption key. Second, you need some mechanism to limit the number of guessing attempts that the user makes, so an attacker can’t simply run an online attack to work through the PIN-space. This cannot be enforced using cryptography alone: you must add a server (or servers) to enforce these checks. Critically, the server will place limits on how many incorrect passwords the user can enter: e.g., after ten incorrect attempts, the user’s account gets locked or erased.

In one sense, we’re right back where we started: someone needs to operate a a server. If that server is under the control of the service provider, then they can disable the guessing limits and/or extract the server’s secret key material, at which point you’re back to square one.

Many services have engineered some reasonable solutions to this problem, however. They boil down to the following alternatives, which can be implemented separately or together:

  1. The server can be implemented inside of a specialized Hardware Security Module, which is set up so that the provider cannot reprogram it or access its key material (at least, after it’s been configured.) This approach was pioneered by Apple, and is now in use by iCloud Key Vault, Signal SVR, WhatsApp and Facebook Messenger.
  2. Alternatively, the server can be “split up” into multiple pieces that are each run by different parties. The idea here is that the user must contact T out of N different servers in order to obtain the correct key (a common example is 2-of-3.) As long as an attacker cannot compromise T different servers, the combined system will still be able to enforce the guessing limits and prevent attackers from getting the key. Naturally this idea fundamentally depends on the assumption that the servers are not run by the same party!

And at last, we come to Juicebox.

Juicebox is an software-based distributed key hardening service that can be implemented across multiple servers. Users can “enroll” their account into the system, at which point the Juicebox servers will convert their PIN/password into a strong cryptographic key (by mixing it with a secret stored on the Juicebox servers.) Later on, they can contact the Juicebox servers and — assuming they enter the right password, and don’t try too many incorrect guesses — they can obtain that same cryptographic key from the system. Users can specify the number of servers (N) and the threshold (T), and the goal is that the system can survive the loss (or unavailability) of N-T servers, and it should retain its security even if an attacker compromises A<T of them. Most critically, Juicebox enforces limits: if too many incorrect passwords are entered, Juicebox will lock or destroy the user’s account.

In principle, Juicebox servers (called “realms” in the project’s lingo) can be either “software based” or they can be deployed inside of HSMs. However, to the best of my knowledge the HSM capability is not fully supported outside of Juicebox’s one test deployment, and has not been used in deployments (at X or anywhere else.) Update 6/10: but see further below! This means the security of XChat’s version of Juicebox probably comes down to a question of who runs the servers.

So who runs X’s Juicebox servers, and do they use HSMs?

To the best of my knowledge, all of the XChat servers are run in software by Twitter/X itself. If this turns out to be incorrect, I will be thrilled to update this post.

Update: this tweet by an engineering lead claims that they are actually HSMs, and Twitter has just not publicized this or published the key ceremonies that were used to set them up. I am very confused by this because it seems extremely backwards! The problem with this late claim is that there’s really no way to verify this fact other than one or two tweets from someone at X.

To put this more explicitly, without any protections like the verifiable use of HSMs and/or distributing Juicebox servers across mutually-distrustful operators, having three servers does relatively little to protect users’ secrets against the service operator. And even if X is secretly implementing these protections, implementing them in secret is stupid. As a wise man once said:

Verifying that the XChat Juicebox servers are software-only is more complicated. Digging around in the Juicebox Github, you’ll find a software-only as well as an HSM-specific implementation of their “realms.” Specifically, there is an entire repository dedicated to supporting Juicebox on Entrust nShield Solo XC HSMs (see here for instructions) although this same code can also be deployed outside of HSMs. There is even a cool “ceremony” document that a group of administrators can perform to certify that they set the HSM up correctly, and that they destroyed all of the cards that could allow it to be re-programmed!

Just one step from the Juicebox HSM ceremony document. I love this stuff!

However, after speaking to Juicebox’s protocol designer Nora Trapp, I’m doubtful that any of this is in use at X. Nora told me that the Juicebox project shut down over a year ago and the engineers moved on, and what code there is now open source and not actively maintained (this matches with project commits I can see.) Nora also looked at XChat’s Juicebox deployment and sent me the following commentary:

From what I’ve seen, there are four realms currently in use by Twitter: realm-a.x.com, realm-b.x.com, realm-east1.x.com, and realm-west1.x.com.

  • realm-a and realm-b are definitely software realms. They don’t use Noise (only true of software realms) and rely solely on TLS. In contrast, HSM-backed realms use Noise within TLS, with TLS terminated at the LB and Noise inside the HSM.
  • realm-east1 and realm-west1 appear to run code from juicebox-hsm-realm. For example, hitting https://realm-east1.x.com/livez shows they’re likely using the repo unmodified from main. However, this doesn’t mean they’re HSM-backed. The repo includes a “software HSM” for testing, which is insecure and doesn’t provide actual HSM properties.

Timing analysis (e.g. via the convenient x-exec-time response header X left in place) suggests these are indeed using the software HSM. Real HSMs are typically significantly slower. And even if by some chance they’re running real HSMs, no ceremony has been published, so there’s no reason to trust they’re secure or that the key material isn’t exfiltrated.

Nora also wrote a post very recently warning deployers to stop placing all servers under the control of a single service provider, which seems very applicable to what Twitter/X is doing.

Obviously this doesn’t prove that X isn’t using HSMs. Though, obviously, there’s no reason that you should hope something is secure when the deployer is going out of their way not to tell you it is. When it comes to XChat, my advice is that you should assume this deployment is (1) entirely in software, and (2) all Juicebox “realms” are run by the same organization. This means you should assume that your decryption keys could be recovered by X’s server administrators with at most a little fuss, unless you use a very strong password.

That’s bad, but let’s talk more about the Juicebox protocol anyway!

If all you came for was a bit of discussion about the security posture of XChat, then Matthew’s post and the additional notes above are all you need. Unless and until X proves that they’re using HSMs (and have destroyed all programming cards) you should just assume that their Juicebox instantiation is based on software realms under X’s control, and that means it is likely vulnerable to brute-force password-guessing attacks.

The rest of this post is going to look past X.

Let’s assume that a deployer has configured Juicebox intelligently: meaning that some/all realms will be deployed inside of HSMs, and/or realms are spread across multiple organizational trust boundaries — such that no single organization can easily demand recovery of anyone’s password. The question we want to ask now is: what guarantees does Juicebox provide in this setting, and how does the protocol work?

Threshold OPRFs. The core cryptographic primitive inside of Juicebox is called a threshold oblivious pseudorandom function, or t-OPRF. I’ve written about OPRFs before, specifically in the context of Password-Authenticated Key Agreement (PAKE schemes.) Nonetheless, I think it’s helpful to start from the top.

Let’s leave aside the “oblivious” part for a minute. PRFs are functions that take in a key K and a string P (for example, a password), and output a string of bits. We might write as:

O = PRF(K, P)

Provided that an attacker does not know the key, the resulting string O should look random, meaning that the output of a PRF makes for excellent cryptographic keys. In many practical implementations, PRFs are realized using functions like HMAC or CBC-MAC; however, there are many different ways to build them.

An oblivious PRF (OPRF) is a two-party cryptographic protocol that helps a client and server jointly compute the output of a PRF. It works like this:

  1. Imagine a server has the cryptographic key K.
  2. The client has their string P (such as a password.)
  3. At the end of a successful protocol run, the client should obtain O = PRF(K, P). The server gets no output at all.
  4. Critically: neither party should learn the other party’s input, and the server should not learn the client’s result, either.

With this tool in hand, it’s easy to build a very simple password-hardening protocol. Simply configure the server with a key K (preferably a different key for every user account), and then have the client run the OPRF protocol with the server to obtain O = PRF(K, P). The resulting value O will make for an excellent encryption key, which the client can use to encrypt any secret values it wants. The best part of this arrangement is that the OPRF protocol ensures that the server never learns the user’s password P, so no information leaks even if the server is fully malicious!

This basic design has some limitations, of course. It does not allow the server to limit the number password guesses, nor does it allow us to spread the process over many different servers.

Addressing the first problem is easy. When the client first registers their account with the server, they can run the OPRF protocol to obtain O = PRF(K, P). Next, they compute some “authenticator tag” T that’s derived in some way from the secret O. They can then store that tag T on the server. When a user returns to log back into the system, the client and server can run the OPRF protocol, and then conduct some process to verify that the O received by the client is consistent with the tag T stored at the server. (The exact process for doing this is important, and I’ll discuss it further below.) If this is unsuccessful, then the server can increment a counter to indicate an incorrect password guess on that user’s account. If the protocol completes successfully, then the server should reset that counter back to zero.1

Critically, when the counter reaches some maximum (usually ten incorrect attempts), the server must lock the user’s account — or much better, delete the account-specific key K. This is what prevents attackers from systematically guessing their way through every possible PIN.

Splitting up the PRF into multiple servers is only slightly more complicated. The basic idea relies on the fact that the specific OPRF used by Juicebox is based on elliptic curves, and this makes it very amenable to threshold implementations. I’m going to put the details into a separate page right here since it’s a little boring. But you should just take away that (1) the service operator can split up the key K across multiple servers, and (2) a client can talk to any T of them and eventually obtain PRF(K, P).

How does the client prove it got the right key, and what attacks are there?

You’ll notice that these “incorrect attempt” counters are a big part of the protection inside of a system like Juicebox. As long as an attacker can only make, say, a maximum of ten incorrect guessing attempts, then even a relatively weak password like 234984 is probably not too bad. If an attacker can make many possible guesses, then the whole system is fairly weak.

Note that in most of these attacks we are going to make the very strong assumption that the server operator is the attacker and that they’ve deployed Juicebox in such a way that they can’t just read the secrets out of their own instance (e.g., in this hypothetical scenario, they have deployed Juicebox using HSMs or distributed trust.) Since the whole point of end-to-end encryption is that users’ secret keys should not be known to a server operator, this isn’t too strong of an assumption. Moreover, we’ve recently seen governments make secret requests to companies like Apple to force them to bypass their end-to-end encrypted services. This means that both hacking and legal compulsion are real concerns.

Within this setting there exist a handful of attacks that could come up in a system like Juicebox. Some are easier than others to prevent:

  1. An attacker could try to enter a few password guesses, then wait for the real user to log in. As long as the real user enters the correct password, the attempt counter on the server(s) will drop back to zero. This attack could allow you to make up to, say, nine invalid guesses for each genuine user login.2

    The existence of this attack is kind of unavoidable in systems like Juicebox. Fortunately it’s probably not a very efficient attack. Unless the user is logging on constantly (say, because a web browser caches the user’s password and runs the protocol automatically), the rate of attacker guesses is going to be very low. Moreover: this attack can be mitigated by having the server inform the client of the number of incorrect guesses it’s seen since the last time the user logged in, which should help the user to detect the fact that they’re under attack.
  2. If the servers don’t coordinate with each other, a smart attacker could try to guess passwords against different subsets of the Juicebox servers: for example if 2-of-10 servers are needed to recover the key, then an attacker actually could actually obtain many more than ten guesses. This is because the attacker could make at least ten attempts with each subset comprising two servers, before all the necessary servers locked them out. Juicebox’s HSM implementation actually goes to some lengths to prevent this (as SVR did before them) by having servers share the current password counter values with each other using a consensus protocol.

Another possibility is that a malicious (software) server operator could try to attack the protocol directly. Just for fun, I thought it might be interesting to conclude with one possible attack I noticed while perusing the protocol description. I’m almost certain this won’t work in practice — and the Juicebox developers agree, but I thought it was a fun illustration of the types of vulnerabilities that show up in these systems.

Recall that whenever a Juicebox client successfully completes the protocol with a set of T servers, it must somehow convince those servers that it obtained the right key. This is necessary because a correct password entry should reset those servers’ attempt counters back to zero, whereas an incorrect guess will increase the attempt counter. (Without a mechanism to reset the counter, the counter will keep rising until the user’s account gets permanently locked.)

The client deals with this by first computing a value called the unlockKey from the t-OPRF output, and then calculating a set of “tags” called the unlockKeyTags, one for each server (realm) in the system:

The calculation of each unlockKeyTags[i] value is customized to include the “realm ID” of the server, which means that the tag for server i is specific only to that server. It cannot be extracted from a malicious server i and replayed against server j, which would be very bad. The client then sends each value unlockKeyTags[i] to the appropriate server. The servers can verify the received tag against a copy they stored during the account registration process. If it matches, they reset their counter back to zero as follows:

However, the only thing that differentiates these tags from one another is the notion that realm IDs for different servers will always be different. What if this assumption isn’t correct?

The attack I’m concerned about is pretty bizarre, but it looks like this. Imagine a user has previously registered its key into several real HSM-based servers. Now someone hacks into the service provider. This hacker “tricks” clients into sending subsequent login requests to a new set of malicious servers that the attacker spins up using software (i.e., no HSM protections.)

Ordinarily this attack wouldn’t be a disaster, since the OPRF should prevent those new servers from learning the user’s password inputs. But let’s imagine that the hacker sets the “realm ID” of these new servers to be identical to the realm ID of the real (HSM) servers. In this case, the value of unlockKeyTag[i] sent to the malicious servers will be identical to the value that would have been stored within the HSM-based servers. Once the attacker learns this value, they can make an unlimited number of guesses against the HSM server with the same realm ID: this is because the stolen unlockKeyTags[i] value will reset the HSM server’s counter.

I ran this past Nora and she pointed out a number of practical issues that will probably make this sort of attack much less likely to work, but it’s still fun to find this sort of thing. More importantly, I think it shows how delicate distributed protocols like this can be, and how sometimes one’s assumptions may not always be valid.

Post image: Noah Berger/AP Photo

Notes:

  1. Obviously this can be dangerous. An attacker who just wants to deny service to the user can enter deliberately-incorrect guesses until the user’s account becomes permanently locked. To prevent this, most services require that you log In using a traditional password first, then you can access the password strengthening server second. Some systems also add time delays, to ensure that an attacker cannot quickly exhaust the counter.
  2. The “try N attempts and then let the client log in normally” attack was outlined to me by Ian Miers.

Dear Apple: add “Disappearing Messages” to iMessage right now

Dear Apple: add “Disappearing Messages” to iMessage right now

This is a cryptography blog and I always feel the need to apologize for any post that isn’t “straight cryptography.” I’m actually getting a little tired of apologizing for it (though if you want some hard-core cryptography content, there’s plenty here and here.)

Sometimes I have to remind my colleagues that out in the real world, our job is not to solve exciting mathematical problems: it’s to help people communicate securely. And people, at this very moment, need a lot of help. Many of my friends are Federal employees or work for contractors, and they’re terrified that they’re going to lose their job based on speech they post online. Unfortunately, “online” to many people includes thoughts sent over private text messages — so even private conversations are becoming chilled. And while this is hardly the worst thing happening to people in this world, it’s something that’s happening to my friends and neighbors.

(And just in case you think this is partisan: many of the folks I’m talking about are Republicans, and some are military veterans who work for the agencies that keep Americans safe. They’re afraid for their jobs and their kids, and that stuff transcends politics.)

So let me get to the point of this relatively short post.

Apple iMessage is encrypted, but is it “secure”?

A huge majority of my “normie” friends are iPhone users, and while some have started downloading Signal app (and you should too!) — many of them favor one communications protocol: Apple iMessage. This is mostly because iMessage is just the built-in messaging app used on Apple phones. If you don’t know the branding, all you need to know is that iMessage is “blue bubbles” you get when talking to other Apple users.

Apple boasts that the iMessage app encrypts your messages end-to-end, and that it has done so since 2011. This means that all your messages and attachments are encrypted under keys that Apple does not know. The company has been extremely consistent about this, discusses it in their platform security guide, and in recent years they’ve even extended their protocol to provide post-quantum security. (A few years ago my students and I found a bug in the protocol, but Apple fixed it quickly — so I am very personally confident in their encryption.)

This is nice. And it’s all true.

But encryption in transit is only one part of the story. After delivering your messages with its uncompromising post-quantum security, Apple allows two things that aren’t so great:

  1. iMessages stick around your phone forever unless you manually delete them (a process that may need to happen on both sides, and is painfully annoying.)
  2. iMessages are automatically backed up to Apple’s iCloud backup service, if you have that feature on — and since Apple sets this up as the default during iPhone setup, most ordinary people do.

The combination of these two features turn iMessage into a Star-Trek style Captain’s Log of your entire life. Searching around right now, I can find messages from 2015. Even though my brain tells me this was three years ago, I’m reliably informed that this is a whole decade in the past.

Now, while the messages above are harmless, I want to convince you that this is often a very bad thing. People want to have private conversations. They deserve to have private conversations. And their technology should make them feel safe doing so. That means they should know that their messaging software has their back and will make sure those embarrassing or political or risque text messages are not stored forever on someone’s phone or inside a backup.

We know exactly how to fix this, and every other messenger did so long ago

If you install WhatsApp, Facebook Messenger, Signal, Snap or even Telegram (please don’t!) you’ll encounter a simple feature that addresses this problem. It’s usually called “disappearing messages“, but sometimes goes by other names.

I’m almost embarrassed to explain what this feature does, since it’s like explaining how a steering wheel works. Nevertheless. When you start a chat, you can decide how long the messages should stick around for. If your answer is forever, you don’t need to do anything. However, if it’s a sensitive conversation and you want it to be ephemeral in the same way that a phone call is, you can pick a time, typically ranging from 5 minutes to 90 days. When that time expires, your messages just get erased — on both your phone and the phones of the people you’re talking to.

A separate feature of disappearing messages is that some platforms will omit these conversations from device backups, or at least they’ll make sure expired messages can’t be restored. This makes sense because those conversations are supposed to be ephemeral: people are clearly not expecting those text messages to be around in the future, so they’re not as angry if they lose them a few days early.

Beyond the basic technical functionality, a disappearing messages feature says something. It announces to your users that a conversation is actually intended to be private and short-lived, that its blast radius will be contained. You won’t have to think about it years or months down the line when it’s embarrassing. This is valuable not just for what it does technically but for the confidence it gives to users, who are already plenty insecure after years of abuse by tech companies.

Why won’t Apple add a disappearing messages feature?

I don’t know. I honestly cannot tell you. It is baffling and weird and wrong, and completely out of step with the industry they’re embedded in. It is particularly bizarre for a company that has formed its entire brand image around stuff like this:

To recap, nearly every single other messaging product that people use in large numbers (at least here in the US) has some kind of disappearing messages feature. Apple’s omission is starting to be very unique.

I do have some friends who work for Apple Security and I’ve tried to talk to them about this. They usually try to avoid me when I do stuff like this — sometimes they mention lawyers — but when I’m annoying enough (and I catch them in situations where exit is impossible) I can occasionally get them to answer my questions. For example, when I ask “why don’t you turn on end-to-end encrypted iCloud backup by default,” they give me thoughtful answers. They’ll tell me how users are afraid of losing data, and they’ll tell me sad stories of how difficult it is to make those features as usable as unencrypted backup. (I half believe them.)

When I ask about disappearing messages, I get embarrassed sighs and crickets. Nobody can explain why Apple is so far behind on this basic feature even as an option, long after it became standard in every other messenger. Hence the best I can do is speculate. Maybe the Apple executives are afraid that governments will pressure them if they activate a security feature like this? Maybe the Messages app is written in some obsolete language and they can’t update it? Maybe the iMessage servers have become sentient and now control Tim Cook like a puppet? These are not great answers, but they are better than anything the company has offered — and everyone at Apple Security kind of knows it.

In a monument to misaligned priorities, Apple even spent time adding post-quantum encryption to its iMessage protocol — this means Apple users are now safe from quantum computers that don’t exist. And yet users’ most intensely private secrets can still be read off their phone or from a backup by anyone who can guess their passcode and use a search box. This is not a good place to be in 2025, and Apple needs to do better.

A couple of technical notes

Since this is a technical blog I feel compelled to say a few things that are just a tad more detailed than the plea above.

First off, Gene Hoffman points me to a small feature in the Settings of your phone called “Keep Messages” (buried under “Messages” in Settings, and then scrolling way down.) This determines how long your messages will be kept around on your own phone. You cannot set this per conversation, but you can set it to something shorter than “Forever”, say, one year. This is a big decision for some people to make, however, since it will immediately delete any old messages you actually cared about.

More importantly (as mentioned in comments) this only affects your phone. Messages erased via this process will remain on the phones of your conversation partners.

Second, if you really want to secure your iMessages, you should turn on Apple’s Advanced Data Protection feature. This will activate end-to-end encryption for your iCloud backups, and will ensure that nobody but you can access those messages.

This is not the same thing as disappearing messages, because all it protects is backups. Your messages will still exist on your phone and in your encrypted backups. But it at least protects your backups better.

Third, Apple advertises a feature called Messages in iCloud, which is designed to back up and sync your messages between devices. Apple even advertises that this feature is end-to-end encrypted!

I hate this phrasing because it is disastrously misleading. Messages in iCloud may be encrypted, however… If you use iCloud Backup without ADP (which is the default for new iPhones) the Messages in iCloud encryption key itself will be backed up to Apple’s servers in a form that Apple itself can access. And so the content of the Messages in iCloud database will be completely available to Apple, or anyone who can guess your Apple account password.

None of this has anything to do with disappearing messages. However: that feature, with proper anti-backup protections, would go a long way to make these backup concerns obsolete.

Let’s talk about AI and end-to-end encryption

Let’s talk about AI and end-to-end encryption

Recently I came across a fantastic new paper by a group of NYU and Cornell researchers entitled “How to think about end-to-end encryption and AI.” I’m extremely grateful to see this paper, because while I don’t agree with every one of its conclusions, it’s a good first stab at an incredibly important set of questions.

I was particularly happy to see people thinking about this topic, since it’s been on my mind in a half-formed state this past few months. On the one hand, my interest in the topic was piqued by the deployment of new AI assistant systems like Google’s scam call protection and Apple Intelligence, both of which aim to put AI basically everywhere on your phone — even, critically, right in the middle of your private messages. On the other hand, I’ve been thinking about the negative privacy implications of AI due to the recent European debate over “mandatory content scanning” laws that would require machine learning systems to scan virtually every private message you send.

While these two subjects arrive from very different directions, I’ve come to believe that they’re going to end up in the same place. And as someone who has been worrying about encryption — and debating the “crypto wars” for over a decade — this has forced me to ask some uncomfortable questions about what the future of end-to-end encrypted privacy will look like. Maybe, even, whether it actually has a future.

But let’s start from an easier place.

What is end-to-end encryption, and what does AI have to do with it?

From a privacy perspective, the most important story of the past decade or so has been the rise of end-to-end encrypted communications platforms. Prior to 2011, most cloud-connected devices simply uploaded their data in plaintext. For many people this meant their private data was exposed to hackers, civil subpoenas, government warrants, or business exploitation by the platforms themselves. Unless you were an advanced user who used tools like PGP and OTR, most end-users had to suffer the consequences.

And (spoiler alert) oh boy, did those consequences turn out to be terrible.

Headline about the Salt Typhoon group, an APT threat that has essentially owned US telecom infrastructure by compromising “wiretapping” systems. Oh how the worm has turned.

Around 2011 our approach to data storage began to evolve. This began with messaging apps like Signal, Apple iMessage and WhatsApp, all of which began to roll out default end-to-end encryption for private messaging. This technology changed the way that keys are managed, to ensure that servers would never see the plaintext content of your messages. Shortly afterwards, phone (OS) designers like Google, Samsung and Apple began encrypting the data stored locally on your phone, a move that occasioned some famous arguments with law enforcement. More recently, Google introduced default end-to-end encryption for phone backups, and (somewhat belatedly) Apple has begun to follow.

Now I don’t want to minimize the engineering effort required here, which was massive. But these projects were also relatively simple. By this I mean: all of the data encrypted in these projects shared a common feature, which is that none of it needed to be processed by a server.

The obvious limitation of end-to-end encryption is that while it can hide content from servers, it can also make it very difficult for servers to compute on that data.1 This is basically fine for data like cloud backups and private messages, since that content is mostly interesting to clients. For data that actually requires serious processing, the options are much more limited. Wherever this is the case — for example, consider a phone that applies text recognition to the photos in your camera roll — designers typically get forced into a binary choice. On the one hand they can (1) send plaintext off to a server, in the process resurrecting many of the earlier vulnerabilities that end-to-end encryption sought to close. Or else (2) they can limit their processing to whatever can be performed on the device itself.

The problem with the second solution is that your typical phone only has a limited amount of processing power, not to mention RAM and battery capacity. Even the top-end iPhone will typically perform photo processing while it’s plugged in at night, just to avoid burning through your battery when you might need it. Moreover, there is a huge amount of variation in phone hardware. Some flagship phones will run you $1400 or more, and contain onboard GPUs and neural engines. But these phones aren’t typical, in the US and particularly around the world. Even in the US it’s still possible to buy a decent Android phone for a couple hundred bucks, and a lousy one for much less. There is going to be a vast world of difference between these devices in terms of what they can process.

So now we’re finally ready to talk about AI.

Unless you’ve been living under a rock, you’ve probably noticed the explosion of powerful new AI models with amazing capabilities. These include Large Language Models (LLMs) which can generate and understand complex human text, as well as new image processing models that can do amazing things in that medium. You’ve probably also noticed Silicon Valley’s enthusiasm to find applications for this tech. And since phone and messaging companies are Silicon Valley, your phone and its apps are no exception. Even if these firms don’t quite know how AI will be useful to their customers, they’ve already decided that models are the future. In practice this has already manifested in the deployment of applications such as text message summarization, systems that listen to and detect scam phone calls, models that detect offensive images, as well as new systems that help you compose text.

And these are obviously just the beginning.

If you believe the AI futurists — and in this case, I actually think you should — all these toys are really just a palate cleanser for the main entrée still to come: the deployment of “AI agents,” aka, Your Plastic Pal Who’s Fun To Be With.

In theory these new agent systems will remove your need to ever bother messing with your phone again. They’ll read your email and text messages and they’ll answer for you. They’ll order your food and find the best deals on shopping, swipe your dating profile, negotiate with your lenders, and generally anticipate your every want or need. The only ingredient they’ll need to realize this blissful future is virtually unrestricted access to all your private data, plus a whole gob of computing power to process it.

Which finally brings us to the sticky part.

Since most phones currently don’t have the compute to run very powerful models, and since models keep getting better and in some cases more proprietary, it is likely that much of this processing (and the data you need processed) will have to be offloaded to remote servers.

And that’s the first reason that I would say that AI is going to be the biggest privacy story of the decade. Not only will we soon be doing more of our compute off-device, but we’ll be sending a lot more of our private data. This data will be examined and summarized by increasingly powerful systems, producing relatively compact but valuable summaries of our lives. In principle those systems will eventually know everything about us and about our friends. They’ll read our most intimate private conversations, maybe they’ll even intuit our deepest innermost thoughts. We are about to face many hard questions about these systems, including some difficult questions about whether they will actually be working for us at all.

But I’ve gone about two steps farther than I intended to. Let’s start with the easier questions.

What does AI mean for end-to-end encrypted messaging?

Modern end-to-end encrypted messaging systems provide a very specific set of technical guarantees. Concretely, an end-to-end encrypted system is designed to ensure that plaintext message content in transit is not available anywhere except for the end-devices of the participants and (here comes the Big Asterisk) anyone the participants or their devices choose to share it with.

The last part of that sentence is very important, and it’s obvious that non-technical users get pretty confused about it. End-to-end encrypted messaging systems are intended to deliver data securely. They don’t dictate what happens to it next. If you take screenshots of your messages, back them up in plaintext, copy/paste your data onto Twitter, or hand over your device in response to a civil lawsuit — well, that has really nothing to do with end-to-end encryption. Once the data has been delivered from one end to the other, end-to-end encryption washes its hands and goes home

Of course there’s a difference between technical guarantees and the promises that individual service providers make to their customers. For example, Apple says this about its iMessage service:

I can see how these promises might get a little bit confusing! For example, imagine that Apple keeps its promise to deliver messages securely, but then your (Apple) phone goes ahead and uploads (the plaintext) message content to a different set of servers where Apple really can decrypt it. Apple is absolutely using end-to-end encryption in the dullest technical sense… yet is the statement above really accurate? Is Apple keeping its broader promise that it “can’t decrypt the data”?

Note: I am not picking on Apple here! This is just one paragraph from a security overview. In a separate legal document, Apple’s lawyers have written a zillion words to cover their asses. My point here is just to point out that a technical guarantee is different from a user promise.

Making things more complicated, this might not be a question about your own actions. Consider the promise you receive every time you start a WhatsApp group chat (at right.) Now imagine that some other member of the group — not you, but one of your idiot friends — decides to turn on some service that uploads (your) received plaintext messages to WhatsApp. Would you still say the message you received still accurately describes how WhatsApp treats your data? If not, how should it change?

In general, what we’re asking here is a question about informed consent. Consent is complicated because it’s both a moral concept and also a legal one, and because the law is different in every corner of the world. The NYU/Cornell paper does an excellent job giving an overview of the legal bits, but — and sorry to be a bummer here — I really don’t want you to expect the law to protect you. I imagine that some companies will do a good job informing their users, as a way to earn trust. Other companies, won’t. Here in the US, they’ll bury your consent inside of an inscrutable “terms of service” document you never read. In the EU they’ll probably just invent a new type of cookie banner.

This is obviously a joke. But, like, maybe not?

So to summarize: in the near term we’re headed to a world where we should expect increasing amounts of our data to be processed by AI, and a lot of that processing will (very likely) be off-device. Good end-to-end encrypted services will actively inform you that this is happening, so maybe you’ll have the chance to opt-in or opt-out. But if it’s truly ubiquitous (as our futurist friends tell us it will be) then probably your options will be end up being pretty limited.

Some ideas for avoiding AI inference on your private messages (cribbed from Mickens.)

Fortunately all is not lost. In the short term, a few people have been thinking hard about this problem.

Trusted Hardware and Apple’s “Private Cloud Compute”

The good news is that our coming AI future isn’t going to be a total privacy disaster. And in large part that’s because a few privacy-focused companies like Apple have anticipated the problems that ML inference will create (namely the need to outsource processing to better hardware) and have tried to do something about it. This something is essentially a new type of cloud-based computer that Apple believes is trustworthy enough to hold your private data.

Quick reminder: as we already discussed, Apple can’t rely on every device possessing enough power to perform inference locally. This means inference will be outsourced to a remote cloud machine. Now the connection from your phone to the server can be encrypted, but the remote machine will still receive confidential data that it must see in cleartext. This poses a huge risk at that machine. It could be compromised and the data stolen by hackers, or worse, Apple’s business development executives could find out about it.

Apple’s approach to this problem is called “Private Cloud Compute” and it involves the use of special trusted hardware devices that run in Apple’s data centers. A “trusted hardware device” is a kind of computer that is locked-down in both the physical and logical sense. Apple’s devices are house-made, and use custom silicon and software features. One of these is Apple’s Secure Boot, which ensures that only “allowed” operating system software is loaded. The OS then uses code-signing to verify that only permitted software images can run. Apple ensures that no long-term state is stored on these machines, and also load-balances your request to a different random server every time you connect. The machines “attest” to the application software they run, meaning that they announce a hash of the software image (which must itself be stored in a verifiable transparency log to prevent any new software from surreptitiously being added.) Apple even says it will publish its software images (though unfortunately not [update: all of] the source code) so that security researchers can check them over for bugs.

The goal of this system is to make it hard for both attackers and Apple employees to exfiltrate data from these devices. I won’t say I’m thrilled about this. It goes without saying that Apple’s approach is a vastly weaker guarantee than encryption — it still centralizes a huge amount of valuable data, and its security relies on Apple getting a bunch of complicated software and hardware security features right, rather than the mathematics of an encryption algorithm. Yet this approach is obviously much better than what’s being done at companies like OpenAI, where the data is processed by servers that employees (presumably) can log into and access.

Although PCC is currently unique to Apple, we can hope that other privacy-focused services will soon crib the idea — after all, imitation is the best form of flattery. In this world, if we absolutely must have AI everywhere, at least the result will be a bit more secure than it might otherwise be.

Who does your AI agent actually work for?

There are so many important questions that I haven’t discussed in this post, and I feel guilty because I’m not going to get to them. I have not, for example, discussed the very important problem of the data used to train (and fine-tune) future AI models, something that opens entire oceans of privacy questions. If you’re interested, these problems are discussed in the NYU/Cornell paper.

But rather than go into that, I want to turn this discussion towards a different question: namely, who are these models actually going to be working for?

As I mentioned above, over the past couple of years I’ve been watching the UK and EU debate new laws that would mandate automated “scanning” of private, encrypted messages. The EU’s proposal is focused on detecting both existing and novel child sexual abuse material (CSAM); it has at various points also included proposals to detect audio and textual conversations that represent “grooming behavior.” The UK’s proposal is a bit wilder and covers many different types of illegal content, including hate speech, terrorism content and fraud — one proposed amendment even covered “images of immigrants crossing the Channel in small boats.”

While scanning for known CSAM does not require AI/ML, detecting nearly every one of the other types of content would require powerful ML inference that can operate on your private data. (Consider, for example, what is involved in detecting novel CSAM content.) Plans to detect audio or textual “grooming behavior” and hate speech will require even more powerful capabilities: not just speech-to-text capabilities, but also some ability to truly understand the topic of human conversations without false positives. It is hard to believe that politicians even understand what they’re asking for. And yet some of them are asking for it, in a few cases very eagerly.

These proposals haven’t been implemented yet. But to some degree this is because ML systems that securely process private data are very challenging to build, and technical platforms have been resistant to building them. Now I invite you to imagine a world where we voluntarily go ahead and build general-purpose agents that are capable of all of these tasks and more. You might do everything in your technical power to keep them under the user’s control, but can you guarantee that they will remain that way?

Or put differently: would you even blame governments for demanding access to a resource like this? And how would you stop them? After all, think about how much time and money a law enforcement agency could save by asking your agent sophisticated questions about your behavior and data, questions like: “does this user have any potential CSAM,” or “have they written anything that could potentially be hate speech in their private notes,” or “do you think maybe they’re cheating on their taxes?” You might even convince yourself that these questions are “privacy preserving,” since no human police officer would ever rummage through your papers, and law enforcement would only learn the answer if you were (probably) doing something illegal.

This future worries me because it doesn’t really matter what technical choices we make around privacy. It does not matter if your model is running locally, or if it uses trusted cloud hardware — once a sufficiently-powerful general-purpose agent has been deployed on your phone, the only question that remains is who is given access to talk to it. Will it be only you? Or will we prioritize the government’s interest in monitoring its citizens over various fuddy-duddy notions of individual privacy.

And while I’d like to hope that we, as a society, will make the right political choice in this instance, frankly I’m just not that confident.

Notes:

  1. (A quick note: some will suggest that Apple should use fully-homomorphic encryption [FHE] for this calculation, so the private data can remain encrypted. This is theoretically possible, but unlikely to be practical. The best FHE schemes we have today really only work for evaluating very tiny ML models, of the sort that would be practical to run on a weak client device. While schemes will get better and hardware will too, I suspect this barrier will exist for a long time to come.)

Is Telegram really an encrypted messaging app?

Is Telegram really an encrypted messaging app?

This blog is reserved for more serious things, and ordinarily I wouldn’t spend time on questions like the above. But much as I’d like to spend my time writing about exciting topics, sometimes the world requires a bit of what Brad Delong calls “Intellectual Garbage Pickup,” namely: correcting wrong, or mostly-wrong ideas that spread unchecked across the Internet.

This post is inspired by the recent and concerning news that Telegram’s CEO Pavel Durov has been arrested by French authorities for its failure to sufficiently moderate content. While I don’t know the details, the use of criminal charges to coerce social media companies is a pretty worrying escalation, and I hope there’s more to the story.

But this arrest is not what I want to talk about today.

What I do want to talk about is one specific detail of the reporting. Specifically: the fact that nearly every news report about the arrest refers to Telegram as an “encrypted messaging app.” Here are just a few examples:

This phrasing drives me nuts because in a very limited technical sense it’s not wrong. Yet in every sense that matters, it fundamentally misrepresents what Telegram is and how it works in practice. And this misrepresentation is bad for both journalists and particularly for Telegram’s users, many of whom could be badly hurt as a result.

Now to the details.

Does Telegram have encryption or doesn’t it?

Many systems use encryption in some way or another. However, when we talk about encryption in the context of modern private messaging services, the word typically has a very specific meaning: it refers to the use of default end-to-end encryption to protect users’ message content. When used in an industry-standard way, this feature ensures that every message will be encrypted using encryption keys that are only known to the communicating parties, and not to the service provider.

From your perspective as a user, an “encrypted messenger” ensures that each time you start a conversation, your messages will only be readable by the folks you intend to speak with. If the operator of a messaging service tries to view the content of your messages, all they’ll see is useless encrypted junk. That same guarantee holds for anyone who might hack into the provider’s servers, and also, for better or for worse, to law enforcement agencies that serve providers with a subpoena.

Telegram clearly fails to meet this stronger definition for a simple reason: it does not end-to-end encrypt conversations by default. If you want to use end-to-end encryption in Telegram, you must manually activate an optional end-to-end encryption feature called “Secret Chats” for every single private conversation you want to have. The feature is explicitly not turned on for the vast majority of conversations, and is only available for one-on-one conversations, and never for group chats with more than two people in them.

As a kind of a weird bonus, activating end-to-end encryption in Telegram is oddly difficult for non-expert users to actually do.

For one thing, the button that activates Telegram’s encryption feature is not visible from the main conversation pane, or from the home screen. To find it in the iOS app, I had to click at least four times — once to access the user’s profile, once to make a hidden menu pop up showing me the options, and a final time to “confirm” that I wanted to use encryption. And even after this I was not able to actually have an encrypted conversation, since Secret Chats only works if your conversation partner happens to be online when you do this.

Starting a “secret chat” with my friend Michael on the latest Telegram iOS app. From an ordinary chat screen this option isn’t directly visible. Getting it activated requires four clicks: (1) to get to Michael’s profile (left image), (2) on the “…” button to display a hidden set of options (center image), (3) on “Start Secret Chat”, and (4) on the “Are you sure…” confirmation dialog. After that I’m still unable to send Michael any messages, because Telegram’s Secret Chats can only be turned on if the other user is also online.

Overall this is quite different from the experience of starting a new encrypted chat in an industry-standard modern messaging application, which simply requires you to open a new chat window.

While it might seem like I’m being picky, the difference in adoption between default end-to-end encryption and this experience is likely very significant. The practical impact is that the vast majority of one-on-one Telegram conversations — and literally every single group chat — are probably visible on Telegram’s servers, which can see and record the content of all messages sent between users. That may or may not be a problem for every Telegram user, but it’s certainly not something we’d advertise as particularly well encrypted.

(If you’re interested in the details, as well as a little bit of further criticism of Telegram’s actual encryption protocols, I’ll get into what we know about that further below.)

But wait, does default encryption really matter?

Maybe yes, maybe no! There are two different ways to think about this.

One is that Telegram’s lack of default encryption is just fine for many people. The reality is that many users don’t choose Telegram for encrypted private messaging at all. For plenty of people, Telegram is used more like a social media network than a private messenger.

Getting more specific, Telegram has two popular features that makes it ideal for this use-case. One of those is the ability to create and subscribe to “channels“, each of which works like a broadcast network where one person (or a small number of people) can push content out to millions of readers. When you’re broadcasting messages to thousands of strangers in public, maintaining the secrecy of your chat content isn’t as important.

Telegram also supports large public group chats that can include thousands of users. These groups can be made open for the general public to join, or they can set up as invite-only. While I’ve never personally wanted to share a group chat with thousands of people, I’m told that many people enjoy this feature. In the large and public instantiation, it also doesn’t really matter that Telegram group chats are unencrypted — after all, who cares about confidentiality if you’re talking in the public square?

But Telegram is not limited to just those features, and many users who join for them will also do other things.

Imagine you’re in a “public square” having a large group conversation. In that setting there may be no expectation of strong privacy, and so end-to-end encryption doesn’t really matter to you. But let’s say that you and five friends step out of the square to have a side conversation. Does that conversation deserve strong privacy? It doesn’t really matter what you want, because Telegram won’t provide it, at least not with encryption that protects you from sharing your content with Telegram servers.

Similarly, imagine you use Telegram for its social media-like features, meaning that you mainly consume content rather than producing it. But one day your friend, who also uses Telegram for similar reasons, notices you’re on the platform and decides she wants to send you a private message. Are you concerned about privacy now? And are you each going to manually turn on the “Secret Chat” feature — even though it requires four explicit clicks through hidden menus, and even though it will prevent you from communicating immediately if one of you is offline?

My strong suspicion is that many people who join Telegram for its social media features also end up using it to communicate privately. And I think Telegram knows this, and tends to advertise itself as a “secure messenger” and talk about the platform’s encryption features precisely because they know it makes people feel more comfortable. But in practice, I also suspect that very few of those users are actually using Telegram’s encryption. Many of those users may not even realize they have to turn encryption on manually, and think they’re already using it.

Which brings me to my next point.

Telegram knows its encryption is difficult to turn on, and they continue to promote their product as a secure messenger

Telegram’s encryption has been subject to heavy criticism since at least 2016 (and possibly earlier) for many of the reasons I outlined in this post. In fact, many of these criticisms were made by experts including myself, in years-old conversations with Pavel Durov on Twitter.1

Although the interaction with Durov could sometimes be harsh, I still mostly assumed good faith from Telegram back in those days. I believed that Telegram was busy growing their network and that, in time, they would improve the quality and usability of the platform’s end-to-end encryption: for example, by activating it as a default, providing support for group chats, and making it possible to start encrypted chats with offline users. I assumed that while Telegram might be a follower rather than a leader, it would eventually reach feature parity with the encryption protocols offered by Signal and WhatsApp. Of course, a second possibility was that Telegram would abandon encryption entirely — and just focus on being a social media platform.

What’s actually happened is a lot more confusing to me.

Instead of improving the usability of Telegram’s end-to-end encryption, the owners of Telegram have more or less kept their encryption UX unchanged since 2016. While there have been a few upgrades to the underlying encryption algorithms used by the platform, the user-facing experience of Secret Chats in 2024 is almost identical to the one you’d have seen eight years ago. This, despite the fact that the number of Telegram users has grown by 7-9x during the same time period.

At the same time, Telegram CEO Pavel Durov has continued to aggressively market Telegram as a “secure messenger.” Most recently he issued a scathing criticism of Signal and WhatsApp on his personal Telegram channel, implying that those systems were backdoored by the US government, and only Telegram’s independent encryption protocols were really trustworthy.

While this might be a reasonable nerd-argument if it was taking place between two platforms that both supported default end-to-end encryption, Telegram really has no legs to stand on in this particular discussion. Indeed, it no longer feels amusing to see the Telegram organization urge people away from default-encrypted messengers, while refusing to implement essential features that would widely encrypt their own users’ messages. In fact, it’s starting to feel a bit malicious.

What about the boring encryption details?

This is a cryptography blog and so I’d be remiss if I didn’t spend at least a little bit of time on the boring encryption protocols. I’d also be missing a good opportunity to let my mouth gape open in amazement, which is pretty much what happens every time I look at the internals of Telegram’s encryption.

I’m going to handle this in one paragraph to reduce the pain, and you can feel free to skip past it if you’re not interested.

According to what I think is the latest encryption spec, Telegram’s Secret Chats feature is based on a custom protocol called MTProto 2.0. This system uses 2048-bit* finite-field Diffie-Hellman key agreement, with group parameters (I think) chosen by the server.* (Since the Diffie-Hellman protocol is only executed interactively, this is why Secret Chats cannot be set up when one user is offline.*) MITM protection is handled by the end-users, who must compare key fingerprints. There are some weird random nonces provided by the server, which I don’t fully understands the purpose of* — and that in the past used to actively make the key exchange totally insecure against a malicious server (but this has long since been fixed.*) The resulting keys are then used to power the most amazing, non-standard authenticated encryption mode ever invented, something called “Infinite Garble Extension” (IGE) based on AES and with SHA2 handling authentication.*

NB: Every place I put a “*” in the paragraph above is a point where expert cryptographers would, in the context of something like a professional security audit, raise their hands and ask a lot of questions. I’m not going to go further than this. Suffice it to say that Telegram’s encryption is unusual.

If you ask me to guess whether the protocol and implementation of Telegram Secret Chats is secure, I would say quite possibly. To be honest though, it doesn’t matter how secure something is if people aren’t actually using it.

Is there anything else I should know?

Yes, unfortunately. Even though end-to-end encryption is one of the best tools we’ve developed to prevent data compromise, it is hardly the end of the story. One of the biggest privacy problems in messaging is the availability of loads of meta-data — essentially data about who uses the service, who they talk to, and when they do that talking.

This data is not typically protected by end-to-end encryption. Even in applications that are broadcast-only, such as Telegram’s channels, there is plenty of useful metadata available about who is listening to a broadcast. That information alone is valuable to people, as evidenced by the enormous amounts of money that traditional broadcasters spend to collect it. Right now all of that information likely exists on Telegram’s servers, where it is available to anyone who wants to collect it.

I am not specifically calling out Telegram for this, since the same problem exists with virtually every other social media network and private messenger. But it should be mentioned, just to avoid leaving you with the conclusion that encryption is all we need.

Main photo “privacy screen” by Susan Jane Golding, used under CC license.

Notes:

  1. I will never find all of these conversations again, thanks to Twitter search being so broken. If anyone can turn them up I’d appreciate it.

Remarks on “Chat Control”

Remarks on “Chat Control”

On March 23 I was invited to participate in a panel discussion at the European Internet Services Providers Association (EuroISPA). The focus of this discussion was on recent legislative proposals, especially the EU Commission’s new “chat control” content scanning proposal, as well as the future of encryption and fundamental rights. These are the introductory remarks I prepared.

Thank you for inviting me today.

I should start by making brief introduction. I am a professor of computer science and a researcher in the field of applied cryptography. On a day-to-day basis this means that I work on the design of encryption systems. Most of what I do involves building things: I design new encryption systems and try to make existing encryption technologies more useful.

Sometimes I and my colleagues also break encryption systems. I wish I could tell you this didn’t happen often, but it happens much more frequently than you’d imagine, and often in systems that have billions of users and that are very hard to fix. Encryption is a very exciting area to work in, but it’s also a young area. We don’t know all the ways we can get things wrong, and we’re still learning.

I’m here today to answer any questions about encryption in online communication systems. But mainly I’m here because the EU Commission has put forward a proposal that has me very concerned. This proposal, which is popularly called “chat control”, would mandate content scanning technology be added to private messaging applications. This proposal has not been properly analyzed at a technical level, and I’m very worried that the EU might turn it into law.

Before I get to those technical details, I would like to address the issue of where encryption fits into this discussion.

Some have argued that the new proposal is not about encryption at all. At some level these people are correct. The new legislation is fundamentally about privacy and confidentiality, and where law enforcement interests should balance against those things. I have opinions about this, but I’m not an EU citizen. Unfortunately this is a fraught debate that Europeans will have to have among themselves. I don’t envy you.

What concerns me is that the Commission does not appear to have a strong grasp on the technical implications of their proposal, and they do not seem to have considered how it will harm the security of our global communications systems. And this does affect me, because the security of our communications infrastructure is not localized to any one continent: if the 447 million citizens of the EU vote to weaken these technical systems, it could affect all consumers of computer security technology worldwide.

So why is encryption so critical to this debate?

Encryption matters because it is the single best tool we have for securing private data. My time here is limited, but if I thought that using all of it to convince you of this single fact was necessary, I would do that. Literally every other approach we’ve ever used to protect valuable data has been compromised, and often quite badly. And because encryption is the only tool that works for this purpose, any system that proposes to scan private data must — as a purely technical requirement — grapple with the technical challenges it raises when that data is protected with end-to-end encryption.

And those technical implications are significant. I have read the Impact Assessment authored by the Commission, and I hope I am not being rude to this audience when I say that I found it deeply naive and alarming. My impression is that the authors do not understand, at a purely technical level, that they are asking technology providers to deploy systems that none of them know how to build safely. Nor has the Commission consulted people with the technical and scientific expertise that would be needed to make this proposal viable.

In order to explain my concerns, I need to give some brief background on how content scanning systems work: both historically, and in the context that the EU is proposing.

Modern content scanning systems are a new creation. They have only been deployed since only about 2009, and widely deployed only after about 2011. These systems normally evaluate messages uploaded to a server, often a social network or public repository. In historical systems — that is, older systems without end-to-end encryption — they would process unencrypted plaintext data, usually to look for known child sexual abuse media files (or CSAM.) Upon finding such an image, they undertake various reporting: typically alerting employees at the provider, who may then escalate to the police.

Historical scanning systems such as Microsoft’s PhotoDNA used a perceptual hashing algorithm to reduce each image to a “fingerprint” that can be checked against a database of known illicit content. These databases are maintained by child safety organizations such as NCMEC. The hashing algorithms themselves are deliberately imperfect: they are designed to produce similar fingerprints for files that appear (to the human eye) to be identical, even if a user has slightly altered the file’s data.

A first limitation of these systems is that their inaccuracy can be exploited. It is relatively easy, using techniques that have only been developed recently, to make new images that appear to be harmless licit media files, but that will produce a fingerprint that is identical to harmful illicit CSAM.

A second limitation of these hash-based systems is that they cannot detect novel CSAM content. This means that criminals who post newly-created abuse media are effectively invisible to these scanners. Even a decade ago, the task of finding novel CSAM would have required human operators. However, recent advances in AI have made it possible to train deep neural networks on such imagery, so that these networks can try to detect new examples of it:

Of course, the key word in any machine-based image recognition system is “try.” All image recognition systems are somewhat fallible (see example at right) and even when they work well, they often fail to differentiate between licit and illicit content. Moreover these systems can be exploited by malicious users to produce surprising results. I’ll come back to that in a moment.

But allow me to return to the key challenge: integrating these systems with encrypted communication systems.

In end-to-end encrypted systems, such as WhatsApp or Apple iMessage or Signal, server-side scanning is no longer viable. The problem here is that private data is encrypted when it reaches the server, and cannot be scanned. The Commission proposal isn’t specific about how these systems should be handled, but it hints that this scanning should be done on the user’s device before the content is encrypted. This approach is called client side scanning.

There are several challenges here.

First, client-side scanning represents an exception to the privacy guarantees of encrypted systems. In a standard end-to-end encrypted system, your data is private to you and your intended recipient. In a system with client-side scanning, your data is confidential… with an asterisk. That is, the data itself will be private unless the scanning system determines a violation has occurred, at which point your confidentiality will be (silently) revoked and unencrypted data will be transmitted to the provider (and thus, anyone who has compromised your provider.)

This ability to selectively disable encryption creates new opportunities for attacks. If an attacker can identify the conditions that will cause the model to reduce the confidentiality of youe encryption, she can generate new — and apparently harmless — content that will cause this to happen. This will very quickly overwhelm the scanning system, rendering it useless. But it will also seriously reduce the privacy of many users.

A mirror version of this attacker exists as well: he will use knowledge of the model to evade these systems, producing new imagery and content that appear unchanged, but that these systems cannot detect at all. Your most sophisticated criminals — most likely the ones who create this awful content in the first place — will hide in plain sight.

Finally, a more alarming possibility exists: many neural-network classifiers allow for the extraction of the images that were used to train the model. This means every complex neural network model may potentially contain images of abuse victims, who would be exposed to further harm if these models were revealed.

The only known defense against all of these attacks is to tightly protect the models themselves: that is, the ensure that the complex systems of neural network weights and/or hash fingerprints are never revealed. Historical server-side systems to to great lengths to protect this data, even making their very algorithms confidential. This was feasible in server-side scanning systems because the data only exists on a centralized server. It does not work well with client-side scanning, where models must be distributed to users’ phones. And so without some further technical ingredient, models cannot exist either on the server or on the user’s device.

The only serious proposal that has attempted to address this technical challenge was devised — and then subsequently abandoned — by Apple in 2021. That proposal aimed only at detecting known content using a perceptual hash function. The company proposed to use advanced cryptography to “split” the evaluation of hash comparisons between the user’s device and Apple’s servers: this ensured that the device never received a readable copy of the hash database.

Apple’s proposal failed for a number of reasons, but its technical failures provided important lessons that have largely been ignored by the Commission. While Apple’s system protected the hash database, it did not protect the code of the proprietary neural-network-based hash function Apple devised. Within two weeks of the public announcement, users were able to extract this code and devise both the collision attacks and evasion attacks I mentioned above.

One of the first “meaningful” collisions against NeuralHash, found by Gregory Maxwell.
Evasion attacks against Apple’s NeuralHash, from Struppek et al. (source)

The Commission’s Impact Assessment deems the Apple approach to be a success, and does not grapple with this failure. I assure you that this is not how it is viewed within the technical community, and likely not within Apple itself. One of the most capable technology firms in the world threw all their knowledge against this problem, and were embarrassed by a group of hackers: essentially before the ink was dry on their proposal.

This failure is important because it illustrates the limits of our capabilities: at present we do not have an efficient means for evaluating complex neural networks in a manner that allows us to keep them secret. And so model extraction is a real possibility in all proposed client-side scanning systems today. Moreover, as my colleagues and I have shown, even “traditional” perceptual hash functions like Microsoft’s PhotoDNA are vulnerable to evasion and collision attacks, once their code becomes available. These attacks will proliferate, if only because 4chan is a thing: and because some people on the Internet love nothing more than hurting other Internet users.

From Prokos et al. (source)
This example shows how a neural-network based hash function (NeuralHash) can be misled, by making imperceptible changes to an image.

In practice, the Commission’s proposal — if it is implemented in production systems — invites a range of technical attacks that we simply do not comprehend today, and that scientists have barely begun to think about. Moreover, the Commission is not content to restrain themselves to scanning for known CSAM content as Apple did. Their desire to target previously unknown content as well as textual content such as “grooming behavior” poses risks from many parties and requires countermeasures against abuse and surveillance that are completely undeveloped.

Worse: the “grooming behavior” requirement implies that untested, perhaps not-yet-developed AI language models will be a core part of tomorrow’s security systems. This is worrisome, since these models have failure modes and exploit opportunities that we are only beginning to explore.

In my discussion so far I have only scratched the surface of this issue. My analysis today does not consider even more basic issues, such as how we can trust that the purveyors of these opaque models are honest, and that the model contents have not been altered: perhaps by insider attack or malicious outside hackers. Each of these threats was once theoretical, and I have seen them all occur in just the last several years. Nor does it consider how the scope of these systems might be increased by future governments, and how this infrastructure will make future abuses more likely.

In conclusion, I hope that the Commission will rethink its hurried schedule and give this proposal enough time to be evaluated by scientists and researchers here in Europe and around the world. We should seek to understand these technical details as a precondition for mandating new technologies, rather than attempting to “build the airplane while we are flying in it”, which is very much what this proposal will encourage.

Thank you for your time.

Why is Signal asking users to set a PIN, or “A few thoughts on Secure Value Recovery”

Why is Signal asking users to set a PIN, or “A few thoughts on Secure Value Recovery”

Over the past several months, Signal has been rolling out a raft of new features to make its app more usable. One of those features has recently been raising a bit of controversy with users. This is a contact list backup feature based on a new system called Secure Value Recovery, or SVR. The SVR feature allows Signal to upload your contacts into Signal’s servers without — ostensibly — even Signal itself being able to access it.

The new Signal approach has created some trauma with security people, due to the fact that it was recently enabled without a particularly clear explanation. For a shorter summary of the issue, see this article. In this post, I want to delve a little bit deeper into why these decisions have made me so concerned, and what Signal is doing to try to mitigate those concerns.

What’s Signal, and why does it matter?

For those who aren’t familiar with it, Signal is an open-source app developed by Moxie Marlinkspike’s Signal Technology Foundation. Signal has received a lot of love from the security community. There are basically two reasons for this. First: the Signal app has served as a sort of technology demo for the Signal Protocol, which is the fundamental underlying cryptography that powers popular apps like Facebook Messenger and WhatsApp, and all their billions of users.

Second, the Signal app itself is popular with security-minded people, mostly because the app, with its relatively smaller and more technical user base, has tended towards a no-compromises approach to the security experience. Wherever usability concerns have come into conflict with security, Signal has historically chosen the more cautious and safer approach — as compared to more commercial alternatives like WhatsApp. As a strategy for obtaining large-scale adoption, this is a lousy one. If your goal is to build a really secure messaging product, it’s very impressive.

Let me give an example.

Encrypted messengers like WhatsApp and Apple’s iMessage routinely back up your text message content and contact lists to remote cloud servers. These backups undo much of the strong security offered by end-to-end encryption — since they make it much easier for hackers and governments to obtain your plaintext content. You can disable these backups, but it’s surprisingly non-obvious to do it right (for me, at least). The larger services justify this backup default by pointing out that their less-technical users tend be more worried about lost message history than by theoretical cloud hacks.

Signal, by contrast, has taken a much more cautious approach to backup. In June of this year, they finally added a way to manually transfer message history from one iPhone to another, and this transfer now involves scanning QR codes. For Android, cloud backup is possible, if users are willing to write down a thirty-digit encryption key. This is probably really annoying for many users, but it’s absolutely fantastic for security. Similarly, since Signal relies entirely on phone numbers in your contacts database (a point that, admittedly, many users hate), it never has to back up your contact lists to a server.

What’s changed recently is that Signal has begun to attract a larger user base. As users with traditional expectations enter the picture, they’ve been unhappy with Signal’s limitations. In response, the signal developers have begun to explore ways by which they can offer these features without compromising security. This is just plain challenging, and I feel for the developers.

One area in which they’ve tried to square the circle is with their new solution for contacts backup: a system called “secure value recovery.”

What’s Secure Value Recovery?

Signal’s Secure Value Recovery (SVR) is a cloud-based system that allows users to store encrypted data on Signal’s servers — such that even Signal cannot access it — without the usability headaches that come from traditional encryption key management. At the moment, SVR is being used to store users’ contact lists and not message content, although that data may be on the menu for backup in the future.

The challenge in storing encrypted backup data is that strong encryption requires strong (or “high entropy”) cryptographic keys and passwords. Since most of us are terrible at selecting, let alone remembering strong passwords, this poses a challenging problem. Moreover, these keys can’t just be stored on your device — since the whole point of backup is to deal with lost devices.

The goal of SVR is to allow users to protect their data with much weaker passwords that humans can actually can memorize, such as a 4-digit PIN. With traditional password-based encryption, such passwords would be completely insecure: a motivated attacker who obtained your encrypted data from the Signal servers could simply run a dictionary attack — trying all 10,000 such passwords in a few seconds or minutes, and thus obtaining your data.

Signal’s SVR solves this problem in an age-old way: it introduces a computer that even Signal can’t hack. More specifically, Signal makes use of a new extension to Intel processors called Software Guard eXtensions, or SGX. SGX allows users to write programs, called “enclaves”, that run in a special virtualized processor mode. In this mode, enclaves are invisible to and untouchable by all other software on a computer, including the operating system. If storage is needed, enclaves can persistently store (or “seal”) data, such that any attempt to tamper with the program will render that data inaccessible. (Update: as a note, Signal’s SVR does not seal data persistently. I included this in the draft thinking that they did, but I misremembered this from the technology preview.)

Signal’s SVR deploys such an enclave program on the Signal servers. This program performs a simple function: for each user, it generates and stores a random 256-bit cryptographic secret “seed” along with a hash of the user’s PIN. When a user contacts the server, it can present a hash of its PIN to the enclave and ask for the cryptographic seed. If the hash matches what the enclave has stored, the server delivers the secret seed to the client, which can mix it together with the PIN. The result is a cryptographically strong encryption key that can be used to encrypt or decrypt backup data. (Update: thanks to Dino Dai Zovi for correcting some details in here.)

The key to this approach is that the encryption key now depends on both the user’s password and a strong cryptographic secret stored by an SGX enclave on the server. If SGX does its job, then even a user who hacks into the Signal servers — and here we include the Signal developers themselves, perhaps operating under duress — will be unable to retrieve this user’s secret value. The only way to access the backup encryption key is to actually run the enclave program and enter the user’s hashed PIN. To prevent brute-force guessing, the enclave keeps track of the number of incorrect PIN-entry attempts, and will only allow a limited number before it locks that user’s account entirely.

This is an elegant approach, and it’s conceptually quite similar to systems already deployed by Apple and Google, who use dedicated Hardware Security Modules to implement the trusted component, rather than SGX.

The key weakness of the SVR approach is that it depends strongly on the security and integrity of SGX execution. As we’ll discuss in just a moment, SGX does not exactly have a spotless record.

What happens if SGX isn’t secure?

Anytime you encounter a system that relies fundamentally on the trustworthiness of some component — particularly a component that exists in commodity hardware — your first question should be: “what happens if that component isn’t actually trustworthy?”

With SVR that question takes on a great deal of relevance.

Let’s step back. Recall that the goal of SVR is to ensure three things:

  1. The backup encryption key is based, at least in part, on the user’s chosen password. Strong passwords mean strong encryption keys.
  2. Even with a weak password, the encryption key will still have cryptographic strength. This comes from the integration of a random seed that gets chosen and stored by SGX.
  3. No attacker will be able to brute-force their way through the password space. This is enforced by SGX via guessing limits.
Example of a high-entropy passphrase (from this random manual). Please don’t use this as your Signal password.

Note that only the first goal is really enforced by cryptography itself. And this goal will only be achieved if the user selects a strong (high-entropy) password. For an example of what that looks like, see the picture at right.

The remaining goals rely entirely on the integrity of SGX. So let’s play devil’s advocate, and think about what happens to SVR if SGX is not secure.

If an attacker is able to dump the memory space of a running Signal SGX enclave, they’ll be able to expose secret seed values as well as user password hashes. With those values in hand, attackers can run a basic offline dictionary attack to recover the user’s backup keys and passphrase. The difficulty of completing this attack depends entirely on the strength of a user’s password. If it’s a BIP39 phrase, you’ll be fine. If it’s a 4-digit PIN, as strongly encouraged by the UI of the Signal app, you will not be.

(The sensitivity of this data becomes even worse if your PIN happens to be the same as your phone passcode. Make sure it’s not!)

Similarly, if an attacker is able to compromise the integrity of SGX execution: for example, to cause the enclave to run using stale “state” rather than new data, then they might be able to defeat the limits on the number of incorrect password (“retry”) attempts. This would allow the attacker to run an active guessing attack on the enclave until they recover your PIN. (Edit: As noted above, this shouldn’t be relevant in SVR because data is stored only in RAM, and never sealed or written to disk.)

A final, and more subtle concern comes from the fact that Signal’s SVR also allows for “replication” of the backup database. This addresses a real concern on Signal’s part that the backup server could fail — resulting in the loss of all user backup data. This would be a UX nightmare, and understandably, Signal does not want users to be exposed to it.

To deal with this, Signal’s operators can spin up a new instance of the Signal server on a cloud provider. The new instance will have a second copy of the SGX enclave software, and this software can request a copy of the full seed database from the original enclave. (There’s even a sophisticated consensus protocol to make sure the two copies stay in agreement about the state of the retry counters, once this copy is made.)

The important thing to keep in mind is that the security of this replication process depends entirely on the idea that the original enclave will only hand over its data to another instance of the same enclave software running on a secure SGX-enabled processor. If it was possible to trick the original enclave about the status of the new enclave — for example, to convince it to hand the database over to a system that was merely emulating an SGX enclave in normal execution mode — then a compromised Signal operator would be able to use this mechanism to exfiltrate a plaintext copy of the database. This would break the system entirely.

Prevention against this attack is accomplished via another feature of Intel SGX, which is called “remote attestation“. Essentially, each Intel processor contains a unique digital signing key that allows it to attest to the fact that it’s a legitimate Intel processor, and it’s running a specific piece of enclave software. These signatures can be verified with the assistance of Intel, which allows enclaves to verify that they’re talking directly to another legitimate enclave.

The power of this system also contains its fragility: if a single SGX attestation key were to be extracted from a single SGX-enabled processor, this would provide a backdoor for any entity who is able to compromise the Signal developers.

With these concerns in mind, it’s worth asking how realistic it is that SGX will meet the high security bar it needs to make this system work.

So how has SGX done so far?

Not well, to be honest. A list of SGX compromises is given on Wikipedia here, but this doesn’t really tell the whole story.

The various attacks against SGX are many and varied, but largely have a common cause: SGX is designed to provide virtualized execution of programs on a complex general-purpose processor, and said processors have a lot of weird and unexplored behavior. If an attacker can get the processor to misbehave, this will in turn undermine the security of SGX.

This leads to attacks such as “Plundervolt“, where malicious software is able to tamper with the voltage level of the processor in real-time, causing faults that leak critical data. It includes attacks that leverage glitches in the way that enclaves are loaded, which can allow an attacker to inject malicious code in place of a proper enclave.

The scariest attacks against SGX rely on “speculative execution” side channels, which can allow an attacker to extract secrets from SGX — up to and including basically all of the working memory used by an enclave. This could allow extraction of values like the seed keys used by Signal’s SVR, or the sealing keys (used to encrypt that data on disk.) Worse, these attacks have not once but twice been successful at extracting cryptographic signing keys used to perform cryptographic attestation. The most recent one was patched just a few weeks ago. These are very much live attacks, and you can bet that more will be forthcoming.

This last part is bad for SVR, because if an attacker can extract even a single copy of one processor’s attestation signing keys, and can compromise a Signal admin’s secrets, they can potentially force Signal to replicate their database onto a simulated SGX enclave that isn’t actually running inside SGX. Once SVR replicated its database to the system, everyone’s secret seed data would be available in plaintext.

But what really scares me is that these attacks I’ve listed above are simply the result of academic exploration of the system. At any given point in the past two years I’ve been able to have a beer with someone like Daniel Genkin of U. Mich or Daniel Gruss of TU Graz, and know that either of these professors (or their teams) is sitting on at least one catastrophic unpatched vulnerability in SGX. These are very smart people. But they are not the only smart people in the world. And there are smart people with more resources out there who would very much like access to backed-up Signal data.

It’s worth pointing out that most of the above attacks are software-only attacks — that is, they assume an attacker who is only able to get logical access to a server. The attacks are so restricted because SGX is not really designed to defend against sophisticated physical attackers, who might attempt to tap the system bus or make direct attempts to unpackage and attach probes to the processor itself. While these attacks are costly and challenging, there are certainly agencies that would have no difficulty executing them.

Finally, I should also mention that the security of the SVR approach assumes Intel is honest. Which, frankly, is probably an assumption we’re already making. So let’s punt on it.

So what’s the big deal?

My major issue with SVR is that it’s something I basically don’t want, and don’t trust. I’m happy with Signal offering it as an option to users, as long as users are allowed to choose not to use it. Unfortunately, up until this week, Signal was not giving users that choice.

More concretely: a few weeks ago Signal began nagging users to create a PIN code. The app itself didn’t really explain well that setting this PIN would start SVR backups. Many people I spoke to said they believed that the PIN was for protecting local storage, or to protect their account from hijacking.

And Signal didn’t just ask for this PIN. It followed a common “dark pattern” born in Silicon Valley of basically forcing users to add the PIN, first by nagging them repeatedly and then ultimately by blocking access to the entire app with a giant modal dialog.

This is bad behavior on its merits, and more critically: it probably doesn’t result in good PIN choices. To make it go away, I chose the simplest PIN that the app would allow me to, which was 9512. I assume many other users simply entered their phone passcodes, which is a nasty security risk all on its own.

Some will say that this is no big deal, since SVR currently protects only users’ contact lists — and those are already stored in cleartext on competing messaging systems. This is, in fact, one of the arguments Moxie has made.

But I don’t buy this. Nobody is going to engineer something as complex as Signal’s SVR just to store contact lists. Once you have a hammer like SVR, you’re going to want to use it to knock down other nails. You’ll find other critical data that users are tired of losing, and you’ll apply SVR to back that data up. Since message content backups are one of the bigger pain points in Signal’s user experience, sooner or later you’ll want to apply SVR to solving that problem too.

In the past, my view was that this would be fine — since Signal would surely give users the ability to opt into or out of message backups. The recent decisions by Signal have shaken my confidence.

Addendum: what does Signal say about this?

Originally this post had a section that summarized a discussion I had with Moxie around this issue. Out of respect for Moxie, I’ve removed some of this at his request because I think it’s more fair to let Moxie address the issue directly without being filtered through me.

So in this rewritten section I simply want to make the point that now (following some discussion on Twitter), there is a workaround to this issue. You can either choose to set a high-entropy passcode such as a BIP39 phrase, and then forget it. This will not screw up your account unless you turn on the “registration lock” feature. Or you can use the new “Disable PIN” advanced feature in Signal’s latest beta, which does essentially the same thing in an automated way. This seems like a good addition, and while I still think there’s a discussion to be had around consent and opt-in, this is a start for now.

Does Zoom use end-to-end encryption?

Does Zoom use end-to-end encryption?

TL;DR: It’s complicated.

Yesterday Zoom (the videoconferencing company, not the defunct telecom) put out a clarification post describing their encryption practices. This is a nice example of a company making necessary technical clarifications during a difficult time, although it comes following widespread criticism the company received over their previous, and frankly slightly misleading, explanation.

Unfortunately, Citizenlab just put out a few of their own results which are based on reverse-engineering the Zoom software. These raise further concerns that Zoom isn’t being 100% clear about how much end-to-end security their service really offers.

This situation leaves Zoom users with a bit of a conundrum: now that everyone in the world is relying on this software for so many critical purposes, should we trust it? In this mostly non-technical post I’m going to talk about what we know, what we don’t know, and why it matters.

End-to-end encryption: the world’s shortest explainer

The controversy around Zoom stems from some misleading marketing material that could have led users to believe that Zoom offers “end-to-end encryption”, or E2E. The basic idea of E2E encryption is that each endpoint — e.g., a Zoom client running on a phone or computer — maintains its own encryption keys, and sends only encrypted data through the service.

In a truly E2E system, the data is encrypted such that the service provider genuinely cannot decrypt it, even if it wants to. This ensures that the service provider can’t read your data, nor can anyone who hacks into the service provider or its cloud services provider, etc. Ideally this would include various national intelligence agencies, which is important in the unlikely event that we start using the system to conduct sensitive government business.

While end-to-end encryption doesn’t necessarily stop all possible attacks, it represents the best path we have to building secure communication systems. It also has a good track record in practice. Videoconferencing like Apple’s FaceTime, and messaging apps like WhatsApp and Signal already use this form of encryption routinely to protect your traffic, and it works.

Zoom: the good news

The great news from the recent Zoom blog post is that, if we take the company at its word, Zoom has already made some progress towards building a genuinely end-to-end encrypted videoconferencing app. Specifically, Zoom claims that:

[I]n a meeting where all of the participants are using Zoom clients, and the meeting is not being recorded, we encrypt all video, audio, screen sharing, and chat content at the sending client, and do not decrypt it at any point before it reaches the receiving clients.

Note that the emphasis is mine. These sections represent important caveats.

Taken at face value, this statement seems like it should calm any fears about Zoom’s security. It indicates that the Zoom client — meaning the actual Zoom software running on a phone or desktop computer — is capable of encrypting audio/video data to other Zoom clients in the conversation, without exposing your sensitive data to Zoom servers. This isn’t a trivial technical problem to solve, so credit to Zoom for doing the engineering work.

Unfortunately the caveats matter quite a bit. And this is basically where the good news ends.

The “unavoidably bad”

The simplest caveats are already present in Zoom’s statement above. Zoom provides a number of value-added services, including “cloud recording” and support for dial-in telephony conferencing. Unfortunately, each of these features is fundamentally incompatible with end-to-end encryption.

Zoom supports these services in a fairly rational way. When those services are active, they provide a series of “endpoints” within their network. These endpoints act like normal Zoom clients, meaning that they participate in your group conversation, and they obtain the keys to decrypt and access the audio/video data: either to record it, or bridge to normal phones.

In theory this isn’t so bad. Even an end-to-end encrypted system can optionally allow these features: a user (e.g., the conference host) could simply send its encryption keys to a Zoom endpoint, allowing it to participate in the call. This would represent a potential loss of security, but at least users would be making the decision themselves.

Unfortunately, in Zoom’s system the decision to share keys may not be entirely left up to the users. And this is where Zoom gets a little scary.

The “pretty-damn-bad”, AKA key management

The real magic in an end-to-end encrypted system is not necessarily the encryption. Rather, it’s the fact that decryption keys never leave the endpoint devices, and are therefore never available to the service provider.

(If you need a stupid analogy here, try this one: availability of keys is like the difference between me when I don’t have access to a cheescake, and me when a cheesecake is sitting in my refrigerator.)

So the question we should all be asking is: does Zoom have access to the decryption keys? On this issue, Zoom’s blog post becomes maddeningly vague:

Zoom currently maintains the key management system for these systems in the cloud. Importantly, Zoom has implemented robust and validated internal controls to prevent unauthorized access to any content that users share during meetings, including – but not limited to – the video, audio, and chat content of those meetings.

In other words: it sounds an awful lot like Zoom has access to decryption keys.

Thankfully we don’t have to wait for Zoom to clarify their answers to this question. Bill Marczak and John Scott-Railton over at CitizenLab have done it for us, by reverse-engineering and taking a close look at the Zoom protocol in operation. (I’ve worked with Bill and his speed at REing things amazes me.)

What they found should make your hair curl:

By default, all participants’ audio and video in a Zoom meeting appears to be encrypted and decrypted with a single AES-128 key shared amongst the participants. The AES key appears to be generated and distributed to the meeting’s participants by Zoom servers  ….

In addition, during multiple test calls in North America, we observed keys for encrypting and decrypting meetings transmitted to servers in Beijing, China.

In short, Zoom clients may be encrypting their connections, but Zoom generates the keys for communication, sometimes overseas, and hands them out to clients. This makes it easy for Zoom to add participants and services (e.g., cloud recording, telephony) to a conversation without any user action.

It also, unfortunately, makes it easy for hackers or a government intelligence agency to obtain access to those keys. This is problematic.

The good news

From the limited information in the Zoom and Citizenlab posts, the good news is that Zoom has already laid much of the groundwork for building a genuinely end-to-end encrypted service. That is, many of the hard problems have already been solved.

(NB: Zoom has some other cryptographic flaws, like using ECB mode encryption, eek, but compared to the key management issues this is a minor traffic violation.)

What Zoom needs now is to very rapidly deploy a new method of agreeing on cryptographic session keys, so that only legitimate participants will have access to them. Fortunately this “group key exchange” problem is relatively easy to solve, and an almost infinite number of papers have been written on the topic.

(The naive solution is simply to obtain the public encryption keys of each participating client, and then have the meeting host encrypt a random AES session key to each one, thus cutting Zoom’s servers out of the loop.)

This won’t be a panacea, of course. Even group key exchange will still suffer from potential attacks if Zoom’s servers are malicious. It will still be necessary to authenticate the identity and public key of different clients who join the system, because a malicious provider, or one compelled by a government, can simply modify public keys or add unauthorized clients to a conversation. (Some Western intelligence agencies have already proposed to do this in practice.) There will be many hard UX problems here, many of which we have not solved even in mature E2E systems.

We’ll also have to make sure the Zoom client software is trustworthy. All the end-to-end encryption in the world won’t save us if there’s a flaw in the endpoint software. And so far Zoom has given us some reasons to be concerned about this.

Still: the perfect is the enemy of the good, and the good news is that Zoom should be able to get better.

Are we being unfair to Zoom?

I want to close by saying that many people are doing the best they can during a very hard time. This includes Zoom’s engineers, who are dealing with an unprecedented surge of users, and somehow managing to keep their service from falling over. They deserve a lot of credit for this. It seems almost unfair to criticize the company over some hypothetical security concerns right now.

But at the end of the day, this stuff is important. The goal here isn’t to score points against Zoom, it’s to make the service more secure. And in the end, that will benefit Zoom as much as it will benefit all of the rest of us.

 

 

Attack of the Week: Group Messaging in WhatsApp and Signal

If you’ve read this blog before, you know that secure messaging is one of my favorite whatsapp-icontopics. However, recently I’ve been a bit disappointed. My sadness comes from the fact that lately these systems have been getting too damned good. That is, I was starting to believe that most of the interesting problems had finally been solved.

If nothing else, today’s post helped disabuse me of that notion.

This result comes from a new paper by Rösler, Mainka and Schwenk from Ruhr-Universität Bochum (affectionately known as “RUB”). The RUB paper paper takes a close look at the problem of group messaging, and finds that while messengers may be doing fine with normal (pairwise) messaging, group messaging is still kind of a hack.

If all you want is the TL;DR, here’s the headline finding: due to flaws in both Signal and WhatsApp (which I single out because I use them), it’s theoretically possible for strangers to add themselves to an encrypted group chat. However, the caveat is that these attacks are extremely difficult to pull off in practice, so nobody needs to panic. But both issues are very avoidable, and tend to undermine the logic of having an end-to-end encryption protocol in the first place. (Wired also has a good article.)

First, some background.

How do end-to-end encryption and group chats work?

In recent years we’ve seen plenty of evidence that centralized messaging servers aren’t a very good place to store confidential information. The good news is: we’re not stuck with them. One of the most promising advances in the area of secure communications has been the recent widespread deployment of end-to-end (e2e) encrypted messaging protocols. 

At a high level, e2e messaging protocols are simple: rather than sending plaintext to a server — where it can be stolen or read — the individual endpoints (typically smartphones) encrypt all of the data using keys that the server doesn’t possess. The server has a much more limited role, moving and storing only meaningless ciphertext. With plenty of caveats, this means a corrupt server shouldn’t be able to eavesdrop on the communications.

In pairwise communications (i.e., Alice communicates with only Bob) this encryption is conducted using a mix of public-key and symmetric key algorithms. One of the most popular mechanisms is the Signal protocol, which is used by Signal and WhatsApp (notable for having 1.3 billion users!) I won’t discuss the details of the Signal protocol here, except to say that it’s complicated, but it works pretty well.

A fly in the ointment is that the standard Signal protocol doesn’t work quite as well for group messaging, primarily because it’s not optimized for broadcasting messages to many users.

To handle that popular case, both WhatsApp and Signal use a small hack. It works like this: each group member generates a single “group key” that this member will use to encrypt all of her messages to everyone else in the group. When a new member joins, everyone who is already in the group needs to send a copy of their group key to the new member (using the normal Signal pairwise encryption protocol). This greatly simplifies the operation of group chats, while ensuring that they’re still end-to-end encrypted.

How do members know when to add a new user to their chat?

Here is where things get problematic.

From a UX perspective, the idea is that only one person actually initiates the adding of a new group member. This person is called the “administrator”. This administrator is the only human being who should actually do anything — yet, her one click must cause some automated action on the part of every other group members’ devices. That is, in response to the administrator’s trigger, all devices in the group chat must send their keys to this new group member.

IMG_1291
Notification messages in WhatsApp.

(In Signal, every group member is an administrator. In WhatsApp it’s just a subset of the members.)

The trigger is implemented using a special kind of message called (unimaginatively) a “group management message”. When I, as an administrator, add Tom to a group, my phone sends a group management message to all the existing group members. This instructs them to send their keys to Tom — and to notify the members visually so that they know Tom is now part of the group. Obviously this should only happen if I really did add Tom, and not if some outsider (like that sneaky bastard Tom himself!) tries to add Tom.

And this is where things get problematic.

Ok, what’s the problem?

According to the RUB paper, both Signal and WhatsApp fail to properly authenticate group management messages.

The upshot is that, at least in theory, this makes it possible for an unauthorized person — not a group administrator, possibly not even a member of the group — to add someone to your group chat.

The issues here are slightly different between Signal and WhatsApp. To paraphrase Tolstoy, every working implementation is alike, but every broken one is broken in its own way. And WhatsApp’s implementation is somewhat worse than Signal. Here I’ll break them down.

Signal. Signal takes a pragmatic (and reasonable) approach to group management. In Signal, every group member is considered an administrator — which means that any member can add a new member. Thus if I’m a member of a group, I can add a new member by sending a group management message to every other member. These messages are sent encrypted via the normal (pairwise) Signal protocol.

The group management message contains the “group ID” (a long, unpredictable number), along with the identity of the person I’m adding. Because messages are sent using the Signal (pairwise) protocol, they should be implicitly authenticated as coming from me — because authenticity is a property that the pairwise Signal protocol already offers. So far, this all sounds pretty good.

The problem that the RUB researchers discovered through testing, is that while the Signal protocol does authenticate that the group management comes from me, it doesn’t actually check that I am a member of the group — and thus authorized to add the new user!

In short, if this finding is correct, it turns out that any random Signal user in the world can you send a message of the form “Add Mallory to the Group 8374294372934722942947”, and (if you happen to belong to that group) your app will go ahead and try to do it.

The good news is that in Signal the attack is very difficult to execute. The reason is that in order to add someone to your group, I need to know the group ID. Since the group ID is a random 128-bit number (and is never revealed to non-group-members or even the server**) that pretty much blocks the attack. The main exception to this is former group members, who already know the group ID — and can now add themselves back to the group with impunity.

(And for the record, while the group ID may block the attack, it really seems like a lucky break — like falling out of a building and landing on a street awning. There’s no reason the app should process group management messages from random strangers.)

So that’s the good news. The bad news is that WhatsApp is a bit worse.

WhatsApp. WhatsApp uses a slightly different approach for its group chat. Unlike Signal, the WhatsApp server plays a significant role in group management, which means that it determines who is an administrator and thus authorized to send group management messages.

Additionally, group management messages are not end-to-end encrypted or signed. They’re sent to and from the WhatsApp server using transport encryption, but not the actual Signal protocol.

When an administrator wishes to add a member to a group, it sends a message to the server identifying the group and the member to add. The server then checks that the user is authorized to administer that group, and (if so), it sends a message to every member of the group indicating that they should add that user.

The flaw here is obvious: since the group management messages are not signed by the administrator, a malicious WhatsApp server can add any user it wants into the group. This means the privacy of your end-to-end encrypted group chat is only guaranteed if you actually trust the WhatsApp server.

This undermines the entire purpose of end-to-end encryption.

But this is silly. Don’t we trust the WhatsApp server? And what about visual notifications?

One perfectly reasonable response is that exploiting this vulnerability requires a compromise of the WhatsApp server (or legal compulsion, perhaps). This seems fairly unlikely.

And yet, the entire point of end-to-end encryption is to remove the server from the trusted computing base. We haven’t entirely achieved this yet, thanks to things like key servers. But we are making progress. This bug is a step back, and it’s one a sophisticated attacker potentially could exploit.

A second obvious objection to these issues is that adding a new group member results in a visual notification to each group member. However, it’s not entirely clear that these messages are very effective. In general they’re relatively easy to miss. So these are meaningful bugs, and things that should be fixed.

How do you fix this?

The great thing about these bugs is that they’re both eminently fixable.

The RUB paper points out some obvious countermeasures. In Signal, just make sure that the group management messages come from a legitimate member of the group. In WhatsApp, make sure that the group management messages are signed by an administrator.*

Obviously fixes like this are a bit complex to roll out, but none of these should be killers.

Is there anything else in the paper?

Oh yes, there’s quite a bit more. But none of it is quite as dramatic. For one thing, it’s possible for attackers to block message acknowledgements in group chats, which means that different group members could potentially see very different versions of the chat. There are also several cases where forward secrecy can be interrupted. There’s also some nice analysis of Threema, if you’re interested.

I need a lesson. What’s the moral of this story?

The biggest lesson is that protocol specifications are never enough. Both WhatsApp and Signal (to an extent) have detailed protocol specifications that talk quite a bit about the cryptography used in their systems. And yet the issues reported in the RUB paper not obvious from reading these summaries. I certainly didn’t know about them.

In practice, these problems were only found through testing.

mallory5
Mallory.

So the main lesson here is: test, test, test. This is a strong argument in favor of open-source applications and frameworks that can interact with private-garden services like Signal and WhatsApp. It lets us see what the systems are getting right and getting wrong.

The second lesson — and a very old one — is that cryptography is only half the battle. There’s no point in building the most secure encryption protocol in the world if someone can simply instruct your client to send your keys to Mallory. The greatest lesson of all time is that real cryptosystems are always broken this way — and almost never through the fancy cryptographic attacks we love to write about.

Notes:

* The challenge here is that since WhatsApp itself determines who the administrators are, this isn’t quite so simple. But at very least you can ensure that someone in the group was responsible for the addition.

** According to the paper, the Signal group IDs are always sent encrypted between group members and are never revealed to the Signal server. Indeed, group chat messages look exactly like pairwise chats, as far as the server is concerned. This means only current or former group members should know the group ID.

Attack of the Week: Apple iMessage

Today’s Washington Post has a story entitled “Johns Hopkins researchers poke a hole in Apple’s encryption“, which describes the results of some research my students and I have been working on over the past few months.

As you might have guessed from the headline, the work concerns Apple, and specifically Apple’s iMessage text messaging protocol. Over the past months my students Christina Garman, Ian Miers, Gabe Kaptchuk and Mike Rushanan and I have been looking closely at the encryption used by iMessage, in order to determine how the system fares against sophisticated attackers. The results of this analysis include some very neat new attacks that allow us to — under very specific circumstances — decrypt the contents of iMessage attachments, such as photos and videos.

The research team. From left: Gabe Kaptchuk, Mike Rushanan, Ian Miers, Christina Garman

Now before I go further, it’s worth noting that the security of a text messaging protocol may not seem like the most important problem in computer security. And under normal circumstances I might agree with you. But today the circumstances are anything but normal: encryption systems like iMessage are at the center of a critical national debate over the role of technology companies in assisting law enforcement.

A particularly unfortunate aspect of this controversy has been the repeated call for U.S. technology companies to add “backdoors” to end-to-end encryption systems such as iMessage. I’ve always felt that one of the most compelling arguments against this approach — an argument I’ve made along with other colleagues — is that we just don’t know how to construct such backdoors securely. But lately I’ve come to believe that this position doesn’t go far enough — in the sense that it is woefully optimistic. The fact of the matter is that forget backdoors: we barely know how to make encryption work at all. If anything, this work makes me much gloomier about the subject.

But enough with the generalities. The TL;DR of our work is this:

Apple iMessage, as implemented in versions of iOS prior to 9.3 and Mac OS X prior to 10.11.4, contains serious flaws in the encryption mechanism that could allow an attacker — who obtains iMessage ciphertexts — to decrypt the payload of certain attachment messages via a slow but remote and silent attack, provided that one sender or recipient device is online. While capturing encrypted messages is difficult in practice on recent iOS devices, thanks to certificate pinning, it could still be conducted by a nation state attacker or a hacker with access to Apple’s servers. You should probably patch now.

For those who want the gory details, I’ll proceed with the rest of this post using the “fun” question and answer format I save for this sort of post.

What is Apple iMessage and why should I care?

Those of you who read this blog will know that I have a particular obsession with Apple iMessage. This isn’t because I’m weirdly obsessed with Apple — although it is a little bit because of that. Mostly it’s because I think iMessage is an important protocol. The text messaging service, which was introduced in 2011, has the distinction of being the first widely-used end-to-end encrypted text messaging system in the world.

To understand the significance of this, it’s worth giving some background. Before iMessage, the vast majority of text messages were sent via SMS or MMS, meaning that they were handled by your cellular provider. Although these messages are technically encrypted, this encryption exists only on the link between your phone and the nearest cellular tower. Once an SMS reaches the tower, it’s decrypted, then stored and delivered without further protection. This means that your most personal messages are vulnerable to theft by telecom employees or sophisticated hackers. Worse, many U.S. carriers still use laughably weak encryption and protocols that are vulnerable to active interception.

So from a security point of view, iMessage was a pretty big deal. In a single stroke, Apple deployed encrypted messaging to millions of users, ensuring (in principle) that even Apple itself couldn’t decrypt their communications. The even greater accomplishment was that most people didn’t even notice this happened — the encryption was handled so transparently that few users are aware of it. And Apple did this at very large scale: today, iMessage handles peak throughput of more than 200,000 encrypted messages per second, with a supported base of nearly one billion devices.

So iMessage is important. But is it any good?

Answering this question has been kind of a hobby of mine for the past couple of years. In the past I’ve written about Apple’s failure to publish the iMessage protocol, and on iMessage’s dependence on a vulnerable centralized key server. Indeed, the use of a centralized key server is still one of iMessage’s biggest weaknesses, since an attacker who controls the keyserver can use it to inject keys and conduct man in the middle attacks on iMessage users.

But while key servers are a risk, attacks on a key server seem fundamentally challenging to implement — since they require the ability to actively manipulate Apple infrastructure without getting caught. Moreover, such attacks are only useful for prospective surveillance. If you fail to substitute a user’s key before they have an interesting conversation, you can’t recover their communications after the fact.A more interesting question is whether iMessage’s encryption is secure enough to stand up against retrospective decryption attacks — that is, attempts to decrypt messages after they have been sent. Conducting such attacks is much more interesting than the naive attacks on iMessage’s key server, since any such attack would require the existence of a fundamental vulnerability in iMessage’s encryption itself. And in 2016 encryption seems like one of those things that we’ve basically figured out how to get right.

Which means, of course, that we probably haven’t.

How does iMessage encryption work?

What we know about the iMessage encryption protocol comes from a previous reverse-engineering effort by a group from Quarkslab, as well as from Apple’s iOS Security Guide. Based on these sources, we arrive at the following (simplified) picture of the basic iMessage encryption scheme:

To encrypt an iMessage, your phone first obtains the RSA public key of the person you’re sending to. It then generates a random AES key k and encrypts the message with that key using CTR mode. Then it encrypts k using the recipient’s RSA key. Finally, it signs the whole mess using the sender’s ECDSA signing key. This prevents tampering along the way.

So what’s missing here?

Well, the most obviously missing element is that iMessage does not use a Message Authentication Code (MAC) or authenticated encryption scheme to prevent tampering with the message. To simulate this functionality, iMessage simply uses an ECDSA signature formulated by the sender. Naively, this would appear to be good enough. Critically, it’s not.

The attack works as follows. Imagine that a clever attacker intercepts the message above and is able to register her own iMessage account. First, the attacker strips off the original ECDSA signature made by the legitimate sender, and replaces it with a signature of her own. Next, she sends the newly signed message to the original recipient using her own account:

The outcome is that the user receives and decrypts a copy of the message, which has now apparently originated from the attacker rather than from the original sender. Ordinarily this would be a pretty mild attack — but there’s a useful wrinkle. In replacing the sender’s signature with one of her own, the attacker has gained a powerful capability. Now she can tamper with the AES ciphertext (red) at will.

Specifically, since in iMessage the AES ciphertext is not protected by a MAC, it is therefore malleable. As long as the attacker signs the resulting message with her key, she can flip any bits in the AES ciphertext she wants — and this will produce a corresponding set of changes when the recipient ultimately decrypts the message. This means that, for example, if the attacker guesses that the message contains the word “cat” at some position, she can flip bits in the ciphertext to change that part of the message to read “dog” — and she can make this change even though she can’t actually read the encrypted message.

Only one more big step to go.

Now further imagine that the recipient’s phone will decrypt the message correctly provided that the underlying plaintext that appears following decryption is correctly formatted. If the plaintext is improperly formatted — for a silly example, our tampering made it say “*7!” instead of “pig” — then on receiving the message, the recipient’s phone might return an error that the attacker can see.

It’s well known that such a configuration capability allows our attacker the ability to learn information about the original message, provided that she can send many “mauled” variants to be decrypted. By mauling the underlying message in specific ways — e.g., attempting to turn “dog” into “pig” and observing whether decryption succeeds — the attacker can gradually learn the contents of the original message. The technique is known as a format oracle, and it’s similar to the padding oracle attack discovered by Vaudenay.

So how exactly does this format oracle work?

The format oracle in iMessage is not a padding oracle. Instead it has to do with the compression that iMessage uses on every message it sends.

You see, prior to encrypting each message payload, iMessage applies a complex formatting that happens to conclude with gzip compression. Gzip is a modestly complex compression scheme that internally identifies repeated strings, applies Huffman coding, then tacks a CRC checksum computed over the original data at the end of the compressed message. It’s this gzip-compressed payload that’s encrypted within the AES portion of an iMessage ciphertext.

It turns out that given the ability to maul a gzip-compressed, encrypted ciphertext, there exists a fairly complicated attack that allows us to gradually recover the contents of the message by mauling the original message thousands of times and sending the modified versions to be decrypted by the target device. The attack turns on our ability to maul the compressed data by flipping bits, then “fix up” the CRC checksum correspondingly so that it reflects the change we hope to see in the uncompressed data. Depending on whether that test succeeds, we can gradually recover the contents of a message — one byte at a time.

While I’m making this sound sort of simple, the truth is it’s not. The message is encoded using Huffman coding, with a dynamic Huffman table we can’t see — since it’s encrypted. This means we need to make laser-specific changes to the ciphertext such that we can predict the effect of those changes on the decrypted message, and we need to do this blind. Worse, iMessage has various countermeasures that make the attack more complex.The complete details of the attack appear in the paper, and they’re pretty eye-glazing, so I won’t repeat them here. In a nutshell, we are able to decrypt a message under the following conditions:

  1. We can obtain a copy of the encrypted message
  2. We can send approximately 2^18 (invisible) encrypted messages to the target device
  3. We can determine whether or not those messages decrypted successfully or not
The first condition can be satisfied by obtaining ciphertexts from a compromise of Apple’s Push Notification Service servers (which are responsible for routing encrypted iMessages) or by intercepting TLS connections using a stolen certificate — something made more difficult due to the addition of certificate pinning in iOS 9. The third element is the one that initially seems the most challenging. After all, when I send an iMessage to your device, there’s no particular reason that your device should send me any sort of response when the message decrypts. And yet this information is fundamental to conducting the attack!

It turns out that there’s a big exception to this rule: attachment messages.

How do attachment messages differ from normal iMessages?

When I include a photo in an iMessage, I don’t actually send you the photograph through the normal iMessage channel. Instead, I first encrypt that photo using a random 256-bit AES key, then I compute a SHA1 hash and upload the encrypted photo to iCloud. What I send you via iMessage is actually just an iCloud.com URL to the encrypted photo, the SHA1 hash, and the decryption key.

Contents of an “attachment” message.

 

When you successfully receive and decrypt an iMessage from some recipient, your Messages client will automatically reach out and attempt to download that photo. It’s this download attempt, which happens only when the phone successfully decrypts an attachment message, that makes it possible for an attacker to know whether or not the decryption has succeeded.

One approach for the attacker to detect this download attempt is to gain access to and control your local network connections. But this seems impractical. A more sophisticated approach is to actually maul the URL within the ciphertext so that rather than pointing to iCloud.com, it points to a related URL such as i8loud.com. Then the attacker can simply register that domain, place a server there and allow the client to reach out to it. This requires no access to the victim’s local network.

By capturing an attachment message, repeatedly mauling it, and monitoring the download attempts made by the victim device, we can gradually recover all of the digits of the encryption key stored within the attachment. Then we simply reach out to iCloud and download the attachment ourselves. And that’s game over. The attack is currently quite slow — it takes more than 70 hours to run — but mostly because our code is slow and not optimized. We believe with more engineering it could be made to run in a fraction of a day.

Result of decrypting the AES key for an attachment. Note that the ? symbol represents a digit we could not recover for various reasons, typically due to string repetitions. We can brute-force the remaining digits.

The need for an online response is why our attack currently works against attachment messages only: those are simply the messages that make the phone do visible things. However, this does not mean the flaw in iMessage encryption is somehow limited to attachments — it could very likely be used against other iMessages, given an appropriate side-channel.

How is Apple fixing this?

Apple’s fixes are twofold. First, starting in iOS 9.0 (and before our work), Apple began deploying aggressive certificate pinning across iOS applications. This doesn’t fix the attack on iMessage crypto, but it does make it much harder for attackers to recover iMessage ciphertexts to decrypt in the first place.

Unfortunately even if this works perfectly, Apple still has access to iMessage ciphertexts. Worse, Apple’s servers will retain these messages for up to 30 days if they are not delivered to one of your devices. A vulnerability in Apple Push Network authentication, or a compromise of these servers could read them all out. This means that pinning is only a mitigation, not a true fix.

As of iOS 9.3, Apple has implemented a short-term mitigation that my student Ian Miers proposed. This relies on the fact that while the AES ciphertext is malleable, the RSA-OAEP portion of the ciphertext is not. The fix maintains a “cache” of recently received RSA ciphertexts and rejects any repeated ciphertexts. In practice, this shuts down our attack — provided the cache is large enough. We believe it probably is.

In the long term, Apple should drop iMessage like a hot rock and move to Signal/Axolotl.

So what does it all mean?

As much as I wish I had more to say, fundamentally, security is just plain hard. Over time we get better at this, but for the foreseeable future we’ll never be ahead. The only outcome I can hope for is that people realize how hard this process is — and stop asking technologists to add unacceptable complexity to systems that already have too much of it.

Let’s talk about iMessage (again)

Yesterday’s New York Times carried a story entitled “Apple and other tech companies tangle with U.S. over data access“. It’s a vague headline that manages to obscure the real thrust of the story, which is that according to reporters at the Times, Apple has not been forced to backdoor their popular encrypted iMessage system. This flies in the face of some rumors to the contrary.

While there’s not much new information in here, people on Twitter seem to have some renewed interest in how iMessage works; whether Apple could backdoor it if they wanted to; and whether the courts could force them to. The answers to those questions are respectively: “very well“, “absolutely“, and “do I look like a national security lawyer?”

So rather than tackle the last one, which nobody seems to know the answer to, I figure it would be informative to talk about the technical issues with iMessage (again). So here we go.

How does iMessage work?

Fundamentally the mantra of iMessage is “keep it simple, stupid”. It’s not really designed to be an encryption system as much as it is a text message system that happens to include encryption. As such, it’s designed to take away most of the painful bits you expect from modern encryption software, and in the process it makes the crypto essentially invisible to the user. Unfortunately, this simplicity comes at some cost to security.

Let’s start with the good: Apple’s marketing material makes it clear that iMessage encryption is “end-to-end” and that decryption keys never leave the device. This claim is bolstered by their public security documentation as well as outside efforts to reverse-engineer the system. In iMessage, messages are encrypted with a combination of 1280-bit RSA public key encryption and 128-bit AES, and signed with ECDSA under a 256-bit NIST curve. It’s honestly kind of ridiculous, but whatever. Let’s call it good enough.

iMessage encryption in a nutshell boils down to this: I get your public key, you get my public key, I can send you messages encrypted to you, and you can be sure that they’re authentic and really came from me. Everyone’s happy.

But here’s the wrinkle: where do those public keys come from?

Where do you get the keys?

Key request to Apple’s server.

It’s this detail that exposes the real weakness of iMessage. To make key distribution ‘simple’, Apple takes responsibility for handing out your friends’ public keys. It does this using a proprietary key server that Apple owns and operates. Your iPhone requests keys from Apple using a connection that’s TLS-encrypted, and employs some fancy cryptographic tokens. But fundamentally, it relies on the assumption that Apple is good, and is really going to give you you the right keys for the person you want to talk to.

But this honesty is just an assumption. Since the key lookup is completely invisible to the user, there’s nothing that forces Apple to be honest. They could, if inspired, give you a public key of their choosing, one that they hold the decryption key for. They could give you the FBI’s key. They could give you Dwayne “The Rock” Johnson’s key, though The Rock would presumably be very non-plussed by this.

Indeed it gets worse. Because iMessage is designed to support several devices attached to the same account, each query to the directory server can bring back many keys — one for each of your devices. An attacker can simply add a device (or a fake ‘ghost device’) to Apple’s key server, and senders will encrypt messages to that key along with the legitimate ones. This enables wiretapping, provided you can get Apple to help you out.

But why do you need Apple to help you out?

As described, this attack doesn’t really require direct collaboration from Apple. In principle, the FBI could just guess the target’s email password, or reset the password and add a new device all on their own. Even with a simple subpoena, Apple might be forced to hand over security questions and/or password hashes.

The real difficulty is caused by a final security feature in iMessage: when you add a new device, or modify the devices attached to your account, Apple’s key server sends a notification to each of the existing devices already to the account. It’s not obvious how this feature is implemented, but one thing is clear — it seems likely that, at least in theory, Apple could shut it off if they needed to.* After all, this all comes down to code in the key server.

Fixing this problem seems hard. You could lock the key server in a giant cage, then throw away the key. But as long as Apple retains the ability to update their key server software, solving this problem seems fundamentally challenging. (Though not impossible — I’ll come back to this in a moment.)

Can governments force Apple to modify their key server?

It’s not clear. While it seems pretty obvious that Apple could in theory substitute keys and thus enable eavesdropping, in practice it may require substantial changes to Apple’s code. And while there are a few well-known cases in which the government has forced companies to turn over keys, changing the operation of a working system is a whole different ball of wax.

And iMessage is not just any working system. According to Apple, it handles several billion messages every day, and is fundamental to the operation of millions of iPhones. When you have a deployed system at that scale, the last thing you want to do is mess with it — particularly if it involves crypto code that may not even be well understood by its creators. There’s no amount of money you could pay me to be ‘the guy who broke iMessage’, even for an hour.

Any way you slice it, it’s a risky operation. But for a real answer, you’ll have to talk to a lawyer.

Why isn’t key substitution a good solution to the ‘escrow’ debate?

Another perspective on iMessage — one I’ve heard from some attorney friends — is that key server tampering sounds like a pretty good compromise solution to the problem of creating a ‘secure golden key‘ (AKA giving governments access to plaintext).

This view holds that key substitution allows only proactive eavesdropping: the government has to show up with a warrant before they can eavesdrop on a customer. They can’t spy on everyone, and they can’t go back and read your emails from last month. At the same time, most customers still get true ‘end to end’ encryption.

I see two problems with this view. First, tampering with the key server fundamentally betrays user trust, and undermines most of the guarantees offered by iMessage. Apple claims that they offer true end-to-end encryption that they can’t read — and that’s reasonable in the threat model they’ve defined for themselves. The minute they start selectively substituting keys, that theory goes out the window. If you can substitute a few keys, why not all of them? In this world, Apple should expect requests from every Tom, Dick and Harry who wants access to plaintext, ranging from divorce lawyers to foreign governments.

A snapshot of my seven (!) currently enrolled iMessage
devices, courtesy Frederic Jacobs.

The second, more technical problem is that key substitution is relatively easy to detect. While Apple’s protocols are an obfuscated mess, it is at least in theory possible for users to reverse-engineer them to view the raw public keys being transmitted — and thus determine whether the key server is being honest. While most criminals are not this sophisticated, a few are. And if they aren’t sophisticated, then tools can be built to make this relatively easy. (Indeed, people have already built such tools — see my key registration profile at right.)

Thus key substitution represents at most a temporary solution to the ‘government access’ problem, and one that’s fraught with peril for law enforcement, and probably disastrous for the corporations involved. It might seem tempting to head down this rabbit hole, but it’s rabbits all the way down.

What can providers do to prevent key substitution attacks?

Signal’s “key fingerprint” screen.


From a technical point of view, there are a number of things that providers can do to harden their key servers. One is to expose ‘key fingerprints’ to users who care, which would allow them to manually compare the keys they receive with the keys actually registered by other users. This approach is used by OpenWhisperSystems’ Signal, as well as PGP. But even I acknowledge that this kind of stinks.

A more user-friendly approach is to deploy a variant of Certificate Transparency, which requires providers to publish a publicly verifiable proof that every public key they hand out is being transmitted to the whole world. This allows each client to check that the server is handing out the actual keys they registered — and by implication, that every other user is seeing the same thing.

The most complete published variant of this is called CONIKS, and it was proposed by a group at Princeton, Stanford and the EFF (one of the more notable authors is Ed Felten, now Deputy U.S. Chief Technology Officer). CONIKS combined key transparency with a ‘verification protocol’ that allows clients to ensure that they aren’t being sidelined and fed false information.

CONIKS isn’t necessarily the only game in town when it comes to preventing key substitution attacks, but it represents a powerful existence proof that real defenses can be mounted. Even though Apple hasn’t chosen to implement CONIKS, the fact that it’s out there should be a strong disincentive for law enforcement to rely heavily on this approach.

So what next?

That’s the real question. If we believe the New York Times, all is well — for the moment. But not for the future. In the long term, law enforcement continues to ask for an approach that allows them to access the plaintext of encrypted messages. And Silicon Valley continues to find new ways to protect the confidentiality of their user’s data, against a range of threats beginning in Washington and proceeding well beyond.

How this will pan out is anyone’s guess. All we can say is that it will be messy.

Notes:

* How they would do this is really a question for Apple. The feature may involve the key server sending an explicit push message to each of the devices, in which case it would be easy to turn this off. Alternatively, the devices may periodically retrieve their own keys to see what Apple’s server is sending out to the world, and alert the user when they see a new one. In the latter case, Apple could selectively transmit a doctored version of the key list to the device owner.