Axolotl and the implementations
Here’s how Axolotl is implemented in WhatsApp and TextSecure
NOTE: The colors are used to simplify the ECDHE down to this color-mixture example, as math would take too much room.
Axolotl is an outstanding protocol that uses ratcheted Diffie-Hellman key exchanges. Instead of long term singing key, Axolotl initiates key exchange with three ECDH operations, one of which is done with long term DH private value (Identity key).
This way Axolotl provides forward secrecy while removing the need to advertise public DH values. Messages are encrypted with message keys derived from chain keys, that derive from constantly changing root key. Unlike OTR, same symmetric key is not used until the next DH handshake completes. Instead, the chain key is run through HMAC-SHA256 to ensure forward secrecy for each message separately. The recipient is able to regenerate all the message keys through cyclic hashing of current chain key, to decrypt any messages sent while their client was not replying.
The root key is renewed with cyclig hashing through HMAC-SHA256 and HKDF, together with entropy obtained from constant generation of DH shared secrets, so the users are able to retrace the trust of current root key back to initial key exchange.
The two implementations look almost completely identical. Can you spot the difference?
WhatsApp does it wrong
There is no fingerprint generation in WhatsApp. What this means is, WhatsApp provides almost as bad security as iMessage. Assuming WhatsApp did not make any other changes to Axolotl in their proprietary source code, an undetectable man-in-the-middle attack can be generated either from the backbone of Internet by HSA, or from within the WhatsApp server (compromised either with malware or NSL). Whether Axolotl uses additional TLS to protect DH-values between the client and server does not matter; PKI provides no meaningful protection against HSAs.
In iMessage, another public key can be added, or the current public key can be switched at any time, without the user noticing. In the case of WhatsApp, the MITM can only be done during the initial handshake (assuming the proprietary code does indeed pass the previous root key to cyclic hashing process).
Here’s how the MITM against WhatsApp works (only the key exchange part is shown):
The missing fingerprint feature makes WhatsApp completelty insecure. End-to-end encryption is defined usually as “user is in control of the encryption keys”. In the case of public key cryptography, it’s more complex. People seem to think it’s enough that they are in control of secrecy of their private decryption and signing keys.
What they don’t realize, is they must also be in control of integrity of all public encryption and signing keys of their peers. If this is not the case, i.e. if the user can not be sure he or she is encrypting with the correct key, the decryption key may not belong to their contact, but to a man in the middle.
If a secure messaging app prevents the user from verifying the public keys, the claim of the software being end-to-end encrypted is unarguably snake oil. The whole point of end-to-end encryption is to make it impossible for the server and any active MITM attacker to access messages. Companies will want to avoid legal issues so keep on eye for carefully worded comments such as “our company is not in possession of private keys of the users“.
TextSecure does it right
In the case of TextSecure, the users have a way to ensure they have actually received the public DH-value of their contact, by checking each other’s fingerprints:
But what about convenience? Every once in a while I come across a comment like
People use [fingerprintless products] to make their lives a little bit easier, not harder by checking some set of numbers they don’t even know what it’s for.
This style of argumenting is just horrible. The decentralized security of E2EE absolutely depends on users verifying indentities. This is an inherent problem in cryptography and no application can fix it. Having a trusted third party who manages keys for you takes the end-to-end encryption factor from that software. This is just an example of a larger phenomenon:
Security is a process, not a product. Products provide some protection, but the only way to effectively do business in an insecure world is to put processes in place that recognize the inherent insecurity in the products.
To be frank, having the tiny bit of convenience from not bothering to check keys, can result in very inconvenient jail time if unjust laws and/or paranoid nation states are able to access your communication. This is especially scary considering parallel contruction: you don’t have to be engaged in serious criminal activities for this type of surveillance to occur.
So, saying fingerprint checking is inconvenient is not only short-sighted, it’s also impolite: Just because you don’t mind sharing your private life and giving HSAs leverage that possibly prevents you from changing the world, doesn’t mean the contact doesn’t either.
TextSecure already makes comparing fingerprints a breeze with QR-codes if the users have Barcode Scanner installed. Let’s take a closer look:
Currently, opening the first function (QR code or the scanner) from chat takes four taps. Then users have to perform first scan. Then users make three more taps to move to the opposite function, perform another scan, and then they do two more taps to get back to chat.
Here’s how OWS could significantly speed up the process even further:
Initially, TextSecure should come bundled with barcode scanner. The software already defines the Alice and Bob -roles for users by comparing which one has larger base key. The “End secure session” must be moved into the triple-dot menu at the top-right corner.
- Pressing the lock icon opens the barcode scanner for Bob, and QR-code for Alice.
- Scanning both fingerprints from same QR-code allows Bob to confirm both that
- Bob has received correct DH identity key from Alice
- Alice has received the correct DH identity key from Bob
- Once the key has been scanned, the scanning device will report either with “Identities verified” or “MITM risk“. The notification and QR code is closed with the second tap.
Here’s a mock up of how it would look
This is less inconvenient than sending a picture from the phone. Currently the fastest QR-scan I managed to do with a friend took around 30 seconds. With my proposal, the process would appear to take way less than ten seconds.
Remote fingerprint verification
You have limited assurance when using the phone: HSAs have had 16 years to improve on this. If you’re a target, you might not be secure against such active voice-morphing attack.
Sometimes people send me fingerprints through TextSecure. This is a bad idea. Here’s how the MITM can trivially remove all assurance from this practice:
While you can agree on obscure tactics on how to stegano-graphically hide pieces of the fingerprint into messages, you need yet another MITM-free channel to do this. So instead of agreeing on these practices face-to-face, just exchange the fingerprint.