The article on Bulk CNE showed how exploitation of client devices shreds the security of current top-of-the-line end-to-end encryption tools. While it’s not the job of secure messaging client to patch vulnerabilities in host OS, it is required that they are functional in a secure configuration. What would that look like? In the case of current E2EE software, the Trusted Computing Base (TCB) is located on the networked device. This makes it easy for state-sponsored malware to exfiltrate keys and plaintexts.
It might seem like an impossible task to solve; The software always has vulnerabilities and because modern client is connected to Internet constantly, the window of exposure to exfiltrate keys remains open. TextSecure does help mitigate against the threat of key exfiltration with it’s DH ratcheting. The second the MITM attack ends, the attacker needs to re-exfiltrate all keys for the next MITM attack to succeed.
But we can do better than that. Google presented Project Vault in June. It’s a smart card in the shape of microSD card, that is able to store private keys and encrypt data inside it’s secure cryptoprocessor. This is a great improvement, in the sense it guarantees that despite end point compromise, keys remain secure and encryption is done properly (assuming smart card has no backdoors). However, in the case of instant messaging (IM), it’s not enough against endpoint compromise.
As you can see, the sensitive plaintext messages are passed to Project Vault through the insecure host OS. Additionally, all replies from contacts that the TCB decrypts are displayed via the host OS. While smart cards have many use cases, this does not seem like a viable one. We need an environment where the keyboard and display connect directly to TCB. So what should we do? Let’s quote two cryptographers:
Each of the [reviewed] apps seem quite good, cryptographically speaking. But that’s not the problem. The real issue is that they each run on a vulnerable, networked platform. If I really had to trust my life to a piece of software, I would probably use something much less flashy — GnuPG, maybe, running on an isolated computer locked in a basement.
Assume that while your computer can be compromised, it would take work and risk on the part of the NSA — so it probably isn’t. If you have something really important, use an air gap. Since I started working with the Snowden documents, I bought a new computer that has never been connected to the Internet. If I want to transfer a file, I encrypt the file on the secure computer and walk it over to my Internet computer, using a USB stick. To decrypt something, I reverse the process. This might not be bulletproof, but it’s pretty good.
This approach would use two computers instead of one. In essence it looks like this: However, this system only works, if the HSA is unable to produce malware that can spread on USB drives. This feature isn’t very hard for the HSA to do compared to exploit research. Let’s go through the attack step-by-step.
- We encrypt a message using airgapped computer that functions as the TCB. We move the ciphertext from the airgapped PC to networked PC using a never-before used USB-drive.
- We send the ciphertext to our contact from the networked PC. We then copy the reply from our contact to USB drive and move the reply back to airgapped PC for decryption. Since the networked computer is infected, the infection spreads to airgapped PC via that USB-drive.
- When we write our second message, all encryption keys are transmitted by malware inside the USB-drive to the infected, networked computer, that will then exfiltrate those keys back to adversary. Game over.
- Now, while this configuration is not secure, it does show us an important thing. Before we transferred the reply to airgapped PC, our TCB was secure. We can send as many messages we want using new a USB-drive every time, throwing the used ones in the shredder. (Don’t worry about the costs, we wont be using USB-drives after this article). If we stop sending messages, we can also receive as many messages to the airgapped device, using a new USB-drive every time. The keys are still in our possession. The compromise of keys/plaintexts happens only after we send a message after decrypting one or more messages. See where I’m getting with this? If we split the two secure processes to two airgapped computers — TCBs with dedicated purpose to either encrypt or decrypt messages, we can repeat the two first steps in isolation, forever.
This does in fact work. The lower (grey) TCB stays clean when it only outputs data. Clean system does not output keys on it’s own. The upper (red) TCB is compromised, but keys and decrypted plaintexts stay inside endpoint, because the device only receives data.
Now, let’s remove the $4 cost per message. Douglas W. Jones and Tom C. Bowersox wrote a fantastic paper on RS232 data diodes. Data diode is a device, that uses laws of physics to enforce direction of data flow in asynchronous data channels. This approach is so secure, that the commercial models have received EAL7+ (best possible) Common Criteria certification. The cost of these devices however, is nowhere near suitable for end-users. Here’s how we can construct a sub-$10 data diode for RS232 (serial port):
The transmitting side has two LEDs connected in parallel with opposite polarities. The receiving side has two phototransistors that regenerate the signal by outputting power from 6V batteries (opposite polarities in relation to Rx and GND), when the corresponding phototransistor starts receiving light. This optical gap is guaranteed one way, because while the LEDs do generate a very low amount of current when light is cast upon them, phototransistor do not emit light.
When the data diode is used to replace the USB drives in the three-computer setup, here’s how how the final assembly looks like:
Messages are written to TxM (transmission module) and they are received by the RxM (receiver module) devices of both users. NH (network handler) acts as a converter between serial ports and network. The data diodes provide high assurance protection against exfiltration of sensitive data. TxM and RxM don’t have to be commercial HSM devices, a netbook should do fine for most situations, provided that data diode is the only connection to outside world: WLAN, Bluetooth interfaces must be removed, together with speakers, webcam and microphones. Batteries should be charged only when the device is powered off. This approach sets one time price-tag to end-point security.
Now I should immediately discuss the three vulnerabilities in this approach.
Firstly, if the TxM is compromised during setup, the malware can exfiltrate keys. However, this kind of compromise can be confirmed to some extent. As the TxM never knows what’s on the reception side of the data diode, the receiving end can be plugged into spectrum analyzers. These devices can see hidden signals, because no information is missed as the displayed output is the result of FFT calculations. Even if this is not done, compared to continuous window of exposure of other E2EE systems, a ~10 minute window during TxM setup is ground-breaking improvement.
Secondly, while the RxM has no window of exposure to exfiltrate data, the window of exposure to exploit RxM remains open. Thus, the malware on RxM can show arbitrary messages at any time. However, because there is no way for attacker to find out what messages users actually send each other, any displayed message is highly likely to be out of context (unless the malware has a sophisticated AI algorithm). Additionally, users can compare the log files of their RxMs to detect if they include messages that the other participant has not typed.
Thirdly, the endpoint is only as secure as the physical environment around it. Covert microphones, keyloggers and video cameras bypass the encryption completely. Physical compromise of TxM also compromises the security. However, these can be claimed to be actual targeted attacks. You can’t copy-paste human beings to spy on every user individually, the way you can copy-paste state-sponsored malware that has more or less ubiquitous access.
There is however one passive, remote attack that has been public knowledge since 1985:
The average consumer is unable to provide high assurance EMSEC to their end points against TEMPEST monitoring. Briefly explained, all keyboard and display cables act as weak antennas when data is passed through them. By collecting these signals using sophisitcated equipment, the attacker is able to log keystrokes and view screen content. An active version of this attack, done by illuminating retro-reflector implants, grows the range from “across the street” to more than 10 km. As far as I know, TEMPEST still requires a tasked team, and even if it could be done with something like SIGINT drones, there would be no way to avoid linearly increasing cost when scaling up surveillance. Currently, such an attack would be too expensive. The day it isn’t, you’ll know:
The physical attacks are the proper balance between privacy and security. As long as the privacy community keeps arguing that “unscalable” exploits are functional alternative to not backdooring proprietary software and services, we will submit ourselves to false dilemma on LEAs’ terms: “Let’s stop doing mass surveillance that hurts company reputation and switch to mass surveillance where companies can have plausiable deniability.” Unless we start communicating with high-assurance setups that are secure against mass-scale end point exploitation, neither outcome in the debate provides a real solution to stop mass surveillance.
That being said, let’s discuss the one last issue.
We can’t trust the possibly compromised RxM generating private DH-values / shared secrets from received public DH values (the generated value might have been sent in by the HSA). If we want to do DH key exchange with three computers we must
- Generate private DH-value on TxM and move it with one-time USB-drive or through additional data diode directly to RxM.
- Type the very long DH public value from RxM to TxM by hand (we can’t have automated input to TxM at any point after setup).
- We must then authenticate the integrity of received public values, preferrably with face-to-face meeting (as discussed in article on Axolotl).
- Finally, once the TxM has generated the shared secret, we must again either by using a one-time USB-drive, or through additional data diode, move the shared secret directly to RxM. After this, KDF can generate the two symmetric keys from the shared secret.
This is very inconvenient unless the participants live across the globe and physical meeting would consume even more of their resources.
What can be done instead, is generate the symmetric encryption keys on TxM, and move them directly to the two RxM devices, either with a USB-drive or a data diode. The latter is more secure as sanitation of USB-drives is much less burdensome, but it requires a lot more hassle during key-exchange rendezvous.
Let’s discuss the misconceptions pre-shared keys have:
“Physical key exchange is too inconvenient”
Physical key exchange is inconvenient, but it’s the highest assurance mehtod to provide integrity there is. Even if you were using Axolotl (TextSecure/Signal), you would have less assurance when verifyig fingerprints over the phone. You should always compare fingerprints face to face. For this TextSecure provides a convenient QR-code fingerprint verification feature. In my article on Axolotl, I made a proposition that would speed up the current QR-code fingerprint verification in TextSecure by three-fold. Compared to my proposal, exchange of USB-drives is four times faster (copying the keyfile and hammering the USB-drive takes some time too of course).
We can have forward secrecy by passing the encryption key through PRF after each message. The ciphertext just needs to include information about how many iterations has the initial key been run through the PRF. We don’t even have to worry about keys getting out of sync if some packets are not received. The only problem is, if packets are not received in order, any packet arriving behind a more recent one becomes undecryptable (unless old keys are not immediately overwritten).
Actually very few modern cryptographic properties are lost with the three computer setup.
Since messages are authenticated with symmetric MACs, we have deniability (the recipient also has the key that can sign messages).
Lack of DH ratcheting does take away the self-healing properties that Axolotl has. But since there is no remote zero-day exploit that is able to exfiltrate keys, it’s unlikely this feature will be needed. Self healing might not even do the trick as in the case of TextSecure, compromised TCB might generate insecure keys or covertly exfiltrate keys and/or plaintext messages.
“But who’s going to write the program that supports this type of hardware layout?“
I already did.