An Analysis of Cloudflare's Email Address Obfuscation
Cloudflare provides a feature that obfuscates email addresses in order to protect them from spam bots. I have it enabled because that’s a pretty solid premise and it sounds useful enough.
It dynamically modifies markup, and adds its own scripts to aid in deobfuscating email addresses to display to the user:
<a href="mailto:[email protected]">contact</a>
Turns into:
<a href="/cdn-cgi/l/email-protection#6e040b1d1d0b040b1d1d0b5f5c5d2e09030f0702400d0103">contact</a>
<script data-cfasync="false" src="/cdn-cgi/scripts/f2bf09f8/cloudflare-static/email-decode.min.js"></script>
A copy of
email-decode.min.js
is available here. All of my findings are a result of reverse engineering that script, and you can find my prettified version here.
Obfuscation Strategy
The part of the injected URL after the #
encodes the email address. For reference, here it is again:
6e040b1d1d0b040b1d1d0b5f5c5d2e09030f0702400d0103
It is a hex encoded series of bytes of variable length, depending on the length of the email address.
The first byte, in this case 6e
(remember that two hex digits make one byte!), is a randomly (?) chosen key used to encrypt and decrypt the remaining bytes by bitwise XORing the key with each subsequent byte. For example, 0x6e ^ 0x04
is decimal 106
which is the ASCII code for j
, the first character of my email address.
What it does next is actually quite interesting, and allows the function to properly support Unicode codepoints (which can be 1-4 bytes large) despite the decryption operating on the per-byte level.
Consider the following character: 丂
Its made of three bytes, E4 B8 82
, which are ä
, ¸
, and U+0082
, respectively. However, naively concatenating the String.fromCharCode()
representations of each byte results in the mess you’d expect:
Cloudflare’s function then uses escape()
on the resulting string, which percent-encodes the string’s bytes.
%E4%B8%82
After that, it decodes the string again using decodeURIComponent()
, which handles unicode in a way we’d expect.
丂
Code
Here is a javascript function that decrypts the email address, given the hex string:
function hex_at(str, index) {
var r = str.substr(index, 2);
return parseInt(r, 16);
}
function decrypt(ciphertext) {
var output = "";
var key = hex_at(ciphertext, 0);
for(var i = 2; i < ciphertext.length; i += 2) {
var plaintext = hex_at(ciphertext, i) ^ key;
output += String.fromCharCode(plaintext);
}
output = decodeURIComponent(escape(output));
return output;
}
> decrypt("6e040b1d1d0b040b1d1d0b5f5c5d2e09030f0702400d0103")
'[email protected]'
You might have noticed that this encryption strategy is super weak. Storing the key right next to the ciphertext is barely better than just sending the email address in plaintext, and a single byte XOR is trivial to detect and brute force—in fact, it’s the third exercise of the excellent Cryptopals challenge.
Indeed, the encoding method isn’t designed to securely encrypt email addresses: while cryptographically weak, it’s enough to throw off the basic scripts that hunt for mailto:
links. One Cloudflare security engineer wrote:
The scrape shield is designed to prevent low-level bots from crawling web pages for contact information. Although it is possible to reveal email addresses due to weak encryption, we do not consider this to be a significant issue. The feature is meant to obfuscate email addresses; not completely enforce their confidentiality. As the alternative would be to not use the scrape shield and display the emails in plaintext, we are of the opinion that this feature does not introduce a vulnerability.
Prior art
It turns out that many people have done this sort of thing before:
- Raddle user sudo posted this C++ program
- Usama Ejaz wrote deobfuscation routines in six (!) languages
- SaltyCrane used a JS interpreter to decode email addresses
- /u/ck3k wrote a routine in Crystal Lang earlier this year for Nettis. Very cool!
- /u/nullableVoidPtr did this in a Python script