An Analysis of Cloudflare's Email Address Obfuscation

tl;dr: It’s a hex encoded string where the first byte (the key), is XORed against each subsequent byte to decrypt the email address. This is not a vulnerability.

Cloudflare provides a feature that obfuscates email addresses in order to protect them from spam bots. I have it enabled because that’s a pretty solid premise and it sounds useful enough.

screenshot of scrape shield option in Cloudflare admin panel

It dynamically modifies markup, and adds its own scripts to aid in deobfuscating email addresses to display to the user:

<a href="mailto:[email protected]">contact</a>

Turns into:

<a href="/cdn-cgi/l/email-protection#6e040b1d1d0b040b1d1d0b5f5c5d2e09030f0702400d0103">contact</a>
<script data-cfasync="false" src="/cdn-cgi/scripts/f2bf09f8/cloudflare-static/email-decode.min.js"></script>

A copy of email-decode.min.js is available here. All of my findings are a result of reverse engineering that script, and you can find my prettified version here.

Obfuscation Strategy

The part of the injected URL after the # encodes the email address. For reference, here it is again:

6e040b1d1d0b040b1d1d0b5f5c5d2e09030f0702400d0103

It is a hex encoded series of bytes of variable length, depending on the length of the email address.

The first byte, in this case 6e (remember that two hex digits make one byte!), is a randomly (?) chosen key used to encrypt and decrypt the remaining bytes by bitwise XORing the key with each subsequent byte. For example, 0x6e ^ 0x04 is decimal 106 which is the ASCII code for j, the first character of my email address.

What it does next is actually quite interesting, and allows the function to properly support Unicode codepoints (which can be 1-4 bytes large) despite the decryption operating on the per-byte level.

Consider the following character: 丂

Its made of three bytes, E4 B8 82, which are ä, ¸, and U+0082, respectively. However, naively concatenating the String.fromCharCode() representations of each byte results in the mess you’d expect:

three separate unicode characters

Cloudflare’s function then uses escape() on the resulting string, which percent-encodes the string’s bytes.

%E4%B8%82

After that, it decodes the string again using decodeURIComponent(), which handles unicode in a way we’d expect.

Code

Here is a javascript function that decrypts the email address, given the hex string:

function hex_at(str, index) {
  var r = str.substr(index, 2);
  return parseInt(r, 16);
}
function decrypt(ciphertext) {
  var output = "";
  var key = hex_at(ciphertext, 0);
  for(var i = 2; i < ciphertext.length; i += 2) {
    var plaintext = hex_at(ciphertext, i) ^ key;
    output += String.fromCharCode(plaintext);
  }
  output = decodeURIComponent(escape(output));
  return output;
}
> decrypt("6e040b1d1d0b040b1d1d0b5f5c5d2e09030f0702400d0103")
'[email protected]'

You might have noticed that this encryption strategy is super weak. Storing the key right next to the ciphertext is barely better than just sending the email address in plaintext, and a single byte XOR is trivial to detect and brute force—in fact, it’s the third exercise of the excellent Cryptopals challenge.

Indeed, the encoding method isn’t designed to securely encrypt email addresses: while cryptographically weak, it’s enough to throw off the basic scripts that hunt for mailto: links. One Cloudflare security engineer wrote:

The scrape shield is designed to prevent low-level bots from crawling web pages for contact information. Although it is possible to reveal email addresses due to weak encryption, we do not consider this to be a significant issue. The feature is meant to obfuscate email addresses; not completely enforce their confidentiality. As the alternative would be to not use the scrape shield and display the emails in plaintext, we are of the opinion that this feature does not introduce a vulnerability.

Prior art

It turns out that many people have done this sort of thing before: