Even if You Don’t Invent Your Own Crypto….It’s Still Hard

By: Patrick Toomey

So, yet another crypto related post.  Often times crypto related attacks fall on deaf ears, as for most people crypto is like magic (include me in that group every now and again), and it becomes difficult for the average developer to differentiate the risks associated with the attacks circulated within the security community.  There is a vast chasm between the crypto community and just about everyone else.  The cryptographers will talk about differential cryptanalysis, meet in the middle attacks, perfect forward secrecy, etc, etc.  Often, the cryptographers’ goals are different than your average developer.  Cryptographers are trying to achieve perfection.  For them, any flaw in a cryptographic protocol is something to be discussed, researched, and fixed.  As is a famous saying in the crypto community, “Attacks only get better”.  So, even the smallest flaw is worth their time and effort, as it may lead to a practical crack that will warrant more immediate attention.

Unlike cryptographers, for the vast majority of developers, the goal is “good enough” security.  Often times the result is not good enough, but the goal is admirable, as it doesn’t make sense to over invest in security.  As a result, there is a chasm between “good enough” for cryptographers and “good enough” for developers.  In this chasm are the real world attacks that are too boring for your average cryptographer to discuss, yet too subtle as to be obvious to your average developer.  Many of these attacks have nothing to do with the cryptography itself.  Instead, many real world crypto attacks have more to do with how developers piece together proven crypto building blocks insecurely.  It was on a recent assessment that I came upon the following story that perfectly illustrates how cryptography is difficult to get right, even when you think you have made an informed decision.

On a recent assessment a colleague of mine came across a typical SQL injection attack.  In fact, it was just the kind of SQL injection you hope for as an attacker, as it returned data to the user in a nice tabular output format.  After a bit of mucking around we managed to find where the system was storing credit card information for its users.  A few minutes later we had output that looked like the following:

Base64 Credit Card Numbers

Base64 Credit Card Numbers

Hmmm, base64 encoded credit card numbers.  I wonder what they look like when we base64 decode them.  A simple base64 decode resulted in the following hex values:

Base64 Decoded Credit Card Numbers

Base64 Decoded Credit Card Numbers

If you start trying to translate the above hex values to their ASCII equivalent you will quickly find that the values do not represent plaintext ASCII credit card numbers.  Our first guess was that they were likely encrypted.  So, step two is trying to figure out how they were encrypted.  We took a look at the base64 decoded values and observed a simple fact: the vast majority of them were either 15 or 16 bytes long.  Well, a 16 byte long ciphertext could be any 128 bit symmetric algorithm in simple ECB mode.  But, the 15 byte entries were evidence against that, as they would get padded out to 16 bytes due to the requirement that symmetric algorithms encrypt a full block (16 bytes in the case of AES).  So, my first thought was the use of a stream cipher.  In their most basic form, a stream cipher simply takes a secret value as input (the key), and begins to generate a pseudorandom output stream of bytes.  Encryption simply involves XORing the stream of random bytes with your plaintext.  Unlike a block cipher, there is no need to pad out your plaintext, as you only use as many pseudorandom bytes as is necessary to XOR with your plaintext.  Decryption is also simple, as XORing is a trivially invertible operation.  The user simply feeds the algorithm the same key, generates the same pseudorandom byte stream, and XORs that with the ciphertext.

To see why this works let’s take a look at how this works for a single plaintext/ciphertext.  Let us say we wish to encrypt the credit card number “4012888888881881”. The ASCII values in hex result in a plaintext array of the following values:

Plaintext Credit Card Number ASCII in Hex

Plaintext Credit Card Number ASCII in Hex

Also, let us assume that we are using a stream cipher with a strong key whose first 16 output bytes are:

Pseudorandom Byte Stream in Hex

Pseudorandom Byte Stream in Hex

When we XOR the plaintext with the pseudorandom stream we obtain the ciphertext.

Resulting Ciphertext

Resulting Ciphertext

To decrypt we simply take the ciphertext and perform the same XOR operation.  Let’s quickly take our ciphertext and see how we decrypt it.

Decrypted Ciphertext

Decrypted Ciphertext

When we XOR the ciphertext with the stream we obtain the plaintext we started with.  Through the whole encryption/decryption process we have effectively XORed the plaintext with the stream twice.  XORing anything with the same value twice is effectively a NOP on the original value.  Pictorially:

Full Encryption Decryption Cycle

Full Encryption Decryption Cycle

However, our examples so far have assumed knowledge of the pseudorandom stream.  Without knowledge of the key it is difficult to reproduce the pseudorandom stream, and thus very difficult to perform the steps we just outlined to obtain the plaintext for a given ciphertext.  Moreover, we don’t even know what algorithm is in use here.  Did the developers use RC4?  Did they try to make their own pseudorandom generator by repetitively hashing a secret key?  We didn’t really know, but  regardless of the algorithm, there is inherent risk with leveraging a stream cipher in its most basic form.  If the stream cipher uses a shared key, the pseudorandom stream will be identical every time it is initialized.  To test this conjecture we used several test user accounts  to register several credit cards.  Sure enough, identical credit card numbers resulted in identical ciphertext.  So, what can we do with this?  We have no idea what algorithm was used, and we have no idea of what the value of the secret key is.  However, it turns out that we don’t need any of that information.  To see why let’s first take a look at the hex of two encrypted credit cards we base64 decoded above.

First Encrypted Credit Card

First Encrypted Credit Card

The above shows the 16 ciphertext bytes for one of the credit card numbers in the database.

Second Encrypted Credit Card

Second Encrypted Credit Card

The above shows the 16 ciphertext bytes for a second credit card number in the database.

Let’s assume that the ciphertext from the first encrypted credit card was ours, where we know the plaintext ASCII hex values for the card.

Plaintext Bytes for the First Credit Card

Plaintext Bytes for the First Credit Card

Let’s also assume that the second credit card represents a credit card of unknown origin (i.e.  all we can observe is the ciphertext.  We have no ideas what the digits of the card are).  Similarly, without knowledge of the key, we have no idea how to generate the pseudorandom byte stream used to encrypt the cards.  But, what happens if we simply XOR these two ciphertexts together.

XOR of the Two Encrypted Credit Cards

XOR of the Two Encrypted Credit Cards

Well, that doesn’t really look like it got us anywhere, as we still don’t have anything that looks like an ASCII credit card number.  However, let’s take a look at what actually happened when we did the above operation.  We are confident that both credit card numbers were encrypted with the same pseudorandom byte stream (remember that encryption of the same credit card under different users resulted in the same ciphertext).  Furthermore, we know that the first credit card number is “4012888888881881”, as that is our own credit card number.  The key insight is that we have effectively XORed four different byte sequences together.  We have XORed the plaintext bytes of our credit card, the plaintext bytes of an unknown credit card, and the same pseudorandom sequence twice.

Encrypted Credit Card XOR Expanded

Encrypted Credit Card XOR Expanded

In the above picture the unknown plaintext credit card number is represented with a series of “YY”s.  Similarly, the unknown pseudorandom stream is represented with a series of “XX”s.  However, despite the fact that we don’t know the value of the pseudorandom stream, we do know they are the same.  As a result, whenever you XOR anything with the same value twice you effectively create a NOP. So, the above three XOR operations can be simplified to the following:

Simplified XOR of Encrypted Credit Cards

Simplified XOR of Encrypted Credit Cards

So, what we actually did by XORing the two ciphertexts is create a new value that is the XOR of the two plaintexts.  Wow, that is kind of cool.  We just got rid of the pseudorandom byte stream that is supposed to provide all the security without ever knowing the key.  But, what do we do now?  We still don’t have plaintext credit card numbers yet.  Ah, but we do!  You can think of what we now have as the unknown credit card number encrypted with our known credit card number as the pseudorandom stream.  In other words, we can simply XOR our result with our known credit card value again and we will obtain the resulting plaintext for the unknown credit card number.

Decrypted Credit Card

Decrypted Credit Card

One you translate the resulting hex into the equivalent ASCII string you obtain “5362016689976147”, a valid MasterCard number (obviously we have used test credit card numbers throughout this blog entry).  We can simply apply the above steps to recover the full unencrypted credit card numbers for the entire database.  We have effectively used our own encrypted credit card number as a key to decrypt all of the other unknown credit cards in the database.  This is a pretty impressive feat for not knowing the algorithm and/or the secret key used to encrypt all of the credit card information.

As it turns out, we eventually received source code for the functionality under investigation and discovered that the developers were using RC4.  RC4 has had its fair share of issues, and is generally not a recommended algorithm, but none of what we just discussed had to do with any weakness with RC4.  The above attack would have worked on any stream cipher used in the same way.  So, as is typical, the attack had more to do with how cryptography is used than the cryptographic algorithm itself.  We can’t say it enough….crypto is hard.  It just is.  There are just too many things that developers need to get right for crypto to be done correctly.  Algorithm selection, IVs, padding, modes of encryption, secret key generation, key expiration, key revocation…the list goes on and on.  We in the security community have been pounding the “don’t invent your own cryptography” mantra for a long time.  However, that has lead to a swarm of developers simply leveraging proven cryptography insecurely.  So, we need to take this one step further.  On average, it is best for developers to defer to proven solutions that abstract the notion of cryptography even further.  For example, GPG, Keyczar, et al have abstractions that simply let you encrypt/decrypt/sign/verify things based off of a secret that you provide.  They handle proper padding, encryption mode selection, integrity protection, etc.  That is not to say that there aren’t use cases for needing to delve into the weeds sometimes, but for the average use case a trusted library is your friend.



8 thoughts on “Even if You Don’t Invent Your Own Crypto….It’s Still Hard

  1. It is worth pointing out that even if you _don’t_ have a plaintext/ciphertext pair (like you did in this case), you can still usually figure out the keystream when you have a large number of ciphertexts to work with – particularly when you know that the unknown plaintexts follow a nice structure, as credit card numbers do.

    • hehe…yeah, I just replied to another comment (before I saw your comment) which basically concurs with what you said. Even if you don’t know a ciphertext/plaintext pair you still can proceed pretty easily with highly formatted data (such as credit card numbers).

  2. You could just XOR together your plain + ciphertext and recover the keystream, then use it to decrypt everything else.

    • Yes,
      You definitely can do that. The only reason I focused in on this approach is that you don’t always know the plaintext for one of the ciphertexts. In that situation you would be forced to simply XOR the two ciphertexts and work from there. This is definitely doable (especisally with credit card numbers that have known prefixes, must pass luhn checks, etc) but would have made the explanation needlessly complex. But yes, in this particular situation you can retrieve the pseudorandom stream directly.

  3. Patrick,

    This is a very impressive post that certainly caught my attention. Doing good cryptographic storage seems a difficult challenge to most developers. OWASP with Kevin Keenan and “The Security Ninja” have been working on a one-page “cheat-sheet” for developers. Are we on the right track?

    http://www.owasp.org/index.php/Cryptographic_Storage_Cheat_Sheet

    Great post – keep em’ coming!

    – Jim

    • Hey,
      So, yeah, I’ll try to take a more in depth look at the link you posted to OWASP when I have a chance. However, I will say this after only looking at that link for a few seconds. First, while I admire the goal of the link, I think there should be a focus on discouraging developers from touching raw crypto. Making them choose algorithms, modes, etc. is a risky proposition. I think most developers should be encouraged to look at packages such as Keyczar, as it defers the difficult decision making to folks that are generally in a better position to make those decisions.
      Secondly, while i haven’t looked that the page in depth, the first thing that caught my eye was the note related to MAC/MIC usage. There was a sentence that talked about creating a MIC using “MIC(msg) = Secure_Hash( salt + msg )”. That is actually a dangerous construction, as it lends itself to length extension attacks (this is a perfect example of why it is dangerous to give crypto advice unless you understand the full ramifications of a given construction). This was why HMAC was developed…to avoid issues related to these ad-hoc MAC/MIC construction techniques.

  4. Anyone who thinks this is only for small sites / products, think again. A very similar issue occurred with Microsoft’s MS-CHAP v1. They were using the same key with a stream cipher to encrypt two separate channels – one in each direction. By xoring together the channels (out, in), you would get (out ⊕ key) ⊕ (in ⊕ key) = (out ⊕ in) ⊕ (key ⊕ key) = out ⊕ in. In other words, even though each stream was encrypted with an unknown key, if you knew any part of one stream you could get the corresponding part of the other.

    In this case you could apply a similar attack even if you couldn’t get it to encrypt a card for you. Simply xor together two encrypted credit card numbers of the same length and you’ll get the xor of the unencrypted credit card numbers. Combine that with the expectation that we have a series of ASCII digits, knowledge of common prefixes and the checksums built in to the number and you could probably reconstruct the numbers from a few ciphertexts. This is even weaker than the MS issue – once you confirm a digit of one credit card, you get the corresponding digit of all the others. A digit can be ruled out if it implies an invalid digit in that position in any other card. Once you got one whole card, you could use the technique in this blog to get the rest.

  5. Pingback: LRBlog » Blog Archive » Thoughts on Keybase.io

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s