By: Patrick Toomey
So, yet another crypto related post. Often times crypto related attacks fall on deaf ears, as for most people crypto is like magic (include me in that group every now and again), and it becomes difficult for the average developer to differentiate the risks associated with the attacks circulated within the security community. There is a vast chasm between the crypto community and just about everyone else. The cryptographers will talk about differential cryptanalysis, meet in the middle attacks, perfect forward secrecy, etc, etc. Often, the cryptographers’ goals are different than your average developer. Cryptographers are trying to achieve perfection. For them, any flaw in a cryptographic protocol is something to be discussed, researched, and fixed. As is a famous saying in the crypto community, “Attacks only get better”. So, even the smallest flaw is worth their time and effort, as it may lead to a practical crack that will warrant more immediate attention.
Unlike cryptographers, for the vast majority of developers, the goal is “good enough” security. Often times the result is not good enough, but the goal is admirable, as it doesn’t make sense to over invest in security. As a result, there is a chasm between “good enough” for cryptographers and “good enough” for developers. In this chasm are the real world attacks that are too boring for your average cryptographer to discuss, yet too subtle as to be obvious to your average developer. Many of these attacks have nothing to do with the cryptography itself. Instead, many real world crypto attacks have more to do with how developers piece together proven crypto building blocks insecurely. It was on a recent assessment that I came upon the following story that perfectly illustrates how cryptography is difficult to get right, even when you think you have made an informed decision.
On a recent assessment a colleague of mine came across a typical SQL injection attack. In fact, it was just the kind of SQL injection you hope for as an attacker, as it returned data to the user in a nice tabular output format. After a bit of mucking around we managed to find where the system was storing credit card information for its users. A few minutes later we had output that looked like the following:
Hmmm, base64 encoded credit card numbers. I wonder what they look like when we base64 decode them. A simple base64 decode resulted in the following hex values:
If you start trying to translate the above hex values to their ASCII equivalent you will quickly find that the values do not represent plaintext ASCII credit card numbers. Our first guess was that they were likely encrypted. So, step two is trying to figure out how they were encrypted. We took a look at the base64 decoded values and observed a simple fact: the vast majority of them were either 15 or 16 bytes long. Well, a 16 byte long ciphertext could be any 128 bit symmetric algorithm in simple ECB mode. But, the 15 byte entries were evidence against that, as they would get padded out to 16 bytes due to the requirement that symmetric algorithms encrypt a full block (16 bytes in the case of AES). So, my first thought was the use of a stream cipher. In their most basic form, a stream cipher simply takes a secret value as input (the key), and begins to generate a pseudorandom output stream of bytes. Encryption simply involves XORing the stream of random bytes with your plaintext. Unlike a block cipher, there is no need to pad out your plaintext, as you only use as many pseudorandom bytes as is necessary to XOR with your plaintext. Decryption is also simple, as XORing is a trivially invertible operation. The user simply feeds the algorithm the same key, generates the same pseudorandom byte stream, and XORs that with the ciphertext.
To see why this works let’s take a look at how this works for a single plaintext/ciphertext. Let us say we wish to encrypt the credit card number “4012888888881881”. The ASCII values in hex result in a plaintext array of the following values:
Also, let us assume that we are using a stream cipher with a strong key whose first 16 output bytes are:
When we XOR the plaintext with the pseudorandom stream we obtain the ciphertext.
To decrypt we simply take the ciphertext and perform the same XOR operation. Let’s quickly take our ciphertext and see how we decrypt it.
When we XOR the ciphertext with the stream we obtain the plaintext we started with. Through the whole encryption/decryption process we have effectively XORed the plaintext with the stream twice. XORing anything with the same value twice is effectively a NOP on the original value. Pictorially:
However, our examples so far have assumed knowledge of the pseudorandom stream. Without knowledge of the key it is difficult to reproduce the pseudorandom stream, and thus very difficult to perform the steps we just outlined to obtain the plaintext for a given ciphertext. Moreover, we don’t even know what algorithm is in use here. Did the developers use RC4? Did they try to make their own pseudorandom generator by repetitively hashing a secret key? We didn’t really know, but regardless of the algorithm, there is inherent risk with leveraging a stream cipher in its most basic form. If the stream cipher uses a shared key, the pseudorandom stream will be identical every time it is initialized. To test this conjecture we used several test user accounts to register several credit cards. Sure enough, identical credit card numbers resulted in identical ciphertext. So, what can we do with this? We have no idea what algorithm was used, and we have no idea of what the value of the secret key is. However, it turns out that we don’t need any of that information. To see why let’s first take a look at the hex of two encrypted credit cards we base64 decoded above.
The above shows the 16 ciphertext bytes for one of the credit card numbers in the database.
The above shows the 16 ciphertext bytes for a second credit card number in the database.
Let’s assume that the ciphertext from the first encrypted credit card was ours, where we know the plaintext ASCII hex values for the card.
Let’s also assume that the second credit card represents a credit card of unknown origin (i.e. all we can observe is the ciphertext. We have no ideas what the digits of the card are). Similarly, without knowledge of the key, we have no idea how to generate the pseudorandom byte stream used to encrypt the cards. But, what happens if we simply XOR these two ciphertexts together.
Well, that doesn’t really look like it got us anywhere, as we still don’t have anything that looks like an ASCII credit card number. However, let’s take a look at what actually happened when we did the above operation. We are confident that both credit card numbers were encrypted with the same pseudorandom byte stream (remember that encryption of the same credit card under different users resulted in the same ciphertext). Furthermore, we know that the first credit card number is “4012888888881881”, as that is our own credit card number. The key insight is that we have effectively XORed four different byte sequences together. We have XORed the plaintext bytes of our credit card, the plaintext bytes of an unknown credit card, and the same pseudorandom sequence twice.
In the above picture the unknown plaintext credit card number is represented with a series of “YY”s. Similarly, the unknown pseudorandom stream is represented with a series of “XX”s. However, despite the fact that we don’t know the value of the pseudorandom stream, we do know they are the same. As a result, whenever you XOR anything with the same value twice you effectively create a NOP. So, the above three XOR operations can be simplified to the following:
So, what we actually did by XORing the two ciphertexts is create a new value that is the XOR of the two plaintexts. Wow, that is kind of cool. We just got rid of the pseudorandom byte stream that is supposed to provide all the security without ever knowing the key. But, what do we do now? We still don’t have plaintext credit card numbers yet. Ah, but we do! You can think of what we now have as the unknown credit card number encrypted with our known credit card number as the pseudorandom stream. In other words, we can simply XOR our result with our known credit card value again and we will obtain the resulting plaintext for the unknown credit card number.
One you translate the resulting hex into the equivalent ASCII string you obtain “5362016689976147”, a valid MasterCard number (obviously we have used test credit card numbers throughout this blog entry). We can simply apply the above steps to recover the full unencrypted credit card numbers for the entire database. We have effectively used our own encrypted credit card number as a key to decrypt all of the other unknown credit cards in the database. This is a pretty impressive feat for not knowing the algorithm and/or the secret key used to encrypt all of the credit card information.
As it turns out, we eventually received source code for the functionality under investigation and discovered that the developers were using RC4. RC4 has had its fair share of issues, and is generally not a recommended algorithm, but none of what we just discussed had to do with any weakness with RC4. The above attack would have worked on any stream cipher used in the same way. So, as is typical, the attack had more to do with how cryptography is used than the cryptographic algorithm itself. We can’t say it enough….crypto is hard. It just is. There are just too many things that developers need to get right for crypto to be done correctly. Algorithm selection, IVs, padding, modes of encryption, secret key generation, key expiration, key revocation…the list goes on and on. We in the security community have been pounding the “don’t invent your own cryptography” mantra for a long time. However, that has lead to a swarm of developers simply leveraging proven cryptography insecurely. So, we need to take this one step further. On average, it is best for developers to defer to proven solutions that abstract the notion of cryptography even further. For example, GPG, Keyczar, et al have abstractions that simply let you encrypt/decrypt/sign/verify things based off of a secret that you provide. They handle proper padding, encryption mode selection, integrity protection, etc. That is not to say that there aren’t use cases for needing to delve into the weeds sometimes, but for the average use case a trusted library is your friend.