“Researchers steal iPhone passwords in 6 minutes”…true…but not the whole story

By Patrick Toomey

Direct link to keychaindumper (for those that want to skip the article and get straight to the code)

So, a few weeks ago a wave of articles hit the usual sites about research that came out of the Fraunhofer Institute (yes, the MP3 folks) regrading some issues found in Apple’s Keychain service.  The vast majority of the articles, while factually accurate, didn’t quite present the full details of what the researchers found.  What the researchers actually found was more nuanced than what was reported.  But, before we get to what they actually found, let’s bring everyone up to speed on Apple’s keychain service.

Apple’s keychain service is a library/API provided by Apple that developers can use to store sensitive information on an iOS device “securely” (a similar service is provided in Mac OS X).  The idea is that instead of storing sensitive information in plaintext configuration files, developers can leverage the keychain service to have the operating system store sensitive information securely on their behalf.  We’ll get into what is meant by “securely” in a minute, but at a high level the keychain encrypts (using a unique per-device key that cannot be exported off of the device) data stored in the keychain database and attempts to restrict which applications can access the stored data.  Each application on an iOS device has a unique “application-identifier” that is cryptographically signed into the application before being submitted to the Apple App store.  The keychain service restricts which data an application can access based on this identifier.  By default, applications can only access data associated with their own application-identifier.  Apple realized this was a bit restrictive, so they also created another mechanism that can be used to share data between applications by using “keychain-access-groups”.  As an example, a developer could release two distinct applications (each with their own application-identifier) and assign each of them a shared access group.  When writing/reading data to the keychain a developer can specify which access group to use.  By default, when no access group is specified, the application will use the unique application-identifier as the access group (thus limiting access to the application itself).  Ok, so that should be all we need to know about the Keychain.  If you want to dig a little deeper Apple has a good doc here.

Ok, so we know the keychain is basically a protected storage facility that the iOS kernel delegates read/write privileges to based on the cryptographic signature of each application.  These cryptographic signatures are known as “entitlements” in iOS parlance.  Essentially, an application must have the correct entitlement to access a given item in the keychain.  So, the most obvious way to go about attacking the keychain is to figure out a way to sign fake entitlements into an application (ok, patching the kernel would be another way to go, but that is a topic for another day).  As an example, if we can sign our application with the “apple” access group then we would be able to access any keychain item stored using this access group.  Hmmm…well, it just so happens that we can do exactly that with the “ldid” tool that is available in the Cydia repositories once you Jailbreak your iOS device.  When a user Jailbreak’s their phone, the portion of the kernel responsible for validating cryptographic signatures is patched so that any signature will validate. So, ldid basically allows you to sign an application using a bogus signature. But, because it is technically signed, a Jailbroken device will honor the signature as if it were from Apple itself.

Based on the above descrption, so long as we can determine all of the access groups that were used to store items in a user’s keychain, we should be able to dump all of them, sign our own application to be a member of all of them using ldid, and then be allowed access to every single keychain item in a user’s keychain.  So, how do we go about getting a list of all the access group entitlements we will need?  Well, the kechain is nothing more than a SQLite database stored in:

/var/Keychains/keychain-2.db

And, it turns out, the access group is stored with each item that is stored in the keychain database.  We can get a complete list of these groups with the following query:

SELECT DISTINCT agrp FROM genp

Once we have a list of all the access groups we just need to create an XML file that contains all of these groups and then sign our own application with ldid.  So, I created a tool that does exactly that called keychain_dumper.  You can first get a properly formatted XML document with all the entitlements you will need by doing the following:

./keychain_dumper -e > /var/tmp/entitlements.xml

You can then sign all of these entitlments into keychain_dumper itself (please note the lack of a space between the flag and the path argument):

ldid -S/var/tmp/entitlements.xml keychain_dumper

After that, you can dump all of the entries within the keychain:

./keychain_dumper

If all of the above worked you will see numerous entries that look similar to the following:

Service: Dropbox
Account: remote
Entitlement Group: R96HGCUQ8V.*
Label: Generic
Field: data
Keychain Data: SenSiTive_PassWorD_Here

Ok, so what does any of this have to do with what was being reported on a few weeks ago?  We basically just showed that you can in fact dump all of the keychain items using a jailbroken iOS device.  Here is where the discussion is more nuanced than what was reported.  The steps we took above will only dump the entire keychain on devices that have no PIN set or are currently unlocked.  If you set a PIN on your device, lock the device, and rerun the above steps, you will find that some keychain data items are returned, while others are not.  You will find a number of entries now look like this:

Service: Dropbox
Account: remote
Entitlement Group: R96HGCUQ8V.*
Label: Generic
Field: data
Keychain Data: <Not Accessible>

This fundamental point was either glossed over or simply ignored in every single article I happend to come across (I’m sure at least one person will find the article that does mention this point :-)).  This is an important point, as it completely reframes the discussion.  The way it was reported it looks like the point is to show how insecure iOS is.  In reality the point should have been to show how security is all about trading off various factors (security, convenience, etc).  This point was not lost on Apple, and the keychain allows developers to choose the appropriate level of security for their application.  Stealing a small section from the keychain document from Apple, they allow six levels of access for a given keychain item:

CFTypeRef kSecAttrAccessibleWhenUnlocked;
CFTypeRef kSecAttrAccessibleAfterFirstUnlock;
CFTypeRef kSecAttrAccessibleAlways;
CFTypeRef kSecAttrAccessibleWhenUnlockedThisDeviceOnly;
CFTypeRef kSecAttrAccessibleAfterFirstUnlockThisDeviceOnly;
CFTypeRef kSecAttrAccessibleAlwaysThisDeviceOnly;

The names are pretty self descriptive, but the main thing to focus in on is the “WhenUnlocked” accessibility constants.  If a developer chooses the “WhenUnlocked” constant then the keychain item is encrypted using a cryptographic key that is created using the user’s PIN as well as the per-device key mentioned above.  In other words, if a device is locked, the cryptographic key material does not exist on the phone to decrypt the related keychain item.  Thus, when the device is locked, keychain_dumper, despite having the correct entitlements, does not have the ability to access keychain items stored using the “WhenUnlocked” constant.  We won’t talk about the “ThisDeviceOnly” constant, but it is basically the most strict security constant available for a keychain item, as it prevents the items from being backed up through iTunes (see the Apple docs for more detail).

If a developer does not specify an accessibility constant, a keychain item will use “kSecAttrAccessibleWhenUnlocked”, which makes the item available only when the device is unlocked.  In other words, applications that store items in the keychain using the default security settings would not have been leaked using the approach used by Fraunhofer and/or keychain_dumper (I assume we are both just using the Keychain API as it is documented).  That said, quite a few items appear to be set with “kSecAttrAccessibleAlways”.  Such items include wireless access point passwords, MS Exchange passwords, etc.  So, what was Apple thinking; why does Apple let developers choose among all these options?  Well, let’s use some pretty typical use cases to think about it.  A user boots their phone and they expect their device to connect to their wireless access point without intervention.  I guess that requires that iOS be able to retrieve their access point’s password regardless of whether the device is locked or not.  How about MS Exchange?  Let’s say I lost my iPhone on the subway this morning.  Once I get to work I let the system administrator know and they proceed to initiate a remote wipe of my Exchange data.  Oh, right, my device would have to be able to login to the Exchange server, even when locked, for that to work.  So, Apple is left in the position of having to balance the privacy of user’s data with a number of use cases where less privacy is potentially worthwhile.  We can probably go through each keychain item and debate whether Apple chose the right accessibility constant for each service, but I think the main point still stands.

Wow…that turned out to be way longer than I thought it would be.  Anyway, if you want to grab the code for keychain_dumper to reproduce the above steps yourself you can grab the code on github.  I’ve included the source as well as a binary just in case you don’t want/have the developer tools on your machine.  Hopefully this tool will be useful for security professionals that are trying to evaluate whether an application has chosen the appropriate accessibility parameters during blackbox assessments. Oh, and if you want to read the original paper by Fraunhofer you can find that here.

Even if You Don’t Invent Your Own Crypto….It’s Still Hard

By: Patrick Toomey

So, yet another crypto related post.  Often times crypto related attacks fall on deaf ears, as for most people crypto is like magic (include me in that group every now and again), and it becomes difficult for the average developer to differentiate the risks associated with the attacks circulated within the security community.  There is a vast chasm between the crypto community and just about everyone else.  The cryptographers will talk about differential cryptanalysis, meet in the middle attacks, perfect forward secrecy, etc, etc.  Often, the cryptographers’ goals are different than your average developer.  Cryptographers are trying to achieve perfection.  For them, any flaw in a cryptographic protocol is something to be discussed, researched, and fixed.  As is a famous saying in the crypto community, “Attacks only get better”.  So, even the smallest flaw is worth their time and effort, as it may lead to a practical crack that will warrant more immediate attention.

Unlike cryptographers, for the vast majority of developers, the goal is “good enough” security.  Often times the result is not good enough, but the goal is admirable, as it doesn’t make sense to over invest in security.  As a result, there is a chasm between “good enough” for cryptographers and “good enough” for developers.  In this chasm are the real world attacks that are too boring for your average cryptographer to discuss, yet too subtle as to be obvious to your average developer.  Many of these attacks have nothing to do with the cryptography itself.  Instead, many real world crypto attacks have more to do with how developers piece together proven crypto building blocks insecurely.  It was on a recent assessment that I came upon the following story that perfectly illustrates how cryptography is difficult to get right, even when you think you have made an informed decision.

On a recent assessment a colleague of mine came across a typical SQL injection attack.  In fact, it was just the kind of SQL injection you hope for as an attacker, as it returned data to the user in a nice tabular output format.  After a bit of mucking around we managed to find where the system was storing credit card information for its users.  A few minutes later we had output that looked like the following:

Base64 Credit Card Numbers

Base64 Credit Card Numbers

Hmmm, base64 encoded credit card numbers.  I wonder what they look like when we base64 decode them.  A simple base64 decode resulted in the following hex values:

Base64 Decoded Credit Card Numbers

Base64 Decoded Credit Card Numbers

If you start trying to translate the above hex values to their ASCII equivalent you will quickly find that the values do not represent plaintext ASCII credit card numbers.  Our first guess was that they were likely encrypted.  So, step two is trying to figure out how they were encrypted.  We took a look at the base64 decoded values and observed a simple fact: the vast majority of them were either 15 or 16 bytes long.  Well, a 16 byte long ciphertext could be any 128 bit symmetric algorithm in simple ECB mode.  But, the 15 byte entries were evidence against that, as they would get padded out to 16 bytes due to the requirement that symmetric algorithms encrypt a full block (16 bytes in the case of AES).  So, my first thought was the use of a stream cipher.  In their most basic form, a stream cipher simply takes a secret value as input (the key), and begins to generate a pseudorandom output stream of bytes.  Encryption simply involves XORing the stream of random bytes with your plaintext.  Unlike a block cipher, there is no need to pad out your plaintext, as you only use as many pseudorandom bytes as is necessary to XOR with your plaintext.  Decryption is also simple, as XORing is a trivially invertible operation.  The user simply feeds the algorithm the same key, generates the same pseudorandom byte stream, and XORs that with the ciphertext.

To see why this works let’s take a look at how this works for a single plaintext/ciphertext.  Let us say we wish to encrypt the credit card number “4012888888881881”. The ASCII values in hex result in a plaintext array of the following values:

Plaintext Credit Card Number ASCII in Hex

Plaintext Credit Card Number ASCII in Hex

Also, let us assume that we are using a stream cipher with a strong key whose first 16 output bytes are:

Pseudorandom Byte Stream in Hex

Pseudorandom Byte Stream in Hex

When we XOR the plaintext with the pseudorandom stream we obtain the ciphertext.

Resulting Ciphertext

Resulting Ciphertext

To decrypt we simply take the ciphertext and perform the same XOR operation.  Let’s quickly take our ciphertext and see how we decrypt it.

Decrypted Ciphertext

Decrypted Ciphertext

When we XOR the ciphertext with the stream we obtain the plaintext we started with.  Through the whole encryption/decryption process we have effectively XORed the plaintext with the stream twice.  XORing anything with the same value twice is effectively a NOP on the original value.  Pictorially:

Full Encryption Decryption Cycle

Full Encryption Decryption Cycle

However, our examples so far have assumed knowledge of the pseudorandom stream.  Without knowledge of the key it is difficult to reproduce the pseudorandom stream, and thus very difficult to perform the steps we just outlined to obtain the plaintext for a given ciphertext.  Moreover, we don’t even know what algorithm is in use here.  Did the developers use RC4?  Did they try to make their own pseudorandom generator by repetitively hashing a secret key?  We didn’t really know, but  regardless of the algorithm, there is inherent risk with leveraging a stream cipher in its most basic form.  If the stream cipher uses a shared key, the pseudorandom stream will be identical every time it is initialized.  To test this conjecture we used several test user accounts  to register several credit cards.  Sure enough, identical credit card numbers resulted in identical ciphertext.  So, what can we do with this?  We have no idea what algorithm was used, and we have no idea of what the value of the secret key is.  However, it turns out that we don’t need any of that information.  To see why let’s first take a look at the hex of two encrypted credit cards we base64 decoded above.

First Encrypted Credit Card

First Encrypted Credit Card

The above shows the 16 ciphertext bytes for one of the credit card numbers in the database.

Second Encrypted Credit Card

Second Encrypted Credit Card

The above shows the 16 ciphertext bytes for a second credit card number in the database.

Let’s assume that the ciphertext from the first encrypted credit card was ours, where we know the plaintext ASCII hex values for the card.

Plaintext Bytes for the First Credit Card

Plaintext Bytes for the First Credit Card

Let’s also assume that the second credit card represents a credit card of unknown origin (i.e.  all we can observe is the ciphertext.  We have no ideas what the digits of the card are).  Similarly, without knowledge of the key, we have no idea how to generate the pseudorandom byte stream used to encrypt the cards.  But, what happens if we simply XOR these two ciphertexts together.

XOR of the Two Encrypted Credit Cards

XOR of the Two Encrypted Credit Cards

Well, that doesn’t really look like it got us anywhere, as we still don’t have anything that looks like an ASCII credit card number.  However, let’s take a look at what actually happened when we did the above operation.  We are confident that both credit card numbers were encrypted with the same pseudorandom byte stream (remember that encryption of the same credit card under different users resulted in the same ciphertext).  Furthermore, we know that the first credit card number is “4012888888881881”, as that is our own credit card number.  The key insight is that we have effectively XORed four different byte sequences together.  We have XORed the plaintext bytes of our credit card, the plaintext bytes of an unknown credit card, and the same pseudorandom sequence twice.

Encrypted Credit Card XOR Expanded

Encrypted Credit Card XOR Expanded

In the above picture the unknown plaintext credit card number is represented with a series of “YY”s.  Similarly, the unknown pseudorandom stream is represented with a series of “XX”s.  However, despite the fact that we don’t know the value of the pseudorandom stream, we do know they are the same.  As a result, whenever you XOR anything with the same value twice you effectively create a NOP. So, the above three XOR operations can be simplified to the following:

Simplified XOR of Encrypted Credit Cards

Simplified XOR of Encrypted Credit Cards

So, what we actually did by XORing the two ciphertexts is create a new value that is the XOR of the two plaintexts.  Wow, that is kind of cool.  We just got rid of the pseudorandom byte stream that is supposed to provide all the security without ever knowing the key.  But, what do we do now?  We still don’t have plaintext credit card numbers yet.  Ah, but we do!  You can think of what we now have as the unknown credit card number encrypted with our known credit card number as the pseudorandom stream.  In other words, we can simply XOR our result with our known credit card value again and we will obtain the resulting plaintext for the unknown credit card number.

Decrypted Credit Card

Decrypted Credit Card

One you translate the resulting hex into the equivalent ASCII string you obtain “5362016689976147”, a valid MasterCard number (obviously we have used test credit card numbers throughout this blog entry).  We can simply apply the above steps to recover the full unencrypted credit card numbers for the entire database.  We have effectively used our own encrypted credit card number as a key to decrypt all of the other unknown credit cards in the database.  This is a pretty impressive feat for not knowing the algorithm and/or the secret key used to encrypt all of the credit card information.

As it turns out, we eventually received source code for the functionality under investigation and discovered that the developers were using RC4.  RC4 has had its fair share of issues, and is generally not a recommended algorithm, but none of what we just discussed had to do with any weakness with RC4.  The above attack would have worked on any stream cipher used in the same way.  So, as is typical, the attack had more to do with how cryptography is used than the cryptographic algorithm itself.  We can’t say it enough….crypto is hard.  It just is.  There are just too many things that developers need to get right for crypto to be done correctly.  Algorithm selection, IVs, padding, modes of encryption, secret key generation, key expiration, key revocation…the list goes on and on.  We in the security community have been pounding the “don’t invent your own cryptography” mantra for a long time.  However, that has lead to a swarm of developers simply leveraging proven cryptography insecurely.  So, we need to take this one step further.  On average, it is best for developers to defer to proven solutions that abstract the notion of cryptography even further.  For example, GPG, Keyczar, et al have abstractions that simply let you encrypt/decrypt/sign/verify things based off of a secret that you provide.  They handle proper padding, encryption mode selection, integrity protection, etc.  That is not to say that there aren’t use cases for needing to delve into the weeds sometimes, but for the average use case a trusted library is your friend.



ViewStateViewer: A GUI Tool for deserializing/reserializing ViewState

By: Patrick Toomey

Background

So, I was reading the usual blogs and came across a post by Mike Tracy from Matasano (Matasano has been having some technical difficulties…this link should work once they have recovered).  In the blog post Mike talks about the development of a ViewState serializer/deserializer for his WWMD web application security assessment console  (please see Mike’s original post for more details).  Mike noted that the tools for viewing/manipulating ViewState from an application testing perspective are pretty weak, and I can’t agree more.  There are a number of ViewState tools floating around that do a tolerable job of taking in a Base64 encoded ViewState string and presenting the user with a static view of the deserialized object. However, no tools up until this point, save for Mike’s new implementation, have allowed a user to change the values within the deserialized object and then reserialize them into a new Base64 string. Mike’s tool does exactly this, and is immensely useful for web application testing. The only downside to Mike’s tools is that it is built for his workflow and not mine (how can I fault the man for that).  So, I decided to build an equivalent that works well for me (and hopefully for you as well).

I tend to have a web proxy of some sort running on my machine throughout the day. There are tons of them out there and everyone seems to have their personal favorite. There is Paros, WebScarab, BurpSuite, and I am sure many others. In the last few months I have been using a newer entrant into the category, Fiddler.  Fiddler is a great web proxy whose only big drawback is that it is Windows only.  However, at least for me, the upsides to Fiddler tend to outweigh the negatives.  Fiddler has a fairly refined workflow (don’t get me started on WebScarab), is stable (don’t get me started on Paros), and is pretty extensible.  There are a number of ways to extend Fiddler, most trivially using their own FiddlerScript hooks.  In addition, there is a public API for extending the application using .NET.  Fiddler has a number of interfaces that can be extended to allow for inspecting and manipulating requests or responses. Please see the Fiddler site for more details on extending Fiddler using either FiddlerScript or .NET.  In particular, take a look at the development section to get a better feel for the facilities provided by Fiddler for extending the application.

Anyway, I had been thinking about writing a ViewState serializer/deserializer for Fiddler for the past month or two when I saw Mike’s blog post.  I decided that it was about time to set aside a little time and write some code.  I was lucky that Fiddler uses .NET, as I was able to leverage all of the system assemblies that Mike had to decompile in Reflector. After a bit of coding I ended up with my ViewStateViewer Fiddler inspector.  Let’s take a quick tour…

ViewStateViewer seamlessly integrates into the Fiddler workflow, allowing a user to manipulate ViewState just as they would any other variable in a HTTP request.  An extremely common scenario for testing involves submitting a request in the browser, trapping the request in a proxy, changing a variable’s value, and forwarding the request on to the server. ViewStateViewer tries to integrate into this workflow as seamlessly as possible.  Upon trapping a request that contains ViewState, ViewStateViewer extract the Base64 encoded ViewState, Base64 decodes it, deserializes the ViewState, and allows a user to manipulate the ViewState as they would any other variable sent to the server.  Let’s take a look at some screenshots to get a better idea of how this works.

ViewStateViewer Tour

Serialized ViewStateSerialized ViewState

By Default, Fiddler lets a user trap requests and view/edit a POST body before submitting the request to the server.  In this case, the POST body contains serialized ViewState that we would like to work with.  Without ViewStateViewer this is non-trivial, as Fiddler only shows us the Base64 encoded serialization.

Deserialized ViewStateDeserialized ViewState

ViewStateViewer adds a new “ViewState” tab within Fiddler that dynamically identifies and deserializes ViewState on the fly.  The top half of the inspector shows the original Base64 serialized ViewState string.  The bottom half of the inspector shows an XML representation of the deserialized ViewState.  In between these two views the user can see if the ViewState is MAC protected and the version of the ViewState being deserialized.  In this case we can see that this is .NET 2.X ViewState and that MAC protection is not enabled.

ReSerialized ViewStateReserialized ViewState

Once the ViewState is deserialized we can manipulate the ViewState by changing the values in the XML representation.  In this example we changed one of the string values to “foobar”, as can be seen in the figure above.  Once we change the value we can reserialize the ViewState using the “encode” button.  The reserialized Base64 encoded ViewState string can be seen in the top half of the ViewStateViewer window.  Once we have “encoded” the ViewState with our modifications, ViewStateViewer automatically updates the POST body with the new serialized ViewState string.  This request can now be “Run to Completion”, which lets Fiddler submit the updated request to the server.

Limitations

It should be noted that if the original ViewState had used MAC protection ViewStateViewer would not be able to reserialize the manipulated ViewState with a valid MAC.  While ViewStateViewer will not prevent you from deserializing, manipulating, and reserializing MAC protected ViewState, it will not be able to append a valid MAC to modified ViewState.  ViewStateViewer will warn us that “Modified ViewState does not contain a valid MAC”.  Modified requests made using reserialized MAC protected ViewState will ultimately fail, as we don’t know the machine key used to produce a valid MAC.  Regardless, sometimes simply being able to view what is being stored in MAC protected ViewState can be useful during an application assessment.

In addition to MAC protection, ViewState can be optionally protected using encryption.  Encryption will prevent any attempt by ViewStateViewer to deserialize the ViewState.  If encryption is detected ViewStateViewer will simply show the user the original Base64 ViewState string.  However, as any application security consultant can attest, there are many applications that do not encrypt or MAC protect their ViewState.  ViewStateViewer is aimed squarely at these use cases.

Finally, ViewStateViewer was written entirely with deserializing/serializing ViewState 2.X in mind. While ViewState 1.X is supported, the support at this time is limited, though completely functional. ViewState 1.X, unlike ViewState 2.X, Base64 decodes into a completely human readable ASCII based serialization format (think JSON).  As such, ViewStateViewer simply Base64 decodes ViewState 1.X and displays the result to the user. The user is then free to make any changes they wish to the decoded ViewState. This works exactly the same as the example shown above, except that the decoded ViewState object will not be a nicely formatted XML document. I might get around to adding true ViewState 1.X support, but the benefits would be purely cosmetic, as the current implementation has no functional limitations.

Wrap Up

ViewState is one of those areas that tends to be under-tested during application assessments, as there have been no good tools for efficiently evaluating the effects of modifying ViewState.  Hopefully the release of ViewStateViewer will make evaluation of ViewState a more common practice.  If you want to give ViewStateViewer a try you can download the Fiddler plugin(with source) here. Simply copy ViewStateViewer.dll into your “Inspectors” folder within your Fiddler install directory.  It should be noted that this inspector is for Fiddler2, so please update your copy of Fiddler if you are out of date.  Finally, this is very much a work in progress.  As I found out during development, there is ton of ViewState out there that is either non-trivial to deserialize/reserialize or is impossible to deserialize/reserialize (non-standard objects being serialized for example).  Maybe I’ll do another blog post detailing some of these difficult/impossible to handle edge cases in a subsequent post.  So, while I have done some limited testing, I am sure there are some bugs that will crop up.  If you happen to find one please don’t hesitate to contact me.

Hulu…client-side “encryption”…seriously?

By: Patrick Toomey

I remember being pretty excited by the prospect of a service like Hulu.   The idea that major networks were actually coming together to stream mainstream video content was impressive.  It was such a departure from the locked down, share nothing, mentality of old.   I thought to myself, “Wow, does Hollywood finally get it?”. Apparently my optimism was exactly that…optimistic.

Sometime in the last week or so it was reported that Hulu, a video streaming service run by NBC and FOX, started “encrypting” Ajax responses to block unauthorized software clients (Boxee et al.) from sidestepping the hulu.com website to view content.  However, encryption is purposefully in quotes, as what Hulu actually implemented is a client-side obfuscation mechanism.  It it well known that such protection mechanisms are flawed by design and bound to be circumvented quickly.

The protective measure that is implemented rests on the obfuscation of Ajax responses made against hulu.com.  Instead of returning plaintext HTML content, Ajax requests return obfuscated URL encoded strings.  These URL encoded strings are reverted to plaintext on the client-side using JavaScript.  For example, a request to:

http://www.hulu.com/channels/Home-and-Garden?kind=videos&sort=popularity

returns a URL encoded string that begins:

dobfu__%F2%9E%84%88%EE%99%81%9F%BD%89%D0%DC …

The entire string is approximately 141KB long.  Other than the “dobfu__” prefix, the remainder of the string is URL encoded.  This obfuscated string is transformed into plaintext by a JavaScript function called “_dobfu()”.  This function, after a bit of reformatting, is reproduced below:

function _dobfu(text) {
  return text.slice(0,7)!='dobfu__'?text:
    $A(unescape(text.substring(7)).tol()).map(function(i) {
      i=0xfeedface^i;
      return String.fromCharCode(i&0xFF,i>>>8&0xFF,i>>>16&0xFF,i>>>24&0xFF);
    }
  ).join('').replace(/\+$/,'');
}

All of the above code is pretty easy to follow, save for the references to $A() and the the tol() functions.  The $A() function is a Prototype global function that creates a full array object from any other object that can pass for an array (supports indexing, etc).  This is done so that the new object inherits the full functionality of an array (the map method is needed in this case).  The second piece of ambiguous logic , the tol() method, is defined in another JavaScript file and is reproduced below:

String.prototype.tol=function(){
  var s=this;
  return $R(0,Math.ceil(s.length/4)-1).map(
    function(i){
      return s.charCodeAt(i*4)+(s.charCodeAt(i*4+1)<<8)+(s.charCodeAt(i*4+2)<<16)+(s.charCodeAt(i*4+3)<<24);
    }
  );
};

Essentially this method takes a string of bytes and creates an array of 32-bit integers from each 4-byte chunk.  For example, if the string processed in the method was “\x01\x23\x45\x67\x89\xab\xcd\xef” the method would return the array [0x67452301, 0xefcdab89].  The ordering of the individual bytes is a result of the “tol()” method parsing the data as little-endian.

So, with those two functions defined we can quickly describe how Hulu de-obfuscates responses.  The obfuscated string is broken up into 4-byte integers.  Since the length of the obfuscated string is always evenly divisible by four we are guaranteed that a string of length x will turn into an array of 4-byte integers of length x/4.  Then, for each 4-byte integer, the value is XORed with the constant “0xfeedface”.  Once XORed, the individual bytes from the integer are split apart and converted back to their equivalent ASCII value.  Finally, all trailing NULL bytes are removed from the de-obfuscated string.

It is a bit difficult to imagine what Hulu thought they might accomplish with the above scheme.  It effectively does nothing to prevent third-party tools from performing the same obfuscation/de-obfuscation.  Any scheme that attempts to implement client-side “decryption”, particularly in JavaScript, is bound for failure.  The client possesses the obfuscated message, the key to de-obfuscate the message, and the Javascript that executes the algorithm.   Using these components, it is a trivial exercise to transform any obfuscated response back into plaintext.  Hulu likely thwarted unauthorized software for the better part of an afternoon and no more.  Client-side security mechanisms simply don’t work.  Even complex systems implemented in native code, such as popular DRM schemes, that may go unbroken for a period of time, will eventually be circumvented.  However, to implement a similar preventative measure in JavaScript lowers the difficulty of circumvention dramatically.

Beyond the technical discussion there is also a more broad question to be asked.  What was the net gain for Hulu?  They failed to accomplish their implicit goal: to block unauthorized software.  Hulu simply received another  bit of bad press for treating their customers like thieves.  Hulu, and other such services, need to realize that the ubiquitous availability of their content will ultimately grow their fan base.  There is ever increasing competition for a viewer’s eyes and ears.  Podcasts, YouTube, gaming, etc are all competing.  Third-party products, such as Boxee, only serve to increase the ubiquity of their content, which shouldn’t be viewed as a bad thing.  Thwarting their own customers only sours the experience and reinforces the presumption that a good chunk of the entertainment industry just doesn’t get it.  Besides being bad security, this latest debacle is just bad business.

Crypto Pet Peeves: Hashing…Encoding…It’s All The Same, Right?

Patrick Toomey

© 2008 Neohapsis

We all know cryptography is hard. Time and time again we in the security community give advice that goes something like, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. Even if you think you have an unbelievably good reason I would still take pause and make sure there is no other alternative. Nearly every aspect of cryptography is painstakingly difficult: developing new crypto primitives is hard, correctly implementing them is nearly just as hard, and even using existing crypto APIs can be fraught with subtlety. As discussed in a prior post, Seed Racing, even fairly simple random number generation is prone to developer error. Whenever I audit source I keep my eyes open for unfamiliar crypto code. So was the case on a recent engagement; I found myself reviewing an application in a language that I was less familiar with: Progress ABL.

Progress ABL is similar to a number of other 4GL languages, simplifying development given the proper problem set. Most notably, Progress ABL allows for rapid development of typical business CRUD applications, as the language has a number of features that make database interactions fairly transparent. For those of you interested to learn more, the language reference manual can be found on Progress’ website.

As I began my review of the application I found myself starting where I usually do: staring at the login page. The application was a fairly standard web app that required authentication via login credentials before accessing the sensitive components of the application. Being relatively unfamiliar with ABL, I was curious how they would handle session management. Sure enough, just as with many other web apps, the application set a secure cookie that uniquely identifies my session upon login. However, I noticed that the session ID was relatively short (sixteen lower/upper case letters and four digits). I decided to pull down a few thousand of the tokens to see if I noticed any anomalies. The first thing I noticed was that the four digit number on the end was obviously not random, as values tended to repeat, cluster together, etc. So, the security of the session ID must lie in the sixteen characters that precede the four digits. However, even the sixteen characters did not look so random. Certain letters appeared to occur more than others. Certain characters seemed to follow other characters more than others. But, this was totally unscientific; strange patterns can be found in any small sample of data. So, I decided to do a bit more scientific investigation into what was going on.

Just to confirm my suspicions I coded up a quick python script to pull down a few thousand tokens and count the frequency of each character in the token. Several minutes later I had a nice graph in excel.

Histogram of Encode Character Frequency
Histogram of Encode Character Frequency

Ouch! That sure doesn’t look very random. So, I opened up Burp Proxy and used their Sequencer to pull down a few thousand more session cookies. The Burp Sequencer has support for running a number of tests, including a set of FIPS-compliant statistical tests for randomness. To obtain a statistically significant result Burp analyzes a sample size of 20,000 tokens. Since I saw that the four digit token at the end of the session ID provided little to no entropy, I discarded them from the analysis. It seemed obvious that the sixteen character sequence was generated using some sort of cryptographic hash, and the four digit number was generated in some other way. I was more interested in the entropy provided by the hash. So, after twenty minutes of downloading tokens, I let Burp crunch the numbers. About 25 seconds later Burp returned an entropy value of 0 bits. Burp returned a graph that looked like the one below, showing the entropy of the data at various significance levels.

Encode Entropy Estimation
Encode Entropy Estimation

Hmmm, maybe Burp is broken. I was pretty sure I had successfully used the Burp Sequencer before. Maybe it was user error, a bug in the current version, who knows. I decided that a control was needed, just to ensure that the tool was working the way I thought it should. So, I wrote a bit more python to simply print the hex-encoded value of a SHA1 hash on the numbers 1-20,000. I loaded this data into Burp and analyzed the data set. Burp estimated the entropy at 153 bits. Just to compare with the prior results, here is the distribution graph and the Burp entropy results for the SHA1 output:

Histogram of SHA1 Character Frequency
Histogram of SHA1 Character Frequency

SHA1 Entropy Estimation
SHA1 Entropy Estimation

I repeated the same test against a set of JSESSIONID tokens and found a similarly acceptable result. Ok, so the Burp Sequencer seems to be working.

So, I next went hunting for the session token generation code in the application. After a little greping I found the function for generating new session tokens. Ultimately the function took a number of values and ran them through a function called “ENCODE”. Hmmm, ENCODE, that didn’t sound familiar. Some more greping through the source did not reveal any function definitions, so I assumed the function must be part of the standard library for ABL. Sure enough, on page 480 of the language reference manual there was a description of the ENCODE function.

“Encodes a source character string and returns the encoded character string result”

The documentation then goes on to state:

“The ENCODE function performs a one-way encoding operation that you cannot reverse.  It is useful for storing scrambled copies of passwords in a database. It is impossible to determine the original password by examining the database. However, a procedure can prompt a user for a password, encode it, and compare the result with the stored, encoded password to determine if the user supplied the correct password.”

That is the least committal description of a hash function I’ve ever had the pleasure reading. It turns out the application, as well as a third party library the application depends upon, uses this function for generating session tokens, storing passwords, and generating encryption keys. For the sake of reproducibility I wanted to be sure my data was not the result of some strange artifact in their environment. I installed the ABL runtime locally and coded up a simple ABL script to call ENCODE on the numbers 1-20000. I reran the Burp Sequencer and got the exact same result, 0 bits.

At this point I was fairly sure that ENCODE was flawed from a hashing perspective. A good quality secure hash function, regardless of how correlated the inputs are (as the number 1-20000 obviously would be), should produce output that is indistinguishable from truly random values (see Cryptographic Hash Functions and  Random Oracle Model for more information). ENCODE clearly does not meet this definition of a secure hash function. But, 0 bits, that seems almost inconceivably flawed.  So, giving them the benefit of the doubt, I wondered if the result is dependent on the input. In other words, I conjectured that ENCODE might perform some unsophisticated “scrambling” operation on the input, and thus input with low entropy will have low entropy on the output. Conversely, input with high entropy might retain it’s entropy on output. This still wouldn’t excuse the final result, but I was curious none the less. My final test was to use the output of my SHA1 results and feed them each through the ENCODE function. Since the output of the SHA1 function contains high entropy I conjectured that ENCODE, despite its obvious flaws, might retain this entropy. The results are shown below:

Histogram of SHA1 then Encode Character Frequency
Histogram of SHA1 then Encode Character Frequency

SHA1 then Encode Entropy Estimation
SHA1 then Encode Entropy Estimation

ENCODE manages to transform an input with approximately 160 bits of entropy into an output that, statistically speaking, contains 0 bits of entropy. In fact, the frequency distribution of the character output is nearly identical to the first graph in this post.

This brings me back to my opening statement, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. I can’t figure out why this ENCODE function exists? Surely the ABL library has support for a proper hash function like SHA1, right? Yes, in fact it does. The best explanation I could come up with is that it is a legacy API call. If that is the case then the call should be deprecated and/or  documented as suitable only in cases where security is of no importance. The current API does the exact opposite, encouraging developers to use the function for storing passwords. Cryptography is hard, even for those of us that understand the subtlety involved. Anything that blurs the line between safe and unsafe behavior only makes the burden on developers even greater.

It is unclear, based on this analysis, how much effort it would require to find collisions in ABL’s ENCODE function. But, even this simple statistical analysis should be enough for anyone to steer clear of its use for anything security related. If you are an ABL developer I would recommend that you try replacing ENCODE with something else. As a trivial example, you could try: HEX-ENCODE(SHA1-DIGEST(input)). Obviously you need to test and refactor any code that this breaks. But, you can at least be assured that SHA1 is relatively secure from a hashing perspective. That said, you might want to start looking at SHA-256 or SHA-512, given the recent chinks in the armor of SHA1:

Unfortunately, it does not appear that ABL has support for these more contemporary SHA functions in their current release.

Ok….slowly stepping down off my soapbox now.   Bad crypto just happens to be one of my pet peeves.

Footnote:

Just before posting this blog entry I decided to email Progress to see if they were aware of the behavior of the ENCODE function.  After a bit a few back and forth emails I eventually got an email that desribed the ENCODE function as using a CRC-16 to generate it’s output (it is not the direct output, but CRC-16 is the basic primitive used to derive the output).  Unfortunately, CRCs were never meant to have any security gurantees.  CRCs do an excellent job of detecting accidental bit errors in a noisy transmission medium.  However, they provide no gurantees if a malicous user tries to find a collision.  In fact, maliciously generating inputs that produce identical CRC outputs is fairly trivial.  As an example, the linearity of the CRC-32 alogirthm was noted as problematic in an analysis of WEP.   Thus, despite the API doc recommendation, I would highly recommend that you not use ENCODE as a means of securely storing your user’s passwords.