Crypto Pet Peeves: Hashing…Encoding…It’s All The Same, Right?

August 25, 2008

Patrick Toomey

© 2008 Neohapsis

We all know cryptography is hard. Time and time again we in the security community give advice that goes something like, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. Even if you think you have an unbelievably good reason I would still take pause and make sure there is no other alternative. Nearly every aspect of cryptography is painstakingly difficult: developing new crypto primitives is hard, correctly implementing them is nearly just as hard, and even using existing crypto APIs can be fraught with subtlety. As discussed in a prior post, Seed Racing, even fairly simple random number generation is prone to developer error. Whenever I audit source I keep my eyes open for unfamiliar crypto code. So was the case on a recent engagement; I found myself reviewing an application in a language that I was less familiar with: Progress ABL.

Progress ABL is similar to a number of other 4GL languages, simplifying development given the proper problem set. Most notably, Progress ABL allows for rapid development of typical business CRUD applications, as the language has a number of features that make database interactions fairly transparent. For those of you interested to learn more, the language reference manual can be found on Progress’ website.

As I began my review of the application I found myself starting where I usually do: staring at the login page. The application was a fairly standard web app that required authentication via login credentials before accessing the sensitive components of the application. Being relatively unfamiliar with ABL, I was curious how they would handle session management. Sure enough, just as with many other web apps, the application set a secure cookie that uniquely identifies my session upon login. However, I noticed that the session ID was relatively short (sixteen lower/upper case letters and four digits). I decided to pull down a few thousand of the tokens to see if I noticed any anomalies. The first thing I noticed was that the four digit number on the end was obviously not random, as values tended to repeat, cluster together, etc. So, the security of the session ID must lie in the sixteen characters that precede the four digits. However, even the sixteen characters did not look so random. Certain letters appeared to occur more than others. Certain characters seemed to follow other characters more than others. But, this was totally unscientific; strange patterns can be found in any small sample of data. So, I decided to do a bit more scientific investigation into what was going on.

Just to confirm my suspicions I coded up a quick python script to pull down a few thousand tokens and count the frequency of each character in the token. Several minutes later I had a nice graph in excel.

Histogram of Encode Character Frequency
Histogram of Encode Character Frequency

Ouch! That sure doesn’t look very random. So, I opened up Burp Proxy and used their Sequencer to pull down a few thousand more session cookies. The Burp Sequencer has support for running a number of tests, including a set of FIPS-compliant statistical tests for randomness. To obtain a statistically significant result Burp analyzes a sample size of 20,000 tokens. Since I saw that the four digit token at the end of the session ID provided little to no entropy, I discarded them from the analysis. It seemed obvious that the sixteen character sequence was generated using some sort of cryptographic hash, and the four digit number was generated in some other way. I was more interested in the entropy provided by the hash. So, after twenty minutes of downloading tokens, I let Burp crunch the numbers. About 25 seconds later Burp returned an entropy value of 0 bits. Burp returned a graph that looked like the one below, showing the entropy of the data at various significance levels.

Encode Entropy Estimation
Encode Entropy Estimation

Hmmm, maybe Burp is broken. I was pretty sure I had successfully used the Burp Sequencer before. Maybe it was user error, a bug in the current version, who knows. I decided that a control was needed, just to ensure that the tool was working the way I thought it should. So, I wrote a bit more python to simply print the hex-encoded value of a SHA1 hash on the numbers 1-20,000. I loaded this data into Burp and analyzed the data set. Burp estimated the entropy at 153 bits. Just to compare with the prior results, here is the distribution graph and the Burp entropy results for the SHA1 output:

Histogram of SHA1 Character Frequency
Histogram of SHA1 Character Frequency

SHA1 Entropy Estimation
SHA1 Entropy Estimation

I repeated the same test against a set of JSESSIONID tokens and found a similarly acceptable result. Ok, so the Burp Sequencer seems to be working.

So, I next went hunting for the session token generation code in the application. After a little greping I found the function for generating new session tokens. Ultimately the function took a number of values and ran them through a function called “ENCODE”. Hmmm, ENCODE, that didn’t sound familiar. Some more greping through the source did not reveal any function definitions, so I assumed the function must be part of the standard library for ABL. Sure enough, on page 480 of the language reference manual there was a description of the ENCODE function.

“Encodes a source character string and returns the encoded character string result”

The documentation then goes on to state:

“The ENCODE function performs a one-way encoding operation that you cannot reverse.  It is useful for storing scrambled copies of passwords in a database. It is impossible to determine the original password by examining the database. However, a procedure can prompt a user for a password, encode it, and compare the result with the stored, encoded password to determine if the user supplied the correct password.”

That is the least committal description of a hash function I’ve ever had the pleasure reading. It turns out the application, as well as a third party library the application depends upon, uses this function for generating session tokens, storing passwords, and generating encryption keys. For the sake of reproducibility I wanted to be sure my data was not the result of some strange artifact in their environment. I installed the ABL runtime locally and coded up a simple ABL script to call ENCODE on the numbers 1-20000. I reran the Burp Sequencer and got the exact same result, 0 bits.

At this point I was fairly sure that ENCODE was flawed from a hashing perspective. A good quality secure hash function, regardless of how correlated the inputs are (as the number 1-20000 obviously would be), should produce output that is indistinguishable from truly random values (see Cryptographic Hash Functions and  Random Oracle Model for more information). ENCODE clearly does not meet this definition of a secure hash function. But, 0 bits, that seems almost inconceivably flawed.  So, giving them the benefit of the doubt, I wondered if the result is dependent on the input. In other words, I conjectured that ENCODE might perform some unsophisticated “scrambling” operation on the input, and thus input with low entropy will have low entropy on the output. Conversely, input with high entropy might retain it’s entropy on output. This still wouldn’t excuse the final result, but I was curious none the less. My final test was to use the output of my SHA1 results and feed them each through the ENCODE function. Since the output of the SHA1 function contains high entropy I conjectured that ENCODE, despite its obvious flaws, might retain this entropy. The results are shown below:

Histogram of SHA1 then Encode Character Frequency
Histogram of SHA1 then Encode Character Frequency

SHA1 then Encode Entropy Estimation
SHA1 then Encode Entropy Estimation

ENCODE manages to transform an input with approximately 160 bits of entropy into an output that, statistically speaking, contains 0 bits of entropy. In fact, the frequency distribution of the character output is nearly identical to the first graph in this post.

This brings me back to my opening statement, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. I can’t figure out why this ENCODE function exists? Surely the ABL library has support for a proper hash function like SHA1, right? Yes, in fact it does. The best explanation I could come up with is that it is a legacy API call. If that is the case then the call should be deprecated and/or  documented as suitable only in cases where security is of no importance. The current API does the exact opposite, encouraging developers to use the function for storing passwords. Cryptography is hard, even for those of us that understand the subtlety involved. Anything that blurs the line between safe and unsafe behavior only makes the burden on developers even greater.

It is unclear, based on this analysis, how much effort it would require to find collisions in ABL’s ENCODE function. But, even this simple statistical analysis should be enough for anyone to steer clear of its use for anything security related. If you are an ABL developer I would recommend that you try replacing ENCODE with something else. As a trivial example, you could try: HEX-ENCODE(SHA1-DIGEST(input)). Obviously you need to test and refactor any code that this breaks. But, you can at least be assured that SHA1 is relatively secure from a hashing perspective. That said, you might want to start looking at SHA-256 or SHA-512, given the recent chinks in the armor of SHA1:

Unfortunately, it does not appear that ABL has support for these more contemporary SHA functions in their current release.

Ok….slowly stepping down off my soapbox now.   Bad crypto just happens to be one of my pet peeves.

Footnote:

Just before posting this blog entry I decided to email Progress to see if they were aware of the behavior of the ENCODE function.  After a bit a few back and forth emails I eventually got an email that desribed the ENCODE function as using a CRC-16 to generate it’s output (it is not the direct output, but CRC-16 is the basic primitive used to derive the output).  Unfortunately, CRCs were never meant to have any security gurantees.  CRCs do an excellent job of detecting accidental bit errors in a noisy transmission medium.  However, they provide no gurantees if a malicous user tries to find a collision.  In fact, maliciously generating inputs that produce identical CRC outputs is fairly trivial.  As an example, the linearity of the CRC-32 alogirthm was noted as problematic in an analysis of WEP.   Thus, despite the API doc recommendation, I would highly recommend that you not use ENCODE as a means of securely storing your user’s passwords.


16-bit debugger goodness

August 2, 2008

It’s Saturday around Noon.  A friend is building a new server and wants to add 8 gigs of RAM to his MSI mother board but needs to flash the motherboard.  I’m bored and seems easy plus he hasn’t seen Afro Samurai so I figure we’ll upgrade his motherboard watch some anime and it will be a Saturday afternoon well spent.

Well, turns out the instructions say use a floppy…guess what?  He has no floppy drives.  He has ten computers counting the laptop I brought but no floppy drives.  Ah ok,  well this is for a Linux server but we supposedly need windows to install.  He installed windows before I got there but no go.  The software won’t run and without a floppy, and we can’t make a DOS floppy.  Believe it our not I keep a small win98boot.img file on my server for just such a reason but without a floppy I’ve got nothing.

So I go back to my house where I have tons of computers, all with floppies.  Now that I think about it I have no idea why I always add a floppy when I build a machine… but I do.  Plus I have extra drives so I grab one and some blank disks.  Hey I still have blanks from the days of Slackware boot floppies, and Novell server.exes… yeah, you remember ;)

So get this, the mother board doesn’t boot with the floppy attached.  Whatever.  We figure it should be trivial to modify the installer to use another drive letter rather than A:.  We divided our efforts.  He goes to task making a USB DOS bootable thumbdrive and I go to modify the motherboard’s installer.

As you can probably guess from this post that it was a 16-bit installer.  No problem, right?  I do 32-bit in Olly and IDA so trivial eh?  Nope,  only IDA can open such a thing but it can’t run as a debugger.  Which wouldn’t be a big deal but the binary is packed.  Packed!  Yeah for real, 16-bit and packed.

At this stage there is no turning back for me.  Meanwhile he already has his thumbdrive in DOS mode.  But I can’t debug 16-bits even in IDA.  Huh, who knew?  Well probably a lot of you but I didn’t.  So I figure, well I start up debug.exe and do it from there…nope.  No breakpoint support or anything useful.  Hmmmm, well I stumble on GRDB.  It stands for ‘Get Real’ Debugger and I have to say I was impressed.  It can handle 16-bit apps as well as 32-bit functionality.  It actually has a lot of cool functionality like PCI bus support but I didn’t need that.  What it did have was breakpoints, single stepping and step over commands.  Plus in comments it would show if a jump was taken or the contents of ES:[ax+dx].  GRDB, I love you!  :)   And just for icing on the cake they have ANSI colors.  Now how can you not love that?

GRDB comes with source:  All in ASM and it can be compiled with MASM.  All code is licensed under the GPL as well!  Woot!  By now my friend has already updated his motherboard and is intalling Linux but I’m still in fascinated mode.  I missed the RE scene from back in the 16-bit DOS days.  Granted I used debug.exe to change binaries but it was based on offsets about which people had told me.  I was not in the scene that set the precedent for debuggers today.  Now to only modify the binary to support Ruby and a .gdbinit script… ;)

But if you are stuck with a 16-bit app I recommend GRDB, or if you want to write a packer that modern debuggers choke on…go old school.  (PS.  I laughed when I unpacked it and fired up ImpRec just to see NTVDM.EXE was all that was running :P )  I forget how spoiled I am nowadays but this Saturday afternoon I found a surprisingly useful gem.

–Craig


Exploiting Erroneous Errata

August 1, 2008

Recently I was reading through the line-up for the Hack in the Box Conference which will be held in Malaysia this October. The following talk made my ears perk up: “Remote Code Execution Through Intel CPU Bugs” By Kris Kaspersky. Briefly, this talk will cover the exploitation of Intel processor errata. Yes you heard that right, Kris has managed to exploit hardware bugs. He goes on to say that they have developed PoC code which allows for remote exploitation of at least on of these bugs.

When I first came across this I was impressed to say the least. I decided to re-read the Intel Errata to see if I could spot the exploitable conditions. There was some discussion when these were first released, including speculation into the exploitability of several of these, but like most people I didn’t think much of it (OS developers flip out all the time).

After reading through the Intel Core 2 documents I decided to check out the revisions for the Athlon 64 as well. That’s where I ran across this gem:

Errata 95: “RET Instruction May Return To Incorrect EIP”

Speaking of exploitability… Lets see what causes this.

In order to efficiently predict return addresses, the processor implements a 12-deep return address stack to pair RETs with previous CALLs.

Under the following unusual and specific conditions, an overflowed hardware return stack may cause a RET instruction to return to an incorrect EIP in 64-bit systems running 32-bit compatibility mode applications:

• A CALL near is executed in 32-bit compatibility mode.
• Prior to a return from the called procedure, the processor is switched to 64-bit mode.
• In 64-bit mode, subsequent CALLs are nested 12 levels deep or greater.
• The lower 32 bits of the 64-bit return address for the 12th-most deeply nested CALL in 64-bit mode matches exactly the 32-bit return address of the original 32-bit mode CALL.
• A series of RETs is executed from the nested 64-bit procedures.
• The processor returns to 32-bit compatibility mode.
• A RET is executed from the originally called procedure.

So lets assume you have a 32 bit application running in compatibility mode that you would like to exploit and can force a somewhat long function to be repeatedly called.  You could create a 64 bit thread with a very tight function that recursively calls itself 12 times and returns to a address which matches (lower 32 bits) the return of the function you are targeting.This would create a bit of a race, but it would be very winnable given a slightly complex target and a tight exploit loop.

Of course the errata doesn’t detail what the incorrect return address might be but assuming it can somehow be predicted or controled this could be a fun little bug. This specific bug only exists on a small subset of AMD hardware, specifically CPUIDs 0xF51, 0xF58, and 0xF48. If anyone has a processor with the bug and would like to experiment with it I would love to hear from you.


Follow

Get every new post delivered to your Inbox.