Directory Traversal in Archives

April 21, 2009

By: Greg Ose and Patrick Toomey

I’m sure on the top of everyone’s list of resolutions from the New Year is the ever forgotten “I will write more secure code” and it seems that each year this task gets harder. With more complex and abstracted frameworks and APIs, the ways security related bugs are being introduced to a code base has become equally complex and abstracted. Being a few months into 2009, hopefully we can help you catch up on your resolutions by presenting something else to look for when reviewing or writing secure code.

In recent engagements, we have run into a slew of issues focusing around the well-known vulnerability of directory path traversal. As a refresher, this typically involves injecting file path meta-characters into a filename string to reference arbitrary files and usually results in the modification or disclosure of files on the system. For example, a user supplies the filename /../../etc/passwd which is appended to the path /tmp/uploaded_pictures and ends up referencing the password file instead of a file under the intended directory.

We all know, or at least should know, what a typical directory traversal vulnerability and exploit looks like, however, we have recently seen these issues manifest themselves in the handling of user-provided archive files instead of file path strings. Typically, these user provided files are sent via HTTP uploads. Almost all of the common high-level application APIs provide a means, or a third-party library, to handle archive files. Additionally, almost all of these libraries do not check for potential directory path traversal when they perform the extraction of these files. This puts the liability on the developer to check for malicious archives. While file operation calls with a user controlled variable may be obvious, filenames within user-controlled archives may be the vulnerability that slips by. Developers should not only validate user supplied file paths for directory traversal, but also check file paths included in archive files. As a note, this type of vulnerability has been mentioned before and is not groundbreaking by any means, but we want to take a detailed look into what to be aware of as a developer and how to test for this during vulnerability assessments.

To get started lets take a look at an example provided by Sun themselves (!!!) in a technical article for the java.util.zip package. Code Sample 1 from the article provides their base example for extracting an archive and is shown below.

import java.io.*;
import java.util.zip.*;

public class UnZip {
  final int BUFFER = 2048;
  public static void main (String argv[]) {
    try {
      BufferedOutputStream dest = null;
      FileInputStream fis = new FileInputStream(argv[0]);
      ZipInputStream zis = new ZipInputStream(
                               new BufferedInputStream(fis));
      ZipEntry entry;
      while((entry = zis.getNextEntry()) != null) {
        System.out.println("Extracting: " +entry);
        int count;
        byte data[] = new byte[BUFFER];
        // write the files to the disk
        FileOutputStream fos = new FileOutputStream(
                                   entry.getName());
        dest = new BufferedOutputStream(fos, BUFFER);
        while ((count = zis.read(data, 0, BUFFER)) != -1) {
          dest.write(data, 0, count);
        }
        dest.flush();
        dest.close();
      }
      zis.close();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

We can see where the vulnerability manifests itself in processing each entry of the provided ZIP file:

FileOutputStream fos = new FileOutputStream(entry.getName());

entry is the current ZIP entry being processed and getName() returns the filename stored in that entry. After retrieving this filename, the uncompressed data is written to its value. We can see that by using directory traversal in the filename a malicious user may be able to make arbitrary writes anywhere on the filesystem. Unfortunately, on most platforms, if an attacker can arbitrarily write files they can most likely also get arbitrary code executed on the affected server.

Similar issues exist with a number of ZIP library implementations across various languages. As one might expect, the equivalent Python code is far less verbose. While Python doesn’t provide any sample code, a simple, and vulnerable, ZIP extraction would look as follows:

from zipfile import ZipFile
import sys
zf = ZipFile(sys.argv[1])
zf.extractall()

The extractall method does what one would expect it to do, except that it does not check for directory traversal in the ZIP entries’ file paths. Python also provides equivalent objects for handling tar archives. Interestingly, the tar archive library documentation does make mention of the risk associated with path traversal within archive files. The documentation for the extractall method states:

Warning: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of path, e.g. members that have absolute filenames starting with “/” or filenames with two dots “..”.

How about PHP, surely they provide a function to work with ZIP files (what don’t they have a function for). The PHP manual provides the following example code for extracting ZIP files.

<?php
$zip = new ZipArchive;
$res = $zip->open('test.zip');
if ($res === TRUE) {
  echo 'ok';
  $zip->extractTo('test');
  $zip->close();
} else {
  echo 'failed, code:' . $res;
}
?>

Sure enough, this code is also vulnerable to file path manipulation within the archive.

What about everyone’s favorite language du jour, Ruby? Ruby itself does not have ZIP file extraction built in to the language’s core library. However, rubyzip is a popular third-party library and like the prior libraries, is also vulnerable to directory traversal. The example below was stated in a post by the library’s author as how to extract a ZIP file and all of its directories:

require 'rubygems'
require 'zip/zipfilesystem'
require 'fileutils'

OUTDIR="out"

Zip::ZipFile::open("all.zip") {
  |zf|

  zf.each { |e|
    fpath = File.join(OUTDIR, e.name)
    zf.extract(e, fpath)
    FileUtils.mkdir_p(File.dirname(fpath))
  }
}

Finally, similar to Ruby, the .Net environment does not have ZIP archive handling built in to the core library. A quick googling for “.Net zip files” leads to an article on MSDN. In this article, the authors detail this gap in the .Net library and then go on to present a solution. The tools released include a signed DLL for use during development and a set of command-line utility programs that utilize the library. One of these command-line utilities is Unzip.exe. Sure enough, Unzip.exe is vulnerable to path traversal within an archive. No warning is presented and the archive is extracted without concern to the fully resolved path of the files within the archive.

How do mainstream, standalone, compression utility programs handle this vulnerability? We tested a large number of archive extraction programs (Winzip, Winrar, command line Info-Zip, unzip on Unix, etc) and noted that all of them either provide a warning when a ZIP file entry contains directory traversal, escape the meta-characters, or just ignore the traversed directory path all together.

When writing code that interacts with archives, the same precautions used by mainstream extraction utilities must be performed by the developer. As with any user-controlled input, the directory filenames should be validated before being processed by any file operation. The developer should verify that path traversal characters do not occur in any entries within the archive. Similarly, the developer may also leverage utility functions within their language to first determine the fully resolved path before extracting an entry (ex. os.path.normpath(path) in Python).

A more drastic mitigation, though perhaps the better long-term solution, would involve modifying these default libraries to work similarly to their standalone application counterparts by default. It is extremely rare to require path traversal characters in a legitimate archive. Perhaps, the libraries should be modified to secure the common case, requiring a developer to explicitly request the atypical case. For example, what if the Python ZipFile object changed its default behavior to throw an exception in the presence of file traversal characters? The extractall method signature could be modified as follows:

ZipFile.extractall([path[,members[,pwd[,allow_traverse]]]])

By default the allow_traverse is set to False, throwing zipfile.BadZipfile if path traversal characters are encountered. This would provide a secure by default configuration for the library while still allowing the existing behavior if necessary. This requires the developer to explicitly request support for path traversal, thus mitigating accidental and insecure usage. This is unlikely to impact existing code, as archives with path traversal characters are not easy to create and it is extremely unlikely a legitimate archive would accidentally include such characters.

During the course of this write-up we grew tired of hand-editing zip archives in a hex-editor to add directory traversal characters. So, we put together a Python script that can be used to generate ZIP archives with path traversal sequences automatically inserted. It can create directories in both Unix and Windows environments for ZIP files (including jar) and tar files with and without compression (gzip or bzip2). You can specify an arbitrary number of directories to traverse and an additional path to append (think var/www or Windows\System32). The full usage follows:

$ ./evilarc.py --help
Usage: evilarc <input file>

Create archive containing a file with directory traversal

Options:
  --version      show program's version number and exit
  -h, --help     show this help message and exit
  -f OUT, --output-file=OUT
                 File to output archive to.  Archive type is
                 based off of file extension.  Supported
                 extensions are zip, jar, tar, tar.bz2, tar.gz,
                 and tgz.  Defaults to evil.zip.
  -d DEPTH, --depth=DEPTH
                 Number directories to traverse. Defaults to 8.
  -o PLATFORM, --os=PLATFORM
                 OS platform for archive (win|unix). Defaults
                 to win.
  -p PATH, --path=PATH  Path to include in filename after
                 traversal.  Ex:WINDOWS\System32\

The following example shows the file test.txt being added to an archive and extracted to the C:\Windows\System32 directory through the vulnerable Java class we previously discussed:

$ ./evilarc.py test.txt -p Windows\\System32\\
Creating evil.zip containing ..\..\..\..\..\..\..\..\Windows\System32\test.txt

$ java javaunzip evil.zip
Extracting: ..\..\..\..\..\..\..\..\Windows\System32\test.txt

$ ls -al /cygdrive/c/Windows/System32/test.txt
-rwxr-x---+ 1 gose mkgroup-l-d 21 Feb 24 11:52 /cygdrive/c/Windows/System32/test.txt

We have made the script available for download here:

https://github.com/Neohapsis/evilarc


Spring Forward

April 20, 2009

As with many, the economic climate has made it challenging to publish as many interesting and insightful concepts and considerations on our blog and in articles as we’d like.   We’ve been focusing our energy on our services and product development staying steadfast in our commitments to our customers and staff.  It’s Spring though and time again to shake off the cobwebs, put on the rubber boots, and march through the mud and puddles to join the tulips and blog for a renewed beginning.

Our work is about exploring the possibilities as much as it is about identifying vulnerabilities, assessing and managing risks, and strategically advising our customers.  Our history and future, as with much of the industry, is predicated on both dotting the i’s, crossing the t’s and delving deeper into ‘why’s’ and ‘what if’s.’  It’s often about conspiring to understand the likes of:

1) Why a seemingly meaningless design, development or implementation trend may cause meaningful and unexpected repercussions in the future

2)  How best practices can come to terms with a Linux distribution when volumes of modules may be installed and loaded by default

3)  What PCI merchants should do to continually be compliant and mitigate their risks and liability

4)  How global earthquakes in the financial sector and a renewed desire to re-establish integrity and transparency may be represented logically in a series of meta-models, frameworks and content which can be visualized to articulate the complexity of associated risks

While many of our explorations have not been published in past weeks, the discussions have continued.  Along with the day to day and a new website, we’ve been researching and writing and debating and discussing findings, theories and concepts, that enlighten our days with meaning and thoughtfulness.    We have been grappling with an assortment of grandiose ideas and mundane mutterings to develop momentum and content that will provide discussions on a more regular and consistent basis.

Welcome to spring, we look forward to conversing with you and appreciate any feedback and thoughts you have that are relevant to you and your challenges.


Hulu…client-side “encryption”…seriously?

April 14, 2009

By: Patrick Toomey

I remember being pretty excited by the prospect of a service like Hulu.   The idea that major networks were actually coming together to stream mainstream video content was impressive.  It was such a departure from the locked down, share nothing, mentality of old.   I thought to myself, “Wow, does Hollywood finally get it?”. Apparently my optimism was exactly that…optimistic.

Sometime in the last week or so it was reported that Hulu, a video streaming service run by NBC and FOX, started “encrypting” Ajax responses to block unauthorized software clients (Boxee et al.) from sidestepping the hulu.com website to view content.  However, encryption is purposefully in quotes, as what Hulu actually implemented is a client-side obfuscation mechanism.  It it well known that such protection mechanisms are flawed by design and bound to be circumvented quickly.

The protective measure that is implemented rests on the obfuscation of Ajax responses made against hulu.com.  Instead of returning plaintext HTML content, Ajax requests return obfuscated URL encoded strings.  These URL encoded strings are reverted to plaintext on the client-side using JavaScript.  For example, a request to:

http://www.hulu.com/channels/Home-and-Garden?kind=videos&sort=popularity

returns a URL encoded string that begins:

dobfu__%F2%9E%84%88%EE%99%81%9F%BD%89%D0%DC …

The entire string is approximately 141KB long.  Other than the “dobfu__” prefix, the remainder of the string is URL encoded.  This obfuscated string is transformed into plaintext by a JavaScript function called “_dobfu()”.  This function, after a bit of reformatting, is reproduced below:

function _dobfu(text) {
  return text.slice(0,7)!='dobfu__'?text:
    $A(unescape(text.substring(7)).tol()).map(function(i) {
      i=0xfeedface^i;
      return String.fromCharCode(i&0xFF,i>>>8&0xFF,i>>>16&0xFF,i>>>24&0xFF);
    }
  ).join('').replace(/\+$/,'');
}

All of the above code is pretty easy to follow, save for the references to $A() and the the tol() functions.  The $A() function is a Prototype global function that creates a full array object from any other object that can pass for an array (supports indexing, etc).  This is done so that the new object inherits the full functionality of an array (the map method is needed in this case).  The second piece of ambiguous logic , the tol() method, is defined in another JavaScript file and is reproduced below:

String.prototype.tol=function(){
  var s=this;
  return $R(0,Math.ceil(s.length/4)-1).map(
    function(i){
      return s.charCodeAt(i*4)+(s.charCodeAt(i*4+1)<<8)+(s.charCodeAt(i*4+2)<<16)+(s.charCodeAt(i*4+3)<<24);
    }
  );
};

Essentially this method takes a string of bytes and creates an array of 32-bit integers from each 4-byte chunk.  For example, if the string processed in the method was “\x01\x23\x45\x67\x89\xab\xcd\xef” the method would return the array [0x67452301, 0xefcdab89].  The ordering of the individual bytes is a result of the “tol()” method parsing the data as little-endian.

So, with those two functions defined we can quickly describe how Hulu de-obfuscates responses.  The obfuscated string is broken up into 4-byte integers.  Since the length of the obfuscated string is always evenly divisible by four we are guaranteed that a string of length x will turn into an array of 4-byte integers of length x/4.  Then, for each 4-byte integer, the value is XORed with the constant “0xfeedface”.  Once XORed, the individual bytes from the integer are split apart and converted back to their equivalent ASCII value.  Finally, all trailing NULL bytes are removed from the de-obfuscated string.

It is a bit difficult to imagine what Hulu thought they might accomplish with the above scheme.  It effectively does nothing to prevent third-party tools from performing the same obfuscation/de-obfuscation.  Any scheme that attempts to implement client-side “decryption”, particularly in JavaScript, is bound for failure.  The client possesses the obfuscated message, the key to de-obfuscate the message, and the Javascript that executes the algorithm.   Using these components, it is a trivial exercise to transform any obfuscated response back into plaintext.  Hulu likely thwarted unauthorized software for the better part of an afternoon and no more.  Client-side security mechanisms simply don’t work.  Even complex systems implemented in native code, such as popular DRM schemes, that may go unbroken for a period of time, will eventually be circumvented.  However, to implement a similar preventative measure in JavaScript lowers the difficulty of circumvention dramatically.

Beyond the technical discussion there is also a more broad question to be asked.  What was the net gain for Hulu?  They failed to accomplish their implicit goal: to block unauthorized software.  Hulu simply received another  bit of bad press for treating their customers like thieves.  Hulu, and other such services, need to realize that the ubiquitous availability of their content will ultimately grow their fan base.  There is ever increasing competition for a viewer’s eyes and ears.  Podcasts, YouTube, gaming, etc are all competing.  Third-party products, such as Boxee, only serve to increase the ubiquity of their content, which shouldn’t be viewed as a bad thing.  Thwarting their own customers only sours the experience and reinforces the presumption that a good chunk of the entertainment industry just doesn’t get it.  Besides being bad security, this latest debacle is just bad business.


About CVE-2009-1151

April 6, 2009

During an evaluation of tools for internal use, we took a look at phpMyAdmin. During the assessment, we identified that the scripts/setup.php script is used to generate a configuration file to config/config.inc.php. Anytime PHP code is being generated, extremely careful filtering must be done to ensure that the intended output cannot be escaped and will not allow the injection of arbitrary code.

While the most obvious inputs, those set by the configuration fields, were escaped properly, other attacker accessible data was not. The script passes PHP serialized data back and forth through the configuration parameter. When a save action is performed, this data is then written as PHP variables to the configuration file. The data contains associative arrays with key and value pairs. On output, the values are properly escaped using add_slashes, however the keys that are also output are not filtered. By modifying the array keys in the serialized data passed to a save POST request, the key name can be escaped and arbitrary PHP code injected. If config/ is writable by the web server user, the config.inc.php file is written to it and can be executed directly out of the document root.

The issue was disclosed to the phpMyAdmin team and they did an amazing job responding to this disclosure with a patch out in less than 24 hours!

Lessons learned? Anytime you are programmatically generating code (be it HTML, JavaScript, PHP, etc.) ensure that your output is properly filtered and make sure all installation scripts and unneeded administration tools are removed.

References:
Advisory: http://www.phpmyadmin.net/home_page/security/PMASA-2009-3.php
Patch: http://phpmyadmin.svn.sourceforge.net/viewvc/phpmyadmin?view=rev&revision=12301
CVE: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-1151


Follow

Get every new post delivered to your Inbox.