Ambiguous RFC leads to Cross Site Scripting

By Patrick Toomey

Sometime in January I was on an application assessment and noticed that user input was being used to generate a link to another application.  In other words, I would send a request that looked like:

and the application was generating some HTML that looked like:

<a href="">Click Me</a>

This is not atypical and I have probably seen it a fair number of times in the past.  This functionality can be implemented in a number of ways, but the way it was implemented in this application was the following:

return ""+request.getQueryString();

So, depending on what  request.getQueryString() returns, this may be used for XSS.  In other words, submitting the following:"><script>alert(1)</script>

could lead to pretty straightforward  XSS, with the above Java code generating the following HTML:

<a href="">
<script>alert(1)</script>">Click Me</a>

Ok, before you chastise me for demonstrating basic XSS, let’s dig a bit deeper.  It turns out that that the above request fails to inject JavaScript into the generated link in Chrome, Safari, and Firefox.  However, it does work in Internet Explorer.  Chrome, Safari, and Firefox all URL encode the “,>, and < characters while IE encodes none of them.  For example, Safari encodes the request as follows:

GET /test.jsp?var1=val1%22%3E%3Cscript%3Ealert(1)%3C/script%3E HTTP/1.1

while IE sends the following:

GET /test.jsp?var1=val1"><script>alert(1)</script>

As can be seen, IE does not URL encode any of the characters, while Safari (et al) tend to URL encode values that might be misinterpreted in another context.  My memory is not what it once was, but I feel like I had run into this in the past and just passed it off as an IE(ism).  Every browser seems to have their own set of strange edge cases that only work in that particular browser (whether we are talking about security or functionally).  I would have probably just blown this off as another IEism, forgotten about it, and remembered it months down the line when I ran into it again in another application.  Instead, not a week goes by when a coworker emails me with the exact same observation.

In the email he, almost verbatim, described the application he was assessing and noted how IE seemed to be aberrant when it came to this edge case.  At this point I started thinking that maybe I had not run into this in the past, and maybe it was not just my memory failing me.  Maybe IE had changed their encoding rules and this behavior was introduced in a more recent version of IE.  A quick round of searching and I found this.  It turns out Imperva had submitted this exact issue to MSFT about a week before my coworker and I noticed the issue.  At this point I was totally confused.  Surely me, my coworker, and the engineer at Imperva noticing the issue within the same month could not be a coincidence.

After reading through the Imperva blog post I brought up my Windows XP VM to test this on every version of IE since 6 to see when this oversight was introduced.  Well, to cut things short, I found identical behavior in IE 6, 7, 8, 9, and 10 (had to test 10 in the Windows 8 public release).  I did not dig into prior releases of other browsers to see when they implemented the URL encoding of “,>, and < (among other characters), but if my poor memory serves me, I feel like this has been a common practice for quite a while.  Quoting from the Imperva blog post, Microsoft’s response  to the observed behavior was the following:

Thank you for writing to us.  The behavior you are describing is something that we are aware of and are evaluating for changes in future versions of IE, however it’s not something that we consider to be a security vulnerability that will be addressed in a security update.

So, is MSFT just being stubborn and knowingly violating the RFC?  Well, as far as I can tell, no.  I believe there are some other drafts, but the most current finalized RFC dealing with URIs is RFC 3986.  In particular, Section 2 talks about characters, reserved characters, unreserved characters, URL encoding, etc.  One would think that if you are going to use the terms “reserved characters” and “unreserved characters” that this would divide the world of all characters into the “reserved character” set and the “unreserved character” set.  That only makes sense, right?  Well, here is a list of the reserved characters:

":" / "/" / "?" / "#" / "[" / "]" / "@" / "!" / "$" / "&" / "'" 
/ "(" / ")" / "*" / "+" / "," / ";" / "="

and the unreserved characters:

ALPHA / DIGIT / "-" / "." / "_" / "~"

Conspicuously absent are the “, >, and < characters (as well as others).  This is strange for a number of reasons, one of which has to do with the fact that this exact issue is mentioned in RFC 1738 (RFC 3986 updated RFC 1738).  In RFC 1738 there is a section that explicitly mentions “Unsafe” characters.  This section, in part, states:

The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems.

RFC 3986 doesn’t mention unsafe characters anywhere (correct me if I am wrong; it is easy to miss a line in a RFC).  It would appear that IE is actually not violating the RFC.  Instead, they just happened to have implemented their URL encoding scheme in a way that is in line with the  “reserved characters” and “unreserved characters” definitions, but different than everyone else.  My guess is that IE has left this in place for the same reason MSFT often leaves things in place….backward compatibility.  I see no reason why they would not prefer the more secure behavior if they were confident it would not break existing applications.  Moreover, I would imagine that they would have gladly implemented it the same way as everyone else if the RFC had actually unambiguously defined the expected behavior.

So, in the end, the fact that three different security engineers all noted the same odd IE behavior in the same month was actually just coincidence.  I guess I can only dream of a day when some sort of formal verification system will free us from RFC ambiguity.  But, for now, we can probably bank on these ambiguities continuing to introduce strange edge case security issues.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s