Category Archives: web scanner evaluation

Bringin’ on the Heartbreak

As web applications stretch beyond borders they need to adopt strategies to work in multiple languages. Without the right tools or adequate knowledge of Unicode, a programmer will quickly descend into hysteria. The explanations in this post won’t leave you in euphoria, but, like the previous one, it should adrenalize your efforts to understand character sets.

Previously, we touched on the relative simplicity of music vs. language (in terms of a focus on character sets). Once you know the pattern for the Am pentatonic on a guitar, you can move the fingering around the neck to transpose it to any other key. From there, it’s just a matter of finding a drummer who can count to four and you’re on your way to a band.

Unicode has its own patterns. We’ll eventually get around to discussing those. But first, let’s examine how browsers deal with the narrow set of HTML characters and the wide possibilities of text characters.

It takes an English band to make some great American rock & roll. However, there’s not much to show off between character sets for lyrics from England and American.* Instead, we’ll turn to continental Europe. I was going to chose Lordi as an example, but not nearly as many Finns visit this site as Ukranians. Plus, neither the words Lordi nor The Arockalypse require “interesting” characters with which to demonstrate encodings (sorry, ISO-8859-1, you’re just boring).

Okay. Consider the following HTML (and, hey, look at that doctype, this is official HTML5). We care about the two links. One has so-called “non-English” characters in the query string, the other has them in the path:

<!DOCTYPE html>
<meta http-equiv="content-type" content="text/html; charset=utf-8" >
<a href="ВопліВідоплясова">query string</a>
<a href="ВопліВідоплясова/?songs=all">path</a>

The charset is explicitly set to UTF-8. Check out what the web server’s logs record for a click on each link:

GET /music?band=%D0%92%D0%BE%D0%BF%D0%BB%D1%96%D0%92%D1%96%D0%B4%D0%BE%D0%BF%D0%BB%D1%8F%D1%81%D0%BE%D0%B2%D0%B0
GET /bands/%D0%92%D0%BE%D0%BF%D0%BB%D1%96%D0%92%D1%96%D0%B4%D0%BE%D0%BF%D0%BB%D1%8F%D1%81%D0%BE%D0%B2%D0%B0/?songs=all

Next, we’ll convert the page using the handy iconv tool:

iconv -f UTF-8 -t KOI8-U utf8.html > koi8u.html

Another tool you should be familiar with is xxd. (No time to cover it here; it’s easy to figure out.) We use it to examine the byte sequences of the converted query string values:

The page is converted to KOI8-U, but the <meta> tag still says it’s in UTF-8. This leads to bad bytes if a browser requests either link:


If we fix the <meta> tag to set the encoding as KOI8-U, then things improve. However, notice the difference between encodings in the query string vs. those in the path:

GET /music?band=%F7%CF%D0%CC%A6%F7%A6%C4%CF%D0%CC%D1%D3%CF%D7%C1
GET /bands/%D0%92%D0%BE%D0%BF%D0%BB%D1%96%D0%92%D1%96%D0%B4%D0%BE%D0%BF%D0%BB%D1%8F%D1%81%D0%BE%D0%B2%D0%B0/?songs=all

The path becomes UTF-8, but the query string remains in its native character set. This isn’t a quirk the encoding scheme. It’s a behavior of browsers. To emphasize the point, here’s another example web page:

<!DOCTYPE html>
<meta http-equiv="content-type" content="text/html; charset=utf-8" >
<a href="成龍">query string</a>
<a href="成龍/?movies=all">path</a>

When all is UTF-8, the web logs record the bytes we expect:

GET /actors?name=%E6%88%90%E9%BE%8D
GET /actors/%E6%88%90%E9%BE%8D/?movies=all

Now, convert the encoding to GBK:

iconv -f UTF-8 -t GBK utf8.html > gbk.html

And the unchanged <meta> tag produces bad bytes in the logs:

GET /actors?name=%EF%BF%BD%EF%BF%BD%EF%BF%BD
GET /actors/%EF%BF%BD%EF%BF%BD%EF%BF%BD/?movies=all

So, we fix the charset to GBK and all is well:

GET /actors?name=%B3%C9%FD%88
GET /actors/%E6%88%90%E9%BE%8D/?movies=all

So, if you were planning to use curl (an excellent tool and about the friendliest mailing list ever) to spider a web site and regexes (pcre, another excellent piece of software) to scrape its content for links, then you’ll have to be careful about character sets once you depart the land of UTF-8. (And you’ll have completely different worries should you ever venture into the just-about-uncharted territory of U+F8D0 – U+F8FF Unicode charts.)

Rather abrupt ending here. Need to wrap up because I’m packing my bags for the Misty Mountains. Bye.


* I think the fabled English reserve also creates better innuendo. Led Zeppelin has “The Lemon Song”, although Def Leppard weren’t exactly subtle with their ultimate “Pour Some Sugar on Me”. Poison simply said, “I Want Action”. Down under is a different story, AC/DC were pretty straight-forward with “You Shook Me All Night Long”. Zep aside (for obvious reasons), the ’80s were apparently big on hair, not ideas.


Music has a universal appeal uninhibited by language. A metal head in Istanbul, Tokyo, or Oslo instinctively knows the deep power chords of Black Sabbath — it takes maybe two beats to recognize a classic like “N.I.B.” or “Paranoid.” The same guitars that screamed the tapping mastery of Van Halen or led to the spandex hair excess of ’80s metal also served The Beatles, Pink Floyd, and Eric Clapton. And before them was Chuck Berry, laying the ground work with the power chords of “Roll Over Beethoven”.

And all this with six strings and five notes: E – A – D – G – B – E. Awesome.

And then there’s the writing on the web. Thousands of symbols, 8 bits, 16 bits, 32 bits. With ASCII, or US-ASCII as RFC 2616 puts it. Or rather ISO-8859-1. But UTF-8 is easier because it’s like an extended ASCII. On the other hand if you’re dealing with GB2312 then UTF-8 isn’t necessarily for you. Of course, in that case you should really be using GBK instead of GB2312. Or was it supposed to be GB18030? I can’t remember.

What a wonderful world of character encodings can be found on the web. And confusion. Our metal head friends like their own genre of müzik / 音楽 / musikk. One word, three languages, and, in this example, one encoding: UTF-8. Programmers need to know programming languages, but they don’t need to know different spoken languages in order to work them into their web sites correctly and securely. (And based on email lists and flame wars I’ve seen, rudimentary knowledge in one spoken language isn’t a prerequisite for some coders.)

You don’t need to speak the language in order to work with its characters, words, and sentences. You just need Unicode. As some random dude (not really) put it, “The W3C was founded to develop common protocols to lead the evolution of the World Wide Web. The path W3C follows to making text on the Web truly global is Unicode. Unicode is fundamental to the work of the W3C; it is a component of W3C Specifications, from the early days of HTML, to the growing XML Family of specifications and beyond.”

Unicode has its learning curve. With Normalization Forms. Characters. Code Units. Glyphs. Collation. And so on. The gist of Unicode is that it’s a universal coding scheme to represent all that’s to come of the characters used for written language; hopefully never to be eclipsed.

The security problems of Unicode stem from the conversion from one character set to another. When home-town fans of 少年ナイフ want to praise their heroes in a site’s comment section, they’ll do so in Japanese. Yet behind the scenes, the browser, web site, or operating systems involved might be handling the characters in UTF-8, Shift-JIS, or EUC.

The conversion of character sets introduces the chance for mistakes and breaking assumptions. The number of bytes might change, leading to a buffer overflow or underflow. The string may no longer be the C-friendly NULL-terminated array. Unsupported characters cause errors, possibly causing an XSS filter to skip over a script tag. A lot of these concerns have been documented (and here). Some even demonstrated as exploitable vulns in the real world (as opposed to conceptual problems that run rampant through security conferences, but never see a decent hack).

Unicode got more popular scrutiny when it was proposed for Internationalized Domain Names (IDN). Researchers warned of “homoglyph” attacks, situations where phishers or malware authors would craft URLs that used alternate characters to spoof popular sites. The first attacks didn’t need IDNs, using trivial tricks like (replacing the letter L with a one, 1). However, IDNs provided more sophistication by allowing domains with harder-to-detect changes like deạ

What hasn’t been well documented (or hasn’t where I could find it) is the range of support for character set encodings in security tools. The primary language of web security seems to be English (at least based on the popular conferences and books). But useful tools come from all over. Wivet originated from Türkiye (here’s some more UTF-8: Web Güvenlik Topluluğu), but it goes easy on scanners in terms of character set support. Sqlmap and w3af support Unicode. So, maybe this is a non-issue for modern tools.

In any case, it never hurts to have more “how to hack” tools in non-English languages or test suites to verify that the latest XSS finder, SQL injector, or web tool can deal with sites that aren’t friendly enough to serve content as UTF-8. Or you could help out with documentation projects like the OWASP Development Guide. Don’t be afraid to care. It would be disastrous if an anti-virus, malware detector, WAF, or scanner was tripped up by encoding issues.

Sometimes translation is really easy. The phrase for “heavy metal” in French is “heavy metal” — although you’d be correct to use “Métal Hurlant” if you were talking about the movie. Character conversion can be easy, too. As long as you stick with a single representation. Once you start to dabble in the Unicode conversions from UTF-8, UTF-16, UTF-32, and beyond you’ll be well-served by keeping up to date on encoding concerns and having tools that spare you the brain damage of implementing everything from scratch.

p.s. Sorry, Canada, looks like I’ve hit my word count and neglected to mention Rush. Maybe next year.

p.p.s. And eventually I’ll work in a reference to all 10 tracks of DSotM in a single post.

The Death of Web Scanners

I come here not to bury web application scanners, but to praise them.1 And then bury them a bit. Perhaps just up to the neck. On the beach. At low tide.

Web application has historically been challenging, even in the early days of so-called simple web sites with low complexity and little JavaScript. Such simple sites would have static HTML, gobs of links, and icons proclaiming “Best viewed in so-and-so screen size” or “Optimized for so-and-so browser”. Even though links were relatively easy to extract, scanners would run into deeply nested links, infinitely recursive links, or redirect loops.

Setting aside web security scanning, the necessary feat of QA testing web sites has been variably difficult, non-existent, or so manually intensive that it lags the pace of development. The challenges of automation aren’t specific to security, but security testing does impose some unique requirements and diverges in distinct ways from QA testing.

Reading through a complete list of automation challenges would exceed the patience of many more than the 140-character crowd. Consider these happy few:2

  • Efficiently crawl sites with thousands or tens of thousands (or more!) links
  • Populate form fields correctly in order to obtain new links and exercise workflows
  • Interact with the site as a browser does, in other words the ability to deal with JavaScript and DOM manipulation
  • Identify multi-step workflows and non-idempotent requests

I hesitate to arrange these in significance or difficulty, or to explain which have reasonable solutions. Instead, I want to focus on the implications that web application design and technology trends have for automation.

Web sites are progressing towards more dynamic, interactive UIs near indistinguishable from the neolithic, disconnected age of desktop apps and strongly divorced from the click-reload pages of the 90′s. That web sites have nifty UIs isn’t news if the first ones that come to mind fall into the Alexa Top 20 or so. Those are the easy examples of early pioneers of this trend. Once you look lower on the Alexa list or drop off it entirely to consider large organizations’ internal networks you’ll find sites still designed for IE6 or that have never heard of the XHR object.

The trend towards dynamic UIs won’t affect legacy sites, which is good news for scanners that choose to remain in the past. Sites rely heavily on JavaScript (and CSS and the DOM) to attain a modern look. Such browser-heavy apps will only increase as HTML5 creeps closer to being officially standardized.3 Consequently, scanners must be able to handle complex JavaScript libraries rather than employ regex-based fakery.

Using pattern matching to statically parse the content from these dynamic sites may work to a degree, but such a technique misses certain classes of vulnerabilities. DOM-based XSS is one example. While in theory it’s possible to create regex patterns to search for particular manifestations of this problem, regexes are too limited to be of real use or produce so much noise that they threaten to increase the workload of manually testing a site. HTML5’s Cross-Origin Request Sharing (CORS) is another example where analyzing JavaScript and the XHR object require a browser. Count the Web Storage API4 as yet another example.

Regardless of your perspective on the pace of web security’s evolution, web application technologies have been changing quickly. It’s unlikely that the SQL database will disappear from the traditional web site stack, but the NoSQL5 movement will require a new set of vuln tests largely unrelated to traditional SQL injection. There are no publicly known examples of “NoSQL injection” attacks, nor even clear ideas on what an attack of that kind would look like. Yet that’s no reason to avoid applying security theory to the practice of testing NoSQL-backed web sites.

Single sign-on (SSO) solutions should eventually become more widely adopted. They alleviate the burden of managing and security passwords, which is evidently difficult to do right. (Compromised credentials from database attacks number in the millions.) The distrust6 of early solutions like Microsoft Passport7 (and its MS Wallet) and the Liberty Alliance8 has been forgotten in light of Facebook Connect, Google Account, OpenID, Twitter, and Yahoo! ID. (There’s possibly an SSO service for each letter of the alphabet even though they mostly use OAuth.) Privacy issues haven’t been forgotten, they’ve just been swept aside in the face of millions of users with accounts on one or more of these sites.

By this point, you might have forgotten that we were originally discussing automated web scanning. The implications of Single Sign-On is that scanners must be able to support them. Once again this boils down to robust browser emulation — or a lot of customization to different sites’ use of the SSO APIs.

Browser plugins have always been a hindrance to scanners. Not only do ActiveX-based plugins and Flash have their own inherent security risks, but the content generated from them is either rarely parseable or poorly parsed by tools other than the plugin itself. Many web developers and users would rejoice if HTML5 heralded the demise of plugins. Unfortunately, efforts like Google’s Native Client9 promise to bring back the era of write once, run oncewhere (in Chrome) rather than Java’s write once, run anywhere. To hijack the title of an excellent book (and graphic novel), it would be nice to relegate plugins to write nonce, run neverwhere. Until sites stick universally to HTML and JavaScript, scanners will need to handle plugin-based content that drives a site’s navigation.

Forward-looking site developers aren’t satisfied with HTTP. Now that they’ve been getting a taste of HTML5′s features, they’re turning their sights to the deficiencies in HTTP and HTTPS. This means scanners should start thinking about things like SPDY10, designed for network performance, and HSTS11, designed for improved transport security. Few sites have adopted these, but considering those few sites include behemoths like Google and Paypal expect others to follow.

The acronym assault hasn’t yet finished. REST is the new SOAP (at least I think so, I’m not sure if SOAP ever caught on). I’ve noted elsewhere the security benefits of a well-defined separation between the server-side API and client-side HTML. As a reminder, a server-side API call that performs a single action (e.g. get list of foo) can be easier to examine and secure as opposed to a function that gets a list, rewrites the page’s HTML, and has to update other unrelated content.

In one way, the move towards well-defined APIs makes a scanner’s job easier. If it’s possible to fully enumerate a site’s functions and their associated parameters, then the scanner doesn’t necessarily have to crawl thousands of different links trying to figure out where the important, unique points of functionality are — it can pull this information from a documented API.

Alas, a raw list of API calls emphasizes a problem scanners already have: state context. You and I can review a list of functions, then come up with a series of security tests. For example, calling events.create with an XSS payload followed by a call to events.get to see if the payload was filtered, or calling admin.banUsers from a non-admin account to see if authorization controls are correctly enforced. A dull scanner, on the other hand, might make calls in a poorly chosen order. In a somewhat contrived example, the scanner might call events.get followed by auth.expireSession (which logs out the user). This causes any subsequent API call to fail (at least it should) if the call requires an authenticated user.

Before we finish, permit a brief aside to address the inevitable concern trolls. There’s a Don Quixote contingent fighting straw man arguments that automation is useless, unusable, disusable, will never replace manual testing, and so on. This article doesn’t aim to engage these distractions.12 I can control quote-mining no more than I can raise the sun. This paragraph serves as a warning about taking statements out of context or twisting its intent. To be clear: I think a degree of automation is important, accurate, and scaleable. And possible. The goal is to accompany technology trends, not trail them.

HTML and HTTP may not have changed very much in the past decade, but the way web sites cobble them together surely has. As web apps grow into more complex UIs, scanners must more accurately emulate (or outright embed) a browser to make sure they’re not missing swaths of functionality. As APIs become more common, scanners must dive into stateful modeling of a site. And as new web specifications and protocols become widely adopted, scanners must avoid the laziness of dealing solely with HTTP and HTML4.

It’s twilight for the era of simple scripting and unsophisticated scanners. The coming tide of HTML5, new plugins, protocols, and complexity make for a long night. With luck, some scanners will survive until dawn.


1 An uniambic inversion of Mark Anthony’s speech.

2 An allusion to King Branagh’s speech to his bowmen, incidentally preventing St. Crispian’s descent into obscurity.

3 Regardless of its draft status, all modern browsers support at least a few features of HTML5.



6 s/Microsoft/Google/g and fast-forward a decade. Plus ca change, eh?






12 Another of Shakespeare’s Romans, Menenius, provides apt words, “…more of your conversation would infect my brain…”

Click depth is a useless scanner option

When web site owners want to measure how their visitors get from point A (say, the home page) to point B (such as finalizing a purchase), they might use a metric called click depth or link depth. This represents the number of clicks required to get from link A to link B. Sites strive to minimize this value so users may more efficiently perform actions without being distracted or frustrated — and consequently depart for other venues. The depth of a link also implies that popular or important pages should have lower values (i.e. “closer” to the home page, easier to find) than less important pages. This train of thought might make sense superficially, but this reasoning derails quickly for web scanners.

There’s merit to designing a web application, or any human interface, to have a high degree of usability. Minimizing the steps necessary to complete an action helps achieve this. Plus, your users will appreciate good design. Web application scanners are not your users, they don’t visit your web site and follow workflows that humans do.

Click depth for web scanning is useless. Pointless. It’s a long string of synonyms for pointless when used as a configuration option, doubly so when scanning web sites that use a JavaScript-driven UI or implement simple Search Engine Optimization (SEO) techniques.

There’s a long list of excuses why someone might want to rely on click depth as an option for web scanning: Links on the home page are more likely to be attacked, vulnerabilities with low click depth are easier to find, opportunistic attackers are the biggest threat, scans run faster. Basically, these arguments directly correlate link popularity with risk. The simple rejoinder is that all links have a depth of 1 in the face of automation. An attacker who invests effort into scripts that search for vulnerable links doesn’t care how deep a link is, just that the scripts finds one.

Whether the correlation of link popularity and risk rings true or not, having the scanner calculate the click depth is fundamentally incorrect. Visitors’ behavior influences a link’s popularity, not the calculation of a scanner. A superior approach would be to use analytics data to identify popular links, then feed that list to the scanner.

Another reason for click depth’s inutility is the positive trend in web application design to create richer, more interactive interfaces in the browser that use lighter-weight data requests back to the web site. This is reflected in the explosion of Ext JS, Prototype, YUI, and other JavaScript libraries designed to provide powerful UI features along with concise request/response handling using JSON and asynchronous requests. This also has the effect of flattening web applications in terms of the number of clicks required to accomplish tasks. Even more significantly it has the effect of separating links into two conceptual buckets: one for links that show up in the browser bar and another for “behind the scenes” links used for API requests. Both link buckets are important to security testing, but the idea of click depth among them has little meaning.

SEO techniques can also flatten a page’s apparent link depth. A technique common to e-commerce sites is to create a long list of links on the home page that reach deep into the site’s product catalog. It’s not uncommon to see several dozen links at the bottom of a home page that point to a different product pages ad nauseum. (The purpose of which is to make sure search engines find all of the site’s products so users looking for a particular shade of Unicorn-skin rugs will find that site over all others.) This sets an artificially low depth for many, many pages. A human is unlikely to care about the slew of links, but a scanner won’t know the difference.

We’ve reached three reasons so far: Automated scanning gives every link an effective click depth of 1, browser-heavy sites have flat APIs, and SEO techniques further reduce apparent link depth. In spite of this, click depth appeared at some point in scanner designs, an OWASP project makes it a point of evaluation (among several poor criteria), and users often ask for it.

One understandable motivation behind click depth is trying to obtain some degree of depth or breadth in the coverage of a web site’s functionality. Notice that coverage of a site’s functionality differs from coverage of the site’s links. Sites might contain vast numbers of links that all exercise the same, small number of code paths. It’s these code paths in the web application where vulnerabilities appear. This sense of click depth actually intends to convey coverage. It’s highly desirable to have a scanner that avoids making dozens of redundant requests, following recursive links, or getting stuck in redirect loops. A good scanner handles such situations automatically rather than burdening the user with a slew of configuration options that may not even have a bearing on the problem.

Login forms

Designing a web application scanner is easy. A good design requires a few sentences; a great design might need two paragraphs or so. It’s easy to find messages on e-mail lists that describe the One True Way to scan a web site.

Implementing a scanner is hard. The core of a web vulnerability scanner performs two functions: find a link, test that link. The task of finding links falls to a crawling engine. The crawler must be fundamentally strong, otherwise links will be missed and a missed link is an untested link. Untested links lead to holes in the site coverage which raise uncertainty in the state of the site’s security. It’s rarely necessary to hit every link of a web site in order to adequately scan it. Security testing requires comprehensive coverage of the site’s functionality, which is different from covering every single link. A SQL injection vulnerability in the thread ID of a forum can be found by crawling a few sample discussion threads. It’s not necessary to fully enumerate 100,000 threads about nerfing warlocks or debating Mal vs. Kirk.

In addition to crawling strategies, scanners must also be able to crawl a site as an authenticated user. Maintaining an authenticated state requires coordinating several pieces of information (tracking the session cookie, avoiding logout links). But first the scanner must find and submit the login form.

Simple login forms have a text field, password field, and submit button. The HTML standard provides the markup to create these forms. The standard only defines syntax, not usage. This gives web developers leeway to abuse HTML through ignorance, inefficiency, and what can only be termed outright malice.

Consider the login form created by Sun’s OpenSSO Enterprise 8.0. The HTML roughly breaks down to the following:

<script language="JavaScript">
var defaultBtn = 'Submit';
var elmCount = 0;
/** submit form with default command button */
function defaultSubmit() {
<script language="javascript">
<form name="frm1" action="blank" onSubmit="defaultSubmit(); return false;" method="post">
User Name: <input type="text" name="IDToken1" id="IDToken1" value="" class="TxtFld">
<form name="frm2" action="blank" onSubmit="defaultSubmit(); return false;" method="post">
Password: <input type="password" name="IDToken2" id="IDToken2" value="" class="TxtFld">
<form name="Login" action="/login/UI/Login?AuthCookie=..."  method="post">
<script language="javascript">
if (elmCount != null) {
  for (var i = 0; i < elmCount; i++) {
    document.write("<input name=\"IDToken" + i + "\" type=\"hidden\">");
  document.write("<input name=\"IDButton" + "\" type=\"hidden\">");
<input type="hidden" name="goto" value=" aHR0cHM6Ly93d3cuZGVhZGxpZXN0d2ViYXR0YWNrcy5jb20vc2VjcmV0L2xpbmsvaW4vYmFzZS82 NC8=">
<input type="hidden" name="SunQueryParamsString" value="">
<input type="hidden" name="encoded" value="true">
<input type="hidden" name="gx_charset" value="UTF-8">

So we have a single page with three forms, two of which have no purpose other than to display a form field, and a final one with JavaScript whose sole purpose is to copy values from the other two forms into its own hidden fields. There’s also a single <script> tag dedicated to nothing more than incrementing an element counter. Not only might this offend the sensibilities of JavaScript developers who appreciate more programmatic approaches as found in JQuery or Prototype.JS, it also causes headaches for web security scanners. The two forms’ actions are “blank” and the onSubmit events always return false. That should at least inform the scanner that they shouldn’t be submitted directly – but that’s an assumption that might prove false since the site may go into an error state if it receives an incomplete set for form fields.
Even uglier login form patterns exist in the wild. In some cases the login form is wrapped within its own HTML element:

...other content
...other content

Some forms use unnamed input fields and programmatically enumerate them via JavaScript upon submission:

Username: <input type="text" value="">
Password: <input type="password" value="">

Then there’s the doPostBack function in .NET sites along with their penchant for multiple submit buttons (e.g. one for authentication, another for a search). Now the scanner has to identify the salient fields for authentication and hit the correct submit button; it’s no good to fill out the username and password only to submit the search button.

Sure, a user could manually coax the scanner through any of these login processes, but that places an unnecessary burden on the user’s time. This is less of a problem when dealing with a single web site, but becomes overwhelming when trying to scan a dozen or even hundreds of web applications.

These types of logins also highlight the difficulty scanners have with understanding the logic of a web page, let alone the logic of a group of pages or some workflow within the site.
It’s still possible to automate the login process for these forms, doing so requires customization at the expense of having a generic authentication mechanism. In the end, dealing with login pages often provides insight into the madness of HTML editing (it’s hard to call some of these methods programming) and the bizarre steps developers will take just to “make it work.”

Scanners should automate the crawl and test phases as much as possible. After all, it’s dangerous to tie too much of a scan’s effectiveness to the user’s knowledge of web security. It may not be every day that a web developer answers your question about the robots.txt file with, “I don’t know what that is,” but it’s a good idea to have a scanner that will be comprehensive and accurate regardless of whether the user knows the UTF-7 encoding for an angle bracket or wonders why web sites don’t just strip the alert function to prevent XSS attacks.

Ceci n’est pas une web site

Web scanner evaluations collect metrics by comparing scan results against a (typically far too small) field of test sites. One quick way to build the test field might be to collect intentionally vulnerable sites from the Web. That approach, though fast, does a disservice to the scanners and more importantly the real web applications that need to be scanned. After all, does your web application really look like WebGoat or the latest HacmeFoo? Does it even resemble an Open Source application like WordPress or phpBB?

The choice of targets highly influences results — not necessarily in terms of a scanner’s perceived accuracy or capabilities, but whether those properties will translate to a completely different1 site. This doesn’t imply that a scanner that fails miserably against a test site will miraculously work against your own. There should always be a baseline that establishes confidence, however meager, in a scanner’s capabilities. However, success against one of those sites doesn’t ensure equal performance against a site with completely different workflows and design patterns.

Peruse Alexa’s top 500 web sites for a moment. They differ not only in category — adult and arts, science and shopping — but in technology and design patterns. Category influences the types of workflows that might be present. In terms of interaction, a shopping site looks and works differently from a news site. A search portal works differently than an auction site. I, of course, don’t know how the adult sites work, other than they make lots of money and are often laced with malware (even tame ones). Dealing with workflows in general creates problems for web scanners.

Remember that scanners have no real understanding of a site’s purpose nor visibility into its source code. Scan reports provide identification of, not insight into, a vulnerability. For example, a cross-site scripting bug might be due to a developer who neglected the site’s coding guidelines and used an insecure print() function rather than a centralized, well-tested print_safe() version. Perhaps the site has no centralized library for displaying user-supplied data in a secure manner. The first case was a mistake in a single page, the latter points to a fundamental problem that won’t disappear after fixing one page. Identifying underlying security problems remains the purview of manual testing and analysis — along with great hourly rates.

Design patterns also influence navigation and workflows, whether pages are built statically, incrementally with calls to document.write(), or dynamically with event-based calls to xmlHttpRequest objects that populate the DOM. These patterns influence, for example, how forms are put together, a common one being .NET forms with VIEWSTATE fields and doPostBack() functions. Some sites are driven by a single index page that handles dozens of actions based on query parameters. Other patterns put more emphasis on JavaScript execution in the browser, whether in the style of frameworks like JQuery, Prototype or YUI, or more tightly integrated client/server frameworks like DWR. Other technologies might affect the site’s response to invalid input or non-existent pages.

A web application scanner should be an agnostic to the mishmash of acronyms and technologies that underpin a site. As long as the server communicates via HTTP and throws up (in varying senses) HTML, then the scanner should be able to look for vulnerabilities. Neither HTTP nor HTML change in any appreciable way whether a site uses PHP-On-Dot-Rails or other language of the jour. If pages render in a web browser, then it’s a web site. Such is the ideal world of scanners and unicorns.

Cracks in this world emerge when a scanner has to deal with pages written by programmers whose experience ranges from burning toast to creating JavaScript singleton prototypes. Commercial scanners have been available for a decade, in turn predated by Open Source tools like whisker. Since the universal scanner doesn’t exist, it’s necessary to create an evaluation or look at metrics that more closely match the web applications to be secured.

How well does the scanner scale? If it can deal with a site that has 1,000 links, what happens when it hits 10,000? It might melt LAN ports scanning a site one network hop away that serves pages with an average size of 1KB only to blow up against a site with a network latency of a few hundred milliseconds and page sizes on the average of a few hundred kilobytes.

Scaleability also speaks to the time and resources necessary to scan large numbers of sites. This point won’t concern users who are only dealing with a single property, but some organizations have to deal with dozens, hundreds, or possibly a few thousand web properties. Manual approaches, although important for in-depth assessments, do not scale. For its part automation needs to enhance scaleability, not create new hindrances. This applies to managing scan results over time as well managing the dozens (or more!) scans necessary. If a scanner requires a high-end desktop system to scan a single site in a few hours, then simple math tells you how long it will take to test N sites that each require M hours on average to complete. You could parallelize the scans, but buying and maintaining more systems induces additional costs for hardware, software, and maintenance.

A test site (preferably plural) must provide areas where scanners can be evaluated for accuracy. A good field of test sites includes applications that exercise different aspects of web security scanning, especially those most relevant to the types of sites to be secured. Web sites require different levels of emphasis on

  • Reliance on client-side execution (e.g. lots of JavaScript)
  • Customized error handling
  • Large numbers of links, including highly redundant areas like product pages
  • Large page sizes
  • Varying degrees of server responsiveness (there’s a big difference between scanning a site supported by load-balanced web farm and bludgeoning a Mac Mini with hundreds of requests per second)

A good web security scanner adapts to the peculiarities of its targets, from mistyped markup that browsers silently fix to complex pages rife with bizarre coding styles. HTML may be an established web standard, but few sites follow standard ways of creating and using it. Not only does this challenge web scanner developers, it complicates the process of in-depth scanner comparisons. No example site with a few dozen links will ever be representative or fully expose the pros and cons of a scanner.

1 At this point we’ve passed the Twitterati’s mystical 140-character limit four times over. So, for those who choose TLDR as a way of life, stop reading and go enjoy something completely different

Web Scanner Evaluation: Accuracy

This is the first in a series of essays describing suggested metrics for evaluating web application security scanners.

Accuracy measures the scanner’s ability to detect vulnerabilities. The basic function of a web scanner is to use automation to identify the same, or most of the same, vulnerabilities as a web security auditor. Rather than focus on the intricate details of specific vulnerabilities, this essay describes two major areas of evaluation for a scanner: its precision and faults.

Precise scanners produce results that not only indicate the potential for compromise of the web site, but provide actionable information that helps developers understand and address security issues.
Comprehensive tests should be measured on the different ways a vulnerability might manifest as opposed to establishing a raw count of payloads. The scope of a test is affected by several factors:
  • Alternate payloads (e.g. tests for XSS within an href attribute or the value attribute of an input element, using intrinsic events, within a JavaScript variable, or that create new elements)
  • Encoding and obfuscation (e.g. employing various character sets or encoding techniques to bypass possible filters)
  • Applicability to different technologies in a web platform (e.g. PHP superglobals, .NET VIEWSTATE)
Robust detection mechanisms will correctly confirm the presence of a vulnerability, which typically fall into one of three categories:
  • Signatures such as HTTP response code, string patterns, and reflected content (e.g. matching ODBC errors or looking for alert(1) strings)
  • Inference based on interpreting the results of a group of requests (e.g. “blind” SQL injection that affects the content of a response based on specific SQL query constructions)
  • Time-based tests measure the responsiveness of the web site to various payloads. Not only can they extend inference-based tests, but they can also indicate potential areas of Denial of Service if a payload can cause the site to spend more resources processing a request than an attacker requires making the request (e.g. a query that performs a complete table scan of a database).
Injection vector refers to the areas that a scanner applies security tests. The most obvious injection points are query string parameters and visible form fields. The web browser should at least be considered an untrusted source if not an adversarial one. Consequently, a web scanner should be able to run security checks via all aspects of HTTP communication including:
  • URI parameters
  • POST data
  • Client-site headers, especially the ones more commonly interpreted by web sites including User-Agent, the-forever-misspelled Referer, and Cookie.
Errors in a scanner’s results take away from the time-saving gains of automation by requiring users to dig into non-existent vulnerabilities or spending too much time repeating the scanner’s tests in order to satisfy that certain vulnerabilities do not exist.
False positives indicate insufficient analysis of a potential vulnerability. The cause of a false positive can be hard to discern without intimate knowledge of the scanner’s internals, but often falls into one of these categories:
  • Misdiagnosis of generic error page or landing page
  • Poor test implementation that misinterprets correlated events to infer cause from effect (e.g. changing a profile page’s parameter value from Mike to Mary to view another user’s public information is not a case of account impersonation – the web site intentionally displays the content)
  • Sole reliance on inadequate test signature to claim the vulnerability exists (e.g. a poor regex or stating an HTTP 200 response code for ../../../../etc/passwd indicates the password file is accessible)
  • Web application goes into error state due to load (e.g. database error occurs because the server has become overloaded by scan traffic, not because a double quote character was injected into a parameter)
  • Lack of security impact (e.g. an unauthenticated, anonymous search form is vulnerable to CSRF – search engines like Google, Yahoo!, and Bing are technically vulnerable but the security relevance is questionable)
The effort expended to invalidate an erroneous vulnerability wastes time better spent investigating and verifying actual vulnerabilities. False positives also reduce trust in the scanner.
False negatives expose a more worrisome aspect of the scanner because the web site owner may gain a false sense of security by assuming, incorrectly, that a report with no vulnerabilities implies the site is fully secure. Several situations lead to a missed vulnerability:
  • Lack of test. The scanner simply does not try to identify the particular type of vulnerability.
  • Poor test implementation that too strictly defines the vulnerability (e.g. XSS tests that always contain <script> or javascript: under the mistaken assumption that those are required to exploit an XSS vuln)
  • Inadequate signatures (e.g. the scanner does not recognize SQL errors generated by Oracle)
  • Insufficient replay of requests (e.g. a form submission requires a valid e-mail address in one field in order to exploit an XSS vulnerability in another field)
  • Inability to automate (e.g. the vulnerability is related to a process that requires understanding of a sequence of steps, knowledge of the site’s business logic). The topic of vulnerabilities for which scanners cannot test (or have great difficulty testing) will be addressed separately.
  • Lack of authentication state (e.g. the scanner is able to authenticate at the beginning of the scan, but unknowingly loses its state, perhaps by hitting a logout link, and does not attempt to restore authentication)
  • Link not discovered by the scanner. This falls under the broader scope of site coverage, which will be addressed separately.
The optimistic aspect of false negatives is that a scanner’s test repository can always grow. In this case a good metric is determining the ease with which false negatives are addressed.
Accuracy is an important aspect of a web scanner. Inadequate tests might make a scanner more cumbersome to use than a simple collection of tests scripted in Perl or Python. Too many false positives reduces the user’s confidence in the scanner and wastes valuable time on items that should never have been identified. False negatives may or may not be a problem depending on how the web site’s owners rely on the scanner and whether the missed vulnerabilities are due to lack of tests or poor methodology within the scanner.
One aspect not addressed here is measuring how accuracy scales against larger web sites. A scanner might be able to effectively scan a hundred-link test application, but suffer in the face of a complex site with various technologies, error patterns, and behaviors.
Finally, accuracy is only one measure of the utility of a web application scanner. Future essays will address other topics such as site coverage, efficiency, and usability.

Observations on Larry Suto’s Paper about Web Application Security Scanners

Note: I’m the lead developer for the Web Application Scanning service at Qualys and I worked at NTO for about three years from July 2003 — both tools were included in this February 2010 report by Larry Suto. Never the less, I most humbly assure you that I am the world’s foremost authority on my opinion, however biased it may be.

The February 2010 report, Analyzing the Accuracy and Time Costs of Web
Application Security Scanners, once again generated heated discussion about web application security scanners. (A similar report was previously published in October 2007.) The new report addressed some criticisms of the 2007 version and included more scanners and more transparent targets. The 2010 version, along with some strong reactions to it, engender some additional questions on the topic of scanners in general:

How much should the ability of the user affect the accuracy of a scan?

Set aside basic requirements to know what a link is or whether a site requires credentials to be scanned in a useful manner. Should a scan result be significantly more accurate or comprehensive for a user who has several years of web security experience than for someone who just picked up a book in order to have spare paper for origami practice?

I’ll quickly concede the importance of in-depth manual security testing of web applications as well as the fact that it cannot be replaced by automated scanning. (That is, in-depth manual testing can’t be replaced; unsophisticated tests or inexperienced testers are another matter.) Tools that aid the manual process have an important place, but how much disparity should there really be between “out of the box” and “well-tuned” scans? The difference should be as little as possible, with clear exceptions for complicated sequences, hidden links, or complex authentication schemes. Tools that require too much time to configure, maintain, and interpret don’t scale well for efforts that require scanning to more than a handful of sites at a time. Tools whose accuracy correlates to the user’s web security knowledge scale at the rate of finding smart users, not at the rate of deploying software.

What’s a representative web application?

A default installation of osCommerce or Amazon? An old version of phpBB or the WoW forums? Web sites have wildly different coding styles, design patterns, underlying technologies, and levels of sophistication. A scanner that works well against a few dozen links might grind to a halt against a few thousand.

Accuracy against a few dozen hand-crafted links doesn’t necessarily scale against more complicated sites. Then there are web sites — in production and earning money no less — with bizarre and inefficient designs such as 60KB .NET VIEWSTATE fields or forms with seven dozen fields. A good test should include observations on a scanner’s performance at both ends of the spectrum.

Isn’t gaming a vendor-created web site redundant?

A post on the Accunetix blog accuses NTO of gaming the Accunetix test site based on a Referer field from web server log entries. First, there’s no indication that the particular scan cited was the one used in the comparison; the accusation has very flimsy support. Second, vendor-developed test sites are designed for the very purpose of showing off the web scanner. It’s a fair assumption that Accunetix created their test sites to highlight their scanner in the most positive manner possible, just as HP, IBM, Cenzic, and other web scanners would (or should) do for their own products. There’s nothing wrong with ensuring a scanner — the vendor’s or any other’s — performs most effectively against a web site offered for no other purpose than to show off the scanner’s capabilities.

This point really highlights one of the drawbacks of using vendor-oriented sites for comparison. Your web site probably doesn’t have the contrived HTML, forms, and vulnerabilities of a vendor-created intentionally-vulnerable site. Nor is it necessarily helpful that a scanner proves it can find vulnerabilities in a well-controlled scenario. Vendor sites help demonstrate the scanner, they provide a point of reference for discussing capabilities with potential customers, and they support marketing efforts. You probably care how the scanner fares against your site, not the vendor’s.

What about the time cost of scaling scans?

The report included a metric that attempted to demonstrate the investment of resources necessary to train a scanner. This is useful for users who need tools to aid in manual security testing or users who have only a few web sites to evaluate.

Yet what about environments where there are dozens, hundreds, or — yes, it’s possible — thousands of web sites to secure within an organization? The very requirement of training a scanner to deal with authentication or crawling works against running scans at a large scale. This is why point-and-shoot comparison should be a valid metric. (In opposition to at least one other opinion.)

Scaling scans don’t just require time to train a tool. It also requires hardware resources to manage configurations, run scans, and centralize reporting. This is a point where Software as a Service begins to seriously outpace other solutions.

Where’s the analysis of normalized data?

I mentioned previously that point-and-shoot should be one component of scanner comparison, but it shouldn’t be the only point — especially for tools intended to provide some degree of customization, whether it simply be authenticating to the web application or something more complex.

Data should be normalized not only within vulnerabilities (e.g. comparing reflected and persistent XSS separately, comparing error-based SQL injection separately from inference-based detections), but also within the type of scan. Results without authentication shouldn’t be compared to results with authentication. Other steps would be to compare the false positive/negative rates for tests scanners actual perform rather than checks a tool doesn’t perform. It’s important to note where a tools does or does not perform a check versus other scanners, but not performing a check has a different reflection on accuracy versus performing a check that still doesn’t identify a vulnerability.

What’s really going on here?

Designing a web application scanner is easy, implementing one is hard. Web security has complex problems, many of which have different levels of importance, relevance, and even nomenclature. The OWASP Top 10 project continues to refine its list by bringing more coherence to the difference between attacks, countermeasures, and vulnerabilities. The WASC-TC aims for a more comprehensive list defined by attacks and weaknesses. Contrasting the two approaches highlights different methodologies for testing web sites and evaluating their security.

So, if performing a comprehensive security review of a web site is already hard, then it’s likely to have a transitive effect on comparing scanners. Comparisons are useful and provide a service to potential customers, who want to find the best scanner for their environment, and useful to vendors, who want to create the best scanner for any environment. The report demonstrates areas not only where scanners need to improve, but where evaluation methodologies need to improve. Over time both of these aspects should evolve in a positive direction.

(UFO label = Unabashed Flamebait Observations)