Web Scanner Evaluation: Accuracy

This is the first in a series of essays describing suggested metrics for evaluating web application security scanners.

Accuracy measures the scanner’s ability to detect vulnerabilities. The basic function of a web scanner is to use automation to identify the same, or most of the same, vulnerabilities as a web security auditor. Rather than focus on the intricate details of specific vulnerabilities, this essay describes two major areas of evaluation for a scanner: its precision and faults.

Precise scanners produce results that not only indicate the potential for compromise of the web site, but provide actionable information that helps developers understand and address security issues.
Comprehensive tests should be measured on the different ways a vulnerability might manifest as opposed to establishing a raw count of payloads. The scope of a test is affected by several factors:
  • Alternate payloads (e.g. tests for XSS within an href attribute or the value attribute of an input element, using intrinsic events, within a JavaScript variable, or that create new elements)
  • Encoding and obfuscation (e.g. employing various character sets or encoding techniques to bypass possible filters)
  • Applicability to different technologies in a web platform (e.g. PHP superglobals, .NET VIEWSTATE)
Robust detection mechanisms will correctly confirm the presence of a vulnerability, which typically fall into one of three categories:
  • Signatures such as HTTP response code, string patterns, and reflected content (e.g. matching ODBC errors or looking for alert(1) strings)
  • Inference based on interpreting the results of a group of requests (e.g. “blind” SQL injection that affects the content of a response based on specific SQL query constructions)
  • Time-based tests measure the responsiveness of the web site to various payloads. Not only can they extend inference-based tests, but they can also indicate potential areas of Denial of Service if a payload can cause the site to spend more resources processing a request than an attacker requires making the request (e.g. a query that performs a complete table scan of a database).
Injection vector refers to the areas that a scanner applies security tests. The most obvious injection points are query string parameters and visible form fields. The web browser should at least be considered an untrusted source if not an adversarial one. Consequently, a web scanner should be able to run security checks via all aspects of HTTP communication including:
  • URI parameters
  • POST data
  • Client-site headers, especially the ones more commonly interpreted by web sites including User-Agent, the-forever-misspelled Referer, and Cookie.
Errors in a scanner’s results take away from the time-saving gains of automation by requiring users to dig into non-existent vulnerabilities or spending too much time repeating the scanner’s tests in order to satisfy that certain vulnerabilities do not exist.
False positives indicate insufficient analysis of a potential vulnerability. The cause of a false positive can be hard to discern without intimate knowledge of the scanner’s internals, but often falls into one of these categories:
  • Misdiagnosis of generic error page or landing page
  • Poor test implementation that misinterprets correlated events to infer cause from effect (e.g. changing a profile page’s parameter value from Mike to Mary to view another user’s public information is not a case of account impersonation – the web site intentionally displays the content)
  • Sole reliance on inadequate test signature to claim the vulnerability exists (e.g. a poor regex or stating an HTTP 200 response code for ../../../../etc/passwd indicates the password file is accessible)
  • Web application goes into error state due to load (e.g. database error occurs because the server has become overloaded by scan traffic, not because a double quote character was injected into a parameter)
  • Lack of security impact (e.g. an unauthenticated, anonymous search form is vulnerable to CSRF – search engines like Google, Yahoo!, and Bing are technically vulnerable but the security relevance is questionable)
The effort expended to invalidate an erroneous vulnerability wastes time better spent investigating and verifying actual vulnerabilities. False positives also reduce trust in the scanner.
False negatives expose a more worrisome aspect of the scanner because the web site owner may gain a false sense of security by assuming, incorrectly, that a report with no vulnerabilities implies the site is fully secure. Several situations lead to a missed vulnerability:
  • Lack of test. The scanner simply does not try to identify the particular type of vulnerability.
  • Poor test implementation that too strictly defines the vulnerability (e.g. XSS tests that always contain <script> or javascript: under the mistaken assumption that those are required to exploit an XSS vuln)
  • Inadequate signatures (e.g. the scanner does not recognize SQL errors generated by Oracle)
  • Insufficient replay of requests (e.g. a form submission requires a valid e-mail address in one field in order to exploit an XSS vulnerability in another field)
  • Inability to automate (e.g. the vulnerability is related to a process that requires understanding of a sequence of steps, knowledge of the site’s business logic). The topic of vulnerabilities for which scanners cannot test (or have great difficulty testing) will be addressed separately.
  • Lack of authentication state (e.g. the scanner is able to authenticate at the beginning of the scan, but unknowingly loses its state, perhaps by hitting a logout link, and does not attempt to restore authentication)
  • Link not discovered by the scanner. This falls under the broader scope of site coverage, which will be addressed separately.
The optimistic aspect of false negatives is that a scanner’s test repository can always grow. In this case a good metric is determining the ease with which false negatives are addressed.
Accuracy is an important aspect of a web scanner. Inadequate tests might make a scanner more cumbersome to use than a simple collection of tests scripted in Perl or Python. Too many false positives reduces the user’s confidence in the scanner and wastes valuable time on items that should never have been identified. False negatives may or may not be a problem depending on how the web site’s owners rely on the scanner and whether the missed vulnerabilities are due to lack of tests or poor methodology within the scanner.
One aspect not addressed here is measuring how accuracy scales against larger web sites. A scanner might be able to effectively scan a hundred-link test application, but suffer in the face of a complex site with various technologies, error patterns, and behaviors.
Finally, accuracy is only one measure of the utility of a web application scanner. Future essays will address other topics such as site coverage, efficiency, and usability.