The Futility of Web Pen Testing

I previously lamented the death of web scanners1 so it’s only fair that to turn this nihilistic gaze to the futility of manual web security testing. This isn’t to say it’s impossible to perform a comprehensive review of a web app. The problem lies in repeating that review after developers modify the app. Or the problem of obtaining consistent results from different testers. Or if we continue to look for problems: Repeating an in-depth review of one app across the few thousand that might exist in an organization.

It’s not controversial to state the web apps need to be tested. The trick is trying to keep up with the pace of development for a single web app, compounded by the pace at which new apps arise. Security testers must deal with the fear that a once-secure app might be crippled by the introduction of a new feature or a change that inadvertently breaks an old one. With this in mind, the fundamental challenges of manual pen testing is managing the effort required to maintain one site’s security and scaling that effort to thousands.

If we look at recent compromises, and accept reasoning from an anecdotal basis, we see simple attacks succeeding against huge, well-known web sites. It’s not like Google, Twitter, Facebook, and similar don’t understand security, don’t have budgets for security testing, or don’t have people testing their apps. Google actively encourages ethical testing against several of its properties.2 This nod to explicit permission to find vulns isn’t a concession to the impossibility of writing secure code; it’s a nod towards the difficulty of scale in manually testing large, complex web applications. It’s also interesting to see the vulns considered worthy of reward versus those deemed cruft or a “vector for petty mischief”.3 This highlights the minefield within the subjectivity of risk when terms like risk, attack, threat, and impact are ambiguously defined or too-broadly applied.

Consider the Sony Playstation Network (PSN) breach4 that compromised user data “by hacking into an application server behind a Web server and two firewalls”.5 This attack6 is a topical example of the impact of (what seems to be7) straight-forward web vulns like SQL injection. One could wonder whether the vulnerable web site ever received security testing. If so, the quality of the test must be called into question, especially if the alleged attack vector was so simple. On the other hand, we could ruminate on reasons why the site wasn’t tested. Missed and forgotten because the site was so old? Assumed it had been tested before? Too many other sites considered more important needed to be tested first? We could go on, but the point of the exercise is to express the difficulty of maintaining web security, not second-guessing situations we know too little about.

The presence of a vulnerability is usually indisputable. On the other hand, its risk or exploitation impact often falls to debate. SQL injection is a clear example of a vuln that should be addressed immediately rather than arguing if the vuln exposes an empty database or encrypted credit card numbers. SQL injection flaws are due to fundamentally improper programming. Even if an exploit is questionable, there’s bad code sitting on the server that should be fixed.

CSRF is a different matter, especially depending on how it manifests. Arguments over risk ratings too often devolve into opinions biased by imaginative threats rather than focus on the situation. (For example, mistakenly considering a CSRF countermeasure broken because it can be bypassed by XSS or incorrectly assuming CSRF tokens prevent sniffing attacks.)

The details of web vulns differ enough that it’s hard, and perhaps unwise, to assign each a static risk. Efforts like CVSS8 try to bring consistency and terminology to describing software vulns, but the scoring systems seems infrequent within web security. Fortunately, the risk calculations can be avoided without great loss if you treat vulns like the software defects they are. Assign priorities using methodologies already familiar to your dev team. And if the effort required to fix a bug is greater than the effort required to prove it’s really, really, truly a problem, then have a discussion about impact and risk.

Manual testing has a high degree of variance in quality and coverage. I don’t make these statements without being blameless. In the past I’ve written a chapter or two that omitted an attack variant or didn’t highlight something well enough. These are things I’ve tried to address in revised editions and on this web site. My point is that manual tests have an unavoidable bias in focus that depends on the person conducting them. Rather than passing judgement on quality, this is more of a judgement on coverage and that inevitable human factor of making mistakes. (After all, lots of vulns boil down to mistakes in code, albeit fairly consequential ones.)

CSRF arrived on the OWASP Top 10 in 2007. How many people were testing their web apps before that year? (To be fair, how many apps were being compromised that way? Especially when SQL injection remains(!) so much easier.) I like to pick on CSRF because the vuln is easily misunderstood and its purported impacts and countermeasures vary immensely.

There are two methodologies for addressing test coverage: blackbox (review the deployed app) and whitebox (review the app’s code). Blackbox testing rarely requires knowledge of the app’s underlying language. Although cases like PHP show that prior knowledge can be helpful because configuration settings influence code execution. The same code may run securely on one server and be wide open under a different configuration. Blackbox testing is the easiest path because it requires nothing more than a browser to begin. The snag is that blackbox testing won’t necessarily find the dusty corners of the web site where an insecure link or form is hiding.

Then why not just step towards a code review where every corner can be checked? A pen tester could spend an hour investigating different XSS vectors against a web page whereas reviewing the page’s source code could provide higher confidence whether it’s secure. After all, not every security fix is securely fixed.9

The drawback is that whitebox testing reduces the population of testers capable of finding security problems. This exacerbates the problem of scale in matching testers to web apps. The tester needs to have good comprehension of the app’s programming language in addition to security concepts. Someone very good at reviewing Java may miss a vuln in C#. Knowledge of good software design translates well between languages. Yet the problem remains that too few people have too many sites to review.

As with the article on web scanner mortality, this one requires a caveat. There will be a vocal group hurling cliched invectives against the idea that manual testing is useless, never requires tools, and is a worthless endeavor. None of those were asserted here. In fact, manual testing must be part of any exhaustive security testing. Humans have the ability to analyze the design of a web site, where an automated scanner largely focuses on implementation problems. Humans also possess the creative thinking required to turn QA’s use cases into abuse and misuse cases that bypass security.

Web QA testing groups are likely already overloaded with the combinatorial craziness inherent to reviewing UIs. Yet this is a perfect step for identifying security problems alongside bugs in features. Tools like Selenium10 would be prime platforms for security testing. Yet how many times did a web pen test produce findings, provide a PDF, then leave? Why haven’t Selenium scripts (or anything similar) become a lingua franca of pen test results?

One of the biggest changes necessary to manual testing is translating the hands-on tests that a human performs into a script that someone else (or a tool) can repeat.11 Treating pen tests as snapshots in time misses the opportunity to build a repository of security knowledge. Rather than just manage vulns, a group could manage the techniques and scripts used to find such vulns. This not only enables a one-time security review to become repeatable, but the quality of testing can improve.

The pessimism of this article and the previous one isn’t intended to be a capitulation to web vulnerabilities. By identifying the fundamental challenges to security testing it’s possible to start thinking of creative ways to solve them. It’s important to understand a problem well to avoid branching off into solutions that address false issues or have too narrow of a focus. In future articles we’ll turn the tables on this bleak landscape and look at the effective ways to apply automation and manual testing to web sites.

=====

1 http://www.deadliestwebattacks.com/2011/05/death-of-web-scanners.html

2 http://googleonlinesecurity.blogspot.com/2010/11/rewarding-web-application-security.html

3 http://www.google.com/corporate/rewardprogram.html

4 http://us.playstation.com/support/answer/index.htm?a_id=2356

5 http://news.cnet.com/8301-31021_3-20058950-260.html?tag=mncol;txt

6 http://blog.us.playstation.com/2011/04/26/update-on-playstation-network-and-qriocity/

7 I’ve yet to find definitive explanations of the attack, so reserve a little skepticism for reports like this: http://news.cnet.com/8301-27080_3-20063789-245.html

8 http://nvd.nist.gov/cvss.cfm

9 http://blog.mindedsecurity.com/2010/09/twitter-domxss-wrong-fix-and-something.html

10 http://seleniumhq.org/ 11 Dinis Cruz recognized this type of problem and started the O2 project, https://www.owasp.org/index.php/OWASP_O2_Platform. However, O2 focuses more on the tools to implement repeatability rather than defining a grammar to describe vuln tests. Selenium is a similar tool that uses JavaScript to define tests that can be driven by one of several different programming languages; however, it does not have an explicit security bent.

5 thoughts on “The Futility of Web Pen Testing

  1. dre

    I suggest watir-webdriver over Selenium — yet both appear to be converging there. This is a good long-term solution.For a short-term solution — see the w3af project, which includes an Export Request Tool. The tests that it creates can be run with HtmlFixture (FitNesse), Nose, or Cucumber (or really any test case runner or modern CI/build server).I'm going to have to say that we must use automation whenever we can — manual testing is usually the wrong way to spend your valuable time. However, this doesn't mean to automate past the point of technology gain. Crawlers are only useful to the tester — they are not a means to an end. Scanners are useful tools in a tester's toolbox — they are also not a means to an end.Secure code review shouldn't be done without tools (i.e. Yasca). Web app pen-testing shouldn't be done without tools (e.g. Multi Links in Firefox or Snap Links Lite in Chrome) and automation (e.g. controlled, iterative programming with a data source, such as URL/form/param/header lists, from a command line that launches/drives browsers and other HTTP tools). Both should be done as simultaneous activities that feed each other and by experts.

  2. book

    I agree with your points, Dre. The valuable information from a pen test is the workflow that reproduces a vulnerability, not the tool or people that find it. Ideally, a test "grammar" would allow the test to never be forgotten so that someone else can verify it X months later using Y tool without having to manually translate to Y tool's abilities, even if the grammar was generated by tool Z.Think of the pen test data in terms of a social network profile or a cloud computing provider. You should be able to move your data from one network to another or from one provider to another. If you can't extract your data, or your data is meaningless on another platform, then it's lost some of its utility.

  3. psiinon (@psiinon)

    For the applications we develop we have a set of Selenium tests which test the UI functionality.
    We proxy these via the OWASP Zed Attack Proxy (ZAP).
    Theres no reason why other similar tools cant be proxied via ZAP, so its not restricted to Selenium.
    ZAP has an API we drive via Ant which runs the spider and then the active scanner. Any alerts raised are reported and cause the build to fail.
    I’ve documented an example of this using the BodgeIt Store: http://code.google.com/p/bodgeit/wiki/RegTests
    Its not going to find everything, in the same way the regression tests dont prove that an app does what its supposed to do rather than just pass the tests.
    You still need QA to check the app really does what you want, and you need pentesters to find things like application level vulnerabilities.
    But it will hopefully pick up stupid mistakes very early on (eg the night after you check in code which fails to escape one bit of user supplied input!).

    Psiinon (ZAP project lead)

  4. Stephen de Vries (@stephendv)

    Until we have proper AI, I think automating (or even partially automating) existing manual processes is the way forward.

    To this end I’m about to release a framework specifically designed to integrate security testing into a BDD framework (JBehave). It uses Selenium WebDriver, and the page object pattern to make tests more maintainable and comes with a templated set of security specifications that can be applied to a broad range of apps. Finally, it integrates with Burp (through another plugin with imminent release) so that Burp scanning can be controlled from within the test scripts.

    This will be the subject of my talk at BH EU on the 14th. For a sneak peak of the tools, you can find them here: http://www.continuumsecurity.net/bdd-intro.html
    and

    http://www.continuumsecurity.net/resty-intro.html

  5. Pingback: The Forlorn Followup | Deadliest Web Attacks

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s