BSides San Francisco

Voting on BSides SF presentations closes this Friday (Feb 2nd). If you’ll be in San Francisco for RSA, make sure to check out BSides as well. It’s also a chance to learn about a JavaScript-based approach to fingerprinting web app frameworks — but only if you vote for Blind Fury!

Blind Fury: An Alternate Web App Fingerprinting Technique

Web app fingerprinting attempts to identify the type and version of frameworks installed on a web site. Knowledge of frameworks and their version helps determine whether a site has kept up to date with security patches. Accurate fingerprinting can be more efficient and less intrusive than blackbox vulnerability scanning for identifying potential vulnerabilities.

Traditional approaches to fingerprinting web applications rely on brute force enumeration of pages, scraping content with regexes, or hybrids of the two. These are suboptimal. Page enumeration is bandwidth-intensive. Its accuracy falls when “install” files are removed or pages are minified. Regexes are prone to errors of matching incorrect content or are defeated by simple site modification (such as removing <meta> content). These techniques tend to identify the presence of pages on a site, but do not indicate whether the files are actually used of the application.

Blind Fury uses a new approach that does not rely on page enumeration or regexes. Yet it is still able to identify several popular frameworks. In fact, the technique can be extended to generate fingerprints for almost any type of web site. It can create and analyze fingerprints from a completely blackbox perspective; it does not require prior knowledge of a target’s directory structure.

If you love Rutger Hauer movies, vote for Blind Fury.

Fear not, regardless of the outcome of voting, I’ll be posting more about it at the end of the month.

p.s. Regular visitors may have noticed that the site has moved to WordPress.com from Blogger (saying good-bye to negative privacy and policy changes). The only drawback so far is that some of the archive links are broken because they were originally saved as year/month rather than year/month/day. All of the content remains, just under a slightly different link.

 

 

The Twelve Web Security Truths

My current writing project has taken time away from adding new content lately. Here’s a brief interlude of The Twelve Web Security Truths I’ve been toying with as a side project. They are modeled on The Twelve Networking Truths from RFC 1925.

  1. Software execution is less secure than software design, but running code has more users.
  2. The time saved by not using parameterized queries to build SQL statements should be used to read about using parameterized queries.
  3. Same Origin Policy restricts the DOM access and JavaScript behavior of content loaded from multiple origins. Malware only cares about plugin and browser versions.
  4. Content like XSS exploits are affected by the Same Origin Policy, which is nice for XSS attacks that inject into the site’s origin.
  5. CSRF countermeasures like Origin headers mitigate CSRF, not XSS. Just like X-Frame-Options mitigates clickjacking, not XSS.
  6. Making data safe for serialization with JSON does not make the data safe for the site.
  7. There are four XSS vulns in your site today. Hackers will find two of them, the security team will find one, the dev team will introduce another one tomorrow.
  8. Blacklists miss the payload syntax that works.
  9. A site that secures user data still needs to work on the privacy of user data.
  10. Hashing passwords with 1000-round PBKDF2 increases the work factor to brute force the login page by a factor of 1. Increasing this to a 10,000-round PBKDF2 scheme provides an additional increase by a factor of 1.
  11. The vulnerabilities in “web 2.0″ sites occur against the same HTML and JavaScript capabilities of “web 1.0″ sites. HTML5 makes this different in the same way.
  12. A site is secure when a compromise can be detected, defined, and fixed with minimal effort and users are notified about it.
  13. Off-by-one errors only happen in C.

Enjoy. And stick around for (the not quite yet imminent arrival of) new content. Thanks for reading!

The Futility of Web Pen Testing

I previously lamented the death of web scanners1 so it’s only fair that to turn this nihilistic gaze to the futility of manual web security testing. This isn’t to say it’s impossible to perform a comprehensive review of a web app. The problem lies in repeating that review after developers modify the app. Or the problem of obtaining consistent results from different testers. Or if we continue to look for problems: Repeating an in-depth review of one app across the few thousand that might exist in an organization.

It’s not controversial to state the web apps need to be tested. The trick is trying to keep up with the pace of development for a single web app, compounded by the pace at which new apps arise. Security testers must deal with the fear that a once-secure app might be crippled by the introduction of a new feature or a change that inadvertently breaks an old one. With this in mind, the fundamental challenges of manual pen testing is managing the effort required to maintain one site’s security and scaling that effort to thousands.

If we look at recent compromises, and accept reasoning from an anecdotal basis, we see simple attacks succeeding against huge, well-known web sites. It’s not like Google, Twitter, Facebook, and similar don’t understand security, don’t have budgets for security testing, or don’t have people testing their apps. Google actively encourages ethical testing against several of its properties.2 This nod to explicit permission to find vulns isn’t a concession to the impossibility of writing secure code; it’s a nod towards the difficulty of scale in manually testing large, complex web applications. It’s also interesting to see the vulns considered worthy of reward versus those deemed cruft or a “vector for petty mischief”.3 This highlights the minefield within the subjectivity of risk when terms like risk, attack, threat, and impact are ambiguously defined or too-broadly applied.

Consider the Sony Playstation Network (PSN) breach4 that compromised user data “by hacking into an application server behind a Web server and two firewalls”.5 This attack6 is a topical example of the impact of (what seems to be7) straight-forward web vulns like SQL injection. One could wonder whether the vulnerable web site ever received security testing. If so, the quality of the test must be called into question, especially if the alleged attack vector was so simple. On the other hand, we could ruminate on reasons why the site wasn’t tested. Missed and forgotten because the site was so old? Assumed it had been tested before? Too many other sites considered more important needed to be tested first? We could go on, but the point of the exercise is to express the difficulty of maintaining web security, not second-guessing situations we know too little about.

The presence of a vulnerability is usually indisputable. On the other hand, its risk or exploitation impact often falls to debate. SQL injection is a clear example of a vuln that should be addressed immediately rather than arguing if the vuln exposes an empty database or encrypted credit card numbers. SQL injection flaws are due to fundamentally improper programming. Even if an exploit is questionable, there’s bad code sitting on the server that should be fixed.

CSRF is a different matter, especially depending on how it manifests. Arguments over risk ratings too often devolve into opinions biased by imaginative threats rather than focus on the situation. (For example, mistakenly considering a CSRF countermeasure broken because it can be bypassed by XSS or incorrectly assuming CSRF tokens prevent sniffing attacks.)

The details of web vulns differ enough that it’s hard, and perhaps unwise, to assign each a static risk. Efforts like CVSS8 try to bring consistency and terminology to describing software vulns, but the scoring systems seems infrequent within web security. Fortunately, the risk calculations can be avoided without great loss if you treat vulns like the software defects they are. Assign priorities using methodologies already familiar to your dev team. And if the effort required to fix a bug is greater than the effort required to prove it’s really, really, truly a problem, then have a discussion about impact and risk.

Manual testing has a high degree of variance in quality and coverage. I don’t make these statements without being blameless. In the past I’ve written a chapter or two that omitted an attack variant or didn’t highlight something well enough. These are things I’ve tried to address in revised editions and on this web site. My point is that manual tests have an unavoidable bias in focus that depends on the person conducting them. Rather than passing judgement on quality, this is more of a judgement on coverage and that inevitable human factor of making mistakes. (After all, lots of vulns boil down to mistakes in code, albeit fairly consequential ones.)

CSRF arrived on the OWASP Top 10 in 2007. How many people were testing their web apps before that year? (To be fair, how many apps were being compromised that way? Especially when SQL injection remains(!) so much easier.) I like to pick on CSRF because the vuln is easily misunderstood and its purported impacts and countermeasures vary immensely.

There are two methodologies for addressing test coverage: blackbox (review the deployed app) and whitebox (review the app’s code). Blackbox testing rarely requires knowledge of the app’s underlying language. Although cases like PHP show that prior knowledge can be helpful because configuration settings influence code execution. The same code may run securely on one server and be wide open under a different configuration. Blackbox testing is the easiest path because it requires nothing more than a browser to begin. The snag is that blackbox testing won’t necessarily find the dusty corners of the web site where an insecure link or form is hiding.

Then why not just step towards a code review where every corner can be checked? A pen tester could spend an hour investigating different XSS vectors against a web page whereas reviewing the page’s source code could provide higher confidence whether it’s secure. After all, not every security fix is securely fixed.9

The drawback is that whitebox testing reduces the population of testers capable of finding security problems. This exacerbates the problem of scale in matching testers to web apps. The tester needs to have good comprehension of the app’s programming language in addition to security concepts. Someone very good at reviewing Java may miss a vuln in C#. Knowledge of good software design translates well between languages. Yet the problem remains that too few people have too many sites to review.

As with the article on web scanner mortality, this one requires a caveat. There will be a vocal group hurling cliched invectives against the idea that manual testing is useless, never requires tools, and is a worthless endeavor. None of those were asserted here. In fact, manual testing must be part of any exhaustive security testing. Humans have the ability to analyze the design of a web site, where an automated scanner largely focuses on implementation problems. Humans also possess the creative thinking required to turn QA’s use cases into abuse and misuse cases that bypass security.

Web QA testing groups are likely already overloaded with the combinatorial craziness inherent to reviewing UIs. Yet this is a perfect step for identifying security problems alongside bugs in features. Tools like Selenium10 would be prime platforms for security testing. Yet how many times did a web pen test produce findings, provide a PDF, then leave? Why haven’t Selenium scripts (or anything similar) become a lingua franca of pen test results?

One of the biggest changes necessary to manual testing is translating the hands-on tests that a human performs into a script that someone else (or a tool) can repeat.11 Treating pen tests as snapshots in time misses the opportunity to build a repository of security knowledge. Rather than just manage vulns, a group could manage the techniques and scripts used to find such vulns. This not only enables a one-time security review to become repeatable, but the quality of testing can improve.

The pessimism of this article and the previous one isn’t intended to be a capitulation to web vulnerabilities. By identifying the fundamental challenges to security testing it’s possible to start thinking of creative ways to solve them. It’s important to understand a problem well to avoid branching off into solutions that address false issues or have too narrow of a focus. In future articles we’ll turn the tables on this bleak landscape and look at the effective ways to apply automation and manual testing to web sites.

=====

1 http://www.deadliestwebattacks.com/2011/05/death-of-web-scanners.html

2 http://googleonlinesecurity.blogspot.com/2010/11/rewarding-web-application-security.html

3 http://www.google.com/corporate/rewardprogram.html

4 http://us.playstation.com/support/answer/index.htm?a_id=2356

5 http://news.cnet.com/8301-31021_3-20058950-260.html?tag=mncol;txt

6 http://blog.us.playstation.com/2011/04/26/update-on-playstation-network-and-qriocity/

7 I’ve yet to find definitive explanations of the attack, so reserve a little skepticism for reports like this: http://news.cnet.com/8301-27080_3-20063789-245.html

8 http://nvd.nist.gov/cvss.cfm

9 http://blog.mindedsecurity.com/2010/09/twitter-domxss-wrong-fix-and-something.html

10 http://seleniumhq.org/ 11 Dinis Cruz recognized this type of problem and started the O2 project, https://www.owasp.org/index.php/OWASP_O2_Platform. However, O2 focuses more on the tools to implement repeatability rather than defining a grammar to describe vuln tests. Selenium is a similar tool that uses JavaScript to define tests that can be driven by one of several different programming languages; however, it does not have an explicit security bent.

How web security will change with HTML5

Here’s an article with musings on potential security1 issues of The Web’s favorite new buzzword, HTML5.

Before you get too excited about breaking the spec, consider this bit:

The most dangerous security problems won’t be due to features of HTML5. Too many experienced people have been working on the specs to leave egregious errors in the design or in browsers’ implementation of it. The worst problems will come from developers who rush into new technologies without remembering sins of the past. It’s far too easy to fall into the trap of trusting data from the browser just because some hefty JavaScript routines have been assumed to perform all sorts of security validation on the data.

I can’t post the original article here because Mashable’s evil contract means I no longer have any rights to it. (Give us your content for free and receive Exposure!) I obviously agreed to these terms; hopefully they serve Mashable and me well.

If you’d like to hear more about HTML5 along with more technical details, stick around. There’s plenty to talk about!

=====

1 http://mashable.com/2011/04/29/html5-web-security/

CSRF and Beyond

Identifying CSRF vulnerabilities is more interesting than just scraping HTML for hidden fields or forging requests. CSRF stems from a design issue of HTTP and HTML that is in one aspect a positive feature of the web, but leads to unexpected consequences for web sites. We’ll start with a brief description of detection methods before diverging onto interesting(?!) tangents.

A passive detection method that is simple to automate looks for the presence or absence of CSRF tokens in web pages. This HTML scraping is prone to many errors and generates noisy results that don’t scale well for someone dealing with more than one web site at a time. This approach just assumes the identity of a token; it doesn’t verify that it is a valid one or more importantly that the application verifies it. Unless the page is examined after JavaScript has updated the DOM, this technique misses dynamically generated tokens, form fields, or forms.

An active detection method that can be automated replays requests under different user sessions. This approach follows the assumption that CSRF tokens are unique to a user’s session, such as the session cookie1 or other pseudo-random value. There’s also a secondary assumption that concurrent sessions are possible. To be effective, this approach requires a browser or good browser emulation to deal with any JavaScript and DOM updates. Basically, this technique swaps forms between two user sessions. If the submission succeeds, then it’s more likely request forgery is possible. If the submission fails, then it’s more likely a CSRF countermeasure has blocked it. There’s still potential for false negatives if some static state token or other form field wasn’t updated properly. The benefit of this approach is that it’s not necessary to guess the identity of a token and that the test is actually determining whether a request can be forged.

Once more countermeasures become based on the Origin header, the replay approach might be as as simple as setting an off-origin value for this header. A server will either reject or accept the request. This would be a nice, reliable detection (not to mention simple, strong countermeasure), but sadly not an imminent one.2

Almost by default, an HTML form is vulnerable to CSRF. (For the sake of word count forms will be synonymous with any “resource” like a link or XHR request.) WhiteHat Security described one way to narrow the scope of CSRF reporting from any form whatsoever to resources that fall into a particular category. Read the original post3, them come back. I’ve slightly modified WhiteHat’s three criteria to be resources:

  • with a security context or that cross a security boundary, such as password or profile management
  • that deliver an HTML injection (XSS) or HTTP response splitting payload to a vulnerable page on the target site. This answers the question for people who react to those vulns with, “That’s nice, but so what if you can only hack your own browser.” This seems more geared towards increasing the risk of a pre-existing vuln rather than qualifying it as a CSRF. We’ll come back to this one.
  • where sensitive actions are executed, such as anything involving money, updating a friend list, or sending a message

The interesting discussion starts with WhiteHat’s “benign” example. To summarize, imagine a site with a so-called non-obvious CSRF, one XSS vuln, one Local File Inclusion (LFI) vuln, and a CSRF-protected file upload form. The attack uses the non-obvious CSRF to exploit the XSS vuln, which in turn triggers the file upload to exploit the LFI. For example, the attacker creates the JavaScript necessary to upload a file and exploit the LFI, places this payload in an image tag on an unrelated domain, and waits for a victim to visit the booby-trapped page so their browser loads <img src=”http://target.site/xss_inject.page?arg=payload”>.

This attack was highlighted as a scenario where CSRF detection methods would usually produce false negatives because the vulnerable link, http://target.site/xss_inject.page, doesn’t otherwise affect the user’s security context or perform a sensitive action.

Let’s review the three vulns:

  • Ability to forge a request to a resource, considered “non-obvious” because the resource doesn’t affect a security context or execute a sensitive action.
  • Presence of HTML injection, HTTP Response Splitting, or other clever code injection vulnerability in said resource.
  • Presence of Local File Inclusion.

Using XSS to upload a file isn’t a vulnerability. There’s nothing that says JavaScript within the Same Origin Rule (under which the XSS falls once it’s reflected) can’t use XHR to POST data to a file upload form. It also doesn’t matter if the file upload form has CSRF tokens because the code is executing under the Same Origin Rule and therefore has access the tokens.

I think these two recommendations would be made by all and accepted by the site developers as necessary:

  • Fix the XSS vulnerability using recommend practices (let’s just assume the arg variable is just reflected in xss_inject.page)
  • Fix the Local File Inclusion (by verifying file content, forcing MIME types, not making the file readable, not using PHP at all, etc.)

But it was CSRF that started us off on this attack scenario. This leads to the question of how the “non-obvious” CSRF should be reported, especially from an automation perspective:

  • Is a non-obvious CSRF vulnerability actually obvious if the resource has another vuln (e.g. XSS)? Does the CSRF become non-reportable once the other vuln has been fixed?
  • Should a non-obvious CSRF vulnerability be obvious anyway if it has a query string (or form fields, etc.) that might be vulnerable?

If you already believe CSRF should be on every page, then clearly you would have already marked the example vulnerable just by inspection because it didn’t have an explicit countermeasure. But what about those who don’t follow the absolutist proscription of CSRF everywhere? (For performance reasons, or the resource doesn’t affect the user’s state or security context.)

Think about pages that use “referrer” arguments. For example:

http://web.site/redir.page?url=http://from.here

In addition to possibly being an open redirect, these are prime targets for XSS with payloads like

http://web.site/redir.page?url=javascript:nasty_stuff()

It seems that in these cases the presence of CSRF just serves to increase the XSS risk rather than be a vuln on its own. Otherwise, you risk producing too much noise by calling any resource with a query string vulnerable. In this case CSRF provides a rejoinder to the comment, “That’s a nice reflected XSS, but you can only hack yourself with it. So what.” Without the XSS vuln you probably wouldn’t waste time protecting that particular resource.

Look at a few of the other WhiteHat examples. They clearly fall into CSRF (changing shipping address, password set/reset mechanisms) with no doubt successful exploits could be demonstrated.

What’s interesting is that they seem to require race conditions or to happen during specific workflows to be successful4, e.g. execute the CSRF so the shipping address is changed before the transaction is completed. That neither detracts from the impact nor obviates it as a vulnerability. Instead, it highlights a more subtle aspect of web security: state management.

Let’s set aside malicious attackers and consider a beneficent CSRF donor. Our scenario begins with an ecommerce site. The victim, a lucky recipient in this case, has selected an item and placed it into a virtual shopping cart.

1) The victim (lucky recipient!) fills out a shipping destination.
2) The attacker (benefactor!) uses a CSRF attack to apply a discount coupon.
3) The recipient supplies a credit card number.
4) Maybe the web site is really bad and the benefactor knows that the same coupon can be applied twice. A second CSRF applies another discount.
5) The recipient completes the transaction.
6) Our unknown benefactor looks for the new victim of this CSRF attack.

I chose this Robin Hood-esque scenario to take your attention away from the malicious attacker/victim formula of CSRF to focus on the abuse of workflows.

A CSRF countermeasure would have prevented the discount coupon from being applied to the transaction, but that wouldn’t fully address the underlying issues here. Consider the state management for this transaction.

One problem is that the coupon can be applied multiple times. During a normal workflow the site’s UI leads the user through a check-out sequence that must be followed. On the other hand, if the site only prevented users from revisiting the coupon step in the UI, then the site’s developers have forgotten how trivial it is to replay GET and POST requests. This is an example of a state management issue where an action that should be performed only once can be executed multiple times.

A less obvious problem of state management is the order in which the actions were performed. The user submitted a discount coupon in two different steps: right after the shipping destination and right after providing payment info. In the UI, let’s assume the option to apply a discount shows up only after the user provides payment information. A strict adherence to this transaction’s state management should have rejected the first discount coupon since it arrived out of order.

Sadly, we have to interrupt this thought to address real-world challenges of web apps. I’ve defined a strict workflow as (1) shipping address required, (2) payment info required, (3) discount coupon optional, (4) confirm transaction required. A site’s UI design influences how strict these steps will be enforced. For example, the checkout process might be a single page that updates with XHR calls as the user fills out each section in any order. Conversely, this single page checkout might enable each step as the user completes them in order.

UI enforcement cannot guarantee that requests be made in order. This is where decisions have to be made regarding how strictly the sequence is to be enforced. It’s relatively easy to have a server-side state object track these steps and only update itself for requests in the correct order. The challenge is keeping the state flexible enough to deal with users who abandon a shopping cart, or decide at the last minute to add an extra widget before confirming the transaction, or a multitude of other actions that affect the state. These aren’t insurmountable challenges, but they induce complexity and require careful testing. This trade-off between coarse state management and granular control leads more to a balance of software correctness rather than security. You can still have a secure site if steps can be performed in order of 3, 1, 2, 4 rather than the expected 1, 2, 3, 4.

CSRF is about requests made in the victim’s session context (by the victim’s browser) on behalf of the attacker (initiated from an unrelated domain) without the victim’s interaction. If a link, iframe, image tag, JavaScript, etc. causes the victim’s browser to make a request that affects that user’s state in another web site, then the CSRF attack succeeded. The conceptual way to fix CSRF is to identify forged requests and reject them. CSRF tokens are intended to identify legitimate requests because they’re a shared secret between the site and the user’s browser. An attacker who doesn’t know the secret can forge a legitimate request.

What this discussion of CSRF attacks has highlighted is the soft underbelly of web sites’ state management mechanisms.

Automated scanners should excel at scaleability and consistent accuracy, but woe to those who believe they fully replace manual testing. Scanners find implementation errors (forgetting to use a prepared statement for a SQL query, not filtering angle brackets, etc.), but they don’t have the capacity to understand fundamental design flaws. Nor should they be expected to. Complex interactions are best understood and analyzed by manual testing. CSRF stands astride the gap between automation and manual testing. Automation identifies whether a site accepts forged requests, whereas manual testing can delve deeper into underlying state vulnerabilities or chains of exploits that CSRF might enable.

=====

[1] Counter to recommendations that session cookies have the Http-Only attribute so their value is not accessible via JavaScript. Http-Only is intended to mitigate some XSS exploits. However, the presence of XSS basically negates any CSRF countermeasure since the XSS payload can perform requests in the Same Origin without having to resort to forged, off-origin requests.

[2] A chicken-and-egg problem since there needs to be enough adoption of browsers that include the Origin header for such a countermeasure to be useful without rejecting legitimate users. Although such rejection would be excellent motivation for users to update old, likely insecure, and definitely less secure browsers.

[3] https://blog.whitehatsec.com/whitehat-security’s-approach-to-detecting-cross-site-request-forgery-csrf/

[4] These particular attacks are significantly easier on unencrypted Wi-Fi networks with sites that don’t use HTTPS, but in that case there are a lot of different attacks. Plus, the basis of CSRF tokens is that they are confidential to the user’s browser — not the case when sniffing HTTP.

Advanced Persistent Ignorance

The biggest threat to modern web applications is developers who exhibit Advanced Persistent Ignorance. Developers rely on all sorts of APIs to build complex software. This one makes code insecure by default. API is the willful disregard of simple, established security designs.

First, we must step back into history to establish a departure point for ignorance. This is just one of many. Almost seven years ago on July 13, 2004 PHP 5.0.0 was officially released. Importantly, it included this note:

A new MySQL extension named MySQLi for developers using MySQL 4.1 and
later. This new extension includes an object-oriented interface in addition
to a traditional interface; as well as support for many of MySQL’s new
features, such as prepared statements.

Of course, any new feature can be expected to have bugs and implementation issues. Even with an assumption that serious bugs would take a year to be worked out, that means PHP has had a secure database query mechanism for the past six years.1

The first OWASP Top 10 list from 2004 mentioned prepared statements as a countermeasure.2 Along with PHP and MySQL, .NET and Java supported these, as did Perl (before its popularity was subsumed by buzzword-building Python and Ruby On Rails). In fact, PHP and MySQL trailed other languages and databases in their support for prepared statements.

SQL injection itself predates the first OWASP Top 10 list by several years. One of the first summations of the general class of injection attacks was the 1999 Phrack article, Perl CGI problems.3 SQL injection was simply a specialization of these problems to database queries.

So, we’ve established the age of injection attacks at over a dozen years old and reliable countermeasures at least six years old. These are geologic timescales for the Internet.4

There’s no excuse for SQL injection vulnerabilities to exist in 2011.

It’s not a forgivable coding mistake anymore. Coding mistakes most often imply implementation errors — bugs due to typos, forgetfulness, or syntax. Modern SQL injection vulns are a sign of bad design. For six years, prepared statements have offered a means of establishing a fundamentally secure design for database queries. It takes actual effort to make them insecure. SQL injection attacks could still happen against a prepared statement, but only due to egregiously poor code that shouldn’t pass a basic review. (Yes, yes, stored procedures can be broken, too. String concatenation happens all over the place. Never the less, writing an insecure stored procedure or prepared statement should be more difficult than writing an insecure raw SQL statement.)

Maybe one of the two billion PHP hobby projects on Sourceforge could be expected to still have these vulns, but not real web sites. And, please, never in sites for security firms. Let’s review the previous few months:

November 2010, military web site.

December 2010, open source code repository web site.

February 2011, HBGary Federal. Sauron’s inept little brother. You might have heard about this one.

February 2011, Dating web site.

March 2011, MySQL.com. Umm…speechless. Let’s move on.

April 2011, web security firm Barracuda (and this)

Looking back on the list, you might first notice that The Register is the xssed.org of SQL injection vulns. (That is, in addition to a fine repository of typos and pun-based innuendos. I guess they’re just journalists after all, hackers don’t bother with such subtleties.)

The list will expand throughout 2011.

XSS is a little more forgivable, though no less embarrassing. HTML injection flaws continue to plague sites because of implementation bugs. There’s no equivalent of the prepared statement for building HTML or HTML snippets. This is why the vuln remains so pervasive: No one has figured out the secure, reliable, and fast way to build HTML with user-supplied data. This doesn’t imply that attempting to do so is a hopeless cause. On the contrary, JavaScript libraries can reduce these problems significantly.

For all the articles, lists, and books published on SQL injection one must assume that developers are being persistently ignorant of security concepts to such a degree that five years from now we may hear yet again of a database hack that disclosed unencrypted passwords.

If you’re going to use performance as an excuse for avoiding prepared statements then you either haven’t bothered to measure the impact, you haven’t understood how to scale web architectures, and you might as well turn off HTTPS for the login page so you can get more users logging in per second. If you have other excuses for avoiding database security, ask yourself if it takes longer to write a ranting rebuttal or a wrapper for secure database queries.

There may in fact be hope for the future. The rush to scaleability and the pious invocation of “Cloud” has created a new beast of NoSQL data stores. These NoSQL databases typically just have key-value stores with grammars that aren’t so easily corrupted by a stray apostrophe or semi-colon in the way that traditional SQL can be corrupted. Who knows, maybe security conferences will finally do away with presentations on yet another SQL injection exploit and find someone with a novel, new NoSQL Injection vulnerability.

Advanced Persistent Ignorance isn’t limited to SQL injection vulnerabilities. It has just spectacularly manifested itself in them. There are many unsolved problems in information security, but there are also many mostly-solved problems. Big unsolved problems in web security are password resets (overwhelmingly relying on e-mail) and using static credit card numbers to purchase items.

SQL injection countermeasures are an example of a mostly-solved problem. Using prepared statements isn’t 100% secure, but it makes a significant improvement. User authentication and password storage is another area of web security rife with errors. Adopting a solution like OpenID can reduce the burden of security around authentication. As with all things crypto-related, using well-maintained libraries and system calls are far superior to writing your own hash function or encryption scheme.

Excuses that prioritize security last in a web site design miss the point that not all security has to be hard. Nor does it have to impede usability or speed of development. Crypto and JavaScript libraries provide high-quality code primitives to build sites. Simple education about current development practices goes just as far. Sometimes the state of the art is actually several years old — because it’s been proven to work.

The antidote to API is the continuous acquisition of knowledge and experience. Yes, you can have your cake and eat it, too.

=====

1 MySQL introduced support for prepared statements in version 4.1, which was first released April 3, 2003.
2 https://www.owasp.org/index.php/A6_2004_Injection_Flaws
3 http://www.phrack.com/issues.html?issue=55&id=7#article
4 Perhaps a dangerous metaphor since here in the U.S. we still have school boards and prominent politicians for whom a complete geologic time spans a meager 4,000 years. Maybe some developers enjoy using ludicrous design patterns.

Stop Building HTML on the Server

Many web sites conceptually fit into the Model-View-Controller (MVC) design pattern despite (or often in spite of) the site’s actual code design. The server pieces together user data, state data, and other bits of HTML to send to the browser. One of the biggest frustrations with automating web app testing is dealing with poorly written web applications, especially the “View” — the HTML to be shown in the browser. A brief list of complaints includes: HTML with typos, outright invalid HTML, HTTP headers omitted, headers incorrect, strange state mechanisms, and weird case sensitivity issues.

Blackbox scanning fundamentally interacts only with content (HTML) and transport (HTTP and its accompanying statefulness as created by cookies). A scanner primarily bases its understanding of a web application on the site’s conceptual View. Whitebox testing, e.g. source code scanning, has more insight into the application’s functionality and can therefore more easily find vulns in our conceptually defined Model and Controller components that reside on the server.

HTML injection (a.k.a XSS) is probably the best example of vulnerabilities that arise due to View problems. For those few still unfamiliar with the term: HTML injection occurs when data to be displayed in a web page ends up modifying the page’s underlying structure. The structure is referred to as the Document Object Model (DOM). For example, a text node expecting to hold the user’s first name may unintentionally turn into a script node when the user’s name becomes “<script>alert(0)</script>” instead of “Trudy”.

The DOM also changes when characters like extraneous quotes appear in the syntax of href or input value attributes, for example:

<a href=”http://web.site/search.page?s=cars”onMouseover=alert(0);a=”>Search again</a>

Zip Code:<input name=zip_code value=‘90210‘onMouseover=alert(0);a=’>

In the two previous cases the data, a search string and a zip code, changed the page’s structure by adding an onMouseover attribute (with innocuous payload, yawn) to the anchor and input tags. The data wasn’t even intended to be displayed to the user, yet it is used in a way that pollutes the View.

Ah, so we’ve finally come back to that concept: the View. Now we get to mention frameworks. Specifically, JavaScript-based browser development frameworks like Dojo, Ext JS, Prototype, and YUI. These frameworks help move the uninteresting part of the UI Controller into the browser. In other words, the server doesn’t have to care about the visual effects of dragging and dropping an email message between folders, displaying data in a tree view, sorting a list, or other interactions that take place in the browser.

JavaScript frameworks enable the site’s UI (its View) to be well-separated from the server-side processing. It should follow that the web site can create a simpler, stricter set of functions that operate on well-defined data.

In other words, the server exposes APIs that respond to very small, atomic requests with very small, atomic responses. The server no longer has to deal with writing complete HTML pages for every response. This doesn’t absolve the server from dealing with encoding issues, but it centralizes the problem more cleanly because functions can be better focused on a single purpose. For example, a drag and drop of an email between folders in a browser requires an API call that verifies access to the message, its source and destination, and updates the message’s state. The server can respond as simply as “succeeded” or “failed” and leave the HTML updating to the browser’s JavaScript library.

Design vs. Implementation

Adopting a browser-side framework entails a significant amount of design effort (in terms of both visual style and software). This design effort can have several positive side-effects, including a reduction in the kinds of implementation errors that lead to HTML injection vulnerabilities. For example, the server can concern itself with ensuring a consistent encoding format for all output and the browser can display that output using well-defined, centralized functions. This is a far cry from ad-hoc HTML generation on the server. While software will always suffer from typos and bugs, a framework at least provides the tools to make a web site more secure by default.

There are some other areas where frameworks quite easily improve design security: CSRF and click-jacking countermeasures. Several frameworks have built-in countermeasures for these two types of attacks. Keep in mind that the countermeasures may not provide complete protection, but the defenses they provide are infinitely better than zero if your site current lacks any defense and most likely better than a home-grown countermeasure based on an incomplete understanding of the problem.1 After all, the framework’s countermeasures have been designed and tested by a large group of developers. Some of them even know a thing or two about security.

Right now frameworks largely rely on JavaScript-based tricks to block CSRF and click-jacking rather than more reliable Header-based ones that recent browsers provide. Of course, it’s a bit early to expect the majority of users’ browsers to include the Origin header2 so this still favors frameworks. (The Origin header is a more reliable version of the Referer header and as such can thwart CSRF attacks. And before you write in with nifty counterexamples of all the ways the Origin header can fail and shouldn’t be trusted, make sure your example doesn’t require an HTML injection, a.k.a. XSS, exploit as well — that’s a different problem.) Conversely, web sites can add click-jacking protection by setting the X-Frame-Options header3, but the number of users with browsers that would benefit is still limited.

Frameworks do not always reduce design errors. Obviously, they have no bearing on server-side design or implementation. User impersonation, data access controls, and authorization will not magically improve. A richer UI might introduce more complex interactions between data and user objects, which implies more chances for security failures.

JavaScript frameworks make the automated analysis of sites more difficult. Web application scanners have trouble testing them and even manually-aided tools like Selenium run into problems with dynamically changing DOMs. We’ll cover those issues in more detail in a separate post. Until then, consider introducing a browser development framework into your site in order to better consolidate server-side processing to a discrete set of APIs and leave HTML manipulation to the browser.

=====
1 You could always start with a book.
2 https://wiki.mozilla.org/Security/Origin
3 http://blogs.msdn.com/b/ie/archive/2009/01/27/ie8-security-part-vii-clickjacking-defenses.aspx

Ignore the OWASP Top 10 in Favor of Mike’s Top 10

Skip the stagnant list of attacks and weaknesses in favor of an approach the deals with the design and implementation of your site. Many items relate to a site’s use of HTTP and HTML. Neither of these arrived in the 21st century. In fact, they date back to the ’90s. HTML 4.01 was made official in December 1999, when Jessica Simpson and Whitney Houston had hits on the Billboard Top 10, and The Green Mile and Galaxy Quest were in the Top 10 grossing movies. (I realize these references reinforce the mighty U.S. cultural hegemony, but the U.K. album charts for that month included Boyzone and Westlife so you’ll have to forgive me.)

But let’s get back to the OWASP Top 10. There’s an implication here that some compelling reason exists for ignoring it. Simply put, it’s uninteresting over time and unhelpful for more than nomenclature.

If you love the OWASP Top 10, don’t bother reading the rest of this post. Skip to the comments section and start typing. The curious may read on. The impatient may skip to the last paragraph.

The OWASP Top 10 list originated as a clone of the SANS Top 10 for Windows and Unix vulnerabilities (ostensibly the most popular ways those operating systems were compromised). The list made an initial mistake of putting itself forward as a standard, which encouraged adoption without comprehension — taking the list as a compliance buzzword rather than a security starting point. The 2010 update mercifully raises red flags about the danger of falling into the trap of myopic adherence to the list.

The list has a suspicious confirmation bias such that popular, trendy vulnerabilities rise to the top, possibly because that’s what researchers are looking for. And these researchers are coming from a proactive rather than forensics perspective of web security, meaning that they rely on vulnerabilities discovered in a web site vs. vulnerabilities actively exploited to compromise a web site.

Two of the list’s metrics, Prevalence and Detectability, appear curiously correlated. A vulnerability that’s easy to detect (e.g. cross-site scripting) has a widespread prevalence. I question the causality of this relationship: Are they widespread because they’re easy to detect? This arises because an entry like A7: Insecure Cryptographic Storage has a difficult detectability and (therefore?) uncommon prevalence. Yet the last few months marked clear instances of web sites that stored users’ passwords with no salt and poor encryption [1][2]. This seems to reinforce the idea that the list is also biased towards a blackbox perspective of web applications at the expense of viewpoints from developers and architects.

Six of the list’s entries have easy detectability. This seems strange. If more than half of the risks to a web application are easy to find, why do these problems remain so widespread that site owners can’t squash them? The list of well-funded, well-established applications that have had HTML injection problems has more than a few familiar names: Amazon, eBay, Facebook, GMail, Paypal, and Twitter [3]. Maybe detectability isn’t so easy when dealing with large, complex systems.

If you’ve ever “tested for the OWASP Top 10″ or asked a vendor if their scanner “tests for the OWASP Top 10″ then you’ve tested the implementation of that web site, but not necessarily its design.

Web application vulnerabilities roughly fall into a design or implementation error. A great example of a design error is cross-site request forgery (CSRF). CSRF exploits the natural, expected way a browser submits requests from the iframe and img tags that appear in HTML. It was manifest from the first form tag in the ’90s, but didn’t reach critical mass (i.e. inclusion on the Top 10) until 2007. SQL injection was another design error: the ever-insecure commingling of code and data. (SQL injection occurs because the grammar of a database query could be modified by the data used by the query.)

These design problems inspired efforts to create viable solutions. Naive solutions blacklisted the common characteristics of an exploit, but such approaches dealt with implementation rather than design. Superior solutions introduced new concepts, the Origin header for CSRF and prepared statements for SQL injection, that enabled developers to address the systemic cause of the vulnerabilities rather than paper over the bugs that allowed them to emerge on the site.

Alas, not every vulnerability gets the secure by design treatment. HTML injection (known as cross-site scripting in the Vulgar Latin) attacks seem destined to be the ever-living cockroaches of the web. Where SQL injection combined code and data in the form of database queries, HTML injection combines user-supplied data with HTML. No one yet has created to reliable “prepared statement/parameterized query” equivalent for updating the DOM.

This doesn’t mean implementation errors can’t be dealt with: Coding guidelines provide secure alternatives to common patterns, frameworks enable consistent usage of recommended techniques, automated scanners provide a degree of verification.

This delineation of design and implementation plays second fiddle to the Siren’s Song of the OWASP Top 10. (Anecdote alert.) All too often the phrase, “Does it scan for the OWASP Top 10?” or “How does this compare to the OWASP Top 10?” arises when discussing a scanner’s capability or the outcome of a penetration test. Not exactly the list’s fault, but the inquirer’s. Never the less, the list has become a beast that drives web security in spite of the fact that applications have narrowly-defined, easy-to-detect problems like HTML injection (XSS) as well as widely-defined, broad, combined concepts like Broken Authentication and Session Management. After all, twenty years into the web sites still ask for an email address and password to authenticate users.

All software has bugs and bugs lead to implementation problems like HTML injection or forgetting to use a prepared statement for a SQL query. Such problems are usually independent of the site’s design. On the other hand, the security of authentication and session management have strong ties to a site’s design. Yet a scanner or penetration test that can’t execute a login bypass via SQL injection or brute force an administrator’s password doesn’t imply the site’s design is strong or correct — it simply means those attacks failed.

Maybe this misinterpretation of the OWASP Top 10 was already obvious: A scanner or penetration test that goes through the list verifies that common attacks can or cannot compromise the site, but strict adherence to the list doesn’t mean that the site is well-designed.

Fine, you say, complain about the rot in Denmark, but where’s the better replacement?

The Common Weakness Enumeration [4] (CWE) provides an excellent alternative to the stale OWASP list. To quote directly from its site, CWE “provides a unified, measurable set of software weaknesses that is enabling more effective discussion, description, selection, and use of software security tools and services that can find these weaknesses in source code and operational systems as well as better understanding and management of software weaknesses related to architecture and design.”

Several of the weaknesses aren’t even specific to web applications, but they’ve clearly informed attacks against web applications. CSRF evolved from the “Confused Deputy” described in 1988 [5]. SQL injection and HTML injection have ancestors in Unix command injection used against SMTP and finger daemons, to name just two. Pointing developers to these concepts provides a richer background on security.

If you care about your site’s security, engage your developers and security team in the site’s design and architecture. Use tools or manual testing to verify its implementation. Refer to the CWE for coding guidelines. And keep the OWASP Top 10 list around as an artifact of the types of vulnerabilities that interests the security community — a different community, one might guess, from the attackers targeting web sites.

As a final experiment, invert the sense of the attacks and weaknesses to a prescriptive list of Mike’s Top 10 and see how that influences developers:

M1. Validate data from the client.
M2. Sanitize data for the appropriate context when displayed in the client.
M3. Apply strong authentication schemes.
M4. Use strong PRNG and high entropy when access control relies on knowledge of a value.
M5. Enforce strong session management and workflows (and set the Origin header).
M6. Configure the platform securely.
M7. Use established, recommended cryptographic systems and techniques.
M8. Apply authorization checks consistently.
M9. Turn on SSL wherever possible.
M10. Restrict data to expected values.

=====

[1] http://www.theregister.co.uk/2010/12/13/gawker_hacked/
[2] http://www.theregister.co.uk/2011/01/21/trapster_website_hack/
[3] http://xssed.org/
[4] http://cwe.mitre.org/
[5] http://portal.acm.org/citation.cfm?id=871709

Regex-based security filters sink without anchors

On June 7th the Stanford Web Security Research Group released a study of clickjacking countermeasures employed across Alexa Top-500 web sites. It’s an excellent survey of different approaches taken by web developers to prevent their sites from being framed (i.e. subsumed by an <iframe> tag). To better understand the dangers of framing pages, read the paper or check out Chapter Two of The Book.

One interesting point emphasized in the paper is how easily regular expressions can be misused or misunderstood as security filters. Regexes can be used to create positive or negative security models — either match acceptable content (whitelisting) or match attack patterns (blacklisting). Inadequate regexes lead to more vulnerabilities than just clickjacking.

One of the biggest mistakes made in regex patterns is leaving them unanchored. Anchors determine the span of a pattern’s match against an input string. The ‘^’ anchor matches the beginning of a line. The ‘$’ anchor matches the end of a line. (Just to confuse the situation, when ‘^’ appears inside grouping brackets it indicates negation, e.g. ‘[^a]+’ means match one or more characters that is not ‘a’.)

Consider the example of the nytimes.com’s document.referrer check as shown in Section 3.5 of the Stanford paper. The weak regex is highlighted in red:

if(window.self != window.top &&
!document.referrer.match(/https?:\/\/[^?\/]+\.nytimes\.com\//))
{
top.location.replace(
window.location.pathname);
}

As the authors point out (and anyone who is using regexes as part of a security or input validation filter should know), the pattern is unanchored and therefore easily bypassed. The site developers intended to check the referrer for links like these:

http://www.nytimes.com/https://www.nytimes.com/http://www.nytimes.com/auth/loginhttp://firstlook.blogs.nytimes.com/

Since the pattern isn’t anchored, it will look through the entire input string for a match, which leaves the attacker with a simple bypass technique. In the following example, the pattern matches the text in red — clearly not the developers’ intent:

http://evil.lair/clickjack.html?a=http://www.nytimes.com/

The devs wanted to match a URI whose domain included “.nytimes.com”, but the pattern would match anywhere within the referrer string.

The regex would be improved by requiring the pattern to begin at the first character of the input string. The new, anchored pattern would look more like this:

^https?:\/\/[^?\/]+\.nytimes\.com\/

The same concept applies to input validation for form fields and URI parameters. Imagine a web developer, we’ll call him Wilberforce for alliterative purposes, who wishes to validate zip codes submitted in credit card forms. The simplest pattern would check for five digits, using any of these approaches:

[0-9]{5}
\d{5}
[[:digit:]]{5}

At first glance the pattern works. Wilberforce even tests some basic XSS and SQL injection attacks with nefarious payloads like <script src=...> and 'OR 19=19. The regex rejects them all.

Then our attacker, let’s call her Agatha, happens to come by the site. She’s a little savvier and, whether or not she knows exactly what the validation pattern looks like, tries a few malicious zip codes (the five digits are underlined):

90210'
<script>alert(0x42)<script>57732
10118<script>alert(0x42)<script>

Poor Wilberforce’s unanchored pattern finds a matching string in all three cases, thereby allowing the malicious content through the filter and enabling Agatha to compromise the site. If the pattern had been anchored to match the complete input string from beginning to end then the filter wouldn’t have failed so spectacularly:

^\d{5}$

Unravelling Strings

Even basic string-matching approaches can fall victim to the unanchored problem; after all they’re nothing more than regex patterns without the syntax for wildcards, alternation, and grouping. Let’s go back to the Stanford paper for an example of walmart.com’s document.referrer check based on a JavaScript String object’s IndexOf function. This function returns the first position in the input string of the argument or -1 in case the argument isn’t found:

if(top.location != location) {
if(document.referrer &&
document.referrer.indexOf("walmart.com") == -1)
{
top.location.replace(document.location.href);
}
}

Sigh. As long as the document.referrer contains the string “walmart.com” the anti-framing code won’t trigger. For Agatha, the bypass is as simple as putting her booby-trapped clickjacking page on a site with a domain name like “walmart.com.evil.lair” or maybe using a URI fragment, http://evil.lair/clickjack.html#walmart.com. The developers neglected to ensure that the host from the referrer URI ends in walmart.com rather than merely contains walmart.com.

The previous sentence is very important. Read it again. The referrer string isn’t supposed to end in walmart.com, the referrer’s host is supposed to end with that domain. That’s an important distinction considering the bypass techniques we’ve already mentioned:

http://walmart.com.evil.lair/clickjack.html
http://evil.lair/clickjack.html#walmart.com
http://evil.lair/clickjack.html?a=walmart.com

Parsers Before Patterns

Input validation filters often require an understanding of a data type’s grammar. Some times this is simple, such as a five digit zip code. More complex cases, such as email addresses and URIs, require that the input string be parsed before pattern matching is applied.

The previous indexOf string example failed because it doesn’t actually parse the referrer’s URI; it just looks for the presence of a string. The regex pattern in the nytimes.com example was superior because it at least tried to understand the URI grammar by matching content between the URI’s scheme (http or https) and the first slash (/)1.

A good security filter must understand the context of the pattern to be matched. The improved walmart.com referrer check is shown below. Notice that the get_hostname_from_url function now uses a regex to extract the host name from the referrer’s URI and the string comparison ensures the host name either exactly matches or ends with “walmart.com”. (You could quibble that the regex in get_hostname_from_url isn’t anchored, but in this case the pattern works because it’s not possible to smuggle malicious content inside the URI’s scheme. The pattern would fail if it returned the last match instead of the first match. And, yes, the typo in the comment in the killFrames function is in the original JavaScript.)

function killFrames() {
if (top.location != location) {
if (document.referrer) {
var referrerHostname = get_hostname_from_url(document.referrer);
var strLength = referrerHostname.length;
if ((strLength == 11) && (referrerHostname != "walmart.com")){ // to take care of http://walmart.com url - length of "walmart.com" string is 11.
top.location.replace(document.location.href);
} else if (strLength != 11 && referrerHostname.substring(referrerHostname.length - 12) != ".walmart.com") { // length og ".walmart.com" string is 12.
top.location.replace(document.location.href);
}
}
}
}

function get_hostname_from_url(url) {
return url.match(/:\/\/(.[^/?]+)/)[1];
}

Conclusion

Regexes and string matching functions are ubiquitous throughout web applications. If you’re implementing security filters with these functions, keep these points in mind:

Normalize the character set. Ensure the string functions and regex patterns match the character encoding, e.g. multi-byte string functions for multi-byte sequences.

Always match the entire input string. Anchor patterns to the start (‘^’) and end (‘$’) of input strings. If you expect input strings to include multiple lines, understand how multiline (?m) and single line (?s) flags will affect the pattern — if you’re not sure then treat it as a single line. Where appropriate to the context, the results of string matching functions should be tested to see if the match occurred at the beginning, within, or end of a string.

Prefer a positive security model over a negative one. Whitelist content that you expect to receive and reject anything that doesn’t fit. Whitelist filters should be as strict as possible to avoid incorrectly matching malicious content. If you go the route of blacklisting content, make the patterns as lenient as possible to better match unexpected scenarios — an attacker may have an encoding technique or JavaScript trick you’ve never heard of.

Consider a parser instead of a regex. If you want to match a URI attribute, make sure your pattern extracts the right value. URIs can be complex. If you’re trying to use regexes to parse HTML content…good luck.

Don’t shy away from regexes because their syntax looks daunting, just remember to test your patterns against a wide array of malicious and valid input strings.

=====
1 Technically, the pattern should match the host portion of the URI’s authority. Check out RFC 3986 for specifics, especially the regexes mentioned in Appendix B.