Posts

Electric Skillet Dec 11, 2010
Of John Brunner’s novels, I recommend reading Stand on Zanzibar first. It’s a well-known classic. Follow that with The Sheep Look Up. If you’re interested in novelty, Squares of the City has the peculiar attribute of being written to the rules of a chess game (the book’s appendix maps each character’s role to its relevant piece).

Two of Brunner’s books contain computer security concepts and activities. The first one, The Shockwave Rider, was written in 1975 and is largely responsible for generating the concept of a worm. A character, Sandy, explains:

What you need is a worm with a completely different structure. The type they call a replicating phage.

The character continues with a short history of replicating phages, including one developed at a facility called Electric Skillet:

…and its function is to shut the net down and prevent it being exploited by a conquering army. They think the job would be complete in thirty seconds.

The main character, Nick Halflinger, creates a variant of the self-replicating phage. Instead of devouring its way towards to the destruction of the net, the program grows off data as a virtual parthenogenetic tapeworm. Nick is a smart computer sabotage consultant (among other things). His creation “won’t expand to indefinite size and clog the net for other use. It has built-in limits.” No spoilers, but the tapeworm has a very specific purpose.

In his 1988 novel, Children of the Thunder, Brunner mentions a logic bomb as he introduces a freelance writer who had been covering a computer security conference. Brunner didn’t coin this term, though. Malicious insiders were creating logic bombs at least since 1985¹, famously described by a computer scientist in 1984, and known in the late 70s ² (including a U.S. law covering cybercrime in 1979).

The history of the term is almost beside the point because the whimsical nature of the fictional version deserves note ³:

Two months ago a logic bomb had burst in a computer at British Gas, planted, no doubt, by an employee disgruntled about the performance of his or her shares, which resulted in each of its customers in the London area being sent the bill intended for the next person on the list – whereupon all record of the sums due had been erased.

A paragraph later we’re treated to a sly commentary embedded in the description of the newspaper who hired the journalist:

The paper…was in effect a news digest, aimed at people with intellectual pretensions but whose attention span was conditioned by the brevity of radio and TV bulletins, and what the [editor] wanted was a string of sensational snippets about his readers’ privacy being infringed, bent programmers blackmailing famous corporations, saboteurs worming their way into GCHQ and the Ministry of Defense…”

The fictional newspaper is called the Comet, but it sounds like an ancestor to El Reg (with the addition of pervasive typos and suggestive puns). It’s amusing to see commentary on the attenuation of attention spans due to radio and TV in 1988. It provides a multi-decade precursor to contemporary screeds against Twitter, texting, and Facebook.

Should you have any remaining attention left to continue reading, I encourage you to try one or more of these books.
1. “Man Guilty in ‘Logic Bomb’ Case.” Los Angeles Times 4 July 1985, Southland ed., Metro; 2; Metro Desk sec.: 3. “[Dennis Lee Williams], who could face up to three years in prison when sentenced by Los Angeles Superior Court Judge Kathleen Parker on July 31, was convicted of setting up the program designed to shut down important data files.” ↩
2. Communications of the ACM: Volume 22. 1979. “…logic bomb (programmed functions triggered to execute upon occurrence of future events)…” ↩
3. Brunner, John. Children of the Thunder. New York: Ballantine, 1989. 8-9. ↩
• • •
Carborundum Saw Dec 11, 2010

It’s entertaining to come across references to computer security in fiction. Sometimes the reference may be grating, infused with hyperbole, or laughably flawed. Sometimes it may seem surprisingly prescient, falling somewhere positive along a spectrum of precision and detail.

Even more rewarding is to encounter such a quote within a good book. Few readers who venture outside of modern bestsellers, science-fiction or otherwise, may recognize the author Stanisław Lem, but they may be familiar with the movie based on his book of the same name: Solaris. Lem has written several books, two of my favorites being The Cyberiad and Fiasco.

One Human Minute, from 1983 (the English translation appeared in February 1986), isn’t about computers in particular. The story is presented as a book review of an imagined tome that describes one minute of the entire Earth’s population. It includes this fun gem:

Meanwhile, computer crime has moved from fantasy into reality. A bank can indeed be robbed by remote control, with electronic impulses that break or fool security codes, much as a safecracker uses a skeleton key, crowbar, or carborundum saw. Presumably, banks suffer serious losses in this way, but here One Human Minute is silent, because – again, presumably – the world of High Finance does not want to make such losses public, fearing to expose this new Achille’s heel: the electronic sabotage of automated bookkeeping.¹

Carborundum saw would also make a great name for a hacking tool.
1. Lem, Stanisław. One Human Minute. Trans. Catherine S. Leach. San Diego: Harvest Book, 1986. 34. ↩
• • •
Regex-based security filters drift without anchors Jun 15, 2010

In June 2010 the Stanford Web Security Research Group released a study of clickjacking countermeasures employed across Alexa Top-500 web sites. It’s an excellent survey of different approaches taken by web developers to prevent their sites from being subsumed by an iframe tag.

One interesting point emphasized in the paper is how easily regular expressions can be misused or misunderstood as security filters. Regexes can be used to create positive or negative security models – either match acceptable content (allow listing) or match attack patterns (deny listing). Inadequate regexes lead to more vulnerabilities than just clickjacking.

One of the biggest mistakes made in regex patterns is leaving them unanchored. Anchors determine the span of a pattern’s match against an input string. The ^ anchor matches the beginning of a line. The $ anchor matches the end of a line.

(Just to confuse the situation, when ^ appears inside grouping brackets it indicates negation, e.g. [^a]+ means match one or more characters that is not a.)

Consider the example of the nytimes.com’s document.referrer check as shown in Section 3.5 of the Stanford paper. The weak regex is highlighted below:
```
if(window.self != window.top &&
   !document.referrer.match(/https?:\/\/[^?\/]+\.nytimes\.com\//)) {
  top.location.replace(window.location.pathname);
}
```
As the study’s authors point out (and anyone who is using regexes as part of a security or input validation filter should know), the pattern is unanchored and therefore easily bypassed. The site developers intended to check the referrer for links like these:
```
https://www.nytimes.com/
https://www.nytimes.com/
https://www.nytimes.com/auth/login
https://firstlook.blogs.nytimes.com/
```
Since the pattern isn’t anchored, it will look through the entire input string for a match, which leaves the attacker with a simple bypass technique. In the following example, the pattern matches the text in red – clearly not the developers’ intent:

https://evil.lair/clickjack.html?a=https://www.nytimes.com/

The devs wanted to match a URI whose domain included “.nytimes.com”, but the pattern would match anywhere within the referrer string.

The regex would be improved by requiring the pattern to begin at the first character of the input string. The new, anchored pattern would look more like this:

^https:\/\/[^?\/]+\.nytimes\.com\/

The same concept applies to input validation for form fields and URI parameters. Imagine a web developer, we’ll call him Wilberforce for alliterative purposes, who wishes to validate U.S. zip codes submitted in credit card forms. The simplest pattern would check for five digits, using any of these approaches:

[0-9]{5} \d{5} [[:digit:]]{5}

At first glance the pattern works. Wilberforce even tests some basic XSS and SQL injection attacks with nefarious payloads like and 'OR 19=19. The regex rejects them all.

Then our attacker, let’s call her Agatha, happens to come by the site. She’s a little savvier and, whether or not she knows exactly what the validation pattern looks like, tries a few malicious zip codes (the five digits are underlined):

90210' alert(0x42)57732 10118alert(0x42)

Poor Wilberforce’s unanchored pattern finds a matching string in all three cases, thereby allowing the malicious content through the filter and enabling Agatha to compromise the site. If the pattern had been anchored to match the complete input string from beginning to end then the filter wouldn’t have failed so spectacularly:

^\d{5}$

Unravelling Strings

Even basic string-matching approaches can fall victim to the unanchored problem; after all they’re nothing more than regex patterns without the syntax for wildcards, alternation, and grouping. Let’s go back to the Stanford paper for an example of walmart.com’s document.referrer check based on a JavaScript String object’s IndexOf function. This function returns the first position in the input string of the argument or -1 in case the argument isn’t found:
```
if(top.location != location) {
  if(document.referrer && document.referrer.indexOf("walmart.com") == -1) {
    top.location.replace(document.location.href);
  }
}
```
Sigh. As long as the document.referrer contains the string “walmart.com” the anti-framing code won’t trigger. For Agatha, the bypass is as simple as putting her booby-trapped clickjacking page on a site with a domain name like “walmart.com.evil.lair” or maybe using a URI fragment, https://evil.lair/clickjack.html#walmart.com. The developers neglected to ensure that the host from the referrer URI ends in walmart.com rather than merely contains walmart.com.

The previous sentence is very important. The referrer string isn’t supposed to end in walmart.com, the referrer’s host is supposed to end with that domain. That’s an important distinction considering the bypass techniques we’ve already mentioned:
```
https://walmart.com.evil.lair/clickjack.html
https://evil.lair/clickjack.html#walmart.com
https://evil.lair/clickjack.html?a=walmart.com
```
Prefer Parsers Before Patterns

Input validation filters often require an understanding of a data type’s grammar. Sometimes this is simple, such as a five digit zip code (assuming it’s a US zip code and assuming it’s not a zip+4 format). More complex cases, such as email addresses and URIs, require that the input string be parsed before pattern matching is applied.

The previous indexOf string example failed because it doesn’t actually parse the referrer’s URI; it just looks for the presence of a string. The regex pattern in the nytimes.com example was superior because it at least tried to understand the URI grammar by matching content between the URI’s scheme (http or https) and the first slash (/)¹.

A good security filter must understand the context of the pattern to be matched. The improved walmart.com referrer check is shown below. Notice that the get_hostname_from_url function now uses a regex to extract the host name from the referrer’s URI and the string comparison ensures the host name either exactly matches or ends with “walmart.com”.

(You could quibble that the regex in get_hostname_from_url isn’t anchored, but in this case the pattern works because it’s not possible to smuggle malicious content inside the URI’s scheme. The pattern would fail if it returned the last match instead of the first match. And, yes, the typo in the comment in the killFrames function is in the original JavaScript.)
```
function killFrames() {
  if(top.location != location) {
    if(document.referrer) {
      var referrerHostname = get_hostname_from_url(document.referrer);
      var strLength = referrerHostname.length;
      if((strLength == 11)
         && (referrerHostname != "walmart.com")) { // to take care of https://walmart.com url - length of "walmart.com" string is 11.
        top.location.replace(document.location.href);
      }
      else if(strLength != 11
              && referrerHostname.substring(referrerHostname.length - 12) != ".walmart.com") { // length of ".walmart.com" string is 12.
        top.location.replace(document.location.href);
      }
    }
  }
}

function get_hostname_from_url(url) {
  return url.match(/:\/\/(.[^/?]+)/)[1];
}
```
Conclusion

Regexes and string matching functions are ubiquitous throughout web applications. If you’re implementing security filters with these functions, keep these points in mind:

Normalize the character set – Ensure the string functions and regex patterns match the character encoding, e.g. multi-byte string functions for multi-byte sequences.

Always match the entire input string – Anchor patterns to the start (^) and end ($) of input strings. If you expect input strings to include multiple lines, understand how multiline (?m) and single line (?s) flags will affect the pattern. If you’re not sure then explicitly treat it as a single line. Where appropriate to the context, the results of string matching functions should be tested to see if the match occurred at the beginning, within, or at the end of a string.

Prefer a positive security model over a negative one – Define what you want to explicitly accept rather than what to reject.

Allow content that you expect to receive and deny anything that doesn’t fit – Allow list filters should be as strict as possible to avoid incorrectly matching malicious content. If you go the route of deny listing content, make the patterns as lenient as possible to better match unexpected scenarios – an attacker may have an encoding technique or JavaScript trick you’ve never heard of.

Use a parser instead of a regex – If you want to match a URI attribute, make sure your pattern extracts the right value. URIs can be complex. If you’re trying to use regexes to parse HTML content…good luck.

Don’t shy away from regexes because their syntax looks daunting, just remember to test your patterns against a wide array of both malicious and valid input strings.

But do avoid regexes if you’re parsing complex grammars like HTML and URLs.
1. Technically, the pattern should match the host portion of the URI’s authority. Check out RFC 3986 for specifics, especially the regexes mentioned in Appendix B. ↩
• • •

⇠1 2 3••••••••••••••18 192021 22 23 ⇢

Unravelling Strings

Prefer Parsers Before Patterns

Conclusion