• O[Utf-8]12

    Music has a universal appeal uninhibited by language. A metal head in Istanbul, Tokyo, or Oslo instinctively knows the deep power chords of Black Sabbath – it takes maybe two beats to recognize a classic like “N.I.B.” or “Paranoid.” The same guitars that screamed the tapping mastery of Van Halen or led to the spandex hair excess of 80s metal also served The Beatles and Pink Floyd. And before them was Chuck Berry, laying the ground work with the power chords of “Roll Over Beethoven”.

    All that from six strings and five notes: E - A - D - G - B - E. Awesome.

    Then there’s the written word on the web. Thousands of symbols in 8 bits, 16 bits, and 32 bits. In ASCII, or US-ASCII as RFC 2616 puts it, or rather ISO-8859-1. Or UTF-8, which is easier to adopt because it’s like an extended ASCII. On the other hand if you’re dealing with GB2312 then UTF-8 isn’t necessarily for you. Of course, in that case you should really be using GBK instead of GB2312. Or was it supposed to be GB18030?

    Character encodings get messy and confusing quickly. Our metal head friends like their own genre of müzik / 音楽 / musikk – one word, three languages, many symbols. In this page, those symbols (or glyphs) share one encoding: UTF-8.

    You don’t need to speak a language in order to work with its characters, words, and sentences. You just need Unicode. As Tim Berners-Lee put it,

    The W3C was founded to develop common protocols to lead the evolution of the World Wide Web. The path W3C follows to making text on the Web truly global is Unicode. Unicode is fundamental to the work of the W3C; it is a component of W3C Specifications, from the early days of HTML, to the growing XML Family of specifications and beyond.

    Unicode has its learning curve. With Normalization Forms. Characters. Code Units. Glyphs. Collation. And so on. The gist of Unicode is that it’s a universal coding scheme to represent all that’s to come of the characters used for written language; hopefully never to be eclipsed.

    The security problems of Unicode stem from the conversion from one character set to another. When home-town fans of 少年ナイフ want to praise their heroes in a site’s comment section, they’ll do so in Japanese. Yet behind the scenes, the browser, web site, or operating systems involved might be handling the characters in UTF-8, Shift-JIS, or EUC.

    The conversion of character sets introduces the chance for mistakes and breaking assumptions. The number of bytes might change, leading to a buffer overflow or underflow. The string may no longer be the C-friendly NULL-terminated array. Unsupported characters cause errors, possibly causing an XSS filter to skip over a script tag. A lot of these concerns have been documented (and here). Some have demonstrable exploits, as opposed to conceptual problems that run rampant through security conferences, but never see a decent hack.

    Unicode got more scrutiny when it was proposed for Internationalized Domain Names (IDN). Researchers warned of “homoglyph” attacks, situations where phishers or malware authors would craft URLs that used alternate characters to spoof popular sites. (Here’s an example of JavaScript’s early problems.)

    The first attacks didn’t need IDNs. They used trivial letter substitution with look-alikes, such as swapping l (the letter L) and 1 (the number one) in dead1iestattacks.com. IDNs provided more sophistication by allowing domains with changes visually harder to detect like deạdliestattacks.com.

    What’s been less well documented (from what I could find) is the range of support for character set encodings in security tools. The primary language of web security seems to be English based on the popular conferences and books. But useful tools come from all over. Wivet originated from Türkiye (here’s some more UTF-8: Web Güvenlik Topluluğu). Sqlmap and w3af support Unicode. So maybe this is a non-issue for modern tools.

    In any case, it never hurts to have more “how to hack” tools in non-English languages or test suites to verify that the latest XSS finder, SQL injector, or web tool can deal with sites that aren’t friendly enough to serve content as UTF-8. Or you could help out with documentation projects like the OWASP Developer Guide.

    Sometimes translation is really easy. The phrase for “heavy metal” in French is “heavy metal” – although you’d be correct to use “Métal Hurlant” if you were talking about the magazine or movie. Character conversion can be easy, too. As long as you stick with a single representation. Once you start to dabble in the Unicode conversions from UTF-8, UTF-16, UTF-32, and beyond you’ll be well-served by keeping up to date on encoding concerns and having tools that spare you the brain damage of implementing everything from scratch.

    • • •
  • My current writing project has taken time away from adding new content lately. Here’s a brief interlude of The Twelve Web Security Truths I’ve been toying with as a side project. They are modeled on The Twelve Networking Truths from RFC 1925.

    1. Software execution is less secure than software design, but executing code attracts actual users.
    2. The time saved by not using parameterized queries to build SQL statements should be used to read about using parameterized queries.
    3. Same Origin Policy restricts the DOM access and JavaScript behavior of content loaded from multiple origins. Malware only cares about plugin and browser versions.
    4. Content with XSS vulns are affected by the Same Origin Policy, which is nice for XSS attacks that inject into the site’s origin.
    5. CSRF countermeasures like Origin headers mitigate CSRF, not XSS. Just like X-Frame-Options mitigates clickjacking, not XSS.
    6. Making data safe for serialization with JSON does not make the data safe for the site.
    7. There are four HTML injection vulns in your site today. Hackers will find two of them, the security team will find one, the dev team will introduce another one tomorrow.
    8. Deny lists miss the attack payload that works.
    9. A site that secures user data still needs to work on the privacy of user data.
    10. Hashing passwords with 1,000-round PBKDF2 increases the work factor to brute force the login page by a factor of 1. Increasing this to a 10,000-round PBKDF2 scheme provides an additional increase by a factor of 1.
    11. The vulnerabilities in “web 2.0” sites occur against the same HTML and JavaScript capabilities of “web 1.0” sites. HTML5 makes this different in the same way.
    12. A site is secure when a compromise can be detected, defined, and fixed with minimal effort and users are notified about it.
    13. Off-by-one errors only happen in C.
    • • •
  • The Bogey-Owl

    The Advanced Persistent Threat (APT) is competing with Cyberwar for security word of the year. It would have been nice if we had given other important words like HTTPS or prepared statements their chance to catch enough collective attention to drive security fixes. Alas, we still deal with these fundamental security problems due to Advanced Persistent Ignorance.

    It’s not wrong to seek out APT examples. It helps to have an idea about their nature, but we must be careful about seeing the APT boogeyman everywhere.

    Threats have agency. They are persons (or natural events like earthquakes and tsunamis) that take action against your assets (like customer information or encryption keys). An XSS vuln in an email site isn’t a threat – the person trying to hijack an account with it is. With this in mind, the term APT helpfully self-describes two important properties of a threat:

    • it’s persistent
    • it’s advanced

    Persistence is uncomplicated. The threat actor has a continuous focus on the target. This doesn’t mean around-the-clock port scanning just waiting for an interesting port to appear. It means active collection of data about the target as well as development of tools, techniques, and plans once a compromise is attained. Persistent implies patience in searching for “simple” vulns as much as enumerating resources vulnerable to more sophisticated exploits.

    A random hacker joyriding the internet with sqlmap or metasploit be persistent, but the persistence is geared towards finding any instance of a vuln rather than finding one instance of a vuln in a specific target. It’s the difference between a creep stalking his ex versus a creep hanging out in a bar waiting for someone to get really drunk.

    The advanced aspect of a threat leads to more confusion than its persistent aspect. An advanced threat may still exploit simple vulns like SQL injection. The advanced nature need not even be the nature of the exploit (like using sqlmap). What may be advanced is how they leverage the exploit. Remember, the threat actor most likely wants to do more than grab passwords from a database. With passwords in hand it’s possible to reach deeper into the target network, steal information, cause disruption, and establish more permanent access.

    Stolen passwords are one of the easiest backdoors and the most difficult to detect. Several months ago RSA systems were hacked. Enough information was allegedly stolen that observers at the time imagined it would enable attackers to spoof or otherwise attack SecurID tokens and authentication schemes.

    Now it seems those expectations have been met with not one, but two major defense contractors reporting breaches that apparently used SecurID as a vector.

    I don’t want you to leave without a good understanding of what an insidious threat looks like. Let’s turn to the metaphor and allegory of television influenced by the Cold War.

    Specifically, The Twilight Zone season 2, episode 28, “Will the Real Martian Please Stand Up” written by the show’s creator, Rod Serling.

    Spoilers ahead. Watch the episode before reading further. It’ll be 25 minutes of entertainment you won’t regret.

    The set-up of the episode is that a Sheriff and his deputy find possible evidence of a crashed UFO, along with very human-like footprints leading from the forested crash site into town.

    The two men follow the tracks to a diner where a bus is parked out front. They enter the diner and ask if anyone’s seen someone suspicious. You know, like an alien. The bus driver explains the bus is delayed by the snowy weather and they had just recently stopped at this diner. The lawmen scan the room, “Know how many you had?”

    “Six.”

    In addition to the driver and the diner’s counterman, Haley, there are seven people in the diner. Two couples, a dandy in a fedora, an old man, and a woman. Ha! Someone’s a Martian in disguise!

    What follows are questions, doubt, confusion, and a jukebox. With no clear evidence of who the Martian may be, the lawmen reluctantly give up and allow everyone to depart. The passengers reload the bus1. The sheriff and his deputy leave. The bus drives away.

    But this is the Twilight Zone. It wouldn’t leave you with a such a simple parable of paranoia; there’s always a catch.

    The man in the fedora and overcoat, Mr. Ross, returns to the diner. He laments that the bus didn’t make it across the bridge. (“Kerplunk. Right into the river.”)

    Dismayed, he sits down at the counter, cradling a cup of coffee in his left hand. The next instant, with marvelous understatement, he pulls a pack of cigarettes from his overcoat and extracts a cigarette – using a third hand to retrieve some matches.

    We Martians (he explains) are looking for a nice remote, pleasant spot to start colonizing Earth.

    Oh, but we’re not finished. Haley nods at Mr. Ross’ story. You see, the inhabitants of Venus thought the same thing. In fact, they’ve already intercepted and blocked the Ross’ Martian friends in order to establish a colony of their own.

    Haley smiles, pushing back his diner hat to reveal a third eye in his forehead.

    That, my friends, is an advanced persistent threat!


    1. The server rings up their bills, charging one of them $1.40 for 14 cups of coffee. I’m not sure which is more astonishing – drinking 14 cups or paying 10 cents for each one. 

    • • •