65 million years ago, dinosaurs ruled the earth. (Which also seems about the last time I wrote something new here.)
In 45 million lines of code, Windows XP dominated the desktop. Yes it had far too many security holes and people held onto it for far too long — even after Microsoft tried to pull support for the first time. But its duration is still a testament to a certain measure of success.
Much of today’s web still uses code that dates from the dawn of internet time, some new code is still written by dinosaurs, and even more code is written by the avian descendants of dinosaurs. These birds flock to new languages and new frameworks. Yet, looking at some of the trivial vulns that emerge (like hard-coded passwords and SQL built from string concatenation), it seems the bird brain hasn’t evolved as much security knowledge as we might wish.
I’m a fan of dead languages. I’ve mentioned before my admiration of Latin (as well as Harry Potter Latin). And hieroglyphs have an attractive mystery to them. This appreciation doesn’t carry over to Perl. (I wish I could find the original comment that noted an obfuscated Perl contest is a redundant effort.)
But I do love regular expressions. I’ve crafted, tweaked, optimized, and obscured my fair share of regexes over the years. And I’ve discovered the performance benefits of pcre_study() and JIT compilation mode.
Yet woe betide anyone using regexes as a comprehensive parser (especially for HTML). And if you’re trying to match quoted strings, be prepared to deal with complexities that turn a few character pattern into a monstrous composition.
Seeing modern day humans still rely on poorly written regexes to conduct code scanning made me wonder how little mammals have advanced beyond the dinosaurs of prehistory. They might not be burning themselves with fire, but they’re burning their chances of accurate, effective scans.
That was how I discovered pfff and its companion, sgrep. At the SOURCE Seattle conference this year I spoke a little about lessons learned from regexes and the advancements possible should you desire to venture into the realm of OCaml: SOURCE Seattle 2015 – Code Scanning. Who knows, if you can conquer fire you might be able to handle stone tools.
I have yet to create a full taxonomy of the mistakes developers make that lead to insecure code.
As a brief note towards that effort, here’s an HTML injection (aka cross-site scripting) example that’s due to a series of tragic assumptions that conspire to not only leave the site vulnerable, but waste lines of code doing so.
The first clue lies in the querystring’s state parameter. The site renders the state‘s value into a title element. Naturally, a first probe for HTML injection would be attempting to terminate that tag. If successful, then it’s trivial to append arbitrary markup such as <script> tags. A simple probe looks like this:
The site responds by stripping the payload’s </title> tag (plus any subsequent characters). Only the text leading up to the injected tag is rendered within the title.
This seems to have effectively countered the attack and not expose any vuln. Of course, if you’ve been reading this blog for any length of time, you’ll know this trope of deceitful appearances always leads to a vuln. That which seems secure shatters under scrutiny.
The developers knew that an attacker might try to inject a closing </title> tag. Consequently, they created a filter to watch for such things and strip them. This could be implemented as a basic case-insensitive string comparison or a trivial regex.
And it could be bypassed by just a few characters.
Consider the following closing tags. Regardless of whether they seem surprising or silly, the extraneous characters are meaningless to HTML yet meaningful to our exploit because they foil the assumption that regexes make good parsers.
After inspecting how the site responds to each of the tags, it’s apparent that the site’s filter only expected a so-called “good” </title> tag. Browsers don’t care about an attribute on the closing tag. (They’ll ignore such characters as long as they don’t violate parsing rules.)
Next, we combine the filter bypass with a payload. In this case, we’ll use an image onerror event.
The tragedy of this vuln is that it proves the site’s developers were aware of the concept of HTML injection exploits, but failed to grasp the fundamental characteristics of the vuln. The effort spent blocking an attack (i.e. countering an injected closing tag) not only wasted lines of code on an incorrect fix, but left the naive developers with a false sense of security. The code became more complex and less secure.
The mistake also highlights the danger of assuming that well-formed markup is the only kind of markup. Browsers are capricious beasts; they must dance around typos, stomp upon (or skirt around) errors, and walk bravely amongst bizarrely nested tags. This syntactic havoc is why regexes are notoriously worse at dealing with HTML than proper parsers.
There’s an ancillary lesson here in terms of automated testing (or quality manual pen testing, for that matter). A scan of the site might easily miss the vuln if it uses a payload that the filter blocks, or doesn’t apply any attack variants. This is one way sites “become” vulnerable when code doesn’t change, but attacks do.
And it’s one way developers must change their attitudes from trying to outsmart attackers to focusing on basic security principles.
The slides convey more theory than practical examples, but the ideas should come across without too much confusion. I expect to revisit the idea of a Rot network (a play on Tor) and toy with an implementation. Instead of blocking tracking bugs, the concept is to reduce their utility by sharing them across unrelated browsers — essentially polluting the data.
In any case, with this presentation over and out of the way, it’s time to start working on more articles!
You taught me language, and my profit on’t
Is, I know how to curse: the red plague rid you,
For learning me your language!
Caliban, (The Tempest, I.ii.363-365)
The announcement of the Heartbleed vulnerability revealed a flaw in OpenSSL that could be exploited by a simple mechanism against a large population of targets to extract random memory from the victim. At worst, that pilfered memory would contain sensitive information like HTTP requests (with cookies, credentials, etc.) or even parts of the server’s private key. (Or malicious servers could extract similarly sensitive data from vulnerable clients.)
In the spirit of Shakespeare’s freckled whelp, I combined a desire to learn about Heartbleed’s underpinnings with my ongoing experimentation with the new language features of C++11. The result is a demo tool named Hemorrhage.
Hemorrhage shows two different approaches to sending modified TLS heartbeats. One relies on the Boost.ASIO library to set up a TCP connection, then handles the SSL/TLS layer manually. The other uses a more complete adoption of Boost.ASIO and its asynchronous capabilities. It was this async aspect where C++11 really shone. Lambdas made setting up callbacks a pleasure — especially in terms of readability compared to prior techniques that required binds and placeholders.
Readable code is hackable (in the creation sense) code. Being able to declare variables with auto made code easier to read, especially when dealing with iterators. Although hemorrhage only takes minimal advantage of the move operator and unique_ptr, they are currently my favorite aspects following lambdas and auto.
Hemorrhage itself is simple. Check out the README.md for more details about compiling it. (Hint: As long as you have Boost and OpenSSL it’s easy on Unix-based systems.)
The core of the tool is taking the tls1_heartbeat() function from OpenSSL’s ssl/t1_lib.c file and changing the payload length — essentially a one-line modification. Yet another approach might be to use the original tls1_heartbeat() function and modify the heartbeat data directly by manipulating the SSL* pointer’s s3->wrec data via the SSL_CTX_set_msg_callback().
In any case, the tool’s purpose was to “learn by implementing something” as opposed to crafting more insidious exploits against Heartbleed. That’s why I didn’t bother with more handshake protocols or STARTTLS. It did give me a better understanding of OpenSSL’s internals (of which I’ll add my voice to the chorus bemoaning its readability).
The oracles of ancient Greece claimed to have the power of precognition, derived from the gods themselves. In the 17th century, John Lockewrote of more experiential sources for ideas, where sensation and reflection were two fountains of knowledge.
But none of these philosophical considerations are necessary to predict the effect of plugins on browser security. In the course of putting a presentation together, I’ve annotated two particular items.
Java. The Comic Sans of browser plugins. Inexplicably, some people think it’s a clever design choice. But it really conveys an uninformed decision, is ultimately useless (especially for modern browsers), and is a sign of incompetence.
Flash.Wingdings. At first it looks pretty. But its overuse quickly leads to annoyance. There’s no reason for a Flash plugin other than to look at legacy cat videos and suffer agitating ad banners.
They are nothing more than fonts of malware. When was the last time you installed a non-critical update for either one? If you haven’t disabled these plugins, do so now.
I’ll have new content soon. And, with my own knowledge of the future, here’s a peek at what those topics might be:
– Ruminations on privacy.
– Identity, passwords, personas.
– More examples of HTML injection.
And that doesn’t include putting up content for the newly released Anti-Hacker Tool Kit. Alas, my fountain of writing is a mere trickle!
The idea: Penalize a site’s ranking in search engine results if the site suffers a security breach.
Now, for some background and details…
In December 2013 Target revealed that it had suffered a significant breach that exposed over 40 million credit card numbers. A month later it upped the count to 70 million and noted the stolen information included customers’ names, mailing addresses, phone numbers, and email addresses.
Does anybody care?
Or rather, what do they care about? Sure, the media likes stories about hacking, especially ones that affect millions of their readers. The security community marks it as another failure to point to. Banks will have to reissue cards and their fraud departments be more vigilant. Target will bear some costs. But will customers really avoid it to any degree?
Years ago, in 2007, a different company disclosed its discovery of a significant breach that affected at least 40 million credit cards. Check out the following graph of the stock price of the company (TJX Holdings) from 2006 to the end of 2013.
Notice the dip in 2009 and the nice angle of recovery. The company’s stock didn’t take a hit until 2009 when TJX announced terms of its settlement. The price nose-dived, only to steadily recover as consumers stopped caring and spent money (amongst any number of arbitrary reasons, markets not being as rational or objective as one might wish).
Consider who bears the cost of breaches like these. Ultimately, merchants pay higher fees to accept credit cards, consumers pay higher fees to have cards. And, yes, TJX paid in lost valuation over a rather long period (roughly a year), but only when the settlement was announced — not when the breach occurred. The settlement suggests that lax security has consequences, but a breach in and of itself might not.
Truth of Consequences
But what if a company weighs the costs of a breach as more favorable than the costs of increasing security efforts? What if a company doesn’t even deal with financial information and therefore has no exposure to losses related to fraud? What about companies that deal in personal information or data, like Snapchat?
Now check out another chart. The following data from Quantcast shows daily visitors to a lyrics site. The number is steady until one day — boom! — visits drop by over 60% when the site is relegated to the backwaters of search results.
Google caught the site (Rap Genius) undertaking sociopathic search optimization techniques like spreading link spam. Not only does spammy, vapid content annoy users, but Google ostensibly suffers by losing users who flee poor quality results for alternate engines. (How much impact it has on advertising revenue is a different matter.) Google loses revenue if advertisers care about where the users are or they perceive the value of users to be low.
The two previous charts have different time scales and measure different dimensions. But there’s an underlying sense that they reflect values that companies care about.
Think back to the Target breach. (Or TJX, or any one of many breaches reported over the years, whether they affected passwords or credit cards.)
What if a penalty affected a site’s ranking in search results? For example, it could be a threshold for the “best” page in which it could appear, e.g. no greater than the fourth page (where pages are defined as blocks of N results, say 10). Or an absolute rank, e.g. no higher than the 40th entry in a list.
The penalty would decay over time at a rate, linear or exponential, based on any number of mathematical details. For example, a page-based penalty might decay by one page per month. A list-based penalty might decay by one on a weekly basis.
If the search engines drives a significant portion of traffic — that results in revenue or influences valuation — then this creates an incentive for the site to maintain strong security. It’s like PCI with different teeth. It might incentivize the site to react promptly to breaches. At least one hopes.
But such a proposal could have insidious consequences.
Suppose a site were able to merely buy advertising to artificially offset the rank penalty? After a breach you could have a search engine that’d love to penalize the “natural” ranking of a site only to rake in money as the site buys advertising to overcome the penalty. It’s not a smart idea to pay an executioner per head, let alone combine the role with judge and jury.
A penalty that a company fears might be one for which it suppresses the penalty’s triggers. Keeping a breach secret is a disservice to consumers. And companies subject to the S.E.C. may be required to disclose such events. But rules (and penalties) need to be clear in order to minimize legal maneuvering through loopholes.
The proposal also implies that a search engine has a near monopoly on directing traffic. Yes, I’m talking about Google. The hand waving about “search engines” is supposed to include sites like Yahoo! and Bing, even DuckDuckGo. But if you’re worried about one measure, it’s likely the Google PageRank. This is a lot of power for a company that may wish to direct traffic to its own services (like email, shopping, travel, news, etc.) in preference to competing ones.
It could also be that the Emperor wears no clothes. Google search and advertisements may not be the ultimate arbiter of traffic that turns into purchases. Strong, well-established sites may find that the traffic that drives engagement and money comes just as well from alternate sources like social media. Then again, losing any traffic source may be something no site wants to suffer.
Target is just the most recent example of breaches that will not end. Even so, Target demonstrated several positive actions before and after the breach:
Thankfully, there were no denials, diminishing comments, or signs of incompetence on the part of Target. Breaches are inevitable for complex, distributed systems. Beyond prevention, goals should be minimizing their time to discovery and maximizing their containment.
And whether this rank idea decays from indifference or infeasibility, its sentiment should persist.
It’s a new year, so it’s time to start counting days until we hear about the first database breach of 2014 to reveal a few million passwords. Before that inevitable compromise happens, take the time to clean up your web accounts and passwords. Don’t be a prisoner of bad habits.
It’s good Operations Security (OpSec) to avoid password reuse across your accounts. Partition your password choices so that each account on each web site uses a distinct value. This prevents an attacker who compromises one password (hashed or otherwise) from jumping to another account that uses the same credentials.
At the very least, your email, Facebook, and Twitter accounts should have different passwords. Protecting email is especially important because so many sites rely on it for password resets.
And if you’re still using the password kar120c I salute your sci-fi dedication, but pity your password creation skills.
Start with a list of all the sites for which you have an account. In order to make this easier to review in the future, create a specific bookmarks folder for these in your browser.
Each account should have a unique password. The latest Safari, for example, can suggest these for you.
Next, consider improving account security through the following steps.
Consider Using OAuth — Passwords vs. Privacy
Many sites now support OAuth for managing authentication. Essentially, OAuth is a protocol in which a site asks a provider (like Facebook or Twitter) to verify a user’s identity without having to reveal that user’s password to the inquiring site. This way, the site can create user accounts without having to store passwords. Instead, the site ties your identity to a token that the provider verifies. You prove your identify to Facebook (with a password) and Facebook proves to the site that you are who you claim to be.
If a site allows you to migrate an existing account from a password-based authentication scheme to an OAuth-based one, make the switch. Otherwise, keep this option in mind whenever you create an account in the future.
But there’s a catch. A few, actually. OAuth shifts a site’s security burden from password management to token management and correct protocol implementation. It also introduces privacy considerations related to centralizing auth to a provider as well as how much providers share data.
Be wary about how sites mix authentication and authorization. Too many sites ask for access to your data in exchange for using something like Facebook Connect. Under OAuth, the site can assume your identity to the degree you’ve authorized, from reading your list of friends to posting status updates on your behalf.
Grant the minimum permissions whenever a site requests access (i.e. authorization) to your data. Weigh this decision against your desired level of privacy and security. For example, a site or mobile app might insist on access to your full contacts list or the ability to send Tweets. If this is too much for you, then forego OAuth and set up a password-based account.
(The complexity of OAuth has many implications for users and site developers. We’ll return to this topic in future articles.)
Two-Factor Auth — One Equation in Two Unknowns
Many sites now support two-factor auth for supplementing your password with a temporary passcode. Use it. This means that access to your account is contingent on both knowing a shared secret (the password you’ve given the site) and being able to generate a temporary code.
Your password should be known only to you because that’s how you prove your identity. Anyone who knows that password — whether it’s been shared or stolen — can use it to assume your identity within that account.
A second factor is intended to be a stronger proof of your identity by tying it to something more unique to you, such as a smartphone. For example, a site may send a temporary passcode via text message or rely on a dedicated app to generate one. (Such an app must already have been synchronized with the site; it’s another example of a shared secret.) In either case, you’re proving that you have access to the smartphone tied to the account. Ideally, no one else is able to receive those text messages or generate the same sequence of passcodes.
The limited lifespan of a passcode is intended to reduce the window of opportunity for brute force attacks. Imagine an attacker knows the account’s static password. There’s nothing to prevent them from guessing a six-digit passcode. However, they only have a few minutes to guess one correct value out of a million. When the passcode changes, the attacker has to throw away all previous guesses and start the brute force anew.
The two factor auth concept is typically summarized as the combination of “something you know” with “something you possess”. It really boils down to combining “something easy to share” with “something hard to share”.
Beware Password Recovery — It’s Like Shouting Secret in a Crowded Theater
If you’ve forgotten your password, use the site’s password reset mechanism. And cross your fingers that the account recovery process is secure. If an attacker can successfully exploit this mechanism, then it doesn’t matter how well-chosen your password was (or possibly even if you’re relying on two-factor auth).
If the site emails you your original password, then the site is insecure and its developers are incompetent. It implies the password has not even been hashed.
If the site relies on security questions, consider creating unique answers for each site. This means you’ll have to remember dozens of question/response pairs. Make sure to encrypt this list with something like the OS X Keychain.
Review Your OAuth Grants
For sites you use as OAuth providers (like Facebook, Twitter, Linkedin, Google+, etc.), review the third-party apps to which you’ve granted access. You should recognize the sites that you’ve just gone through a password refresh for. Delete all the others.
Where possible, reduce permissions to a minimum. You’re relying on this for authentication, not information leakage.
Universal adoption of HTTPS remains elusive. Fortunately, sites like Facebook and Twitter have set this by default. If the site has an option to force HTTPS, use it. After all, if you’re going to rely on these sites for OAuth, then the security of these accounts becomes paramount.
Maintain Constant Vigilance
Watch out for fake OAuth prompts, such as windows that spoof Facebook and Twitter.
Keep your browser secure.
Keep your system up to date.
Set a reminder to go through this all over again a year from now — if not earlier.
Otherwise, you risk losing more than one account should your password be exposed among the millions. You are not a number, you’re a human being.
Silicon Valley green is made of people. This is succinctly captured in the phrase: When you don’t pay for the product, the product is you. It explains how companies attain multi-billion dollar valuations despite offering their services for free. They promise revenue through the glorification of advertising.
Investors argue that high valuations reflect a company’s potential for growth. That growth comes from attracting new users. Those users in turn become targets for advertising. And sites, once bastions of clean design, become concoctions of user-generated content, ad banners, and sponsored features.
Sites measure their popularity by a single serving size: the user. Therefore, one way to interpret a company’s valuation is in its price per user. That is, how many calories can a site gain from a single serving? How many servings must it consume to become a hulking giant of the web?
You know where this is going.
The movie Soylent Green presented a future where a corporation provided seemingly beneficent services to a hungry world. It wasn’t the only story with themes of overpopulation and environmental catastrophe to emerge from the late ’60s and early ’70s. The movie was based on the novel Make Room! Make Room!, by Harry Harrison. And it had peers in John Brunner’s Stand on Zanzibar (and The Sheep Look Up) and Ursula K. Le Guin’s The Lathe of Heaven. These imagined worlds contained people powerful and poor. And they all had to feed.
A Furniture Arrangement
To sell is to feed. To feed is to buy.
In Soylent Green, Detective Thorn (Charlton Heston) visits an apartment to investigate the murder of a corporation’s board member, i.e. someone rich. He is unsurprised to encounter a woman there and, already knowing the answer, asks if she’s “the furniture.” It’s trivial to decipher this insinuation about a woman’s role in a world afflicted by overpopulation, famine, and disparate wealth. That an observation made in a movie forty years ago about a future ten years hence rings true today is distressing.
We are becoming products of web sites as we become targets for ads. But we are also becoming parts of those ads. Becoming furnishings for fancy apartments in a dystopian New York.
Women have been components of advertising for ages, selected as images relevant to manipulating a buyer no matter the irrelevance of their image to the product. That’s not changing. What is changing is some sites’ desire to turn all users into billboards. They want to create endorsements by you that target your friends. Your friends are as much a commodity as your information.
In this quest to build advertising revenue, sites also distill millions of users’ activity into individual recommendations of what they might want to buy or predictions of what they might be searching for.
And what a sludge that distillation produces.
There may be the occasional welcome discovery from a targeted ad, but there is also an unwelcome consequence of placing too much faith in algorithms. A few suggestions can become dominant viewpoints based more on others’ voices than personal preferences. More data does not always mean more accurate data.
We should not excuse an algorithm as an impartial oracle to society. They are tuned, after all. And those adjustments may reflect the bias and beliefs of the adjusters. For example, an ad campaign created for UN Women employed a simple premise: superimpose upon pictures of women a search engine’s autocomplete suggestions for phrases related to women. The result exposes biases reinforced by the louder voices of technology. More generally, a site can grow or die based on a search engine’s ranking. An algorithm collects data through a lens. It’s as important to know where the lens is not focused as much as where it is.
There is a point where information for services is no longer a fair trade. Where apps collect the maximum information to offer the minimum functionality. There should be more of a push for apps that work on an Information-to-Functionality relationship of minimum requested for the maximum required.
In the movie, Sol (Edward G. Robinson) talks about going Home after a long life. Throughout the movie, Home is alluded to as the ultimate, welcoming destination. It’s a place of peace and respect. Home is where Sol reveals to Detective Thorn the infamous ingredient of Soylent Green.
Web sites want to be your home on the web. You’ll find them exhorting you to make their URL your browser’s homepage.
Web sites want your attention. They trade free services for personal information. At the very least, they want to sell your eyeballs. We’ve seen aggressive escalation of this in various privacy grabs, contact list pilfering, and weak apologies that “mistakes were made.”
More web sites and mobile apps are releasing features outright described as “creepy but cool” in the hope that the latter outweighs the former in a user’s mind. Services need not be expected to be free without some form of compensation; the Web doesn’t have to be uniformly altruistic. But there’s growing suspicion that personal information and privacy are being undervalued and under-protected by sites offering those services. There should be a balance between what a site offers to users and how much information it collects about users (and how long it keeps that information).
The Do Not Track effort fizzled, hobbled by indecision of a default setting. Browser makers have long encouraged default settings that favor stronger security, they seem to have less consensus about what default privacy settings should be.
Third-party cookies will be devoured by progress; they are losing traction within the browser and mobile apps. Safari has long blocked them by default. Chrome has not. Mozilla has considered it. Their descendants may be cookie-less tracking mechanisms, which the web titans are already investigating. This isn’t necessarily a bad thing. Done well, a tracking mechanism can be limited to an app’s sandboxed perspective as opposed to full view of a device. Such a restriction can limit the correlation of a user’s activity, thereby tipping the balance back towards privacy.
If you rely on advertising to feed your company and you do not control the browser, you risk going hungry. For example, only Chrome embeds the Flash plugin. A plugin that eternally produces vulnerabilities while coincidentally playing videos for a revenue-generating site.
There are few means to make the browser an agent that prioritizes a user’s desires over a site’s. The Ghostery plugin is an active counteraction to tracking; it’s available for all the major browsers. Mozilla’s Lightbeam does not block tracking mechanisms by default; it reveals how interconnected tracking has become due to ubiquitous cookies.
Browsers are becoming more secure, but they need a site’s cooperation to protect personal information. At the very least, sites should be using HTTPS to protect traffic as it flows from browser to server. To do so is laudable yet insufficient for protecting data. And even this positive step moves slowly. Privacy on mobile devices moves perhaps even more slowly. The recent iOS 7 finally forbids apps from accessing a device’s unique identifier, while Android struggles to offer comprehensive tools.
The browser is Home. Apps are Home. These are places where processing takes on new connotations. This is where our data becomes their food.
Soylent Green’s year 2022 approaches. Humanity must know.