Click depth is a useless scanner option

When web site owners want to measure how their visitors get from point A (say, the home page) to point B (such as finalizing a purchase), they might use a metric called click depth or link depth. This represents the number of clicks required to get from link A to link B. Sites strive to minimize this value so users may more efficiently perform actions without being distracted or frustrated — and consequently depart for other venues. The depth of a link also implies that popular or important pages should have lower values (i.e. “closer” to the home page, easier to find) than less important pages. This train of thought might make sense superficially, but this reasoning derails quickly for web scanners.

There’s merit to designing a web application, or any human interface, to have a high degree of usability. Minimizing the steps necessary to complete an action helps achieve this. Plus, your users will appreciate good design. Web application scanners are not your users, they don’t visit your web site and follow workflows that humans do.

Click depth for web scanning is useless. Pointless. It’s a long string of synonyms for pointless when used as a configuration option, doubly so when scanning web sites that use a JavaScript-driven UI or implement simple Search Engine Optimization (SEO) techniques.

There’s a long list of excuses why someone might want to rely on click depth as an option for web scanning: Links on the home page are more likely to be attacked, vulnerabilities with low click depth are easier to find, opportunistic attackers are the biggest threat, scans run faster. Basically, these arguments directly correlate link popularity with risk. The simple rejoinder is that all links have a depth of 1 in the face of automation. An attacker who invests effort into scripts that search for vulnerable links doesn’t care how deep a link is, just that the scripts finds one.

Whether the correlation of link popularity and risk rings true or not, having the scanner calculate the click depth is fundamentally incorrect. Visitors’ behavior influences a link’s popularity, not the calculation of a scanner. A superior approach would be to use analytics data to identify popular links, then feed that list to the scanner.

Another reason for click depth’s inutility is the positive trend in web application design to create richer, more interactive interfaces in the browser that use lighter-weight data requests back to the web site. This is reflected in the explosion of Ext JS, Prototype, YUI, and other JavaScript libraries designed to provide powerful UI features along with concise request/response handling using JSON and asynchronous requests. This also has the effect of flattening web applications in terms of the number of clicks required to accomplish tasks. Even more significantly it has the effect of separating links into two conceptual buckets: one for links that show up in the browser bar and another for “behind the scenes” links used for API requests. Both link buckets are important to security testing, but the idea of click depth among them has little meaning.

SEO techniques can also flatten a page’s apparent link depth. A technique common to e-commerce sites is to create a long list of links on the home page that reach deep into the site’s product catalog. It’s not uncommon to see several dozen links at the bottom of a home page that point to a different product pages ad nauseum. (The purpose of which is to make sure search engines find all of the site’s products so users looking for a particular shade of Unicorn-skin rugs will find that site over all others.) This sets an artificially low depth for many, many pages. A human is unlikely to care about the slew of links, but a scanner won’t know the difference.

We’ve reached three reasons so far: Automated scanning gives every link an effective click depth of 1, browser-heavy sites have flat APIs, and SEO techniques further reduce apparent link depth. In spite of this, click depth appeared at some point in scanner designs, an OWASP project makes it a point of evaluation (among several poor criteria), and users often ask for it.

One understandable motivation behind click depth is trying to obtain some degree of depth or breadth in the coverage of a web site’s functionality. Notice that coverage of a site’s functionality differs from coverage of the site’s links. Sites might contain vast numbers of links that all exercise the same, small number of code paths. It’s these code paths in the web application where vulnerabilities appear. This sense of click depth actually intends to convey coverage. It’s highly desirable to have a scanner that avoids making dozens of redundant requests, following recursive links, or getting stuck in redirect loops. A good scanner handles such situations automatically rather than burdening the user with a slew of configuration options that may not even have a bearing on the problem.