Click depth is a useless scanner option

When web site owners want to measure how their visitors get from point A (say, the home page) to point B (such as finalizing a purchase), they might use a metric called click depth or link depth. This represents the number of clicks required to get from link A to link B. Sites strive to minimize this value so users may more efficiently perform actions without being distracted or frustrated — and consequently depart for other venues. The depth of a link also implies that popular or important pages should have lower values (i.e. “closer” to the home page, easier to find) than less important pages. This train of thought might make sense superficially, but this reasoning derails quickly for web scanners.

There’s merit to designing a web application, or any human interface, to have a high degree of usability. Minimizing the steps necessary to complete an action helps achieve this. Plus, your users will appreciate good design. Web application scanners are not your users, they don’t visit your web site and follow workflows that humans do.

Click depth for web scanning is useless. Pointless. It’s a long string of synonyms for pointless when used as a configuration option, doubly so when scanning web sites that use a JavaScript-driven UI or implement simple Search Engine Optimization (SEO) techniques.

There’s a long list of excuses why someone might want to rely on click depth as an option for web scanning: Links on the home page are more likely to be attacked, vulnerabilities with low click depth are easier to find, opportunistic attackers are the biggest threat, scans run faster. Basically, these arguments directly correlate link popularity with risk. The simple rejoinder is that all links have a depth of 1 in the face of automation. An attacker who invests effort into scripts that search for vulnerable links doesn’t care how deep a link is, just that the scripts finds one.

Whether the correlation of link popularity and risk rings true or not, having the scanner calculate the click depth is fundamentally incorrect. Visitors’ behavior influences a link’s popularity, not the calculation of a scanner. A superior approach would be to use analytics data to identify popular links, then feed that list to the scanner.

Another reason for click depth’s inutility is the positive trend in web application design to create richer, more interactive interfaces in the browser that use lighter-weight data requests back to the web site. This is reflected in the explosion of Ext JS, Prototype, YUI, and other JavaScript libraries designed to provide powerful UI features along with concise request/response handling using JSON and asynchronous requests. This also has the effect of flattening web applications in terms of the number of clicks required to accomplish tasks. Even more significantly it has the effect of separating links into two conceptual buckets: one for links that show up in the browser bar and another for “behind the scenes” links used for API requests. Both link buckets are important to security testing, but the idea of click depth among them has little meaning.

SEO techniques can also flatten a page’s apparent link depth. A technique common to e-commerce sites is to create a long list of links on the home page that reach deep into the site’s product catalog. It’s not uncommon to see several dozen links at the bottom of a home page that point to a different product pages ad nauseum. (The purpose of which is to make sure search engines find all of the site’s products so users looking for a particular shade of Unicorn-skin rugs will find that site over all others.) This sets an artificially low depth for many, many pages. A human is unlikely to care about the slew of links, but a scanner won’t know the difference.

We’ve reached three reasons so far: Automated scanning gives every link an effective click depth of 1, browser-heavy sites have flat APIs, and SEO techniques further reduce apparent link depth. In spite of this, click depth appeared at some point in scanner designs, an OWASP project makes it a point of evaluation (among several poor criteria), and users often ask for it.

One understandable motivation behind click depth is trying to obtain some degree of depth or breadth in the coverage of a web site’s functionality. Notice that coverage of a site’s functionality differs from coverage of the site’s links. Sites might contain vast numbers of links that all exercise the same, small number of code paths. It’s these code paths in the web application where vulnerabilities appear. This sense of click depth actually intends to convey coverage. It’s highly desirable to have a scanner that avoids making dozens of redundant requests, following recursive links, or getting stuck in redirect loops. A good scanner handles such situations automatically rather than burdening the user with a slew of configuration options that may not even have a bearing on the problem.

One thought on “Click depth is a useless scanner option”

  1. While I agree with the theory here — in practice, it is very nice to have an app scanner include a feature to control link/crawl depth.In particular, this feature is excellent if you already have a good crawl (or full-knowledge including a directory tree). Many app scanners/crawlers will brute-force directories, and there is no point brute-forcing farther than one below the lowest possible directory (for files in that directory).One might run, for example, wget in spider mode, in order to quickly determine this number. Then, a fast app scanner such as skipfish could be run. I cannot stress this enough, but providing skipfish the "-c" argument with the correct link depth will most certainly complete the scan in a more timely manner, with fewer false positives and with less strain on the web server, especially it's logging interface.Other app scanners such as Netsparker or NTOSpider need not necessarily bother with these link depth settings, because they are built with these problems in mind.I find it is increasingly unnecessary and detrimental to run a spider/crawler in a web application security assessment. In my case, I prefer to visually look at the pages I'll be testing and fill in the HTML forms, and otherwise deal with the workflow by myself. I use a browser to do this, but I certainly use tools to help me automate this work. In Firefox, I utilize Multi Links (to select and open all links on a page); in Chrome I use Snap Links Lite. For filling out forms, Firefox has a popular add-on called Web Developer that can "Populate Form Fields", while Chrome has an extension called Form Fuzzer. Of course, these tools have trouble with Flash, Ajax, and other RIA frameworks — but those particular technologies call for their own special, customized automation for other assessment activities. I suggest the popular and common unit and component testing frameworks, especially ones that include a protocol/application/browser driver combination, such as WebDriver, flash-selenium, flex-ui-selenium, or silverlight-selenium.However, there are times when an automated view of a very large web infrastructure can be helpful. Besides digging DNS and IPs with BGP views and tools like Fierce-v2, and busting vhosting with tools like MyIPNeighbors, Metasploit, and host-extract.rb scripts — there are a few good web-based tools out there. Skipfish, already mentioned, has the capability to scan all domains found in the crawling process with "-D". I also think that SHODAN is incredibly powerful for analyzing web server activity over time. There are a few nice discovery and grep features in w3af that make it a good candidate to aid in this work, but it is a bit problematic compared to skipfish, wget, and others — I find it to be unstable unless configured with care. It is also possible to build your own custom crawler — one of the best I've seen lately (a balance of performance and API capability) is crawler4j.googlecode.comMay your scans and crawls be ever fruitful, but do not forget the power of careful manual review with a few good tools that automate some (but not all) of your manual efforts.

Comments are closed.