Modern PHP has successfully shed many of the problematic functions and features that contributed to the poor security reputation the language earned in its early days. Settings like safe_mode mislead developers about what was really being made “safe” and magic_quotes caused unending headaches. And naive developers caused more security problems because they knew just enough to throw some code together, but not enough to understand the implications of blindly trusting data from the browser.

In some cases, the language tried to help developers – prepared statements are an excellent counter to SQL injection attacks. The catch is that developers actually have to use them. In other cases, the language’s quirks weakened code. For example, register_globals allowed attackers to define uninitialized values (among other things); and settings like magic_quotes might be enabled or disabled by a server setting, which made deployment unpredictable.

x=logb(by)

But the language alone isn’t to blame. Developers make mistakes, both subtle and simple. These mistakes inevitably lead to vulns like our ever-favorite HTML injection.

Consider the intval() function. It’s a typical PHP function in the sense that it has one argument that accepts mixed types and a second argument with a default value. (The base is used in the numeric conversion from string to integer):

int intval ( mixed $var , int $base = 10 )

The function returns the integer representation of $var (or “casts it to an int” in more type-safe programming parlance). If $var cannot be cast to an integer, then the function returns 0. (Confusingly, if $var is an object type, then the function returns 1.)

Using intval() is a great way to get a “safe” number from a request parameter. Safe in the sense that the value should either be 0 or an integer representable on the platform. Pesky characters like apostrophes or angle brackets that show up in injection attacks will disappear – at least, they should.

The problem is that you must be careful if you commingle the newly cast integer value with the raw $var that went into the function. Otherwise, you may end up with an HTML injection vuln – and some moments of confusion in finding the problem in the first place.

The following code is a trivial example condensed from a web page in the wild:

<?php $s = isset($_GET['s']) ? $_GET['s'] : '';
$n = intval($s);
$val = $n > 0 ? $s : '';
?>
<!doctype html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<form>
<input type="text" name="s" value="<?php print $val;?>"><br>
<input type="submit">
</form>
</body>
</html>

At first glance, a developer might assume this to be safe from HTML injection. Especially if they test the code with a simple payload:

https://web.site/intval.php?s="><script>alert(9)<script>

As a consequence of the non-numeric payload, the intval() has nothing to cast to an integer, so the greater than zero check fails and the code path sets $val to an empty string. Such security is short-lived. Try the following link:

https://web.site/intval.php?s=19"><script>alert(9)<script>

With the new payload, intval() returns 19 and the original parameter gets written into the page. The programming mistake is clear: don’t rely on intval() to act as your validation filter and then fall back to using the original parameter value.

Since we’re on the subject of PHP, we’ll take a moment to explore some nuances of its parameter handling. The following behaviors have no direct bearing on the HTML injection example, but you should be aware of them since they could come in handy for different situations.

One idiosyncrasy of PHP is the relation of URL parameters to superglobals and arrays. Superglobals are request variables like $_GET, $_POST, and $_REQUEST that contain arrays of parameters. Arrays are actually containers of key/value pairs whose keys or values may be extracted independently (they are implemented as an ordered map).

It’s the array type that often surprises developers. Surprise is undesirable in secure software. With this in mind, let’s return to the example. The following link has turned the s parameter into an array:

https://web.site/intval.php?s[]=19

The sample code will print Array in the form field because intval() returns 1 for a non-empty array.

We could define the array with several tricks, such as an indexed array (i.e. integer indices):

https://web.site/intval.php?s[0]=19&s[1]=42
https://web.site/intval.php?s[0][0]=19

Note that we can’t pull off any clever memory-hogging attacks using large indices. PHP won’t allocate space for missing elements since the underlying container is really a map.

https://web.site/intval.php?s[0]=19&s[4294967295]=42

This also implies that we can create negative indices:

https://web.site/intval.php?s[-1]=19

Or we can create an array with named keys:

https://web.site/intval.php?s["a"]=19
https://web.site/intval.php?s["<script>"]=19

For the moment, we’ll leave the “parameter array” examples as trivia about the PHP language. However, just as it’s good to understand how a function like intval() handles mixed-type input to produce an integer output; it’s good to understand how a parameter can be promoted from a single value to an array.

The intval() example is specific to PHP, but the issue represents broader concepts around input validation that apply to programming in general:

First, when passing any data through a filter or conversion, make sure to consistently use the “new” form of the data and throw away the “raw” input. If you find your code switching between the two, reconsider why it apparently needs to do so.

Second, make sure a security filter inspects the entirety of a value. This covers things like making sure validation regexes are anchored to the beginning and end of input, or being strict with string comparisons.

Third, decide on a consistent policy for dealing with invalid data. The intval() is convenient for converting to integers; it makes it easy to take strings like “19”, “19abc”, or “abc” and turn them into 19, 19, or 0. But you may wish to treat data that contains non-numeric characters with more suspicion. Plus, fixing up data like “19abc” into 19 is hazardous when applied to strings. The simplest example is stripping a word like “script” to defeat HTML injection attacks – it misses a payload like “<scrscriptipt>”.