Egregious PHP Code

I know many folks reading this will be thinking that “Egregious PHP Code” is redundant. However, it need not be if proper discipline and understanding of PHP is deployed. Rather than discussion how to do PHP right, whatever that means, I’m going to discuss some examples of how I have seen it done wrong. As with many things, it is easier to identify things that are wrong than to know that something is unequivocably right.

On a recent project, I encountered some of the most shockingly bad PHP code I have ever seen. This is much worse than the typical reliance on “register_globals” or various other trivial errors made by novices and inattentive copy-pasters. No, this was shockingly bad because it was trying to be clever, and failed miserably.

Let me start with the most obvious problem. Anyone old enough to remember Basic on the old micros will remember line numbers and the old GOSUB command. Now GOSUB was wonderful for many purposes. But the only way to pass parameters to a subroutine (or get a value out) was using global variables. That is because the old Basic dialects did not have local variables and they did not have the notion of functions in the sense that modern (or even not-so-modern) languages have them. For the C types reading, it is the equivalent of having every function declared as “void foo(void)”. Now PHP is a modern language (say what you will about it, it is modern) so it has proper functions with local variables.

How does all of that relate to this egregious code? Consider the following:

if ($condition)
{
    require('foo.php');
}
else
{
    require('bar.php');
    $oldval = $val;
    require('foo.php');
}

That is the basic structure of code I encountered all over this one project. The project is littered with similar examples. Within a single source file, it is not uncommon to find a handful of require statements referencing the same source file. In other words, the PHP require keyword is being used in the same manner as the Basic GOSUB statement.

This is bad for more than the obvious reason of global variables being required to use these subroutines properly. For one, it can significantly increase the parsing and compilation overhead at the start of each script, though this is hardly noticeable on modern servers unless the site using this technique is very busy. More importantly, it prevents recursion and makes identifying what would be a safe variable name to use difficult. And finally, it makes tracing the code extremely difficult if you need to figure out how it works, a problem I encountered immediately. Until I figured out that the require statement was being used like GOSUB, I found it difficult to comprehend.

This same code was attempting to be a general purpose data entry/editing system driven by configuration files. It succeeded at that goal, to a degree. It was structured so that each table in the database would have its own configuration file which would be loaded by the body of the code as needed. What I found in a configuration file looked something like this:

$i = 0;
$attrib1[$i] = 'val1';
$attrib2[$i] = 'val2';
$attrib3[$i] = 4;
$i++;
$attrib1[$i] = 'anotherval';
$attrib2[$i] = 'anotherval2';
$attrib3[$i] = 42;
$i++;
$numobjects = $i

That’s another interesting structure. It is also reminiscent of the old Basic days. Basic lacked any data structure support, and also lacked arrays indexed by strings. This was largely due to resource limitations of the computers those early Basic interpreters ran on. It was not uncommon to see a data structure constructed similarly to the above example.

Now there are two interesting problems with the above example. The first is obvious. It would be better structured as a multi-level array which would eliminate the need for both the $i counter and the $numobjects value at the end. The other problem is that it is a configuration file but it is also PHP code. While this is not necessarily an problem, it does, again, mean that anything including that file must take steps to avoid having its own variables clobbered.

The above does work well as long as you only ever need to have one configuration loaded at any given time. But what happens if you need to operate on multiple configurations at the same time? What if you wish to process the configurations recursively for some reason? With the global variable configuration using the same variable name(s) in each different table configuration, such operation is nearly impossible, and is certainly inconvenient. Combine that with the require-as-gosub technique and it becomes impossibly complicated.

The code does a number of other things that naïve programmers often do, such as using an input value from the browser directly in a file inclusion decision.  It attempts to prevent exploitation of this by appending this value to a directory name and checking if the resulting file exists, and if not it throws a cryptic error. However, it does nothing to project against directory separators or the “..” parent directory name in the referenced string. That means, someone with sufficient cleverness could convince the scripts to dump out the contents of, say, /etc/passwd on a unix system, since that file exists and thus will not trigger the cryptic error message. That is trivially corrected by checking for both “/” and “.” in the passed value as well as whether the file exists. But the code does not check this in just one place! There are at least two dozen separate cases that duplicate the “does the file exist” check!

Most of the rest of the nastiness in this particular project stems from the require-as-gosub and configuration file patterns described above. In most cases, the require statements could be replaced with a single library file that is included at the start of each source file. Then each separate file in the existing scheme could be replaced with a function in the global library file. This would eliminate the global variable problem, even if the configuration files were not changed since including the configuration file within a function would set the configuration values in the local namespace of the function rather than the global one.

There is one use of the require-as-gosub pattern that is more difficult to eliminate, however. It is being used as an iterator over the configuration values. This could be solved by having a generic iteration function in the global library that takes as a parameter a callback function to perform actions on each item. There are other options, as well, depending how much of the configuration file structure one wishes to hide.

By eliminating the require-as-gosub pattern entirely, the proliferation of cryptic variable names can disappear. Additionally, the need to scan through as many as a dozen files to figure out what is going on with any given piece of code can be substantially reduced to include the global library file and the file being examined. It even becomes reasonable to split the global library into bits that are not necessarily always all needed. Even this way, it becomes clearer because those bits needed will be referenced in exactly one place in the code referencing them instead of potentially dozens.

Most of the rest of the complexity comes from the fact that the code is attempting to be a general purpose data entry/editing framework. But by using proper functions instead of require-as-gosub, the actual logic of the framework would be clearer and it would be much more flexible, and, as a consequence, much more maintainable.

 

Leave a Reply

Your email address will not be published. Required fields are marked *