Writing Reasonable PHP

PHP gets ragged on a lot for various reasons. One of the biggest complaints I see is that PHP is “insecure” as if writing bad code in PHP is somehow PHP’s fault. The other major complaint is not so much a complaint against the core language as against the standard library and runtime environment and refers to the chaotic nature of the standard functions in particular. Complaints about the latter have merit but PHP is far from the only popular language to have that problem. The former might have some merit but it is just as ridiculous as blaming C because programmers write buffer overflows. It is not strictly PHP’s fault when programmers do stupid things. Granted, PHP makes a lot of stupid things very easy and some of the early design decisions for the PHP runtime environment are questionable in hindsight, but writing sensible PHP code is not impossible or even especially difficult.

Types of PHP Code

Before I delve too far into the intricacies of PHP, let me touch on the types of coding that PHP can be used for.

PHP was designed (or evolved, really) as a means to enhance largely static web pages. It fit into the same niche as Microsoft’s active server pages. It was designed to make adding a small amount of dynamic content to an otherwise largely static page easy. While this is still common today, it is no longer the primary use case. This is also the reason for a lot of the somewhat questionable design decisions for the runtime environment (such as the ever popular and justifiably maligned “register_globals” feature).

As it gained popularity, it began to edge out the use of CGI scripts written in perl or other languages. This was partly due to the complexity of dealing with CGI on most servers and partly due to the fact that PHP itself handled all of the boilerplate stuff needed to deal with the CGI interface – decoding script input primarily. Thus, PHP scripts moved more toward being PHP code with HTML content embedded in it instead of HTML code with PHP embedded in it. Some of the more unfortunate design decisions were addressed at this point (during the 4.x series), including the “register_globals” problem, with the introduction of the “superglobal” arrays and a few other things. PHP also gained a sort of object orientation and a massive collection of “extensions”, many of which are bundled and/or enabled by default. This type of coding is the most common today – programs that are still intended to run in a web server environment and resemble the classic CGI script more than the classic “active page” model.

Finally, PHP gained a command line variant. With a few tweaks to the runtime environment, it became possible to write programs that do not depend on the presence of a web server or the CGI interface specification. Most of the historical runtime design issues do not apply to a command line PHP program. However, the source format remains the same including the PHP open/close tags.

A Sensible PHP Environment

A great deal of sanity can be obtained before a single PHP statement is written by setting up the environment in a sensible manner. Most of the features of PHP that are maligned (often justifiably) by critics can be turned off in the PHP configuration file. Notably, one should turn off register_globals, all magic quotes variants, register_long_arrays, allow_url_include, and allow_url_fopen. There are other configurations that make sense to disable too, depending which extensions you are using.

It should be noted that disabling some of these settings makes coding less convenient. However, often the convenience comes at the cost of clarity or even security.

Writing PHP Code

Most of the recommendations here apply to all programming languages. Let me stress that. Writing good code requires discipline in any language.

Check Inputs

One of the biggest sources of problems with any program is failure to check input data. Anything input by a user must be viewed as suspect. After all, the user might be malicious or simply make an error. Relying on user input to be correct is never the right thing to do. Steps must be taken to ensure that bogus input data does not cause your program to misbehave. Inputs that cannot be handled should produce error conditions in a controlled manner.

Many programmers do grasp this concept intuitively. Input checking code is often present when handling direct user input. However, most overlook the simple fact that data coming from anywhere outside the program code itself must be treated as suspect. You cannot be certain that what you wrote to a data file is still in that file. It could have been corrupted by a hardware failure, user error, or the file could have been replaced with another type of file, all without your program being aware of it. The same applies to data stored in a database system like MySQL or in a session cache or a shared memory cache somewhere.

The advice here: Verify everything. Failure to correctly do so  is not a weakness in PHP but in the programmer. It is also the single largest source of security problems. Careful adherence to this principle will quickly yield much better code.

Check Returns

Closely related to the previous item, and high up on the list of programmer errors, is failing to check return values from function calls. Most library functions will have some sort of return value. For functions that can fail for whatever reason (bad parameters fed in, external state, etc.), it is absolutely critical to check for those failure conditions and handle them in a manner that is appropriate for your program. These conditions can be as simple as a data file being missing or as complicated as a remote socket connection timing out or the database server going away.

Study all function calls you use and make certain you understand what failure conditions exist. If a failure condition will cause your program to fail or otherwise misbehave, handle it. If a failure condition is impossible, it is doubly critical to handle it. That said, if a failure condition will not cause your program to misbehave or otherwise fail, it can be ignored, but make absolutely certain that is the case and document why.

The advice here: Always check return values.

Protect Output

This one is a lot less obvious and is best explained by example. Suppose you are outputting some text into an HTML document and you do not know in advance what characters that text contains. In HTML, some characters have special meanings (such as quotes) but are also valid in actual text. These special characters have to be protected in a medium appropriate way. In the HTML case, they would be replaced with appropriate entities. This is a common case in PHP programming but it is not the only one. The same applies when passing data to a database system like MySQL using SQL or when passing command arguments to an an external program. Failure to protect output properly is the leading cause of a class of security vulnerabilities known as SQL injection attacks. There are analogs for other output streams too. Sometimes the corruption of the output stream is mostly harmless like when an unprotected comma is inserted into a CSV field in an informational spreadsheet. Other times, it can cause cascading failures or even allow clever attackers to obtain private data.

The advice: Always protect output, no matter where it is destined.

Use Correct Operators

This is more specific to PHP but there are similar situations in other languages. In PHP specifically, there are two equality and two inequality operators. One set does loose type handling and attempts to find some means to compare its operands to the point of doing type conversions behind the scenes. The other set will fail if the underlying types of the two operands are different even if the apparent values are the same. The “==” and “!=” operators are the first set and “===” and “!==” are the second set.  Using the former, the string “0” and the number 0 will compare as equal while with the second they will not. This is important because many functions will return “false” on an error but some other type (like a number) on success. If you use the loose comparisons, “false” and “0” are equal but they are not with the strict comparisons.

PHP also has a number of functions which can be used to identify NULL values, arrays, and so on, which can also be employed when the type of a value is important.

In most cases, the strict comparison operator is probably the better choice but the loose comparison can be useful. In short, write what you mean using the correct operators. Make sure you know exactly what the operator you choose is doing.

Using Language Constructs

Like any programming language, PHP has a number of language constructs that are very useful but there are other ways that similar effects can be achieved. For a trivial example, consider the use of a long “if/elseif/elseif/else” structure comparing a single variable against a series of values. This can also be expressed using a “switch” statement. In this trivial example, either one is valid and is about equivalent though the “switch” statement has a few features that might make it more useful in some circumstances. Likewise, a “for” loop can always be faked using “while”.

On the other hand, there are cases where an alternative is not equivalent. Consider the case of “include/require” vs. a function call.. While the fact that you can include the same file in dozens of different places looks a lot like a function call, and can often be used for a similar effect, it is not the same thing. The included code runs in the same scope as the location of the include directive, for instance, which means that any variables in the including file might be scribbled over by the included file. Parameters also must be passed in variables and return values returned the same way. It is also not possible to use such a “function” recursively. On the other hand, an actual function call gains its own local variable scope, preventing the function from clobbering variables in the caller, and also has a formalized parameter list and return value. Furthermore, functions can be called recursively which is also incredibly useful. Thus, it is important to use the right construct for the job. “include” is not the right construct to execute a chunk of code from random locations. (I have singled this particular one out because it shows up far to often in PHP code.)

The advice: use the right language construct for the job. This applies not only to things like “include” but also to things like objects. Creating an object to “encapsulate” a behaviour adequately described by a single function is just as silly as using “while” to simulate “for”.

Wrap Up

The preceding is, by no means, exhaustive. However, by following the above recommendations, it is possible to write reasonable PHP code. All it requires is a bit of discipline and an understanding of the language you are using.

I should note that this is not an apology for PHP but merely a set of suggestions to avoid writing bad code. Remember. Just because PHP allows you to do something in a particularly unfortunate way, it does not mean that you have to do it that way. If it looks like a bad way to do things, look for a better way. Odds are pretty good you will find one.

 

Leave a Reply

Your email address will not be published. Required fields are marked *