Bicycles on Roads

In recent years, there has been an insane amount of effort to improve cities with respect to non-automobile traffic. Most of this effort has been focused on bicycles, often to the exclusion of even pedestrians. Now, don’t get me wrong. Bicycles are a good idea. So are tricycles, which, in my opinion, are a bit safer in stop and go traffic. However, it is my considered opinion that while encouraging bicycle use is a good idea, we’re going about it the wrong way.

First off, there is a major conflict between bicycle traffic and automobile traffic. The fact that automobiles can travel upwards of 100 km/h while bicycles are limited to much lower speeds (in most circumstances) causes an irreconcilable conflict. Have you ever wondered why slow moving vehicles are not permitted on many freeways? Think about it for a bit and you’ll understand. The same applies to bicycles.

Now, this problem is not inherently unsolvable. (Well, it might be in some well built up areas.) But the solution of painting a bike lane on the right hand side of a road and somehow magically expecting that to solve things is, pardon the language, fucking stupid. This solution works well on a through road with no cross traffic and a sufficient buffer space between the bike lane and the rest of the traffic. However, that’s not the usual case. More usually, the bike lane is painted on the right hand side of a regular street with fairly closely packed cross streets. At each intersection, right turning automobiles have to cross the bike lane. Sounds find to you? Think about this. Why are right turns only permitted from lanes where all traffic to the right (if any) also turns right? That’s right. Safety. Yet we somehow think that a through lane for bicycles to the right of right turning autombiles is a good idea. Are we fucking stupid‽ And better yet, we legislate that the bicycle has the right of way.

If that still sounds reasonable to you, let’s examine a very common situation. I will be driving the autombile and you will be riding the bicycle. I want to turn right but there are pedestrians in the crosswalk and I have to wait for them. I have my blinker on and I am waiting. Finally, the crosswalk clears and I proceed. I checked the bike lane as much as I can reasonably see but it disappears around a curve a short distance behind me. Unimpeded by the traffic snarl caused by my wait for the pedestrians, you are travelling along the bike lane at 40 km/h (which is not much of a stretch for someone in decent condition). By the time I am half way across the bike lane (automobiles take a bit to get moving from a standing start), you appear to my right and one of three things happens: you smash into me, you swerve in front of me and I hit you, or you stop.  After all, I am not going to see you because once I have entered the bike lane, there is no point watching it any more. Instead, I absolutely must be watching the road ahead of me to avoid any obstacle there.

Now you’re thinking that in that accident, I am at fault because I crossed the bike lane when it was unsafe to do so. And, legally, you would probably be correct. Yet it was clear when I entered it based on the visibility I have in the automobile. The geometry of the roads prevented me from seeing you. (Maybe it wasn’t a curve. Maybe it was the crest of a hill or a massive delivery vehicle blocking the view.) Even if it were your responsibility to avoid me (an arguably reasonable position), the same thing that blocked my view would likely also block yours. While your stopping distance and maneuverability might be superior, it might still be difficult or impossible to safely avoid a collision.

The problem gets even worse when you introduce street side parking since the bike lane is almost certainly to the left of the parked cars. Now the cyclist has to worry about parked cars coming from standing start trying to get into traffic. And traffic coming at speed trying to park. Both such activities are complex and require looking in multiple places. It becomes very easy to miss that bicycle coming along from the rear. Even if the cyclist must yield, there is still the problem of car doors flinging open and other such hazards. Of course, if you put the bike lane on the right hand side of the parked cars, you can avoid a lot of this, leaving only the previous issue at intersections and possibly car doors opening. And that also assumes that I, the motorist, can actually park correctly, which is doubtful.

The solution is fairly obvious. For roads with any amount of traffic, cyclists and motor vehicles need to be separated, much the same way pedestrians and motor vehicles already are. The notion that a bicycle is a “vehicle” that should be treated like all others is patently absurd (or fucking stupid). It may be that pedestrians and bicycles also need to be separated (for similar reasons) but pedestrians and sensibly operated bicycles can more easily coexist than bicycles and automobile traffic for the simple reason that collisions between the two are, on the whole, less dangerous due to a more equal weight distribution. Of course, there is still a conflict at intersections but that can be generally solved with signal phasing, grade separation, or other things that work fairly well for pedestrians. As with all cases of conflicting traffic, there will be no ideal solution.

Now let’s assume that what we’re doing with bike lanes, paths, etc., is all brilliantly workable. Cyclists themselves need to take some responsibility for their own safety and realize that they are participating in a larger traffic system. The number of times I’ve seen cyclists blasting along at unsafe speeds, going the wrong way on one way roads, ignoring traffic signals, crossing lanes of traffic without looking, mowing pedestrians down in crosswalks and on sidewalks, and generally ignoring even the rules of sanity let alone the rules of the road is depressing. Until we force cyclists to obey the rules, there is no way motorists or pedestrians are going to be pleased with more of them out there. There needs to be consquences for violating the rules. Speeding? You get a ticket. Ignore a signal? Get a ticket. Cause an injury? See you in court. Unsafe riding? Get a ticket. Improper equipment on the bicycle (lack of brakes, no lights, etc.)? The bike is impounded. Perhaps bicycles should require registration just like motor vehicles? Clearly, we need to do something to get cyclists to behave.

So what is the point of this rant? Well, I am all for improved safety for cyclists. But it requires more thought than simply throwing unfair rules at motorists or painting a few lines on roads that are already too narrow for the traffic they have. It requires effort from the cyclists, too. And it also requires that we recognize that cycling is not practical in all climates. In a Calgary winter, for instance, cycling is suicidal even without motor vehicles. It’s simply too cold or too variable a lot of the time.

So let’s be smart about this shall we?

Why Wikis Are Stupid

For a while, having a wiki was what all the cool kids were doing. It was the ultimate in collaboration, allowing anybody who cared to edit any page they wanted to to read anything they wanted. Certain high profile successes (Wikipedia for instance) only served to make the idea even more cool. For a time, wikis sprang up everywhere, even for sites where a traditional content management system made as much or more sense.

What people failed to realize at the time was that the very feature of wikis that makes them so useful for collaboration – their wide open editing policies – also makes them wide open for abuse by the less scrupulous types out there. It should not have come as any surprise that open wikis suddenly became the hottest venue for spammers and other nefarious types to peddle their wares. After all, it happened on UseNET, email, forums, and every other open forum.

These days, running an open wiki requires intense oversight from the administrator. Once the spammers find an open wiki, they will hammer away at it until the end of time, adding ever more garbage to the content. Even simply reverting the spam edits does not remove the contents from the wiki which, after all, stores previous revisions which are still accessible. No, running an open wiki requires being able to permanently remove the garbage and that must be done continually.

Of course, most wikis are really targeted at a fairly small community so restricting edits to authorized users is a reasonable solution. But that also requires some oversight from the administrators. If one simply allows anyone who wishes to create an account and start editing immediately, the abuse noted above will sill occur. After all, spammers can register accounts as easily as anyone else. That means there must be a manual vetting stage for new accounts and that requires administrator intervention. And even once an account is approved, the activity must still be monitored and abusive accounts stopped in their tracks.

In the light of all that, is a wiki a good idea? In the general case, no, it is not. Not even a moderated one. Before putting up a wiki, you should consider carefully whether you really need the functionality. Is there a real benefit to it? Are you really just creating a site where people can discuss things? If so, maybe a forum is better. Are you just trying to get your information out and only a handful of people in your organization will be editing content? If so, a standard content mangaement system is probably all you need.

The fact that wikis are fairly rare compared to the number of sites out there should tell you something. And among the wikis that do exist, almost all require some form of authorization before edit access is allowed. That should also tell you something.

In short, wikis (as originally imagined) are stupid. They simply ignore the nature of the population in general.

Patent Reform

Patents as currently implemented are totally nonfunctional and a generally stupid idea. Note that I’m talking about patents here, not copyright which is something totally different. In general terms, a patent is a monopoly grant to the patent holder, a monopoly which has legal force, usually for a limited time. Now, the notion of rewarding an inventor with a limited monopoly is, in general, a sound idea. However, patents have become particularly problematic in their current incarnation.

The biggest problem with most current patent systems is that they permit patenting things that are clearly not inventions. Computer software, for instance, is not an invention and should not be patentable in any form. In particular, algorithms for accomplishing tasks on computers should not be any more patentable than mathematical algorithms. After all, an algorithm itself is not a device! Similarly, simply finding a genetic structure in nature somewhere should not grant a patent to the discoverer. Even if the genes are artificial, it is dangerous to grant a monopoly on something that is inherently uncontrollable. What happens when the same genetic code appears in a human being?

Patents generally have a limited term, which is good. The term might be too long for many things, but it does, at least, expire in a predictable fashion. However, some patent systems, such as the one in the United States, make any patent under review secret and allow extensions to the review process which allows a nefarious actor to essentially hide a patent until someone else manages to come up with the same thing, let their patent pass the review process, and than sue the poor sucker who had no possible way of knowing he was violating a patent that was not available to learn about. This is your basic submarine patent if I have my terminology right.

I will avoid turning this into a long rant about the ills of patents. Instead, I will switch to my take on what would be a reasonable patent system in my mind.

  • A working version of the invention is required in order to receive a patent. Regardless of what the patent is for, if you can’t construct a working version, you haven’t invented anything. Whether you should be compensated for having an idea that later turns into a real invention is a separate issue and has nothing to do with protecting a limited monopoly on a real invention.
  • A patentable invention must be a physically distinct device that must accomplish something clearly beyond the scope of all of its specific components. That means a physical device that is simply a general purpose computer in a special housing running a program is not an invention. It also means that anyone duplicating a physical invention on a general purpose computer has not violated the patent. Yes, this would make a great many things non-patentable. This is a good thing.
  • An invention must not be substantially similar to any previous invention, patented or otherwise. This is the so-called “prior art” exception. Thus, it should not be possible to patent a wheel given that there is clear prior art going back thousands of years, regardless whether there was a patent filed on it or not.
  • Nothing which forms a crucial underpinning for human life itself should be patentable. That means no gene patents. It is reasonable to consider molecular patents as long as they are not crucial components of human life. That means vitamins, genes, water, naturally occurring hormones, etc., cannot be patented.
  • Anything capable of self-replicating without intervention must not be patentable. That means plants which grow on their own in a field and can reproduce are not patentable. If there is no reasonable means to protect oneself from infringing on a patent for an invention, the invention is not patentable. There is no reasonable means to prevent your crop from cross-pollinating with your neighbour’s crop or to prevent seeds from your neighbour’s crop from ending up in your field.
  • Patents must be public and searchable for their entire duration from initial application to final expiry. It is not reasonable to expect anyone to be liable for infringing on something they had no possible way to know about.
  • Patents must be written in a language that is intelligible to an ordinary citizen competent in a related field of endeavour. If it is not intelligible to such a person, then how can he possible avoid infringing on it?
  • Patents must cover exactly one clearly described invention. The current practice of including multiple claims on a single patent, starting with a ridiculously general description and moving to ever more detailed and complex claims is deleterious to understanding patents. Instead, each claim must be its own patent application with its own attendant fees and investigation.
  • Patents cannot be transferred except in the case of succession (death of the original holder, corporate restructuring). Simply disbanding a corporation would terminate patent protection.
  • Failure to take action immediately upon discovering patent infringement is deemed to be a non-revokable license grant to the infringer. Thus, if a defendant can demonstrate that the patent holder reasonably had knowledge of his activities, that is considered adequate defense to a patent suit. Things like sending a traceable request to the patent holder for a license grant but receiving no reply would qualify. The duration of “immediate” must, of course, take into account commercially reasonable response time based on the method of discovery. Enforcement can be as simple as “we grant you a royalty free license to do what you are already doing.”
  • A pattern of neglect in patent enforcement (failing to act upon discovery of infringement) may be construed as a general royalty free license grant to the world. In general, the more cases where enforcement is neglected, the more easily a future defendant can use this defense.

There are many other points I could raise but many of them are more general. Things such as punitive fines being based on cash flow and assets of the perpetrators should obviously apply in general.

I have no illusions that the above will ever happen or that it will work out exactly as I would expect if it ever did happen. Still, in my opionion it makes a reasonable starting point.

Web Site Development and Sessions

Sessions are used all the time by web site developers, often without the developer realizing it. It turns out, however, that sessions are immensely overused and they tend to cause all manner of random problems. My perspective on this is by no means unique but I do wear multiple hats. One hat is as server administrator with hundreds of sites hosted. Another hat is as a web developer. The final relevant hat is as a web site operator. All three hats lead to the following conclusions.

Sessions are over-used

The biggest thing I have noticed over the years is that sessions are overused. Sure, some sort of session makes sense when you need to track a login through portions of a site. But the portions of the site which are public should not need access to any session information, period. If there is no session already in use, there is no need to initiate one if some random member of the public arrives on a public page on your site. You may think you need the session to change the navigation or some other element for a logged in user, and you would be correct, to a point. But if you do not initiate a session for a user unless he logs in, you can still identify a logged in use by the presence of a session combined with whatever session validation you use.

Of course, login tracking is not the only thing sessions get used for. It is simply the most common. However, if you are using a session to track users through your site or something more nefarious, you should consider whether you really need to do that. Are you actually deriving any concrete benefit from doing so? Do you really need a session to collect the information you desire? Do you really need to personalize every page with the visitor’s name or whatever cutesy thing you’re doing?

Sessions are poorly implemented

Completely orthogonal to whether sessions are used needlessly or not is that fact that sessions are often implemented poorly or a session mechanism not well suited for the task at hand is used for whatever reason.

I will pick on a particularly common example of session handling which illustrates several problematic features quite nicely. This particular session handling scheme is the one implemented by default in PHP.

By default, a PHP session exists as a file stored on the web server paired with a cookie that holds the session identifier. When a PHP script activates a session, PHP looks for the cookie and if it finds one, it reads the session data file. But not only does it read the data file, it also locks it, preventing another PHP script from activating the same session at the same time. Then, when the session is released, often implicitly by the end of the script, PHP writes the session data back to the file and finally unlocks it. Note that it rewrites the session data even if nothing has changed.

There are two major things wrong with this approach, as commonly used.

Request serialization

First, because almost nobody writing PHP code knows about the locking or even understands how locking works, this leads to scripts that start with “session_start();” and never release the session. As a result, any scripts that run as part of the same session will run serially. If one script is already running and another tries to start the same session, it will block at session_start() until the previous script finishes.

This is not terribly problematic for cases where only a single script is likely to be running at the same time within a session. However, with the advent of such things as ajax, a single ajax request will block all other ajax requests on the page until it completes. Indeed, even the initial page load might block any ajax requests. Thus, instead of having the page load and asynchronously fill in any ajax type content, instead, elements load up one by one, harkening back to the days of really slow dial-up networking. This can manifest particularly frustratingly to the user who clicks on something while the page is loading or something like that and nothing happens for long seconds while other scripts finish churning away on the server.

But even if the programmer is aware of this problem and defends against it by releasing the session immediately after it is no longer needed, the session still must be maintained for the entire duration where it is possible that session data will need to be modified.

Rewriting unchanged data

The PHP session system also rewrites the session data when nothing has changed. This is generally unseen by users of the site or even by the programmer. The people who notice this are server operators who witness higher write volumes on their disks. This, in turn, leads to slower performance on the server and generally annoys server operators and users alike. However, since this is a problem in aggregate rather than from one single source, there is little a server operator can do to mitigate it.

One interesting thing, though, is that rewriting data needlessly can lead to data corruption in the case a process crashes. Of course, that can happen any time, but if you are not writing data when you crash, there is no chance you corrupt that data. Thus, rewriting unchanged data is generally a dumb idea anyway.

Storing too much

Another common issue with sessions is that too much data is stored in the session manager’s data store. That means in the $_SESSION superglobal in PHP but it could as easily be some custom rolled scheme.

Because the session has to be read by everything that needs information about the current, well, session, the more data that is thrown around, the longer every script takes to execute. If every time something needs the logged in user ID, it also needs to read half a megabyte of other stored state, then you create a needlessly high memory, I/O, processing, and storage overhead for every script. A few milliseconds may not be noticeable on a single script, but consider if your site suddenly gets busy. Those few milliseconds here or there suddenly start adding up to your web server falling over dead.

Instead, use a separate data store for the mutable data like, say, shopping cart contents or cached credentials. Further, don’t cache anything that you can’t easy and quickly recalculate cheaper than the cost of caching it. If you have a rather barroque permissions system, it might make sense to cache that information somewhere, for instance. However, anything you cache, make sure you can recreate it if the cache is unavailable. For other things, like shopping cart data, you might consider using a more permanent storage system and garbage collecting it periodically. If your site already requires access to a database server, for instance, you might consider using that to store the authoritative copy of the cart. Caching might make sense for that sort of thing, but the previously mentioned caveats still apply.

The data stored in the actual session should be small, and largely immutable. There is rarely any more need for anything other than a single identifier, possibly identifying the particular user or particular session. Multiple identifiers might make some sense depending on your circumstances. However, storing the entire result of sorting a list of 500 items in the session store is ridiculous. (That is recalculable and should be cached separately if caching is indicated.)

For those of you well familiar with web technologies, you may have realized that with this sort of minimalist session scheme, the entire “managed” session data can be stored in a single cookie. Indeed, this scheme eliminates both of the PHP specific problems identified above. Also, using a single cookie to store the managed session data largely eliminates any storage bottlenecks on the server as it avoids any uneeded disk writes there.

Session data is unprotected

The final problem I see all the time is that session data is not authenticated or protected at all. The sesson manager is trusted to get this right, and, realistically, it should. That means the PHP session system needs to authenticate its sessions.

Exactly what this means depends a great deal on the session system. PHP can be relatively sure that the session data itself is not visible to a remote user because it is stored in a file on the server. However, other users on the server can potentially read that data. PHP makes no attempt to obfuscate or otherwise encrypt the data it writes on the server. This is likely due to performance concerns and code complexity. Similarly, it makes no attempt to verify that what it is reading from the session file is what it previously wrote out to it. That means any random third party can modify the file and possibly corrupt the session.

Storing everything in a cookie (or even just the session identifier) has a similar problem but now anything in the communication path can potentially see the contents, including proxy servers, possible network sniffers, and software at either end. Thus, some steps need to be taken to be certain that the cookie contents you get back is contents you created in the first place. If you set anything remotely sensitive in the cookie (which you shouldn’t), you also need to make certain the contents cannot be easily read by third parties. Fortunately, relatively common cryptography techniques can be used to provide adequate protection for most situations. (The same techniques can be applied to local cache files, too.) Look up HMAC for more information on such schemes.

Conclusions

The above leads to the following specific advice.

  • Store as little as possible in whatever session manager you use
  • Release the session as soon as possible. If you are only reading data, release it as soon as you’ve read what you need. If you are writing data, do so immediately after writing the data. If what you write doesn’t depend on what you read, you might release the session and re-aquire it later to update it if your script is going to run for a while.
  • Store only data that is likely to remain unchanged once you put it in the session and use other data stores for data that is likely to change.
  • Do not use the session store as a connection-specific cache. If your cached data depends on the session information, use another data store for the cache and store an index in the session.
  • This is more general, but only acquire the resources you need for the script rather than everything all the time. That means do not acquire the shopping cart information when your script is processing a login. A bottleneck on one resource should not affect scripts that do not operate on that resource.
  • If something needs to be locked to maintain consistency, only lock it when you are going to operate on it and unlock it immediately after you are finished. Do not rely on a “big session lock” to do this for you.

WTF, CP?

Anyone who isn’t living under a rock knows that Calgary has experienced an unprecedented flood on both the Bow and Elbow rivers. While the water is down to manageable levels now and cleanup is proceeding at a staggering pace, the state of emergency persists and a large chunk of the downtown core is still without power, not to mention low lying areas outside the core.

In the wake of all this, at roughly 4:00 this morning, some genius at Canadian Pacific Railway thought it would be a good idea to run a loaded freight train across a bridge that is well over a century old (probably over 125 years). Ordinarily, this would not be a particularly dangerous thing but given the unprecedented flooding and the fact that the Bow river is still running very high, the logic of this decision totally escapes me. I suspect this is the sort of decision that controllers have gotten away with many times over the years with questionable structures surviving by pure fluke. Alas, that was not to be the case today. The bridge started to callapse as the train had mostly finished passing over the bridge. Clearly the bridge was not sound and from the descriptions of the failure, it sounds like one of the bridge piers was undermined and the weight and vibration of the train’s passage caused whatever was still holding the pier up to collapse. Of course, once that happened, the river flow would have ensured it continued to collapse.

So the question then becomes why did the city allow the bridge to be used? After all, it is within the city limits. Well, it turns out that the city has no authority over railroads at all. That’s right. Zero. None. The city cannot even enforce noise bylaws or bar trains from blocking intersections during rush hour. It further turns out that even the province can do nothing. Apparently railroads are only beholden to the federal government and its agencies. What that means is that the city had no authority or access to inspect any of the rail bridges or to bar the railroad from operating trains. Yet it turns out that within the city limits, the city is responsible for the safety and response to any problems caused by railroads.

So, not only is the city still dealing with the aftermath of an unprecedented flood, but it also has to deal with the aftermath of a bone headed decision by a flunky working for a private company over which the city has no authority whatsoever. Thanks to this #nenshinoun, the city has to divert resources from handling the flood cleanup to dealing with this secondary crisis.

It seems clear now that regulatory reform is absolutely required. Make railroads beholden to the same municipalities that other transportation companies are. Let the municipalities manage all infrastructure in their boundaries instead of everything except the railways. After all, municipalities are uniquely qualified to manage infrastructure in their geographic areas. Furthermore, allow the provincial transportation departments to enforce their regulations as well. No mroe of this incomplete oversight from the federal authorities who are either understaffed or simply not competent to the job.

Update 19:11. The bridge is actually 101 years old according to current news reports. It was also apparently inspected several times before the train was driven across it. It seems I was also correct in assuming that it was a failure at the bottom of a bridge pier, which, to be fair, there is no way they could have seen it in an inspection. However, since they apparently didn’t even know that the neighbouring bridge was not connected at the foundations, it is clear that they should not have been opening the bridge until they could inspect the foundations. After all, if you don’t even know what is connected together, how do you know the foundations of the bridge are still sound? Calgary was able to have some certainty about its bridges because they are anchored into actual bedrock. Any bridge not so anchored probably should be considered suspect after such a flood as we have had.

Writing Reasonable PHP

PHP gets ragged on a lot for various reasons. One of the biggest complaints I see is that PHP is “insecure” as if writing bad code in PHP is somehow PHP’s fault. The other major complaint is not so much a complaint against the core language as against the standard library and runtime environment and refers to the chaotic nature of the standard functions in particular. Complaints about the latter have merit but PHP is far from the only popular language to have that problem. The former might have some merit but it is just as ridiculous as blaming C because programmers write buffer overflows. It is not strictly PHP’s fault when programmers do stupid things. Granted, PHP makes a lot of stupid things very easy and some of the early design decisions for the PHP runtime environment are questionable in hindsight, but writing sensible PHP code is not impossible or even especially difficult.

Types of PHP Code

Before I delve too far into the intricacies of PHP, let me touch on the types of coding that PHP can be used for.

PHP was designed (or evolved, really) as a means to enhance largely static web pages. It fit into the same niche as Microsoft’s active server pages. It was designed to make adding a small amount of dynamic content to an otherwise largely static page easy. While this is still common today, it is no longer the primary use case. This is also the reason for a lot of the somewhat questionable design decisions for the runtime environment (such as the ever popular and justifiably maligned “register_globals” feature).

As it gained popularity, it began to edge out the use of CGI scripts written in perl or other languages. This was partly due to the complexity of dealing with CGI on most servers and partly due to the fact that PHP itself handled all of the boilerplate stuff needed to deal with the CGI interface – decoding script input primarily. Thus, PHP scripts moved more toward being PHP code with HTML content embedded in it instead of HTML code with PHP embedded in it. Some of the more unfortunate design decisions were addressed at this point (during the 4.x series), including the “register_globals” problem, with the introduction of the “superglobal” arrays and a few other things. PHP also gained a sort of object orientation and a massive collection of “extensions”, many of which are bundled and/or enabled by default. This type of coding is the most common today – programs that are still intended to run in a web server environment and resemble the classic CGI script more than the classic “active page” model.

Finally, PHP gained a command line variant. With a few tweaks to the runtime environment, it became possible to write programs that do not depend on the presence of a web server or the CGI interface specification. Most of the historical runtime design issues do not apply to a command line PHP program. However, the source format remains the same including the PHP open/close tags.

A Sensible PHP Environment

A great deal of sanity can be obtained before a single PHP statement is written by setting up the environment in a sensible manner. Most of the features of PHP that are maligned (often justifiably) by critics can be turned off in the PHP configuration file. Notably, one should turn off register_globals, all magic quotes variants, register_long_arrays, allow_url_include, and allow_url_fopen. There are other configurations that make sense to disable too, depending which extensions you are using.

It should be noted that disabling some of these settings makes coding less convenient. However, often the convenience comes at the cost of clarity or even security.

Writing PHP Code

Most of the recommendations here apply to all programming languages. Let me stress that. Writing good code requires discipline in any language.

Check Inputs

One of the biggest sources of problems with any program is failure to check input data. Anything input by a user must be viewed as suspect. After all, the user might be malicious or simply make an error. Relying on user input to be correct is never the right thing to do. Steps must be taken to ensure that bogus input data does not cause your program to misbehave. Inputs that cannot be handled should produce error conditions in a controlled manner.

Many programmers do grasp this concept intuitively. Input checking code is often present when handling direct user input. However, most overlook the simple fact that data coming from anywhere outside the program code itself must be treated as suspect. You cannot be certain that what you wrote to a data file is still in that file. It could have been corrupted by a hardware failure, user error, or the file could have been replaced with another type of file, all without your program being aware of it. The same applies to data stored in a database system like MySQL or in a session cache or a shared memory cache somewhere.

The advice here: Verify everything. Failure to correctly do so  is not a weakness in PHP but in the programmer. It is also the single largest source of security problems. Careful adherence to this principle will quickly yield much better code.

Check Returns

Closely related to the previous item, and high up on the list of programmer errors, is failing to check return values from function calls. Most library functions will have some sort of return value. For functions that can fail for whatever reason (bad parameters fed in, external state, etc.), it is absolutely critical to check for those failure conditions and handle them in a manner that is appropriate for your program. These conditions can be as simple as a data file being missing or as complicated as a remote socket connection timing out or the database server going away.

Study all function calls you use and make certain you understand what failure conditions exist. If a failure condition will cause your program to fail or otherwise misbehave, handle it. If a failure condition is impossible, it is doubly critical to handle it. That said, if a failure condition will not cause your program to misbehave or otherwise fail, it can be ignored, but make absolutely certain that is the case and document why.

The advice here: Always check return values.

Protect Output

This one is a lot less obvious and is best explained by example. Suppose you are outputting some text into an HTML document and you do not know in advance what characters that text contains. In HTML, some characters have special meanings (such as quotes) but are also valid in actual text. These special characters have to be protected in a medium appropriate way. In the HTML case, they would be replaced with appropriate entities. This is a common case in PHP programming but it is not the only one. The same applies when passing data to a database system like MySQL using SQL or when passing command arguments to an an external program. Failure to protect output properly is the leading cause of a class of security vulnerabilities known as SQL injection attacks. There are analogs for other output streams too. Sometimes the corruption of the output stream is mostly harmless like when an unprotected comma is inserted into a CSV field in an informational spreadsheet. Other times, it can cause cascading failures or even allow clever attackers to obtain private data.

The advice: Always protect output, no matter where it is destined.

Use Correct Operators

This is more specific to PHP but there are similar situations in other languages. In PHP specifically, there are two equality and two inequality operators. One set does loose type handling and attempts to find some means to compare its operands to the point of doing type conversions behind the scenes. The other set will fail if the underlying types of the two operands are different even if the apparent values are the same. The “==” and “!=” operators are the first set and “===” and “!==” are the second set.  Using the former, the string “0” and the number 0 will compare as equal while with the second they will not. This is important because many functions will return “false” on an error but some other type (like a number) on success. If you use the loose comparisons, “false” and “0” are equal but they are not with the strict comparisons.

PHP also has a number of functions which can be used to identify NULL values, arrays, and so on, which can also be employed when the type of a value is important.

In most cases, the strict comparison operator is probably the better choice but the loose comparison can be useful. In short, write what you mean using the correct operators. Make sure you know exactly what the operator you choose is doing.

Using Language Constructs

Like any programming language, PHP has a number of language constructs that are very useful but there are other ways that similar effects can be achieved. For a trivial example, consider the use of a long “if/elseif/elseif/else” structure comparing a single variable against a series of values. This can also be expressed using a “switch” statement. In this trivial example, either one is valid and is about equivalent though the “switch” statement has a few features that might make it more useful in some circumstances. Likewise, a “for” loop can always be faked using “while”.

On the other hand, there are cases where an alternative is not equivalent. Consider the case of “include/require” vs. a function call.. While the fact that you can include the same file in dozens of different places looks a lot like a function call, and can often be used for a similar effect, it is not the same thing. The included code runs in the same scope as the location of the include directive, for instance, which means that any variables in the including file might be scribbled over by the included file. Parameters also must be passed in variables and return values returned the same way. It is also not possible to use such a “function” recursively. On the other hand, an actual function call gains its own local variable scope, preventing the function from clobbering variables in the caller, and also has a formalized parameter list and return value. Furthermore, functions can be called recursively which is also incredibly useful. Thus, it is important to use the right construct for the job. “include” is not the right construct to execute a chunk of code from random locations. (I have singled this particular one out because it shows up far to often in PHP code.)

The advice: use the right language construct for the job. This applies not only to things like “include” but also to things like objects. Creating an object to “encapsulate” a behaviour adequately described by a single function is just as silly as using “while” to simulate “for”.

Wrap Up

The preceding is, by no means, exhaustive. However, by following the above recommendations, it is possible to write reasonable PHP code. All it requires is a bit of discipline and an understanding of the language you are using.

I should note that this is not an apology for PHP but merely a set of suggestions to avoid writing bad code. Remember. Just because PHP allows you to do something in a particularly unfortunate way, it does not mean that you have to do it that way. If it looks like a bad way to do things, look for a better way. Odds are pretty good you will find one.

 

Frameworks – Solution or Problem?

Frameworks are all the rage these days. Frameworks for building web sites. Frameworks for building applications. Frameworks for building databases. Frameworks for building frameworks. Okay, I made the last one up but I’m sure that sufficient noodling around the net will reveal at least fifty dozen attempts to do just that. But are frameworks really all they’re cracked up to be?

Continue reading “Frameworks – Solution or Problem?”

Climate Change and Heat

It’s currently the “in” thing to talk about anthropogenic global warming (AGW), which is the notion of global warming being caused by human activity.  Whether AGW is real or not is not the point of this post, however. Neither is debate over whether “climate change” (a term usually conflated with AGW in popular culture) is a bad thing or not. Rather, I’m going to consider a couple of mechanisms that might lead to the AGW effect. Continue reading “Climate Change and Heat”

Sustainable Settlement

Sustainability is the buzzword of the day. Everyone wants sustainability. But somehow, everyone seems to miss the point of sustainability. Have you heard a policy maker talk about “sustainable growth”? That’s utter nonsense. Anyone putting a bit of thought into the matter will realize that growth cannot be sustainable indefinitely. After all, there is only a finite set of resources available to fuel it. Leaving aside systemic biases toward perpetual growth, however, let’s muse about what a sustainable settlement on any planet would need to look like.

Continue reading “Sustainable Settlement”