Web Site Development and Sessions

Sessions are used all the time by web site developers, often without the developer realizing it. It turns out, however, that sessions are immensely overused and they tend to cause all manner of random problems. My perspective on this is by no means unique but I do wear multiple hats. One hat is as server administrator with hundreds of sites hosted. Another hat is as a web developer. The final relevant hat is as a web site operator. All three hats lead to the following conclusions.

Sessions are over-used

The biggest thing I have noticed over the years is that sessions are overused. Sure, some sort of session makes sense when you need to track a login through portions of a site. But the portions of the site which are public should not need access to any session information, period. If there is no session already in use, there is no need to initiate one if some random member of the public arrives on a public page on your site. You may think you need the session to change the navigation or some other element for a logged in user, and you would be correct, to a point. But if you do not initiate a session for a user unless he logs in, you can still identify a logged in use by the presence of a session combined with whatever session validation you use.

Of course, login tracking is not the only thing sessions get used for. It is simply the most common. However, if you are using a session to track users through your site or something more nefarious, you should consider whether you really need to do that. Are you actually deriving any concrete benefit from doing so? Do you really need a session to collect the information you desire? Do you really need to personalize every page with the visitor’s name or whatever cutesy thing you’re doing?

Sessions are poorly implemented

Completely orthogonal to whether sessions are used needlessly or not is that fact that sessions are often implemented poorly or a session mechanism not well suited for the task at hand is used for whatever reason.

I will pick on a particularly common example of session handling which illustrates several problematic features quite nicely. This particular session handling scheme is the one implemented by default in PHP.

By default, a PHP session exists as a file stored on the web server paired with a cookie that holds the session identifier. When a PHP script activates a session, PHP looks for the cookie and if it finds one, it reads the session data file. But not only does it read the data file, it also locks it, preventing another PHP script from activating the same session at the same time. Then, when the session is released, often implicitly by the end of the script, PHP writes the session data back to the file and finally unlocks it. Note that it rewrites the session data even if nothing has changed.

There are two major things wrong with this approach, as commonly used.

Request serialization

First, because almost nobody writing PHP code knows about the locking or even understands how locking works, this leads to scripts that start with “session_start();” and never release the session. As a result, any scripts that run as part of the same session will run serially. If one script is already running and another tries to start the same session, it will block at session_start() until the previous script finishes.

This is not terribly problematic for cases where only a single script is likely to be running at the same time within a session. However, with the advent of such things as ajax, a single ajax request will block all other ajax requests on the page until it completes. Indeed, even the initial page load might block any ajax requests. Thus, instead of having the page load and asynchronously fill in any ajax type content, instead, elements load up one by one, harkening back to the days of really slow dial-up networking. This can manifest particularly frustratingly to the user who clicks on something while the page is loading or something like that and nothing happens for long seconds while other scripts finish churning away on the server.

But even if the programmer is aware of this problem and defends against it by releasing the session immediately after it is no longer needed, the session still must be maintained for the entire duration where it is possible that session data will need to be modified.

Rewriting unchanged data

The PHP session system also rewrites the session data when nothing has changed. This is generally unseen by users of the site or even by the programmer. The people who notice this are server operators who witness higher write volumes on their disks. This, in turn, leads to slower performance on the server and generally annoys server operators and users alike. However, since this is a problem in aggregate rather than from one single source, there is little a server operator can do to mitigate it.

One interesting thing, though, is that rewriting data needlessly can lead to data corruption in the case a process crashes. Of course, that can happen any time, but if you are not writing data when you crash, there is no chance you corrupt that data. Thus, rewriting unchanged data is generally a dumb idea anyway.

Storing too much

Another common issue with sessions is that too much data is stored in the session manager’s data store. That means in the $_SESSION superglobal in PHP but it could as easily be some custom rolled scheme.

Because the session has to be read by everything that needs information about the current, well, session, the more data that is thrown around, the longer every script takes to execute. If every time something needs the logged in user ID, it also needs to read half a megabyte of other stored state, then you create a needlessly high memory, I/O, processing, and storage overhead for every script. A few milliseconds may not be noticeable on a single script, but consider if your site suddenly gets busy. Those few milliseconds here or there suddenly start adding up to your web server falling over dead.

Instead, use a separate data store for the mutable data like, say, shopping cart contents or cached credentials. Further, don’t cache anything that you can’t easy and quickly recalculate cheaper than the cost of caching it. If you have a rather barroque permissions system, it might make sense to cache that information somewhere, for instance. However, anything you cache, make sure you can recreate it if the cache is unavailable. For other things, like shopping cart data, you might consider using a more permanent storage system and garbage collecting it periodically. If your site already requires access to a database server, for instance, you might consider using that to store the authoritative copy of the cart. Caching might make sense for that sort of thing, but the previously mentioned caveats still apply.

The data stored in the actual session should be small, and largely immutable. There is rarely any more need for anything other than a single identifier, possibly identifying the particular user or particular session. Multiple identifiers might make some sense depending on your circumstances. However, storing the entire result of sorting a list of 500 items in the session store is ridiculous. (That is recalculable and should be cached separately if caching is indicated.)

For those of you well familiar with web technologies, you may have realized that with this sort of minimalist session scheme, the entire “managed” session data can be stored in a single cookie. Indeed, this scheme eliminates both of the PHP specific problems identified above. Also, using a single cookie to store the managed session data largely eliminates any storage bottlenecks on the server as it avoids any uneeded disk writes there.

Session data is unprotected

The final problem I see all the time is that session data is not authenticated or protected at all. The sesson manager is trusted to get this right, and, realistically, it should. That means the PHP session system needs to authenticate its sessions.

Exactly what this means depends a great deal on the session system. PHP can be relatively sure that the session data itself is not visible to a remote user because it is stored in a file on the server. However, other users on the server can potentially read that data. PHP makes no attempt to obfuscate or otherwise encrypt the data it writes on the server. This is likely due to performance concerns and code complexity. Similarly, it makes no attempt to verify that what it is reading from the session file is what it previously wrote out to it. That means any random third party can modify the file and possibly corrupt the session.

Storing everything in a cookie (or even just the session identifier) has a similar problem but now anything in the communication path can potentially see the contents, including proxy servers, possible network sniffers, and software at either end. Thus, some steps need to be taken to be certain that the cookie contents you get back is contents you created in the first place. If you set anything remotely sensitive in the cookie (which you shouldn’t), you also need to make certain the contents cannot be easily read by third parties. Fortunately, relatively common cryptography techniques can be used to provide adequate protection for most situations. (The same techniques can be applied to local cache files, too.) Look up HMAC for more information on such schemes.

Conclusions

The above leads to the following specific advice.

  • Store as little as possible in whatever session manager you use
  • Release the session as soon as possible. If you are only reading data, release it as soon as you’ve read what you need. If you are writing data, do so immediately after writing the data. If what you write doesn’t depend on what you read, you might release the session and re-aquire it later to update it if your script is going to run for a while.
  • Store only data that is likely to remain unchanged once you put it in the session and use other data stores for data that is likely to change.
  • Do not use the session store as a connection-specific cache. If your cached data depends on the session information, use another data store for the cache and store an index in the session.
  • This is more general, but only acquire the resources you need for the script rather than everything all the time. That means do not acquire the shopping cart information when your script is processing a login. A bottleneck on one resource should not affect scripts that do not operate on that resource.
  • If something needs to be locked to maintain consistency, only lock it when you are going to operate on it and unlock it immediately after you are finished. Do not rely on a “big session lock” to do this for you.

Leave a Reply

Your email address will not be published. Required fields are marked *