One of the unfortunate things about my day job is that I have to manage a server running cPanel. Some folks insist on cPanel because it has all these fancy gewgaws, features, widgets, and the like. However, once you start trying to manage a server running cPanel for more than a few trivial web sites, you start to discover just how terribly engineered it is, and it has absolutely no excuse for that. One particular feature I recently tripped over hard is cphulkd, which is cPanel’s answer to brute force detection.
Now, the notion of detecting brute force attacks is a good one. I have no issues with that at all, and, in fact, it’s a very good thing to be operating on a server. However, cphulkd has some serious design flaws. This is best illustrated by anecdote.
Today, I received a report from a customer that their web site was showing the dreaded “Internal Server Error” page. Because I have PHP configured to use suPHP (for good reason which I won’t get into here), every request to a PHP page will spawn a PHP interpreter process. When too many of those are spawned, the internal server error appears. This is actually a good thing because it prevents a runaway web site form totally exhausting server resources in most cases. However, in this case, I could see that the PHP processes were spawning as desired but they were not exiting. Also, the load average on the system was through the roof. (It took almost 10 minutes to actually log in.)
The first thing I did was kill off the PHP processes. Sometimes it’s just a case of the system getting hit hard and then thrashing itself to death trying to recover when the load goes back to normal. That did seem to help some, but not nearly as much as it should have. (The load average went from 40 to 20 and response was still slow.) That meant there had to be something else going on. However, the server was so slow to respond that I did something that Unix types rarely do. I rebooted it. In a thrashing situation, that can be the best solution because it clears any state that might be exacerbating the problem and the time during which the server is offline during the reboot can cause traffic to abate somewhat.
The reboot seemed to help for a while. However, as I usually do in these situations, I watched the load for a while using the trusty top command. After a while, I saw the load slowly creeping upward. But I also noticed that the process running the system the hardest was mysql. I looked around in mysql and found that there was this cphulk stuff going on there. It was making fairly regular queries of one sort or another. Often enough that “show processlist” would show them. “Aha!” I thought. It looked like I had a smoking gun.
I poked around a bit and realized that this was the brute force protection gimmick in cPanel. But why was it using mysql to store its state? I can think of less efficient ways to do it, but not many. I mean, something like that is going to get hit many times per second when an actual attack is going on. Why use something that inefficient to store its state? So, I decided to disable cphulkd and see what happened.
Meanwhile, the load average had crept back up to the 20 range and the WHM interface in cPanel wasn’t responding. No worries. I broke out my trusy google-fu and discovered how to disable cphulkd on the command line. (It’s not as obvious as it ought to be, but there you go.) Within seconds of stopping cphulkd, the load average on the server dropped from 20 to around 1. With a single brute force probe from a single IP address going on (I checked the logs), cphulkd was using almost 100% of the server’s resources. Meanwhile, with cphulkd disabled, SSHD was happily rejecting bad logins at the rate of many per second without the server even noticing the resource usage.
Now, remember when I said that brute force protection is a good thing? I didn’t want to leave the server without it so I did some research. I found an alterate system for doing the same thing which had a cPanel interface and installed it. I did a bit of testing, configured it. Once I enabled it fully, it blocked the brute force attack almost instantly. Meanwhile, the server load has barely gotten as high as 1.
Here’s the thing. This server has never been this responsive. Ever. Sure, it’s a virtual server and you expect that to be less efficient, but the slowness could not be accounted for by resource contention between virtual servers. For one, other virtual servers on the same physical server, with fewer resources allocated to them, were performing substantially better. So now I know why cPanel was performing so abysmally. It was all down to one service that was so horribly designed that it brought the entire server to its knees simply due to the internet background radiation.
There are other aspects of cPanel that are dreadfully designed, but I think cphulkd is probably the absolute worst of the lot.