Five Things Every PHP Developer Should Know
November 29, 2009
HTTP – Your platform is the web, so learn how it works. The deeper details of the HTTP protocol can be absolutely invaluable for any PHP developer to know. This will also help you understand proxies and caches in depth.
FastCGI – Most PHP deployments are on mod_php, but FastCGI can be a great alternative interface. It’s also universal, and will let you run many languages with only a single front-end service.
C – PHP is written in C, so it’s fundamental to writing extensions. By learning C well, you can not just write extensions, but also find and debug problems in the PHP scripting engine and existing extensions.
Network Programming – If you ever need to something complicated with PHP, like off-loading background processes for example, network programming is an extremely useful skillset. A basic understanding of sockets and daemons is already a huge benefit.
Your Operating System – I originally wanted to write ‘UNIX’ here. However, I realize that some of you deploy on Windows systems. I am not touching that argument with a ten foot pole. What ever your operating system is, learn it in depth. You’ll be surprised at what you can do.
A C10k nerd tries node.js
November 26, 2009
I’ve spent a lot of time as of late thinking about C10K-type problems. Obviously, we all know there are faster ways to do certain things than PHP. In this context, I mean simple performance, of course. I don’t consider development time because I won’t waste it with overly verbose stacks – but I digress.
In any case, I spent some time looking at node.js lately. node.js is not a client-side javascript library; it is an addition to Google’s V8, which is an extremely fast Javascript engine. node.js describes itself as “Evented I/O for for V8 Javascript”, and that is exactly what it is. As an example, here is a simple HTTP server written for node.js:
var sys = require('sys');
var multipart = require('multipart');
var http = require('http');
var handler_404 = function(req,res) {
res.rtext(404,'four-oh-fail');
}
var handler_index = function(req,res) {
res.rtext(200,'Oh, hai!');
}
var router = function(req,res) {
res.easyResponse = function(rescode,restype,resdata) {
res.sendHeader(rescode,{'Content-Type': restype});
res.sendBody(resdata);
res.finish();
}
res.rtext = function(code,text) { this.easyResponse(200,'text/plain',text); }
switch(req.uri.path) {
case '/':
handler_index(req,res);
break;
default:
handler_404(req,res);
break;
}
}
http.createServer(router).listen(8000);
sys.puts('Server running at http://127.0.0.1:8000/');
As you can see, it’s extremely simple to write something moderately complex. node.js is also incredibly fast. My little 256mb virtual machine for testing spat out over 3,000 requests per second for the above, after a little bit of tuning. I really like node.js so far. That being said, there are still a lot of things missing. For example, processing POST form variables is a no-go. However, you can decode multipart forms. There are no drivers for any database that I am aware of this time, and et cetera, et cetera. In other words, it’s still pretty rough, but it’s getting there.
I also looked at Google’s Go. I like Go, and I think that if I were writing something complicated, I’d probably prefer it over node.js. However, node.js’s immediacy and low barrier of entry make it extreme;y attractive and I am really looking forward to seeing it grow.
Why Your PHP Site is Slow
November 19, 2009
I’m going to take a wild guess and say that your PHP site is hosted on an Apache 2 web server and that you’re using mod_php for your PHP request handling. That’s cool. That is a solid, dependable stack. I’m going to guess, though, that it went down the last time you were slashdotted. Let me show you how to fix that without having lolcats upgrade your RAM.
Let’s take a look at a page view to see what’s going on with your server. I am going to throw out a random server configuration for us to examine this. Let’s say you’re sitting on some sort of Pentium 4 with a gig of RAM, which is a reasonable expectation for a moderately-aged ‘personal’ web server. Let’s also say that you’re really kind of boring in your requirements, and you really just host your blog, which sits at myawesomeblog.tld. I am also going to assume you use WordPress, because it’s quite popular.
A user visits your website, and their web browser issues the following requests:
- /2008/01/01/how-to-train-puppies-with-php/ which is routed to index.php, which is WordPress.
- /wp-content/themes/puppies/style.css which is your style sheet.
- /wp-content/themes/puppies/jquery-1.3.2.s which is a Javascript library you need for the item below.
- /wp-content/themes/puppies/mycode.js which is some fancy javascript you use to make things be shiny.
- a ton of image in /wp-content/themes/puppies, which are support images for your layout. Let’s say you are loading 7 of them.
The WordPress page, plus the style sheet, the two javascript files, and seven images makes a total of eleven requests in order to satisfy the user’s page view. It almost certainly adds up to a relatively marginal amount of bandwidth, but it is quite a number of requests. Many sites have more; I have seen as many as 114 on poorly built sites, and I am sure there are worse out there still. But, wait, you ask, why are these requests bad? I am glad you asked!
So, as it would turn out, PHP is not what we call “thread-safe”. This is not necessarily a bad thing, and we won’t discuss the technical reasons, but it basically means that PHP doesn’t run in a multi-threaded model. It serves each request as a simultaneous, concurrent process. On UNIX systems, we accomplish this on Apache with a pre-forking model. That is, when you start Apache, it spawns a certain number of worker processes. These worker processes wait for the main process to receive a request and then forward it to them. If it needs more worker processes, due to there being more visitors, it spawns more worker processes, and so on. This is somewhat simplified, and the concurrency limits and other settings are adjustable.
That sounds pretty smart. Why can this be bad? Well, the think to consider is, that a forked process is a complete copy of the original. As such, the workers are pretty large. On an example installation I have in another window, each Apache worker processes is 33 megabytes. When this process loads mod_php and loads a bunch of PHP files, it will consume some more memory, so let’s make it a flat 50 megabytes for the sake of example. Requests for static resources, like image and CSS files, are going to consume less than 50 and more than 33 megabytes, but not much more. Let’s assume 50 megabytes though, since we are looking for a capacity maximum. As such, if your server has 1024 megabytes of memory, and we estimate that 200 megabytes are being used by the operating system and other processes, that leaves us 824 megabytes for Apache processes. 824 divided by 50 leaves us 16 and change. That means you can only safely process 16 simultaneous requests, otherwise you risk running out of memory, and hitting virtual memory, which means your disks start thrashing, and your request performance will fall through the floor. This will cause user requests to start queueing up, which means your load is going to go up, and everyone has a really bad time. The stock option for the maximum number of simultaneous Apache worker processes in most packages is 150. 150! That will sink the server 9 times over!
Each of our blog users needs 11 requests to satisfy their page view. A per the HTTP protocol, they are limited to 2 simultaneous connections per host, so their requests will only only 2 at a time. However, this means you are limited to 8 simultaneous users at any given exact point in time. That is not a lot, and it’s certainly a lot more than your hardware is capable of.
How do we fix this? Well, for starters, you can move all your static resources (your CSS and Javascript files, your images, etc) to a CDN. This is pretty easy these days. For example, you can upload them to Rackspace’s Cloud Files, which gives you instant CDN abilities for, at the time of this writing, 22 cents a gigabyte, if I remember correctly right now. All you really need to do is edit your theme to point your CSS and Javascript files to a different location.
Some people may not want to get a CDN account for their blog or such, even if it is very affordable. You can also, effectively, create your own one-node CDN by replacing the performance problem: Apache. There are many web servers that work well with PHP; one of them is nginx. Nginx works differently than Apache in that it does not spawn a process for every request. Rather, to explain it in a simplified way, nginx jumps back and forth between connections to serve requests, and only maintains the one process. This is an incredibly fast model, but we noted earlier that PHP needs to have a process per request. So how does that work? Well, there is an interface called FastCGI, which grew largely out of discontent with the CGI model. This interface allows us to create pools of PHP worker processes, much the same way that Apache created pools of Apache worker processes. The key issue here is that PHP fastcgi workers have much lower memory requirements than an Apache worker process. On average, I find them to be 1/10th as large, with some PHP code loaded. The way an nginx + php configuration works is basically this:
- A request comes in to nginx.
- If the request is a simple static file, nginx serves it without further commentary, and does so incredibly quickly and with very little resource usage, especially in comparison to Apache.
- If the request is for a PHP page, nginx connects to a pool of php worker processes, grabs one, and has it process the PHP request.
- nginx takes the output from the PHP worker and serves the request.
Because the PHP workers are much smaller than the Apache workers, you can have the very fast nginx server process all your static materials, and simultaneously increase your capacity to serve PHP requests by as much as 10 times, depending on what your exact memory requirements end up being.
If you don’t want to do that, you can also make sure your static files are being served with a far-future expires header, so that they are retained in the browser cache, and the client doesn’t come back with HTTP HEAD requests on every page load to check that those files are still identical. If you don’t know what I am talking about, google it; it’s a big topic, but it’s worth reading. Once we’ve removed all these extraneous requests, you are down to 1 requests per page view. That, in our model of a page that requires 11 resources, is a 90.990% reduction in capacity use.
So; that’s why your PHP app is slow, and that is my two cents. I’ll address specific questions in comments if they are asked.
On Making it Better
November 7, 2009
If you develop software for a living, you’re plagued ceaselessly by thoughts about how to make the software you build better. Sometimes, this results in feature creep. One will often try to stuff more into an application in an effort to make it ‘better’, to make it ‘do more’. After some years of experience, you realize that ‘more’ is not the same thing as ‘better’, and you stop doing that.
Once you move past that, you start trying to make your software ‘better’ in other ways. Sometimes, this means that you start to over-think the problems that your software is trying to solve. By this time you understand that it becomes exponentially more difficult to change your software as it becomes more specialized, so you start to prepare for eventualities. You write code ‘safely’. In popular programming culture, this usually leads to someone coming up with some new paradigm that will, end hunger and cure cancer, such as “object-oriented programming”, or “modular programming”, or lists of acronyms such as RAD, SCRUM, TTD, etc. Mind you, none of these are bad ideas, but in the context of the argument I am making, they are ways of looking at your code to make it ‘better’, by securing it’s malleability in the future.
The problem is, for most of us, figuring out what the right balance of making it “better” and making it work, is something we never really master to perfection. While you are planning and re-factoring code, you aren’t actually getting anything done in the terms of man-hours or deadlines. You’re just making it “better”. This is not a bad thing. That’s called computer science, and thinking is encouraged. However, I think the people who pay us may occasionally have very different ideas, and hence, we need that balance.
I pride myself on trying to stay very pragmatic in my approaching to developing software for clients. I pride myself on the fact that I usually get the balance of “right or done” well adjusted, or at least close to. However, I am horribly, horribly guilty of failing in that balance on this blog. I’ve started at least four separate code bases for my blog – and to what end? Because I wanted to make it better. I’ve decided that that is a complete waste of time, as I should have done in the beginning. So, I am starting over; one more time. Completely fresh, and hosted by someone else so that I can’t be tempted to touch it.
So, fresh start for me. Good. Now, a fresh start for you. What are you working on that should be done by now? Maybe it’s time to be a bit more pragmatic.