Getting into multiprocessing

int pcntl_fork ( void )

int pcntl_waitpid ( int pid, int &status, int options)

int pcntl_wexitstatus ( int status)

We already discussed the problems inherent to using a singe process for a PHP script - multi-processor boxes are under-utilised, and single-processor boxes are often left waiting for slow operations to complete before they can continue work. Any modern operating system performs multitasking behind the scenes, which means that several programs appear to run at the same time by way of the computer assigning a few hundredths of a second to each process for it to performs its tasks.

However, if we confine all our work to just one process, it will only be able to do one thing at a time, which means we need to make our one-process task into a multi-process task so that we can take advantage of the operating system's multitasking capabilities.

Before I continue, let me first explain the different between multiprocessing and multithreading. A process, as already have discussed, is a unique instance of a program with its own memory space, own process ID number, and such. A thread can be thought of as a virtual process - it does not have its own process ID, does not have its own memory space, but is still able to take advantage of multitasking. A hyperthreading-enabled CPU, such as the Pentium 4 "Prescott" chip, takes this a step further by spawning threads dynamically to avoid latency wherever possible.

Although some might disagree, most Unix programmers view threads with a degree of distrust. Unix systems have always preferred multiprocessing to multithreading, partially as a result of the fact that creating a process (often called "spawning" or "forking" a child process) is very fast on Unix. In other operating systems, such as Windows, forking is quite slow so the threads concept is much more popular.

With that in mind, it is no surprise that the Unix-only process control extension in PHP only supports forking, and this is done using the pcntl_fork() function. Now, I suggest you pay extra attention because many people find it a little confusing when this function is described to them!

When pcntl_fork() is called, it will return one of three values. If the return value is -1, the fork failed and there is no child process. This may be as a result of a lack of memory, or because the system limit on the number of user processes has been reached. If the return value is any number higher than 0, the current script is the parent that called pcntl_fork() and the return value is the process ID (PID) of the child that was forked. Finally, if the return value is 0, the current script is the child that was forked.

If you successfully fork, there will be two copies of PHP executing the same script at the same time. Both of them carry on from the pcntl_fork() line, and, most importantly, the child gets a copy of all the variables that were set in the parent, even down to the resources. One key thing that people forget is that a copy of a resource does not make it a unique resource - they will both point to the same thing, and this might be problematic - more on that later. For now, here's an example of basic use of pcntl_fork():

<?php
    $pid
= pcntl_fork();

    switch(
$pid) {
        case -
1:
            print
"Could not fork!\n";
            exit;
        case
0:
            print
"In child!\n";
            break;
        default:
            print
"In parent!\n";
    }
?>

The above script just prints out a message in both the parent and child processes. However, it does not show how the parent's variable data has been copied across to the children, so take a look at this script:

<?php
    
for ($i = 1; $i <= 5; ++$i) {
        
$pid = pcntl_fork();

        if (!
$pid) {
            
sleep(1);
            print
"In child $i\n";
            exit;
        }
    }
?>

This time five child processes are forked off, and, because each one takes a copy of the $i variable as it was last set by the parent, the script prints out "In child 1", "In child 2", "In child 3", "In child 4", and "In child 5". However, all is not quite so simple as that, as there are two key things to notice as you run the above script.

Firstly, notice that each child script calls exit after it prints out its little message. In normal cases this would exit the script immediately, and it does here too except that it exits the child PHP script not the parent or any of the other children. As such, each of the other children and the parent can and do carry on executing after one child has terminated.

Secondly, when the script is run, its out put may be quite confusing. Here is what I got when I ran the script:

[paul@wilbur paul]$ php fork2.php
[paul@wilbur paul]$ In child 1
In child 2
In child 3
In child 4
In child 5

[paul@wilbur paul]$

Notice how the children print out their message in order. Although this is likely to be the case quite often, you cannot rely on your children to be executed in a certain order. This is one of the basic tenets of multiprocessing: once you spawn the process, it is the OS that decides when it is executed and how much time it is given. Also notice how I get returned to my shell prompt immediately, then call five children print out their message despite me apparently having had control back.

The reason for this is because although the children are attached to the terminal, they are essentially running in the background. As soon as the parent terminates, the command prompt will reappear and you can start executing other programs, however, as you can see, the children will still butt in when they want to (as children are wont to do). Without the sleep() command in the children, this would be less obvious, however it is important to remember that child processes essentially have a life of their own.

I say "essentially", because PHP, like any parent, can be made to watch over its children to make sure they are doing the right thing. This is accomplished through two new functions: pcntl_waitpid(), which instructs PHP to wait for a child, and pcntl_wexitstatus(), which fetches the value returned by a terminated child. We already looked at the exit() function and how it can be used to return a value to the system - we are going to use this to send a value back to our parent process and then retrieve that use pcntl_wexitstatus().

Before you dive into the code, let me first explain how these new functions are used. Firstly, pcntl_waitpid() takes a minimum of two parameters, which should be what kind of child process the parent should wait for, and a variable where the child's status code can be placed. By default, pcntl_waitpid() will cause the parent process to pause indefinitely, waiting for a child to terminate. When a child quits, pcntl_waitpid() returns the PID of the terminated child, then fills its status variable with information about how the child quit.

Alternatively, if pcntl_waitpid() is called and there are no children running, it returns immediately with -1 and does not fill the status variable.

The first parameter to the function must be one of the following:

< -1

Wait for any child process that has the process group ID equal to the absolute value of this number. For example, if you pass in -1802, pcntl_waitpid() will wait for any child that has the process group ID of 1802.

-1

Wait for any child process at all

0

Wait for any child process whose process group ID is equal to that of the calling process. This is the most commonly used value.

> 0

Wait for the child whose process ID is equal to the value of this number. That is, if you pass in 1802, pcntl_waitpid() will wait for child process 1802 to terminate.

So, if 0 is passed into the function as the first parameter, pcntl_waitpid() will wait for any of its child processes to terminate. When it does, it returns the PID of the child process that terminated and filled the second parameter with information about the child that terminated. As we have several children, we need to keep calling pcntl_waitpid() until it returns -1, and each time it returns something else we should print out the return value from the child process.

Returning a value from our child processes is as simple as passing a parameter to exit() rather than just terminating. This gets back to the parent through the return value from pcntl_waitpid(), which returns a status code. This status code does not directly evaluate to the return value, as it contains two bits of information: how the child terminated, and, if the child terminated itself, the exit code it sent back.

For now we will only assume that children terminate themselves, which means that the exit code will always be set inside the return value from pcntl_waitpid(). To extract the exit code from the return value, the pcntl_wexitstatus() function is used, which takes the return value as its only parameter and returns the exit code from the child process.

This might all sound very complicated, but it should become clear once you look over the next item of code. This example shows everything we have discussed:

<?php
    
for ($i = 1; $i <= 5; ++$i) {
        
$pid = pcntl_fork();

        if (!
$pid) {
            
sleep(1);
            print
"In child $i\n";
            exit(
$i);
        }
    }

    while (
pcntl_waitpid(0, $status) != -1) {
        
$status = pcntl_wexitstatus($status);
        echo
"Child $status completed\n";
    }
?>

Note that by using exit($i);, each child returns the number it prints out on the screen as its exit code. The main while loop calls pcntl_waitpid() again and again until it returns -1 (no children left), and, for each child that terminates, it extracts the exit code using pcntl_wexitstatus() and prints it out. Note that the first parameter to pcntl_waitpid() is 0, which means it will wait for all children.

Running that script should stop the command prompt from appearing until all five children have terminated, which is ideal.

 

Next chapter: Duplication of resources when forking >>

Previous chapter: Timing your signals

Jump to:

 

Home: Table of Contents

Follow us on Identi.ca or Twitter

Username:   Password:
Create Account | About TuxRadar