Be wary of garbage collection, part 2

Getting a full understanding of how the garbage collection works in PHP, and also how copy-on-write works, is no easy task. However, if you ever plan to create interactive applications with PHP, such an understanding is essential because without it your program risks leaking memory.

I put together a simple test script that demonstrates how memory usage works in PHP. To run it you need to compile PHP by hand, with the configure switch "--enable-memory-limit". You'll also need to boost the memory limit up from 8MB in your php.ini. Set it to something high like 800MB temporarily, just to make sure. To get the memory usage, just call the memory_get_usage() function with no parameters - it returns the amount of memory being used by the current PHP process, in bytes. Here's the script:

echo "Stage 1: Mem usage is: ", memory_get_usage(), "\n";

  $arr = array();

  for ($i = 0; $i < 1000000; ++$i) {
    $arr[] = rand();
  }

  echo "Stage 2: Mem usage is: ", memory_get_usage(), "\n";

  $foo = 1;
  $bar = 2;

  echo "Stage 3: Mem usage is: ", memory_get_usage(), "\n";

  $foo = $arr;
  $bar = $arr;

  echo "Stage 4: Mem usage is: ", memory_get_usage(), "\n";

$arr = array();

  echo "Stage 5: Mem usage is: ", memory_get_usage(), "\n";

  $bar[] = "hello, world";

  echo "Stage 6: Mem usage is: ", memory_get_usage(), "\n";

  $foo = array();

  echo "Stage 7: Mem usage is: ", memory_get_usage(), "\n";

For those of you without the will or the way to do that, I ran the script for you. Here's what I got:

Stage 1: Mem usage is: 37712
Stage 2: Mem usage is: 60232136
Stage 3: Mem usage is: 60232248
Stage 4: Mem usage is: 60232248
Stage 5: Mem usage is: 60232288
Stage 6: Mem usage is: 104426704
Stage 7: Mem usage is: 60242672

OK, so what does that tell us? Before the script has done anything, PHP is already using 37KB of RAM. This is where the parsed script and other basic components live - there's nothing we can do about that. In Stage 2, we have allocated 1,000,000 numbers into the array $arr, using up 57.5MB of RAM. Yes, PHP is reporting 60232136 bytes, but there are 1024 bytes in a kilobyte, and 1024 kilobytes in a megabyte, hence 57.5MB. By stage 3 we've also got the $foo and $bar variables set to integers, so there's a nominal increase in memory usage. So far, so good.

Now, the interesting part is stage 4: $arr is "copied" into $foo and $bar, giving us three instances of the same array. However, look at the memory usage - it's exactly the same as in stage 3. Why is this? Copy-on-write, of course! That is, both $foo, $bar, and $arr are all pointing to the same internal array.

This is illustrated in the next two stages. In stage 5 $arr is set to be an empty array, and yet the memory usage barely moves. It goes up a little because a new array structure is allocated empty for $arr, but it's basically negligible. In stage 6 we've added an array element to the $bar array, so PHP performs the copy-on-write operation - $bar takes a full copy of the array it was previously pointing to, then adds the new element. At this point, $foo and $bar are point to two different arrays, and $arr is pointing to an empty array.

In stage 7, $foo is also set to be an empty array, and suddenly there's a huge drop in the amount of memory used as $foo's array gets cleaned up. Note, however, that even though the $bar array is no longer referenced in the rest of the script, it is not garbage collected: PHP holds it in memory all the way until the script finishes.

So, what lessons can we learn from that?

  • If you want a global scope variable to release its memory, use the unset() function or set it to a different value. Otherwise, PHP will keep it floating around just in case.

  • Copy-on-write is your friend, and means that for all intents and purposes arrays are copied by reference in the same way that objects are. The only difference is that if you change the array subsequently, a deep copy is performed.

  • The minute you unset() or re-assign a variable, PHP frees its memory. Freeing memory - particularly large amounts - isn't free in terms of processor time, which means that if you want your script to execute as fast as possible at the expense of RAM, you should avoid garbage collection on large variables while it's running, then let PHP do it en masse at the end of the script.

Author's Note: The garbage collection mechanism is due to change in PHP 5.1, which has the potential to change this performance tweak quite drastically. If you intend on using PHP 5.1, please do grab it from the PHP site and test this thoroughly before committing to a plan of action.

 

Next chapter: Listen to all errors, big and small >>

Previous chapter: Be wary of garbage collection, part 1

Jump to:

 

Home: Table of Contents

Follow us on Identi.ca or Twitter

Username:   Password:
Create Account | About TuxRadar