Compressing output

Probably the most popular reason for people using output buffering is that it allows you to compress the HTML you send to your visitors. Using compression makes your site load faster for your users, and also allows you to make more use of the bandwidth allocated to your server.

Whenever a visitor connects to your site, they send along information such as the last page they were at, the name of the web browser they are using, what content they accept, and what kind of encoding they accept. The encoding part is what we're interested in - if a browser supports compressed HTML, it sends word of this to the web server each time it requests a page. The web server can then send back compressed HTML if told to do so - this is important, because browsers that do not support compressed HTML will always get plain HTML back, so everyone's a winner.

Compressed HTML is literally the zipped version of the normal HTML a browser would otherwise have received - the client unzips it, then reads it as normal. As zipping information requires that you must know all the information before you compress it, output buffering is perfect - you send all your data to a buffer, zip the buffer, and send it off to your users.

As the tie between output buffering and output compression is so close, the code to make it work is equally close - you actually just pass a parameter to ob_start(), which is 'ob_gzhandler', and that will automatically check whether content compression is support, and enable it if it is.

From the client's point of view, nothing will have changed, except the fact that the site might load a little quicker. If they click "View Source" from their web browser, they'll see normal HTML - the process is entirely transparent. However, if we take a look behind the scenes, gzip is clearly in operation.

Save this script in the root directory of your web server as ob.php:

<?php
    ob_start
('ob_gzhandler');
    print
"My content\n";
    
ob_end_flush();
?>

It is quite a simple script, but enough to illustrate that content compression is working. Now, open up a command line window, and enter this command: telnet <your server> 80. This will connect to your web server on the HTTP port - you get to pretend to be a web browser. Once you are connected, enter GET /ob.php HTTP/1.0 and press enter twice - you should see a load of HTTP headers, followed by the content of the page.

Here is what I got below:

[paul@carmen paul]$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost (127.0.0.1).
Escape character is '^]'.
GET /ob.php HTTP/1.0

HTTP/1.1 200 OK
Date: Wed, 23 Jul 2003 00:25:07 GMT
Server: Apache-AdvancedExtranetServer/2.0.48 (Mandrake Linux/5.0mdk) mod_perl/1.99_08 Perl/v5.8.0 mod_ssl/2.0.45 OpenSSL/0.9.7a PHP/5.0.0
Accept-Ranges: bytes
X-Powered-By: PHP/5.0.0
Connection: close
Content-Type: text/html; charset=ISO-8859-1

My content
Connection closed by foreign host.

As you can see "My content" is in there, but note that it is in plain text - if you are wondering why it is not compressed, remember that the server will only compress output if we tell it we support compression . Open the telnet connection again, but this time enter GET /ob.php HTTP/1.0, press enter, then type "ACCEPT-ENCODING: gzip" and press enter twice. Here is what I got this time around:

>

[paul@carmen paul]$ telnet localhost 80

Trying 127.0.0.1...

Connected to localhost (127.0.0.1).

Escape character is '^]'.

GET /ob.php HTTP/1.0

ACCEPT-ENCODING: gzip

HTTP/1.1 200 OK

Date: Wed, 23 Jul 2003 00:30:11 GMT

Server: Apache-AdvancedExtranetServer/2.0.48 (Mandrake Linux/5.0mdk) mod_perl/1.99_08 Perl/v5.8.0 mod_ssl/2.0.45 OpenSSL/0.9.7a PHP/5.0.0

Accept-Ranges: bytes

X-Powered-By: PHP/5.0.0

Content-Encoding: gzip

Vary: Accept-Encoding

Connection: close

Content-Type: text/html; charset=ISO-8859-1

ò­THÃŽÃ+IÃ+áÿsÎ÷

Connection closed by foreign host.

There are two important things to note in there. Firstly, note that Apache now sends "Content-Encoding: gzip" back as a HTTP header so that the browser knows it needs to unzip the contents of the page. Secondly, note that our "My content" message has been scrambled - it is compressed now. Yes, it is actually grown in size, but that is simply because compressing very small amounts of text is useless. Everything over about 100 characters is good for compression.

Author's Note: if you are wondering why Apache does not send its headers compressed, the reason is because the headers are what tell the client that the content is compressed - if the headers themselves are compressed, how would the client know it hadn't just received garbage?

So, as you can see, the same PHP code will send two different HTML pages depending on what the client supports, which means there is little reason not to use content compression unless you are certain that the data you are sending does not compress well and it would be a waste of CPU time to try.

Note that content compression works only on the contents of the output buffer - it does not compress pictures, CSS files, or other attachments to your HTML.

Author's Note: be wary using multiple buffers with compression. You're only allowed one compressed buffer with PHP because of the need to compress content all at once.

 

Next chapter: URL rewriting >>

Previous chapter: Flushing output

Jump to:

 

Home: Table of Contents

Follow us on Identi.ca or Twitter

Username:   Password:
Create Account | About TuxRadar