How to set up a web server with Apache
Ask anyone to name a web server for Linux and they'll either mention Apache or be deliberately obtuse by picking something else. It's not that there aren't alternatives, but Apache is everywhere. The others have their advantages, often being lighter, but if you're ever going to transfer a site from your local server to a commercial one, the chances are that it'll run Apache and all your configurations will copy straight across.
Why would you want to install a web server? There are many reasons, but only one needs to apply for you to want to proceed.
- Develop and test a website before transferring to a live server.
- Make documents available over a local network or intranet.
- Run your own private site for family and friends only.
- Experiment with various web-based programs.
- Because you can!
Just about every distro has Apache in its repositories, so install it through your package manager (you may even find it was installed by default). Web servers are rarely run through inetd, but instead are started as separate processes when you boot.
Installing and configuring a server usually requires root access. Your distro's administration tools will prompt for a password when they need this, but you have to take care of it yourself when editing configuration files. Either use a root terminal to launch your editor or use sudo to launch it, for example:
sudo gedit /etc/apache2/httpd.conf
That command will work in Ubuntu and other distros that use sudo. In some distros you have to enter su and you'll then be prompted for your root password. Note that in root mode you have the ability to change critical system files, so always be careful! You can tell that you're logged in as root by the '$' switching to '#' in the prompt, as shown below).
Root access is usually made clear by replacing the usual $ prompt with a # prompt.
Check your distro's services manager to make sure that Apache is set to start at boot and is running, then fire up the nearest browser and point it at http://localhost. You should see either the Apache welcome page, which tells you that you have no content installed, or an error message because your content directory is empty.
Succinct but absolutely correct: the default Apache homepage for your site tells you what you want to know.
If you see a 'cannot connect' error, go back to your services manager and check that Apache is running. The location of the HTML that Apache serves is set in the config file and varies between distros. The standard location is /var/www/localhost/htdocs, but some distros use /srv/www/htdocs or just /var/www. Whatever the base directory for the Apache websites, a popular and sensible convention is to store the HTML files in hostname/htdocs.
This way you can have sites for more than one host stored on the same system. The htdocs subdirectory is there because there are some files that are specific to a site but that you don't want to be accessible from a web browser, such as password files, so these can go in the hostname directory, as Apache won't serve anything above htdocs. This will become more clear shortly.
Some distros don't have a separate directory for each site - Ubuntu puts the files directly in /var/www - so there's no secure location for private files. Let's fix that first.
The Apache configuration files are stored in /etc/apache2 but the specific locations within that directory are organised differently for each distro. The main file is httpd.conf, but this is rarely edited. It uses the Include directive to pull in per-site information from other files or directories.
For example, OpenSUSE has the default site setup in default-server.conf with other sites stored in the vhosts.d directory. Ubuntu keeps all the site configurations in the sites-available directory, with the default having the obvious name. These are then symlinked to the sites-enabled directory, so sites can be enabled and disabled by creating and removing the symlinks, without disturbing the actual configuration details.
Create the directory where you want to store the files, say, /var/www/localhost/htdocs, then load the default site into your preferred editor (even if this is Emacs) and look for the DocumentRoot directive. This is the base directory where Apache looks for files.
Change it to match your directory, then find the <Directory...>...</Directory> section that matches the original setting for DocumentRoot and change the path in the opening Directory tag to suit. You should then have a block that looks something like this, possibly with plenty of explanatory comments.
<Directory "/var/www/localhost/htdocs"> Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all </Directory>
Let's go through this line by line. Apache's configuration is hierarchical in two ways: the various files and directories in /etc/apache make it easier to organise multiple sites, but this is only a convenience, because the use of Includes means that everything is effectively combined into one big file when passed to Apache.
Within that mass of settings, another hierarchy exists; settings can be global or included within specific sections. The Directory stanza is an example of that - these settings apply only to that directory and its subdirectories, and can be modified for those subdirectories by further, more specific, Directory stanzas.
This one sets two options: Indexes tells Apache to generate a HTML listing of a directory's contents if given a URL to a directory and that directory contains no index.html file. Without this, a directory with no index file will give an error message.
FollowSymLinks does pretty much what you would expect - it allows Apache to follow links within the DocumentRoot. Options lines are additive, so if the root of your site has Options Indexes and a subdirectory has Options FollowSymLinks, both will apply to the subdirectory. If you want indexing in the parent directory only, for example, use Options -Indexes in the subdirectory.
The AllowOverride controls the use of .htaccess files, which are a further step in the configuration hierarchy. Configuration for a specific directory can be amended by putting options in a file called .htaccess in that directory.
While this gives flexibility for those administering websites on the servers of others, it should be avoided when you can edit the files in /etc/apache. It's slightly less secure, but the main reason is that with AllowOverride enabled, before each time Apache loads a page, it has to check for a .htaccess file in that page's directory and each of its parents all the way up to DocumentRoot, so performance will suffer. The last two lines are to do with access controls - more on that shortly.
This will get Apache serving static HTML files from a sensibly named directory, so copy your content there and see how it goes. Apache usually runs as the apache:apache user. You can check the User and Group settings in the configs to be sure, so make certain your files are readable by that user. There's a slew of settings for the files, but you may want to check a couple now.
ServerAdmin is the email address of the server's administrator. This is included in some server-generated content, such as error messages. When you give a directory instead of a page in the URL, such as www.linuxformat.co.uk, Apache will look for an index page, which is usually index.html. The DirectoryIndex directive gives the name of the index file if more than one is specified and Apache will look for each in turn.
DirectoryIndex index.php index.html index.htm
...will look first for index.php, then each of the other two, using the first one it finds. If none are found it will either return a directory listing or an error, depending on the Indexes setting.
Who goes there?
You can control access to your server in a number of ways. Listen tells Apache which IP address and port to listen on. The default is port 80 on any address, though you could use this code to run Apache on a different port:
If you have two network cards, one connected to the internet and one to the LAN, the Listen directive can force Apache to respond only to requests from the LAN interface:
...or a combination of the two:
Access is also controlled with the Allow directive; we saw Allow From All in the code above, which is fairly self-explanatory. You could also have these:
Allow from 192.168.1 Allow from example.com
...to allow access from 192.168.1.* or *.example.com. More than one Allow directive can be given and access will be granted if at least one matches. The Deny directive works in exactly the same way to block access, and the Order directive specifies how these interact:
Order allow,deny Order deny, allow
In the first case, the Allows are processed first, and access is rejected unless one matches; then the Denys are processed and access is rejected if any match. Any requests that don't match any of these are denied, so it must match at least one Allow and no Denys.
With the second setting the Denys are processed first, and any that match are denied, unless they then match an Allow. The main difference is that this method blocks requests that match both and lets through requests that match neither, the opposite of the first method.
Passwords of wisdom
Apache can also control access with passwords - you can configure this by adding the following to the configuration file:
AuthType basic AuthName "Registered users only" AuthUserFile /var/www/hostname/.htpasswd Require valid-user
The first line sets the type of authentication; the second is the text that appears when the browser requests the login. AuthUserFile is the full path to the password file (note that it's outside of the DocumentRoot) and finally, Require tells Apache not to allow access until a valid user has authenticated him/herself. This directive can also contain a list of user or group names:
Require user alice bob Require group admin
These users and groups are defined in AuthUserFile, so to prevent anyone downloading it, this file should not be left in your DocumentRoot. This is one reason for using the /var/www/hostname/htdocs as DocumentRoot, because we can then put the password file a level up and still have it specific to the host. Create the file and add the first user with this:
htpasswd -c /var/www/hostname/.htpasswd alice
...and omit the -c for subsequent users. That option creates a new password file, overwriting any existing one of the same name. Each time it will ask for the password for the user, much like the system passwd command.
What if you want to combine methods of access control? You may have an intranet site that you also want some users to be able to access from outside, but you don't want anyone else to get there. The Satisfy directive combines access control methods:
AuthType basic AuthName "Registered users only" AuthUserFile /var/www/hostname/.htpasswd Require valid-user Allow from 192.168.1 Satisfy Any
This means that meeting any one of the Allow or Require criteria will suffice, so local users can connect immediately, but they need to log in when they're out of the office. Satisfy All means that all criteria must be met, so only a valid user on the local network will be allowed in.
When a browser requests a page from a web server, it passes information to that server, including the hostname that it used to get to the server. Apache uses this to create virtual hosts, where a different site is served depending on the URL used.
This has a number of uses. Let's say you want to install a webmail client on your server so you can read your mail while away from home, but you also want a website on the same server. If you have the domain example.com and both www.example.com and mail.example.com are pointing to the same IP address, you can set up everything for your website as above, create another directory at /var/www/mail.example.com/htdocs and install your webmail files in here.
To tell Apache to use this directory for mail.example.com, you need only add a few lines to the configuration. Put this in a separate file - webmail.conf, for example - and place this in the directory your distro uses for virtual hosts, which will be vhosts.d in OpenSUSE or sites-available in Ubuntu with this content:
<VirtualHost mail.example.com:80> DocumentRoot /var/www/mail.example.com/htdocs <Directory "/var/www/mail.example.com/htdocs"> Options -Indexes Order allow,deny Allow from all </Directory> </VirtualHost>
Repeat this for any other virtual hosts you want, then tell Apache to use virtual hosts by adding
...to your main configuration file. Now restart Apache to force it to re-read the configuration and go to mail.example.com in your browser. If you don't have a domain yet, don't worry: you can make one up and put it in /etc/hosts for testing. In fact, you'll need to do this even if you have a real domain if you use a router to access the internet.
This is because a request to mail.example.com will cause a DNS lookup to return your external IP address when you want the local address of the computer, so put something like this:
192.168.1.1 www.example.com mail.example.com
...in /etc/hosts and any request for either of these names will return 192.168.1.1, but the Apache server running at that address will return a different site for each domain. While we're on the subject of routers, if you have a NAT router (you will if you have a typical broadband internet connection) you'll need to set up port forwarding in the router to forward all requests to port 80 to the computer running Apache.
If want your websites to be accessible from outside, you'll need a domain name and, preferably, a static IP address. Your ISP can take care of the latter, although it may charge you for the privilege. Then you can go to one of the hosting companies that advertise in any good magazine (such as Linux Format, natch), or just search the web for domain registration services and register a domain name with them.
Once this is done, have them point it to your IP address. Make sure that the domain registrant you use will actually set up a DNS entry for your domain and IP address. Some of the bargain basement services use some HTTP or even HTML trickery to pass your content to browsers while leaving your domain name pointing to their servers. A decent domain registration serve for a .uk address can be bought for around £10 for two years.
If you don't have a static IP address, you can get a domain name from a dynamic DNS hosting site, such as dyndns.org. These give you a domain name that's a subdomain of one of their options and you run a program each time you connect that sends your IP address to their server. Some modems/routers have an option to work with these sites, automatically updating them when your IP address changes. This sort of setup isn't suitable for a professional site, but is adequate for a home-based site.
A dynamic DNS server can help if you don't have a static IP address, and many routers will keep this up to date for you.
Once you have Apache running, there are all sorts of things you can do with it. Some programs have web-based front-ends that you can use with it; install MythWeb and you can set up recordings with MythTV from anywhere with an internet connection, or use PhpMyAdmin to add an administration GUI for MySQL databases, while Gallery turns your Apache server into a fully-fledged photo gallery site.
Or you could try out wiki, blogging or content management packages without exposing anything to the big bad internet until you're sure of them. Many of these programs use the PHP running on the server, so you may need to install some extra Apache-php packages, depending on your distro.
Once Apache is running, your options are limited only by your imagination. Here it's running a photo album, courtesy of Gallery.
- Daemon A program that runs in the background, waiting for connections. These are usually servers and often have a name ending in d, such as 'sshd' or 'ftpd'. Older versions of Apache used to run as 'httpd', but it's no longer shy about its name.
- Inetd This is a special daemon, sometimes referred to as the superdaemon, that listens for all sorts of connections, then passes them to the appropriate program. Some servers have an option to be called by inetd, or its successor, xinetd, instead of continually running in the background waiting for a connection. Apache is not run in this way.
- Directive This is what the Apache documentation, which you'll spend a considerable amount of time reading, calls a configuration item in any of its config files.
First published in Linux Format magazine