Diagnose and fix network problems yourself

Networking

A recent and typical case of Linux network failure was the friend who rang up to say his "network had stopped". As error reports go, this is on a par with the classic Apollo 13 line "Houston, we've had a problem", though a little less life-threatening. Luckily, Linux has a goodly collection of network tools to help us figure out exactly what had gone wrong. (To eliminate any stress-inducing suspense, let me reveal that we eventually discovered that he had been disconnected by his ISP as a result of forgetting to renew his subscription.)

So, follow along with us now as we review some of the network diagnostic tools in Linux and see how to use them to get answers to the question "what's wrong with my network?"

The most important thing, when you're trouble-shooting something, is to have some idea how it's supposed to work in the first place. Does your machine have a static IP address, and if so, what should it be? Does it use DHCP, and if so, where is the DHCP server, and what range of IP addresses is it expected to allocate? Do you have a broadband modem directly connected to your machine, or do you have a separate broadband router to which you connect via ethernet or wireless?

Our methodology in this tutorial is to take a "bottom up" approach. We start by checking the really low-level stuff first, then gradually work our way up to higher levels. The sequence of tests we'll perform is summarised (approximately) in Figure 1, below. This is a good, systematic approach for network connections that have never worked. On the other hand, if it was working fine yesterday, it's generally faster to start at the top and work your way down.

Figure 1: summary of the testing sequence.

Figure 1: summary of the testing sequence.

Can Linux find your network card?

In this instance, the first question to ask is: "Is Linux seeing your network interfaces?" You may be able to answer this by looking through the boot-time messages from the kernel using the command dmesg:

# dmesg | grep eth 
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection 
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection 
e1000: eth0: e1000_watchdog: NIC Link is Up 10 Mbps Half Duplex 

Alternatively, try listing the devices on the bus with lspci:

# lspci | grep Ethernet 
01:01.0 Ethernet controller: Intel Corporation 82547EI
02:01.0 Ethernet controller: Intel Corporation 82540EM

Failure at this stage suggests faulty or unsupported hardware.

Does it have an IP address assigned?

Assuming that the kernel knows your network card is there, the next question is: does it have an IP address assigned? The simplest command to use for this is ifconfig:

# ifconfig eth0
 eth0 Link encap:Ethernet  HWaddr 00:0C:F1:96:A3:F7  
     inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0 
     inet6 addr: fe80::20c:f1ff:fe96:a3f7/64 Scope:Link 
     UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1 
     RX packets:306 errors:0 dropped:0 overruns:0 frame:0 
     TX packets:261 errors:0 dropped:0 overruns:0 carrier:0 
     collisions:8 txqueuelen:10 
     RX bytes:43074 (42.0 KiB)  TX bytes:34480 (33.6 KiB) 
     Base address:0xac00 Memory:ff7e0000-ff800000 

The important line here is the second one, which shows an assigned IP address of 192.168.0.3. If you do not see such a line, then it follows that there is no assigned IP address. Even if there is an IP address assigned, give a moment's thought to whether it's a valid address for the network you're on.

In an operational environment, several times we have experienced networks running into trouble after the introduction of a machine that turned out to be running an (unintentional) DHCP server configured with a pool of addresses that weren't valid for that network. If a machine was rebooted, it had about a 50/50 chance of getting a valid IP address from the "real" DHCP server, or a rogue address from the imposter.

If your network interface has no IP address, check the system configuration files: Is the interface configured to be started at boot time? If so, is it configured to use DHCP, or does it have a static IP address? The files you need to look in for this are distro-specific.

On Fedora and Red Hat the filename would be of the form /etc/sysconfig/network-scripts/ifcfg-eth*, on SUSE it would be /etc/sysconfig/network/ifcfg-eth*, and on Ubuntu it would be /etc/network/interfaces. (Aren't standards a marvellous thing; don't you just adore all these gratuitous version-specific differences?) Of course, all these distributions have graphical tools to allow you to inspect and edit these settings; for example, Figure 2, below shows Fedora's system-config-network tool.

Figure 2: Fedora's system-config-network tool.

Figure 2: Fedora's system-config-network tool.

Normally, the initialisation of an interface is buried deep in a boot-time script, and the interaction with the DHCP server can be difficult to observe. However, you may be able to see the DCHP activity by running the script ifup directly, or by running dhclient. This program handles the dialogue with the DHCP server and the setting of network parameters:

# dhclient 
Internet Systems Consortium DHCP Client V3.0.5-RedHat 
Copyright 2004-2006 Internet Systems Consortium. 
All rights reserved. 
For info, please visit http://www.isc.org/sw/dhcp/ 

Listening on LPF/eth1/00:0e:0c:01:d3:a0 
Sending on   LPF/eth1/00:0e:0c:01:d3:a0 
Listening on LPF/eth0/00:0c:f1:96:a3:f7 
Sending on   LPF/eth0/00:0c:f1:96:a3:f7 
Sending on   Socket/fallback 
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 7 
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4 
DHCPOFFER from 192.168.0.1 
DHCPREQUEST on eth0 to 255.255.255.255 port 67 
DHCPACK from 192.168.0.1 
bound to 192.168.0.3 -- renewal in 125868 seconds. 

This particular system has two network interface, eth0 and eth1. We see that eth0 obtained an IP address from the DHCP server at 192.168.0.1. The eth1 interface tried to do the same, (it transmitted a DHCPDISCOVER) but didn't get a reply, which isn't surprising in this case as it wasn't actually connected to anything.

Can you ping your router?

If you've got a valid IP address, a good next step might be to test if you can ping one of the other machines on your network. A successful ping looks something like this:

# ping -c1 192.168.0.6 
PING 192.168.0.6 (192.168.0.6) 56(84) bytes of data. 
64 bytes from 192.168.0.6: icmp_seq=1 ttl=64 time=0.468 ms 

--- 192.168.0.6 ping statistics --- 
1 packets transmitted, 1 received, 0% packet loss, time 0ms 
rtt min/avg/max/mdev = 0.468/0.468/0.468/0.000 ms 
and an unsuccessful one looks like this: 
# ping -c 1 192.168.0.2 
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. 
From 192.168.0.3 icmp_seq=1 Destination Host Unreachable 

--- 192.168.0.2 ping statistics --- 
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms 

The message "Destination Host Unreachable" usually means that the target machine (192.168.0.2 in this example) isn't connected or isn't running, and so failed to respond to my machine's ARP request to find its MAC address. It could also mean that your machine can't find a route to reach the local network; the most likely reason for this is that it has an IP address that's not actually part of the local network.

It's also possible that you have a more complex routing problem, but that's unlikely on a typical home network that has only one (default) route. If you don't have any other machines on your network, you can try pinging your router. (You do know the IP address of your router, right?)

If you can't ping your local router, then you obviously have a local problem. If you have a wired network, check the cabling and the little green lights on the network interfaces at each end.

Is your firewall blocking the traffic?

At some point in your diagnosis, it's worth checking to see whether your firewall settings are screwed down too tight. A quick-and-dirty way to do this favoured by many sysadmins in a hurry is to flush all the firewall rules with the command

# iptables -F 

and then try again. If this solves the problem, then at least you know that the problem that's been causing you grief is firewall-related. At that point you should reboot (to re-establish the firewall) and investigate further. Do not be tempted to leave the firewall disabled, this is a Bad Idea!

Do you have an ADSL connection to your ISP?

But if you can ping your router, it's time to start widening your net, so to speak. There might be some more little green lights on your router (and if you can find the instruction book you may even be able to figure out what they mean!) that will allow you to determine if the ADSL modem in your router has successfully connected to your ISP.

Some broadband routers also provide various web-based administration screens that you can use to determine the status of your connection. Figure 3, below, shows one such example. The things to look for here are the Connection Status setting, and the IP address that the ISP has assigned to your outward-facing network connection. (You probably don't much care what that IP address actually is, you just want to confirm that there is one!)

Figure 3: web-based administration screens that you can use to determine the status of your connection.

Figure 3: web-based administration screens that you can use to determine the status of your connection.

Try disconnecting and re-connecting manually, and see if you can figure out at what point it fails. If you can't get a connection, you should obviously check the cabling from the router to the phone line, (and it's worth plugging a phone handset into the line to check if you get a dial tone) but if this all looks OK, a call to your ISP's tech support line is probably in order. Make a flask of coffee and grab a good book first, though... those call queues can be long!

Can you ping the target system?

If you appear to have a good connection to your ISP, it's time to continue up the stack with the testing. Try pinging a known external machine using its IP address. For example, Linux Format's web server is at 80.244.178.151. (Of course, it's entirely possible that this will change before you get to read this, but it will serve for the purpose of this example.)

# ping -c1 80.244.178.151 
PING 80.244.178.151 (80.244.178.151) 56(84) bytes of data. 
64 bytes from 80.244.178.151: icmp_seq=1 ttl=56 time=24.3 ms 

--- 89.167.142.11 ping statistics --- 
1 packets transmitted, 1 received, 0% packet loss, time 0ms 
rtt min/avg/max/mdev = 24.367/24.367/24.367/0.000 ms 

If this works, your network connectivity is actually in quite good shape. As a final test, try pinging the machine using its DNS name:

# ping -c1 www.linuxformat.com 
PING www.linuxformat.com (80.244.178.151) 56(84) bytes of data. 
64 bytes from www.linuxformat.com (80.244.178.151): icmp_seq=1 ttl=56 time=24.2 ms 

--- www.linuxformat.com ping statistics --- 
1 packets transmitted, 1 received, 0% packet loss, time 0ms 
rtt min/avg/max/mdev = 24.249/24.249/24.249/0.000 ms 

DNS failures show up very quickly with this test; for example:

$ ping www.prophylactic.gov 
ping: unknown host www.prophylactic.gov 

If you can ping a machine using its IP address but not using its DNS name, it's high time that you investigated your DNS configuration. The best utility for this is dig. Here's a sample (and successful) run. Don't be intimidated by the level of detail that is shown in the output; the important thing to note is the A record returned in the ANSWER section:

# dig www.linuxformat.com 

; <<>> DiG 9.4.0 <<>> www.linuxformat.com 
;; global options:  printcmd 
;; Got answer: 
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23236 
;; flags: qr rd ra; QUERY:1, ANSWER:2, AUTHORITY:2, ADDITIONAL:2 
;; QUESTION SECTION: 
;www.linuxformat.com.		IN	A

;; ANSWER SECTION:
www.linuxformat.com.	3600	IN	A	80.244.178.151

;; AUTHORITY SECTION:
linuxformat.com.	300	IN	NS	ns0.future.net.uk.
linuxformat.com.	300	IN	NS	ns1.future.net.uk.

;; ADDITIONAL SECTION:
ns0.future.net.uk.	104	IN	A	89.167.142.1
ns1.future.net.uk.	104	IN	A	89.167.143.1

;; Query time: 323 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Thu Mar 26 21:42:40 2009
;; MSG SIZE  rcvd: 134

If the DNS lookup fails, you need to distinguish a couple of cases: The first case is when DNS can't find the machine you're looking for. Here's an example of an attempt to look up a machine that simply doesn't exist:

# dig prophylactic.gov 

; <<>> DiG 9.4.0 <<>> prophylactic.gov 
;; global options:  printcmd 
;; Got answer: 
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 13168 
;; flags: qr rd ra; QUERY:1, ANSWER:0, AUTHORITY:1, ADDITIONAL:0 

;; QUESTION SECTION: 
;prophylactic.gov.           IN   A 

;; AUTHORITY SECTION: 
gov.                    2560 IN   SOA     a.gov.zoneedit.com. govcontact.zoneedit.com. 1183644065 3600 900 1814400 86400 

Notice the NXDOMAIN report for the status of the enquiry, and the absence of an ANSWER section like we saw in the previous lookup. Assuming you've entered a valid machine name, this kind of failure is somebody else's problem.

Can you find your DNS server?

The second case of DNS failure is the situation where your machine can't find a DNS server. This indicates a problem that is potentially closer to home,

# dig www.linuxformat.co.uk 

; <<>> DiG 9.4.0 <<>> www.linuxformat.co.uk 
;; global options:  printcmd 
;; connection timed out; no servers could be reached 

If this happens, take a look at the file /etc/resolv.conf. This is where Linux records its idea of where its DNS servers are. If you use DHCP to configure your networking, the IP addresses of your DNS servers are supplied by your DHCP server. If you have a static setup, you probably used a graphical network configuration tool such as Fedora's system-config-network to specify the location of your DNS servers. In either case, the results are written into this file. Is there a valid nameserver IP address in this file? Can you ping it directly?

If all else fails in your diagnostic attempts, try looking at the network traffic with wireshark, a packet trace utility previously known as ethereal. As a diagnostic tool, we do tend to view wireshark as a "last resort": not because of any weakness in the program (wireshark is actually a great piece of software) but because debugging network problems by examining the detailed packet traffic requires a very detailed knowledge of TCP/IP and the overlying application protocols. Also (depending on the problem) you may need a "third party" machine on the network in order to observe the traffic.

# ping 192.168.0.42 

run on the machine with the IP address 192.168.0.3. Take a look at the upper of wireshark's three display panels; it shows a one-line summary for each packet captured. The middle and bottom panels let us drill down into the contents of the individual packets, but for our present purposes we don't need to go there.

Figure 4: this screengrab shows a simple example of a Wireshark capture of the packet traffic resulting from the command 'ping 192.168.0.42'

Figure 4: this screengrab shows a simple example of a Wireshark capture of the packet traffic resulting from the command 'ping 192.168.0.42'

The message is clear and simple: the machine 192.168.0.3 is trying to use ARP to discover the MAC address of the machine it's trying to ping. It tries three times, at one second intervals, but gets no reply.

So we can conclude that there's nothing wrong with 192.168.0.3 - it's able to get packets out onto the network with the correct source IP address - but that the machine 192.168.0.42 simply isn't there.

First published in Linux Format

First published in Linux Format magazine

You should follow us on Identi.ca or Twitter


Your comments

no proxy-related content :(

It's a good article, but unfortunately it doesn't cover proxy-related things at all. I'd like to see more information regarding how to diagnose proxy-related problems (especially with ISA server)

At the moment I've got a bunch of unresolved issues:
my linux box doesn't ping external ip & fqdn, it's not able to connect to external ftp & ssh servers, but the same things work perfectly on windoze...

It'll be really helpful to get an article covering these proxy-related issues...

GNUCASH .. the most VALUABLE network repair tool

Sounds about right these days ...

good article, but...

Don't you think it would be a little more efficient to start with a ping to a target system on the internet, followed by a ping to the router if the first one doesn't reply?

I appreciate the effort to show the different possibilities for troubleshooting, which does justify your approach. But I do think in a real life situation you would have to start by locating the problem. If you can ping a remote system, or your local router, you can skip the rest in between.

Useful, but doesn't cover USB network connection.

Well done for a very useful article.

On my home PC, I connect to my ADSL router, using a USB port. As a result, my network connection is known as USB0. This wouldn't have been picked-up by grepping dmesg for eth, but I guess my set-up is not that widely used.

As I said though, good article.

chris_debian

Some bullshit here

One question what do you do if you have:
iptables -P INPUT DROP
iptables -P OUTPUT DROP

and you do iptables -F ???
Answer end up blocked... Probably wanted to say:
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -F
iptables -X

also nat and mangle tables should be checked.

WiFi

In stage 1, you missed how to scan for WiFi cards. This is probably the most important as ethernet hardware tends to be detected a lot more reliably than WiFi hardware. Moreover, you should add some stuff about seeing if it's connected to the correct SSID, with the correct security, etc.

not noob friendly

doesn't specify WHERE to run these commands... in root? home? as superuser???

ifconfig command not found

dhclient command not found

this was written in '09 maybe needs updating and a little more clarity for noobs... ALSO might want to include info for those looking to setup WIFI

something more missing

What if I can assign an IP to the network device manually but cannot ping anything but 127.0.0.1? Dhclient doesn't get any offers, too.
Yes, the box is pysically connected to the network.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

CAPTCHA
We can't accept links (unless you obfuscate them). You also need to negotiate the following CAPTCHA...

Username:   Password:
Create Account | About TuxRadar