What To Do When Your Website Goes Down

Advertisement

Have you ever heard a colleague answer the phone like this: “Good afterno… Yes… What? Completely?… When did it go down?… Really, that long?… We’ll look into it right away… Yes, I understand… Of course… Okay, speak to you soon… Bye.” The call may have been followed by some cheesy ’80s rock ballad coming from the speaker phone, interrupted by “Thank you for holding. You are now caller number 126 in the queue.” That’s your boss calling the hosting company’s 24 hour “technical support” line.

An important website has gone down, and sooner or later, heads will turn to the Web development corner of the office, where you are sitting quietly, minding your own business, regretting that you ever mentioned “Linux” on your CV. You need to take action. Your company needs you. Your client needs you. Here’s what to do.

1. Check That It Has Actually Gone Down

Don’t take your client’s word for it. Visit the website yourself, and press Shift + Refresh to make sure you’re not seeing a cached version (hold down Shift while reloading or refreshing the page). If the website displays fine, then the problem is probably related to your client’s computer or broadband connection.

If it fails, then visit a robust website, such as google.com or bbc.co.uk. If they fail too, then there is at least an issue with your own broadband connection (or your broadband company’s DNS servers). Chances are that you and your client are located in the same building and the whole building has lost connectivity, or perhaps you have the same broadband company and its engineers have taken the day off. You will need to check the website on your mobile phone or phone a friend. To be doubly sure, ask your friend to check Where’s It Up?1 or Down for Everyone or Just Me?2, which will confirm whether your website is down just for you or for everyone.

If the website is definitely down, then frown confusedly and keep reading. A soft yet audible sigh would also be appropriate. You might want to locate the documents or emails that your Internet hosting service3 sent you when you first signed up with it. It should have useful details such as your IP address, control panel location, log-in details and admin and root passwords; these will come in handy.

2. Figure Out What Has Gone Down

A website can appear to have gone down mainly for one of the following reasons:

  • A programming error on the website,
  • A DNS problem, or an expired domain,
  • A networking problem,
  • Something on the server has crashed,
  • The whole server has crashed.

To see whether it’s a programming error, visit the website and check the status bar at the bottom of your browser. If it says “Done” or “Loaded,” rather than “Waiting…” or “Connecting…,” then the server and its software are performing correctly, but there is a programming error or misconfiguration. Check the Apache error log for clues.

Otherwise, you’ll need to run some commands to determine the cause. On a Mac with OS X or above, go to Applications → Utilities and run Terminal. On a PC with Windows, go to Start → All Programs → Accessories and choose “Command Prompt.” If you use Linux, you probably already know about the terminal; but just in case, on Ubuntu, it’s under Applications → Accessories.

The first command is ping, which sends a quick message to a server to check that it’s okay. Type the following, replacing the Web address with something meaningful to you, and press “Enter.” For all of the commands in this article, just type the stuff in the grey monospaced font. The preceding characters are the command prompt and are just there to let you know who and where you are.

C:> ping www.stockashop.co.uk

If the server is alive and reachable, then the result will be something like this:

Reply from 92.52.106.33:
bytes=32 time=12ms TTL=53

Ping command from a Windows computer
Ping command from a Windows computer.

On Windows, it will repeat four times, as above. On Linux and Mac, each line will start with 64 bytes from and it will repeat indefinitely, and you’ll need to press Control + C to stop it.

The four-part number in the example above is your server’s IP address. Every computer on the Internet has one. At this stage, you can double-check that it is the correct one. You’ll need to have a very good memory, or refer to the documentation that your hosting company sent you when you first signed up with it. (This article does not deal with the newish eight-part IPv6 addresses4.)

For instance, my broadband company is sneaky and tries to intercept all bad requests so that it can advertise to me when I misspell a domain name in the Web browser. In this case, the ping looks successful but the IP address is wrong:

64 bytes from advancedsearch.virginmedia.com
(81.200.64.50): icmp_seq=1 ttl=55 time=26.4 ms

Note that ping might also show the server name in front of the IP address (advancedsearch.virginmedia.com in this case). Don’t worry too much if it doesn’t match the website you are pinging — a server can have many names. The IP address is more important.

Assuming you’ve typed the domain name correctly, a bad IP address indicates that the domain name could have expired or that somebody has made a mistake with its DNS settings. If you receive something like unknown host, then it’s definitely a domain name issue:

ping: unknown host www.nosuchwebsite.fr

In this case, use a website such as Who.is5 to verify the domain registration details, or run the whois command from Linux or Mac. It will at least tell you when it expired, who owns it and where it is registered. The Linux and Mac commands host and nslookup are also useful for finding information about a domain. The nslookup command in particular has many different options6 for querying different aspects of a domain name:

paul@MyUbuntu:~$ whois stockashop.co.uk
paul@MyUbuntu:~$ host stockashop.co.uk
paul@MyUbuntu:~$ nslookup stockashop.co.uk
paul@MyUbuntu:~$ nslookup -type=soa stockashop.co.uk

If nothing happens when you ping, or you get something like request timed out, then you can deepen your frown and move on to step three.

What a non-responding server looks like in a Linux terminal
What a non-responding server looks like in a Linux terminal.

Alternatively, if your server replied with the correct IP address, then you can exhale in relief and move on to step five.

Note that there are plenty of websites such as Network-Tools.com7 that allow you to ping websites. However, using the command line will impress your colleagues more, and it is good practice for the methods in the rest of this article.

3. How Bad Is It?

If your ping command has timed out, then chances are your whole server has crashed, or the network has broken down between you and the server.

If you enjoy grabbing at straws, then there is a small chance that the server is still alive and has blocked the ping request for security reasons — namely, to prevent hackers from finding out it exists. So, you can still proceed to the next step after running the commands below, but don’t hold your breath.

To find out if it is a networking issue, use traceroute on Mac or Linux and tracert on a PC, or use the trace option on a website such as Network-Tools.com. On Mac and Linux type:

paul@MyUbuntu:~$ traceroute www.stockashop.co.uk

On Windows:

C:> tracert www.stockashop.co.uk

Traceroute traces a route across the Internet from your computer to your server, pinging each bit of networking equipment that it finds along the way. It should take 8 to 20 steps (technically known as “hops”) and then time out or show a few asterisks (*). The number of steps depends on how far away the server is and where the network has broken down.

The first couple of steps happens in your office or building (indicated by IP addresses starting with 192.68 or 10). The next few belong to your broadband provider or a big telecommunications company (you should be able to tell by the long name in front of the IP address). The last few belong to your hosting company. If your server is alive and well, then the very last step would be your server responding happily and healthily.

Traceroute on a Mac
Traceroute on a Mac, through the broadband company and host to an unresponsive server.

Barring a major networking problem, like a city-wide power outage, traceroute will reach your hosting company. Now, you just need to determine whether only your server is ill or a whole rack or room has gone down.

You can’t tell this just from traceroute, but chances are the servers physically next to yours have similar IP addresses. So, you could vary the last number of your server’s IP address and check for any response. If your server’s IP address is 123.123.123.123, you could try:

C:> ping 123.123.123.121
C:> ping 123.123.123.122
C:> ping 123.123.123.124
C:> ping 123.123.123.125

If you discover that the server is in the middle of a range of 10 to 20 IP addresses that are all broken, then it could well indicate a wider networking issue deep within the air-conditioned, fireproof bunker that your server calls home. It is unlikely that the hosting company would leave so many IP addresses unused or that the addresses would have all crashed at the same time for different reasons. It is likely, though not definitive, that a whole rack or room has been disconnected or lost power… or burned down.

Alternatively, if nearby IP addresses do reply, then only your server is down. You can proceed to the next step anyway and hope that the cause is that your server is very secure and is blocking ping requests. Perhaps upgrade that deep frown to a pronounced grimace.

Otherwise, you’ll have to keep listening to Foreigner until your hosting company answers the phone. It is the only one that can fix the network and/or restart the server. But at least you now have someone else to blame. And if you are number 126 in the queue, it’s probably because 125 other companies think their websites have suddenly gone down, too.

4. Check Your Web Server Software

If the server is alive but just not serving up websites, then you can make one more check before logging onto the server. Just as your office computer has a lot of software for performing various tasks (Photoshop, Firefox, Mac Mail, Microsoft Excel, etc.), so does your server. Arguably its most important bit of software is the Web server, which is usually Apache on Linux servers and IIS on Windows servers. (From here on in, I will refer to it as “Web server software,” because “Web server” is sometimes used to refer — confusingly — to the entire server.)

When you visit a website, your Web browser communicates with the Web server software behind the scenes, sharing caching information, sending and receiving cookies, encrypting and decrypting, unzipping and generally managing your browsing experience.

You can bypass all of this and talk directly to the Web server software by using the telnet command, available on Windows, Linux and Mac. It will tell you conclusively whether your Web server software is alive. The command ends with the port, which is almost always 80:

ping@MyUbuntu:~$ telnet www.stockashop.co.uk 80

If all were well, then your Web server software would respond with a couple of lines indicating that it is connected and then wait for you to tell it what to do. Type something like this, followed by two blank lines:

GET / HTTP/1.1
Host: www.stockashop.co.uk

The first / tells it to get your home page; you could also say GET /products/index.html or something similar. The Host line tells it which website to return, because your server might hold many different websites. If your website were working, then your Web server software would reply with some headers (expiry, cookies, cache, content type, etc.) and then the HTML, like this:

Screenshot
Checking the web server software with telnet.

But because there is a problem, telnet will either not connect (indicating that your Web server software has crashed) or not respond (indicating that it is misconfigured). Either way, you’ll need to keep reading.

5. Logging Into Your Server

The remote investigations are now over, and it’s time to get up close and personal with your errant server.

First, check your server’s documentation to see whether the server has a control panel, such as Plesk or cPanel. If you’re lucky, it will still be working and will tell you what is wrong and offer to restart it for you (in Plesk, click Server → Service Management).

Restarting your Web server in Plesk
If your server has a control panel such as Plesk, try logging in to make sure the Web server is running.

If not, then the following commands apply to dedicated Linux servers. You could try them in shared hosting environments, but they probably won’t work. Windows servers are a different kettle of fish and won’t be addressed in this article.

To log in and run commands on the server, you will need the administrative user name and password and the root password, as provided by your host. For shared hosting environments, an FTP user name and password might work.

On Linux and Mac, the command to run is ssh, which stands for “secure shell” and which allows you to securely connect to and run commands on your server. You will need to add your administrative user name to the command after -l, which stands for “login”:

paul@MyUbuntu:~$ ssh -l admin www.stockashop.co.uk

Windows doesn’t come with ssh, but you can easily download a Windows SSH client such as Putty8. Download putty.exe, save it somewhere and run it. Type your website as the host name and click “Open.” It will ask you who to log in as and then ask for your password.

Putty configuration
Using Putty to SSH from a Windows computer.

Once you have successfully logged in, you should see something like admin@server$, followed by a flashing or solid cursor. This is the Linux command line, very similar to the Terminal or command prompt used above, except now you are actually on the server; you are a virtual you, floating around in the hard drive of your troubled server.

If ssh didn’t even connect, then it might be blocked by a firewall or turned off on the server. If it said Permission denied, then you’ve probably mistyped the user name or password. If it immediately said Connection to www.stockashop.co.uk closed, then you are trying to log in with a user name that is not allowed to run commands; make sure you’re logging in as the administrative user and not an FTP user.

6. Has It Run Out Of Space?

Your server has likely not run out of hard disk space, but I’m putting this first because it’s a fairly easy problem to deal with. The command is df, but you can add -h to show the results in megabytes and gigabytes. Type this on the command line:

admin@server$ df -h

The results will list each file system (i.e. hard drive or partition) and show the percentage of each that has been used.

Screenshot9
Checking hard disk usage on a Linux server.

If any of them show 100% usage, then the command probably took eons to type, and you will need to free up some space fast.

Quick Fix

You should still be able to FTP to the server and remove massive files that way. A good place to start is the log files and any back-up directories you have.

You could also try running the find command to search for and remove huge files. This command finds files bigger than 10 MB and lets you scroll through the results one page at a time. You might need to run it as root to avoid a lot of permission denied messages (see below for how to do this). It might also take a long time to run.

root@server# find / -size +10000000c | more

You could also restrict the search to the full partition or to just your websites, if you know where they are:

root@server# find /var/www/vhosts/ -size +10000000c | more

If you want to know just how big those files are, you can add a formatting sequence to the command:

root@server# find /var/www/vhosts -size +10000000c -printf "%15s %pn"

When you’ve found an unnecessarily big file, you can remove it with rm:

root@server# rm /var/www/vhosts/badwebsite.com/backups/really-big-and-old-backup-file.tgz

Permanent Fix

Clearing out back-ups, old websites and log files will free up a lot of space. You should also identify any scripts and programs that are creating large back-up files. You could ask your host for another hard drive.

7. Has It Run Out Of Memory?

Your server might just be running really, really slowly. The free command will let you know how much memory it is using. Add -m to show the results in megabytes.

admin@server$ free -m

The results will show how much of your memory is in use.

Checking memory usage on a Linux server
Checking memory usage on a Linux server.

The results above say that the server has 3550 MB, or 3.5 GB, of total memory. Linux likes to use as much as possible, so the 67 MB free is not a problem. Focus on the buffers/cache line instead. If most of this is used, then your server may have run out of workable memory, especially if the swap space (a bit of the hard drive that the server uses for extra memory) is full, too.

If your server has run out of memory, then the top command will identify which bit of software is being greedy.

admin@server$ top

Every few seconds, this gives a snapshot of which bits of software are running, which user started them and how much of your memory and CPU each is using. Unfortunately, this will run very slowly if memory is low. You can press “Q” or Control + C to exit the command.

The Linux top command shows what is running
The Linux top command shows what is running.

Each of the bits of software above is known as a “process.” Big pieces of software such as Apache and MySQL will often have a parent process with a lot of child processes and so could appear more than once in the list. In this benign example, a child process of the Apache Web server is currently the greediest software, using 7.6% of the CPU and 1.6% of the memory. The view will refresh every three seconds. Check the Mem column to see whether anything is consistently eating up a large portion of the memory.

Quick Fix

The quickest solution is to kill the memory hog. You will need to be root to do this (unless the process is owned by you — see below). First of all, though, search on Google to find out what exactly you are about to kill. If you kill a core program (such as the SSH server), you’ll be back to telephone support. If you kill your biggest client’s data amalgamation program, which has been running for four days and is just about to finish, then the client could get annoyed, despite your effort to sweeten it with “But your website is okay now!”

If the culprit is HTTPD or Apache or MySQLd, then skip to the next section, because those can be restarted more gracefully. In fact, most things can be restarted more gracefully, but this is a quick ignore-the-consequences type of fix.

Find the process ID in the PID column of the command above, and type kill -9, followed by the number. For example:

root@server# kill -9 23421

The -9 tells it to stop completely and absolutely. You can now run top again to see whether it has made a difference. If some other similar process has jumped to the memory-eating position instead, then you’ve probably only stopped a child process, and you will need to find the parent process that spawned all the greedy children in the first place, because stopping the parent will stop all the children, too. Use the process ID again in this command:

root@server# ps -o ppid,user,command 23421

This asks Linux to show you the parent process ID, user and command for the process number 23421. The results will look like this:

PPID  USER     COMMAND
31701 apache   /usr/sbin/httpd

The PPID is the parent process ID. Now try killing this one:

root@server# kill -9 31701

Run top again. Hopefully, the memory usage has now returned to normal. If the parent process ID was 0, then some other process entirely is consuming memory, so run top again.

Permanent Fix

You will probably have to restart the offending software at some point because you may have just disabled your server’s SPAM filter or something else important. If the problem was with Apache or MySQL, you might have an errant bit of memory-eating programming somewhere, or Apache, MySQL or PHP might have non-optimal memory limits. There’s a slim chance that you have been hacked and that your server is slow because it’s sending out millions of emails. Sometimes, though, a server has reached capacity and simply needs more RAM to deal with the afternoon rush.

To find out what went wrong in the first place, check the web logs and/or the log files in /var/log/. When your hosting company has finally answered the phone, you can ask it to also take a look. Figuring out what happened is important because it could well happen again, especially if it’s a security issue. If the hosting company is not responsive or convincing enough, seek other help.

8. Has Something Crashed?

Most Linux servers use Apache for the Web server software and MySQL for the database. It is easy to see whether these are still running (and to restart them if they’re not) or are using up way too much memory. To see all processes running on your server right now, run this command:

admin@server$ ps aux | more

Scroll through the list and look for signs of apache (or its older name httpd) and mysqld (the “d” stands for daemon and is related to the way the programs are run). You are looking for something like this:

USER       PID %CPU %MEM   VSZ   RSS TTY     STAT START   TIME COMMAND
apache   29495  0.5  1.4 90972 53772 ?       S    14:00   0:02 /usr/sbin/httpd
apache   29683  0.3  1.4 88644 52420 ?       S    14:03   0:00 /usr/sbin/httpd
apache   29737  0.3  1.4 88640 52520 ?       S    14:04   0:00 /usr/sbin/httpd

Or you can use the grep command to filter results:

admin@server$ ps aux | grep http
admin@server$ ps aux | grep mysql

If either Apache or MySQL is not running, then this is the source of the problem.

This listing shows that Apache is indeed running
This listing shows that Apache is indeed running.

Quick Fix

If Apache or MySQL is not running, then you’ll need to run the commands below as root (see below). Linux usually has a set of scripts for stopping and starting its major bits of software. You first need to find these scripts. Use the ls command to check the couple of places where these scripts usually are:

root@server# ls /etc/init.d/

If the results include a lot of impressive-looking words like crond, httpd, mailman, mysqld and xinetd, then you’ve found the place. If not, try somewhere else:

root@server# ls /etc/rc.d/init.d/

Or use find to look for them:

root@server# find /etc -name mysqld

Once it is located, you can run a command to restart the software. Note that the scripts might have slightly different names, like apache, apache2 or mysql.

root@server# /etc/init.d/httpd restart
root@server# /etc/init.d/mysqld restart

Hopefully, it will say something like Stopping… Starting… Started. Your websites will start behaving normally again!

Permanent Fix

As above, check the log files, especially the Apache error logs. Sometimes these are all in one place, but usually each website on the server has its own error log. You could look through the ones that were busiest around the time of the crash. Or else you could have a misconfiguration or a programming bug or security breach, so it could well happen again until you identify and address the cause.

Becoming a Super-User

Most of the fixes above require special permissions. For example, you (i.e. the user you have logged in as) will be able to kill or restart processes only if you started them. This can happen on shared servers but is unlikely on dedicated servers, where you will see a lot of permission denied messages. So, to run those commands, you will need to become the server’s super-user, usually known as “root.” I’ve left this for last because it’s dangerous. You can do a lot of irreversible damage as root. Please don’t remove or restart anything unless you’re sure about it, and don’t leave your computer unattended.

There are two ways to run a command as root. You can prefix each command with sudo, or you can become root once and for all by typing su. Different servers place different restrictions on these commands, but one of them should work. The sudo command is more restrictive when it turns you into a lesser non-root super-user who is able to run some commands but not others. Both commands will ask for an extra password. For example:

admin@server$ sudo /etc/init.d/httpd restart

When you run su successfully, the prompt will change from a $ to a #, like this:

admin@server$ su
Password:
admin@server#

It might say admin@server or root@server. Either way, the # means that you are powerful and dangerous — and that you assume full liability for your actions.

Conclusion

This article has provided a few tips for recognizing and solving some of the most common causes of a website going down. The commands require some technical knowledge — or at least courage — but are hopefully not too daunting. However, they cover only a small subset of all the things that can go wrong with a website. You will have to rely on your hosting company if it is a networking issue, hardware malfunction or more complicated software problem.

Personally, I don’t mind the ’80s music that plays while I’m on hold with my hosting company. It’s better than complete silence or a marketing message. But it would be even better if the support rep picked up the phone within a few seconds and was ready to help. That is ultimately the difference between paying $40 per month for a dedicated server versus $400.

When the dust has settled, this might be a conversation worth having with your boss — the one still sitting glumly by the phone, eyeing your frown, and waiting for Bono to stop warbling.

(al)

↑ Back to topShare on Twitter

Paul Tero is an experienced PHP programmer and server administrator. He developed the Stockashop ecommerce system in 2005 for Sensable Media. He now works part-time maintaining and developing Stockashop, and the rest of the time freelancing from a corner of his living room, and sleeping, eating, having fun, etc. He has also written numerous other open sourcish scripts and programs.

  1. 1

    Useful article… Thank you for taking the time to write it. I have bookmarked it for future reference.

    9 times out of 10, when my client’s sites go down, it’s an issue caused by their hosting provider, and yet they still automatically assume it’s my fault!

    19
  2. 2

    Nice post now lots of people are anxious about the DDoS attacks regarding Wikileaks. :-) How current.
    It’s always great to have SSH access.
    (I prefer using dig instead of NSLookup)

    0
  3. 3

    Nice Article :D

    0
  4. 4

    Very thorough. And bookmarked!

    3
  5. 5

    I use Nagios from a remote server to ping my production servers every 90 sec. I have used one of the pinging services, but Nagios gives much more control. More importantly, I pay the premium for managed servers at Singlehop, so I get 24/7 surveillance; after 10 minutes they jump in and start diagnosing the issue. Peace of mind worth the cost.

    I would like to see an article on ways to create a mirror server with a some sort of failover or switchover. If the hard drive crashes, it could take hours to get all the software back (much of it must be installed and not just copied over from the backups) and all the data backups moved over.

    All I know is that everytime I get a ping error notification, my heart stopped. 99.9% of the the time it’s a temporary connection issue that goes away, but one never knows.

    1
  6. 6

    Hehe thanks,

    You know the past few months things have happened on my website and changes had to be made. Everytime I check this website strangely enough everytime an article appears on subjects I actually need.

    What a coincidence :D

    I’ve been a visitor for almost a year now, and I want to thank you guys for these wonderful articles.

    Regards

    1
  7. 7

    Nice article, but I think that everyone that has only one site big enough already knows how to do all of the above, or more.

    It’s funny, there’s no SYNFLOOD or DDOS information, how to track such attacks, how to handle them and so on. That would be helpful!

    0
  8. 8

    This is a very useful article for anyone who deals with or helps diagnose problems for clients. Anything you can do to help the account folks solve server problems is certainly appreciated, especially if the client’s income is being affected. Being able to check and see if a server is responding at all, and if it is being able to provide extra feedback when dealing with the host support can speed up restoration of service significantly. Thanks for this.

    1
  9. 9

    I agree with Bogdan (sta ima) if you own dedicated server you better know all of that above, but I think it’s a good tutorial for web owners that are not that much into tech like Gawker :) . Leaving clear txt passwords on server, who does that?

    Anyway, regarding the $400 vs $40 I never pay more than $100 for dedicated server and my site gets over 600K pageviews. How?

    Use Cloud Computing… Amazon, Rackspace….

    Dedicated servers are just waist of money, why would I spend $800-$900 in my case, to have half managed server with some pretty console? I rather get book and learn about web servers and security than buying half managed dedicated servers on any host.

    On cloud you get same thing, and it’s 50% cheaper.

    0
  10. 10

    I would add, “Check Twitter”. A lot of hosts nowadays have a feed which publishes the statuses of their servers. It could save you some trouble during the early stages at least.

    0
  11. 11

    To be honest, this is one of the more useless articles I have read on Smashing. No offence Paul, because I understand where you were trying to go with this. I personally manage my own servers and I know firsthand that when a client’s site goes down it is one of the most terrifying experiences as a developer. But you missed the mark. There are just too many scenarios, end of story. Diving into the ethics of what to do, or possibly applying some business theories to this would have made for a better article.

    -5
  12. 12

    Awesome article – I really love when Smashing does “slightly” more nitty gritty articles like this one. Still have my fingers crossed for an article on best practices for automating your webserver deployment!

    5
  13. 13

    You missed out a couple of incredibly important steps:

    1) Don’t Panic
    2) Ensure you have a towel on standby

    6
  14. 14

    Great article, thanks. I will be trying out some of your suggestions.

    Since this type of thing has happened to me quite often I have my own quick checklist that you might want to add:

    1. See if all of your websites are down on that server or just the one.
    2. Check to see if your host’s website is down as well! (if it is you’re f*$%d)
    3. I think you mentioned it but see where/if you can login, ftp, ssh, web, etc.
    4. Don’t freak out too bad, especially on the support people at the hosting company.
    5. BACKUP EVERYTHING

    One of our hosts had a major outage where a fire marshal flipped the wrong switch during a test and sent a spray of water down on a roomful of active servers including mine. They were all destroyed. Luckily I pay for nightly backups but it was a nightmare getting everything up and running again.

    A couple of years ago it was a huge fire that burned up a roomful a servers including mine.
    If you have a website it will happen to you sooner or later so backup your sites locally.

    Regards,
    Mark Lewis
    Partners In Rhyme Inc
    partnersinrhyme.com

    0
  15. 15

    I just usually cry and avoid the phone until it comes back ;)

    10
  16. 16

    @James: nice one :-)

    THe last time I had a major issue was when the hosting supplier had a major power issue, and there was nothing I could do about it. Fortunately, I can RDC into my servers and do everything there, when I can’t get in then panic does set in. I’d second an article for mirror servers – I’d like to get mine to mirror each other.

    1
  17. 17

    Rizqi Djamaluddin

    December 13, 2010 7:10 am

    I think this is a really good starting point (though perhaps a tad too technical, but that can’t hurt either). Even developers who never touch their client’s (maybe self-hosted) site should have some knowledge of troubleshooting websites that went down — if only “ah, it’s from your host, try calling them?”, it’s better than saying no outright.

    A suggestion for the author: Maybe more references in each scenario and stage could be useful, for those of us running different setups or when the first aid situations don’t help!

    Overall, still very useful and worth a read.

    0
  18. 18

    Check out http://UptimeMonitor.net if you want a easy way to know when your site goes down

    1
  19. 19

    @breadwild
    If you’re very concerned about having your site online 100% of the time there is a solution that can do this with a dedicated server. It’s basically a system that magically resumes service after a few seconds if Server A goes down. All the files are synced between 2 physical systems on the fly. It will be a bit more expensive that a non failover system but you will sleep better. It also doesn’t protect against everything, such as network outages. It will protect against system A going down for any reason, such as a kernel reboot, hardware maintenance or general crash.

    I’m actually in the works of setting up systems like this for some of my clients. Feel free to give me a shout on twitter. @leggettsteven

    Steve

    0
  20. 20

    Thanks for all the great comments – and I’m glad most people found the article useful. Coincidentally, one of the servers I maintain became unresponsive on Monday, and I logged in and ran “top” and found that MySQL was using all the memory. So although this article does only brush the surface of what can go wrong with a server, it will help in some cases. (Actually, not that much of a coincidence as I caused the problem in the first place – but not on purpose – and I was pleased to be able to take my own advice.) And although anybody who maintains a server should already know this stuff, I learned it through frantic research and practice with the phone ringing in the background. Sorry I forgot to mention towels.

    4
  21. 21

    Thanks Paul, this is a useful article. It is always worth investing in some site monitoring software like Are My Sites Up (http://aremysitesup.com/) This can email and SMS you to tell you when one of your sites goes down.

    This way you should know that one of your sites isn’t working and possibly fix it before your client notices.

    0
  22. 22

    Nice article Paul. I would add that if the server is down and you’re using any PPC campaigns, it would be good to pause them till the server recovers. You don’t want to pay for a terrible user experience.

    0
  23. 23

    Had a downed server today, or at least it looked that way…
    - couldn’t access the websites
    - could access FTP, but couldn’t upload anything (read-only)
    - could access SSH, but no write access (even as root)
    - and everything I did gave me an error (BUS error)

    And even though I have some knowledge about linux and hardware, I had no idea how to fix this (or even check out what was wrong).

    So, all you need when *it* hits the fan; call the support center of you server (even dedicated servers have support) and let them handle the problem. Don’t try to “fix” it yourself if you don’t know what you’re doing.

    Conclusion: Always have support numbers ready, just in case.

    PS the problem was a defect HD in the RAID array, so no way to fix this from home/office.

    1
  24. 24

    Fantastic article. Thank you

    2
  25. 25

    I don’t quite understand the point of this article. If it’s targeted at server administrators, they should be replaced instantly if they don’t know these basic tasks and tools. If it’s targeted at designers, developers and site maintainers who don’t have advanced server admin skills (like me), then usually your host whoever it is including shared hosting companies, have support departments who take over if it’s in fact down for everyone. I can’t imagine that too many people who are savvy enough to run their OWN server, don’t know these things, and companies should have an IT team or person in charge of this stuff.

    3
  26. 26

    Looks like there isn’t even a mention of services like “Are My Sites Up?”, that email/text you within a few minutes of your site actually going down.

    So: Get a service (or make your own) to check on your sites regularly, so you can find out early when they’re down.

    0
  27. 27

    Great post! It contains exactly everything I need to control my websites.
    Regards!

    2
  28. 28

    Great article: I like the way it is written with a gentle sense of humor and it covers all the basics. It’s nice for people to have a clear plan of action to be able to follow.

    1
  29. 29

    Houser – Don’t be a douche!

    If you bother to read the other comments, you can easily see that this article is indeed both useful and appreciated. Next time try to make suggestions that are intended to be constructive criticism from a fellow developer, instead of extolling your elevated thinking process as an arrogant elitist that is above such mundane topics. Basically, try to be helpful without having to show everyone how smart you are.

    1
  30. 30

    I totally disagree on step 1′s hitting shift+F5 to get the uncached version of your site. If that is indeed the solution to your problem, then only YOU have solved the problem. Your USERS may still be getting the cached version. Don’t expect the average user to clear their cache when something goes wrong. They’d rather just assume your site is broken and leave.

    The correct way to serve uncached files is to version all your images, javascript, and CSS whenever releasing a new version of your site. For example: style_20101213.css and sprite_20101213.png. If you edit the CSS that manipulates your sprite PNG without versioning, your site will appear broken to your users. You may not realize it because during development, you’ve been constantly hitting F5 to refresh your cache. Don’t expect the average user to do the same. They usually get to your site through a bookmark or link, not through refreshing, and thus will get the cached version from their harddrive.

    -3
  31. 31
  32. 32

    This article sends the wrong message.

    It should read:

    If you don’t know how to troubleshoot a webserver,
    you should get *managed* hosting.

    0
  33. 33

    And of course, make sure you’re using our product, Are My Sites Up ( http://aremysitesup.com ) and you’ll know exactly when/if your site goes down, so you can hold your hosting company (or yourself) responsible! :)

    0
  34. 34

    Great article, thanks for sharing this with us.

    Have one point though, I am not sure how genuine “Down for Everyone or Just Me?” is as I checked a perfectly fine website (www.univertal.com) and their result was that it was “also” down for them. With off-course an advertisement that you should switch to their hosting partner.

    0
  35. 35

    the site will get back up anyway, so why stress yourself?

    -2
  36. 36

    I think you missed something while “reading” this post. The author never said refreshing the cache was the solution. He was using it as a way to determine the problem. When troubleshooting, it is always important that you are seeing the current state of the issue. If you load the site and it looks correct, but the client says it’s down, you should most definitely refresh your cache to make sure the server is indeed serving up the site and you’re not just viewing the cached state.

    Once you have determined whether or not that is the problem, then you can address it as needed. (including versioning.)

    please read and understand the post before adding your ‘informed’ opinions.

    2
  37. 37

    unless the problem was caused by you or your client. Or if for some reason your hosting company is unaware of your specific issue, especially if it is only affecting your machine or your account. Or programming errors. Or TOS violation. Or if you went over a usage quota. Or if your domain didn’t get renewed…. etc. etc. etc.

    there are a multitude of things that could happen that would cause you to take action first or the site will not get back up. It’s your responsibility to take action, as a reputable business person and out of respect for your clients.

    1
  38. 38

    Rizqi Djamaluddin

    December 13, 2010 7:05 pm

    All of this, plus it’s simply about showing a good business impression. Even if we have no idea where it’s hosted, who manages it, or whatever, clients will appreciate it when we know a bit of what’s going on.

    It’s like passing by your dentist friend and asking what medicine to take for his headache. It’d be much more tactful for him to give an offhand suggestion, as a friendly person, rather than shrug and say “not my thing, go ask a doctor or wait it out.”

    It’s generally just showing that you have the technical merits, even if it’s not your responsibility. (Of course, you shouldn’t spend too long working on it; just quick troubleshooting to get a handle on the problem will do.)

    1
  39. 39

    Nice article Paul. Great work. Details are explained very nicely.
    Please don’t pay too much for hosting if there is no need for it.

    1
  40. 40

    nice read, i may be in need of an admin from time to time , paul where would i find your contact info ?

    1
  41. 41

    @ORyan Indeed, he just one of those people who jump in recklessly to smash people’s article but in the end humiliating yourself…

    0
  42. 42

    Thanks for this article. Any advise if a site has been hacked?

    0
  43. 43

    Thanks – There’s a link to the company I work for at the end of the article.

    0
  44. 44

    I think it’s good for all developers to have a few basic skills, so they can give their clients a bit of information – as several people have said in the comments. But also, many small web companies may end up with a client with a dedicated server – if the client is taken over from a previous company, or the client insists – and the hosting company’s phone support department may not be quick enough. And hiring somebody with those server administration skills may not be feasible. I worked for a company with a dedicated server from 1and1, which was fine for a couple of years, but one time it went down and stayed down for over a day, and their support number had long queues and wasn’t helpful. I needed to figure out what to do myself. After that we switched to Rackspace. Thanks for the comment, Paul

    1
  45. 45

    I’m not sure – I don’t know much about the security side. I suppose you should figure out the type of hack that was used, remove it completely, and make sure the server is patched so it won’t happen again. Searching the server/website for all files modified around the time of the hack, and looking in log files, is a good place to start.

    0
  46. 46

    Good idea. Some hosting companies also provide this service for you, and will follow a set of instructions you have provided, including some minimal fixing, and then ring you.

    0
  47. 47

    i found this website useful for checking if the problem was a local network issue – if the site is down for everyone or just you : http://downforeveryoneorjustme.com

    1
  48. 48

    “regretting that you ever mentioned “Linux” on your CV.”

    Haha I did this mistake. But thank god smashing magazine was there for the rescue. Thanks Paul!

    1
  49. 49

    Thanks Paul! I think I have to get some reading material about this!

    1
  50. 50

    Few helpful / important thingies…

    To find out what’s taking so much space use “du -sh *” command in a folder, it will list the disk usage of the sub folders (Note, this is sometimes really slow task, if you have a lot of small files). If you don’t have really busy sites with uploading enabled, the first place to check is /var/log folder.

    When you stop running processes, try “kill 12345″ first, and the process will quit nicely if it responds, otherwise use the mentioned “kill -9 12345″.

    There are few ways for becoming a super user as mentioned in the article. On Ubuntu use “sudo -i” . On other, not sudo oriented, systems use “su -l” (“true” login as root user).

    After becoming root you can change to other users using su again, for example “su otherusername” or “su -l otherusername” (the difference with using -l parameter is that the environment variables are set correctly, sometimes it _might_ cause trouble if you are not using -l).

    I have played with servers for a while and found these commands / practices really useful.

    Cheers!

    0
  51. 51

    To mitigate the network issue, we created a mirror on a European server. But even that doesn’t work if WWIII breaks out.

    0
  52. 52

    What to do when your website goes down (and is on a shared reseller) – go for a beer.. there is not much else to do :)

    0
  53. 53

    Mehmet Orkun Alabaz

    December 15, 2010 7:57 am

    also for bigger projects i think having multiple hosts (mirrors) and having an a record switch from your dns provider is great.

    0
  54. 54

    Fantastic! I had an issue with a Joomla! website crashing a server and a week later this post comes out providing a great guideline. I was able to find a PHP fatal error among other things that really helped.

    Thanks!

    0
  55. 55

    maybe useful:
    normaly you don’t need a lot of changes in mysql’s memory settings for common cms and blog systems.

    i like to work with tools like tuning-primer – it gives you recommodations to your mysql settings. but dont just do everything some script tells you – always think twice before doin any changes on any settings.
    this example excerpt of the tuning primer’s output may would have helped you not to get in trouble with wrong mysql memory settings.
    . . .
    MEMORY USAGE
    Max Memory Ever Allocated : 425 M
    Configured Max Per-thread Buffers : 1 G
    Configured Max Global Buffers : 338 M
    Configured Max Memory Limit : 1 G
    Physical Memory : 1.97 G
    Max memory limit seem to be within acceptable norms
    . . .
    it shows the amount of memory that mysql would need if all connections would be in use with the current settings.

    0
  56. 56

    it is always a good advice NOT to take the < 5$ / month unlimited bandwith plan if you do SERIOUS business.
    websites running on such plans normaly have a much slower response beside the sucking support because they share the same hardware with thousands of other web domains.

    0
  57. 57

    it is also a good idea to check the servers log file. Keep in mind that if your server runs out of disk space, you wont be able to login with ssh.

    0
  58. 58

    Wow, useful! But I hope I never have to do this.

    0
  59. 59

    The last couple times we were down it was DNS. You’d ping the address and it didn’t resolve to a number. Our IT department decided to move our registar without telling anyone. There were 5 DNS servers, but they only changed the first two since, “We can hit it later, there’s no chance anyone is going beyond the first two.” It caused us to be offline in large population centers, but not in smaller markets. After we got IT to admit they had actually changed something without approval, we explained how they lost us a minimum of $15,000 in sales. Last time they blocked DNS from the web server, so the server itself couldn’t resolve it’s own URL.

    0
  60. 60

    Well i use a pretty simple approach.. access ssh via putty > top .. if load average is higher then 8.0 … Service httpd restart , service cpanel restart, service sql restart… sometimes requires killing a process but not everytime..
    helps me every time . in traffic spikes

    1
  61. 61

    Great Article :)

    0
  62. 62

    We recently had a similar issue at work.
    Our corporate site went down, we could ping it, but not ssh into it. It turned out that MySQL had crashed and was using 100% of the CPU and it required the hosting company to physically reboot the server.

    0
  63. 63

    I work at a medium sized webhosting company here in the UK; here’s a few tips:

    Use the following command to watch PHP scripts that are running in real time as well as monitor server load and MySQL processes (much more effective than staring at “top” in despair):

    watch -n1 ‘uptime ; echo -e “r” ; ps aux | grep php | grep -v watch* | grep -v grep* ; echo -e “r” ; mysqladmin pr’

    Use the following to kill all PHP scripts running immediately:

    pkill -9 php

    If a script is constantly hammering the server and you want to disable it, use the following:

    chmod 0 /home/username/path/to/php/script.php

    List the amount of times an IP has accessed a site in the past 24 hours:

    cat /usr/local/apache/domlogs/DOMAIN | awk {‘print $1′} |sort | uniq -c | sort -n

    List the amount of times each page on a site has been accessed in the past 24 hours:

    cat /usr/local/apache/domlogs/DOMAIN | awk {‘print $7′} |sort | uniq -c | sort -n

    When running “top” to check server load, run the following to display the full commands being run and to refresh every 1 second:

    top -cd1

    Check the mail queue size (for possible spam detection):

    exim -bpc

    For DNS issues, use the following sites:

    intodns.com – Diagnoses DNS issues in detail (awesome)
    mxtoolbox.com/blacklists.aspx – The best blacklist/RBL checker
    just-dnslookup.com – Check the IP your domains pointing to from various locations
    whois.sc – Check a domains registration/expiration/etc…

    4
  64. 64

    This was really helpful! Thanks for sharing.

    0
  65. 65

    And never, repeat never, use telnet to connect to a web server. This is as bad an idea as it can get, given that it is so easy for a hacker to listen to an unencrypted exchange. Use ssh. And no ftp either. That’s what sftp is for.

    0
  66. 66

    Must be nice to have that kind of $.Unless we are talking large e-commerce their cost is an overkill for 90% of sites.

    1
  67. 67

    ideally….but web is not an ideal world.

    0
  68. 68

    @Shandon really? THIS article helped you find a PHP error hahahaha

    -1
  69. 69

    what is ‘serious business’?
    spoken like a true overpriced host.
    You can do PLENTY of ‘business’ on a shared host….having a ‘portfolio’ site is serious business that does not require lots of server power.

    0
  70. 70

    I love how you ‘figured it out’ buy switching to Rackspace LOL.
    But true…you never know the skill level of the ‘tech support’ at hosting company and being able to do some troubleshooting is HUGE. Even if you do not actually DO anything for the client on the server side it will help you with your communication to the the client even if it is to give some bad news.

    1
  71. 71

    My organization’s site is being hosted by a company that has been experiencing a lot of DDOS problems. We will be moving to another host that guarantees 100% up time. Do you think 100% is really possible?

    0
  72. 72

    i think rather than using kill -9 directly.!!! i would suggest to use
    kill -3
    sleep (60)
    kill -9… :).. neverthe less.. gr8 article for starters and can be useful for cracking interviews..

    thumbs up bor..:)

    0
  73. 73

    I host my clients website on a managed dedicated server. If the site is down, I usually know before the client. I just email my server admin team, and they have it back in no time. Things do go down, and if you want some peace of mind in your life, you let professionals handle it. Maybe you can do it, but are you available 100% of the day? What do you do when you’re on vacation? (if you get one). I did my own hosting in a co-location datacenter and it was a total waste of my time.

    0
  74. 74

    I love the article. Guided me when we had an outage and helped. we have since used a few monitoring services to tell us when we are down instead of a customer! Now we have a secondary server and use a DNS failover service to tell us when we go down and that tzo service instantly moves our DNS to point to our secondary server on an alternate host. Thanks for the great article!

    0
  75. 75

    You’re the man … handy reference. You have been added to my colossal bookmarks folder. :D

    @WordVerb … curiosity grabs me. Who are you hosting with now ? Just my view on it ( I’m by no means a networking/admin guru ) Yes, 100% uptime is totally poss.

    Have seen monitoring where shared web hosts had 100% uptime for a given month. At least for the server(s)/websites that were being monitored.

    0
  76. 76

    I have lost access to my site netcashforum.com. My hosting company is also down so I guess it is a server problem. However I called them on the phone and was told that the site encountered a security threat from hackers which made them to shut down the site to prevent loss of websites. They promised to be back in 30 minutes but it is up 7 hours now the problem has not been resolved. I get this message when I try to enter my site Index of /

    cgi-bin/
    community/
    I am confused. Please what do I do?

    0
  77. 77

    I think this article was fab, it provides good info to beginners as well a reminder for users with a bit more experience under their belt!

    Zak,

    1
  78. 78

    Theres another free DNS tools site at http://viewdns.info/ for those that are looking for alternatives.

    0
  79. 79

    Terrific work! That is the type of information that are supposed to be shared across the net. Disgrace on Google for not positioning this post upper! Come on over and talk over with my site . Thanks =)

    0
  80. 80

    This information is great because I experienced this yesterday. I’m relatively new at web coding and was editing code in the header on word press when the site went down. In the control panel I saw “Expired” but I didn’t know what that meant. I didn’t know if that meant the site was down or that the domain had expired and needed renewed. From what I’ve read and seen I think that my client had forgot to update their domain. I had busily been blaming myself of course, in my nature. Thanks for this great blog of useful information. ;)

    0
  81. 81

    For me, as a novice in this area, this article was a lot of fun to read and very interesting. I have learned from it and really appreciate the effort from the writer.

    0
  82. 82

    Finding where the issue is has always been a pain for me, so we recently made an algorithm that troubleshoots the cause of downtime.

    You can plug in your URL to http://whysitedown.com and it will run the tests for you.

    0
  83. 83

    Or you could just sign up for letsmonitor.com and get a text when your site goes down :)

    0
  84. 84

    This article was great in helping me try & work out why my site is down… thank you for taking the time to write it. I’m sure I speak for all tech-phobes when I say just how helpful this kind of thing really is!
    I’m having some trouble still however. My server is responding to the ‘ping’ but when I try to log in to the cPanel the page isn’t responding, though it isn’t showing me a connection error screen either. It just keeps reloading & wiping my login info. I use ‘Web Hosting Hub’ as my server. Is anyone else having the same problems & is there anything more I can do to try & fix this?
    Thank you,
    Ria.

    0
  85. 85

    Helpful article……Thank you

    0
  86. 86

    What a hoot – you made my day. I’m no techie, but the “Boss losing the will” when the website crashes.

    After a long day dealing with the type of issue you have mentioned, after reading your article I was rolling on the floor laughing.

    You’re a funny guy…like your style – ever thought about stand up?

    Thank you for sharing your advice in such an entertaining way.

    0
  87. 87

    This was very helpful and ensured my sanity. I guess I should have thought of what I would do if my site crashes before I launched it but never occurred to me; the guide was super-helpful. Thanks!!

    0

↑ Back to top