Part TwoHow To Fix The Web: Obscure Back-End Techniques And Terminal Secrets

Advertisement

New Perspectives on Coding1Editor’s Note: Today we are happy to present to you the second part of the sample chapter from the upcoming printed Smashing Book #4: New Perspectives on Coding2, written by Paul Tero. You might want to read the first part3 of this chapter beforehand — if you haven’t already. Also, feel free to download the full chapter from the Smashing eBook Library4.

Just a little reminder before you start: In part 15 we explored the infrastructure of the Internet and the make-up of a Web server. We left off at the stage where our Web server software is up and running again, and we’ve just double-checked this by telnetting an HTTP request and received the successful response code. It’s now time for…

Finding Your Website

The 200 code means that your home page is okay, and you should be able to visit it in your browser. However, it may not show what you expected, and your fabulous Widget 3000 page may still be absent.

Virtual Hosts and Streams

As mentioned above, many servers host multiple websites. One of these is the default website. It is the website you get when you visit the server by IP address http://80.72.139.101/ instead of by name, or when you leave off the Host: line in the HTTP request while telnetting. The rest of the websites are known as virtual hosts. Every one of these websites has a physical location on the server known as its document root. To further investigate your website woes, you need to discover its document root.

Fortunately and sensibly, most server management packages like Plesk store their virtually hosted websites according to their domain name, so you can usually just find directly on the domain name. The / in the command below tells find to search the whole file system, the -type d looks only for directories, and the -name part searches for any directories containing “smashingmagazine”. The asterisks are wild cards. You’ll need to either escape them *smashingmagazine* or put them in quotes "*smashingmagazine*":

$ find / -type d -name "*smashingmagazine*"
find: '/var/run/cups/certs': Permission denied
find: '/var/run/PolicyKit': Permission denied
/var/www/vhosts/smashingmagazine.com
/var/www/vhosts/smashingmagazine.com/httpdocs...

If you run this command as a normal unprivileged user, you will probably see lots of “Permission denied” as find tries to explore forbidden places. You are actually seeing two types of output here: stdout for “standard output” and stderr for “standard error”. They are called output streams and are confusingly mixed together.

You have already encountered the pipe symbol | for piping the output stream (stdout) of one command into the input stream (stdin) of another. The symbol > can redirect that output into a file. Try this command to send all the matches into a file called matches.txt:

$ find / -type d -name "*smashingmagazine*" > matches.txt
find: '/var/run/cups/certs': Permission denied
find: '/var/run/PolicyKit': Permission denied...

In this case, all the stdout is redirected into the file matches.txt and only the error output stream stderr is displayed on the screen. By adding the number 2 you can instead redirect stderr into a file and just display stdout:

$ find / -type d -name "*smashingmagazine*" 2> matcherrors.txt
/var/www/vhosts/smashingmagazine.com
/var/www/vhosts/smashingmagazine.com/httpdocs...

There is a special file on Linux, UNIX and Mac computers which is basically a black hole where stuff gets sent and disappears. It’s called /dev/null, so to only see stdout and ignore all errors:

$ find / -type d -name "*smashingmagazine*" 2> /dev/null
/var/www/vhosts/smashingmagazine.com
/var/www/vhosts/smashingmagazine.com/httpdocs...


The end result is that this find command tells you roughly where your document root is. In Plesk, all the virtual hosts are generally stored within the /var/www/vhosts directory, with the document roots in /var/www/vhosts/domain.com/httpdocs.

The Long Way

You can find the document root more accurately by looking through the configuration files. For Apache servers, you can find the default website’s document root by looking through the main configuration file which is usually /etc/apache2/apache2.conf or /etc/httpd/conf/httpd.conf.

$ grep DocumentRoot /etc/httpd/conf/httpd.conf
DocumentRoot "/var/www/html"

Somewhere inside this conf file will also be an Include line which references other conf files, which may themselves include further conf files. To find the DocumentRoot for your virtual host, you’ll need to search through them all. You can do this using grep and find but its a long command, so we will build it up gradually.

First, we will find all the files (because of the -type f) on the whole server (the /) whose names end in “conf” or “include”. The -type f finds only files and the -o lets us look for files ending in “conf” or “include”, with surrounding escaped parentheses. As above, the errors are banished into the ether:

$ find / -type f ( -name *conf -o -name *include ) 2> /dev/null
/var/spool/postfix/etc/resolv.conf
/var/some file with spaces.conf
/var/www/vhosts/myserv.com/conf/last_httpd.include...

This is not quite complete as any files with spaces will confuse the grep command we are about to attempt. To fix that you can pipe the output of the find command through the sed command which allows you to specify a regular expression. Regular expressions are a huge topic in their own right. In the command below, the s/ /\ /g will replace all spaces with a slash followed by a space:

$ find / -type f ( -name *conf -o -name *include ) 2>/dev/null | sed 's/ /\ /g'
/var/spool/postfix/etc/resolv.conf
/var/some file with spaces.conf
/var/www/vhosts/myserv.com/conf/last_httpd.include...


Now you can use a backtick to embed the results of that find command into a grep command. Using ` is different than | as it actually helps to build a command, rather than just manipulating its input. The -H option to grep tells it so show file names as well. So, now we will look for any reference to “smashingmagazine” in any conf files.

$ grep -H smashingmagazine `find / -type f ( -name *conf -o -name *include ) 2> /dev/null | sed 's/ /\ /g'`
/var/www/vhosts/smashingmagazine.com/conf/last_httpd.include: ServerName "smashingmagazine.com"...

This may take a few seconds to run. It is finding every conf file on the server and searching through all of them for “smashingmagazine”. It may reveal the DocumentRoot directly. If not, it will at least reveal the file where the ServerName or VirtualHost is defined. You can then use grep or less to look through that file for the DocumentRoot.

You can also use the xargs command to do the same thing. It also allows the output from one command to be embedded into another:

$ find / -type f ( -name *conf -o -name *include ) 2> /dev/null | sed 's/ /\ /g' | xargs grep -H smashingmagazine
/var/www/vhosts/smashingmagazine.com/conf/last_httpd.include: ServerName "smashingmagazine.com"...
$ grep DocumentRoot /var/www/vhosts/smashingmagazine.com/conf/last_httpd.include
DocumentRoot "/var/www/vhosts/smashingmagazine.com/httpdocs"

The end result, hopefully, is that you’ve definitively found the document root for your website.

You can use a similar technique for nginx. It also has a main conf file, usually in /etc/nginx/nginx.conf, and it can also include other conf files, however its document root is just called “root”.

Apache Control Interface

With Apache, there is yet another way to find the right conf file, using the apachectl or newer apache2ctl command with the -S option.

$ apachectl -S
VirtualHost configuration:
80.72.139.101:80 is a NameVirtualHost
default server default (/usr/local/psa/admin/conf/generated/13656495120.10089200_server.include:87)
port 80 namevhost default (/usr/local/psa/admin/conf/generated/13656495120.10089200_server.include:87)
port 80 namevhost www.smashingmagazine.com (/var/www/vhosts/smashingmagazine.com/conf/last_httpd.include:10)...

If this whizzes by too fast, you can try piping the results through grep. It won’t work, however, because grep only operates on stdout and for some reason apachectl outputs its information to stderr. So, you have to first direct stderr into stdout and then send it through grep. This is done by redirecting the error stream 2 into the output stream 1 with 2>&1, like this:

$ apachectl -S 2>&1 | grep smashingmagazine
port 80 namevhost smashingmagazine.com (/var/www/vhosts/smashingmagazine.com/conf/13656495330.08077300_httpd.include:10)

This also reveals the conf file which contains the DocumentRoot for this website. As above further grep or less will reveal the DocumentRoot.

Checking the Document Root

Now that you’ve found the document root, you can snoop around to make sure it’s alright. Change to the directory with cd:

$ cd /var/www/vhosts/smashingmagazine.com/httpdocs
bash: cd: /var/www/vhosts/smashingmagazine.com/httpdocs: No such file or directory


If you get the error message “No such file or directory”, that is bad news. Either the DocumentRoot has been incorrectly set or your whole website has been deleted. If it is there, you can list the files with ls. The -a also shows hidden files which start with a dot, and -l displays them in long format with permissions and dates:

$ ls -al
drwxrwxrwx  8 nobody  nogroup  4096 May  9 14:03 .
drwxr-xr-x 14 root    root     4096 Oct 13  2012 ..


Every folder will at least show these two entries. The single “.” is for the current directory and “..” is for the parent directory. If that’s all there is, then the directory is empty.

While you’re there, you can double-check you are in the correct place. Create a new file using echo and again using the > symbol to send the output to a file.

$ echo "<h1>My test file</h1>" > testfile.html

This will create a file called testfile.html containing a bit of HTML. You can use your browser or telnet or curl or wget to see if the file is where it should be.

$ curl http://www.smashingmagazine.com/testfile.html
<h1>My test file</h1>


If that worked, then well done, you have found your website! Remove that test file to clean up after yourself with rm testfile.html and keep going.

Back up and Restore

The tar and zip commands can be used to back up and restore. If your website is missing, then restore won’t help you much unless you have previously backed up. So go back in time and backup your data with one of the commands below. To go back a whole day:

$ gobackintime 86400
It is now Sat May 10 20:30:57 BST 2013

Just kidding — but it would be nice! The tar command stands for tape archive and comes from the days when data was backed up on magnetic tapes. To create an archive of a directory, pass the cfz options to tar which will create a new archive in a file and then zip it in the gzip format.

$ tar cfz backupfile.tgz /var/www/vhosts/smashingmagazine.com/httpdocs
tar: Removing leading `/' from member names

All Mac and Linux computers support the tar command and most also have zip. To do the same with zip:

$ zip -r backupfile.zip /directory/to/backup

To see what an archive contains, run:

tar tfz backupfile.tgz
var/www/vhosts/smashingmagazine.com/httpdocs/
var/www/vhosts/smashingmagazine.com/httpdocs/.htaccess...

Or for zip format:

unzip -l backupfile.zip
Archive:  test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2012-05-28 00:33   var/www/vhosts/smashingmagazine.com/httpdocs
      234  2012-05-28 00:33   var/www/vhosts/smashingmagazine.com/httpdocs/.htaccess...

Both tar and zip strip the leading slashes when they backup. So when you restore the files, they will be restored within the current directory. To restore them in the same location they were backed up from, first cd to /.

$ tar xfzv backupfile.tgz
var/www/vhosts/smashingmagazine.com/httpdocs/...

The “v” above stands for verbose and causes tar to show what it’s doing. zip has a similar option:

$ unzip -v backupfile.zip
Archive:  backupfile.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       0  Stored        0   0% 2012-05-28 00:33 00000000  var/www/vhosts/smashingmagazine.com/httpdocs/...

Website Errors

Let’s assume your website hasn’t actually disappeared. The next place to look is the error log file.

Finding the Log File

When using a server management package like Plesk, each website probably has its own log file. You can find it by grepping for the word “log” in the conf file you identified above. The -i means case-insensitive.

$ grep -i log /var/www/vhosts/smashingmagazine.com/conf/last_httpd.include
CustomLog /var/www/vhosts/smashingmagazine.com/statistics/logs/access_log plesklog
ErrorLog  "/var/www/vhosts/smashingmagazine.com/statistics/logs/error_log"...


There is also a server-wide log where any non-website-specific errors go. You can find this in the main conf file:

$ grep -i log /etc/apache2/apache2.conf
ErrorLog /var/log/apache2/error.log...

Htaccess Errors

It is very easy to screw up a website. You can quite readily bring down a very big website by removing a single character from the .htaccess file. Apache uses the file .htaccess to provide last-minute configuration options for a website. It is most often used for URL rewriting rules. They look like this:

RewriteRule   ^products/.*/([0-9]+)$   products/view.php?id=$1   [L,QSA]

This rule says to rewrite any URL in the form “products/widget-3000/123″ to the actual URL “products/view.php?id=123″. The L means that this is the last rule to be applied and QSA< means that Apache should attach any query string to the new URL. URL rewriting is often used for search engine optimization so that Web managers can get the name of the product into the URL without actually having to create a directory called “widget-3000″.

However, make a single typo and your whole website will give a 500 Internal Server Error.

The tail command will display the last 10 lines of a log file. Give it a -1 to display the single last line instead. An .htaccess problem will look like this:

$ tail -1 /var/www/vhosts/smashingmagazine.com/statistics/logs/error_log
[Thu May 06 11:04:00 2013] [alert] [client 81.106.118.59] /var/www/vhosts/smashingmagazine.com/httpdocs/.htaccess: Invalid command 'RewiteRule', perhaps misspelled or defined by a module not included in the server configuration

You can grep for all of these types of errors:

$ grep alert /var/www/vhosts/smashingmagazine.com/statistics/logs/error_log
[Thu May 06 11:04:00 2013] [alert] [client 81.106.118.59]...

PHP Parse and Runtime Errors

Many websites use the LAMP combination: Linux, Apache, MySQL and PHP. A common reason for Web pages not showing up is that they contain a PHP error. Fortunately, these are quite easy to discover and pinpoint.

There are two broad classes of PHP errors: parse errors and runtime errors. Parse errors are syntax errors and include leaving off a semicolon or forgetting the $ in front of a variable name. Running errors include undefined functions or referencing objects which don’t exist.

Like .htaccess errors, parse errors will cause an HTML response code 500 for Internal Server Error, often with a completely blank HTML page. Runtime errors will give a successful HTML response of 200 and will show as much HTML as they have processed (and flushed) before the error happened. You can use telnet or wget -S or curl -i to get only the headers from a URL. So now, copy and paste your erroneous page into a command:

$ curl -i http://www.smashingmagazine.com/products/widget-3000/123
HTTP/1.0 500 Internal Server Error
Date: Sun, 12 May 2013 17:44:49 GMT
Server: Apache
Vary: Accept-Encoding
Content-Length: 0
Connection: close
Content-Type: text/html

PHP Error Settings

To find the exact error, you need to make sure errors are being reported in the log file.

There are several PHP settings which cover errors. display_errors determines if errors are shown to the website visitor or not, and log_errors says whether they will appear in log files. error_reporting specifies the types of errors that are reported: only fatal errors, for example, or warnings and notices as well. All of these can be set in a configuration file, in .htaccess or within the PHP script itself.

You can find out your current settings by running the PHP function phpinfo. Create a PHP file which calls the function and visit it in your browser:

$ echo "<?php phpinfo()?>" > /var/www/vhosts/smashingmagazine.com/httpdocs/phpinfo.php

phpinfo
phpinfo function showing configuration settings.

The two columns show the website and server-wide settings. This shows that display_errors is off, which is good, because it should be off on live websites. It means that no PHP errors will ever be seen by the casual visitor. log_errors on the other hand should be on. It is very handy for debugging PHP issues.

The error_reporting value is 30719. This number represents bit flags or bit fields. This is a technique for storing multiple yes/no values in a single number. In PHP there are a series of constants representing different types of errors6. For example, the constant E_ERROR is for fatal errors and has the value 1; E_WARNING is for warnings and equals 2; E_PARSE is for parsing or syntax errors and has the value 4. These values are all powers of two and can be safely added together. So the number 7 means that all three types of errors should be reported, as E_ERROR + E_WARNING + E_PARSE = 7. A value of 5 will only report E_ERROR + E_PARSE.

In reality, there are 16 types of errors from 1 for E_ERROR to 16384 for E_USER_DEPRECATED. You can type “30719 in binary” into Google and it will give you the binary equivalent: 0b111011111111111. This means that all errors are switched on except the twelfth, which is E_STRICT. This particular setup has also been given a constant E_ALL = E_ERROR + E_WARNING + E_PARSE + etc = 30719. From PHP version 5.4.0, E_ALL is actually 32767 which includes all the errors include E_STRICT.

If your error_reporting setting is 0, then no errors will show up in the log file. You can change this setting in the file php.ini, but then you have to restart Apache to make it have an effect. An easier way to change this setting in Apache is to add a line in a file called .htaccess in your document root: php_value error_reporting 30719.

Or you can do that on the command line, using the double arrow which appends to an existing file or creates the file if it doesn’t exist:

$ echo "php_value error_reporting 30719" >> .htaccess
$ echo "php_value log_errors On" >> .htaccess

Refresh your erroneous Web page. If there is a PHP error in your page it should now show up in the error log. You can grep the log for all PHP errors:

grep PHP /var/www/vhosts/smashingmagazine.com/statistics/logs/error_log
[Sun May 12 18:19:09 2013] [error] [client 81.106.118.59] PHP Notice:  Undefined variable: total in /var/www/vhosts/smashingmagazine.com/httpdocs/products/view.php on line 10...

If you have referenced variables or array indices before assigning them values, you may see thousands of PHP notices like the one above. It happens when you do things like <? $total = $total + 1 ?> without initially setting $total to 0. They are useful for finding potential bugs, but they are not show stoppers. Your website should work anyway.

You may have so many notices and warnings like this that the real errors get lost. You can change your error_reporting to 5 to only show E_ERROR and E_PARSE or you can grep specifically for those types of errors. It is very common to chain grep commands together like this when you want to filter by multiple things. The -e option below tells the second grep to use a regular expression. This command finds all log entries containing “PHP” and either “Parse” or “Fatal”.

$ grep PHP /var/www/vhosts/smashingmagazine.com/statistics/logs/error_log | grep -e "Parse|Fatal"
[Thu Jul 19 12:26:23 2012] [error] [client 81.106.118.59] PHP Fatal error:  Class 'Product' not found in /var/www/vhosts/smashingmagazine.com/httpdocs/library/class.product.php on line 698
[Sun May 12 18:16:21 2013] [error] [client 81.106.118.59] PHP Parse error:  syntax error, unexpected T_STRING in /var/www/vhosts/smashingmagazine.com/httpdocs/products/view.php on line 100...

Seeing Errors in the Browser

If you are tracing a runtime error rather than a parse error, you can also change the error_reporting setting directly in PHP. And you can quickly turn display_errors on, so you will see the error directly in your browser. This makes debugging quicker, but means everyone else can see the error too. Add this line to the top of your PHP page:

<? ini_set ('display_errors', 1); error_reporting (E_ERROR | E_WARNING); ?>

These two functions change the two PHP settings. The | in the error_reporting call is a bit OR operator. It effectively does the same as the + above but operates on bits, so is the correct operator to use with bit flags.

Any fatal errors or warnings later in the PHP page will now be shown directly in the browser. This technique won’t work for parse errors as none of the page will run if there’s a parse error.

Bit Flags

Using bit flags for error_reporting avoids having 15 separate arguments to the function for each type of error. Bit flags can also be useful in your own code. To use them, you need to define some constants, use the bit OR operator | when calling the function and the bit AND operator & within the function.

Here’s a simple PHP example using bit flags to tell a function called showproduct which product properties to display:

<?
define ('PRODUCT_NAME', 1);
define ('PRODUCT_PRICE', 2);
function showproduct ($product, $flags) {
  if ($flags & PRODUCT_NAME) echo $product['name'];
  if ($flags & PRODUCT_PRICE) echo ': $' . $product['price'];
}
$product = array ('name'=>'Widget 3000', 'price'=>10);
showproduct ($product, PRODUCT_NAME | PRODUCT_PRICE);
?>

This will display “Widget 3000: $10″ in the browser.

Infinite Loops

PHP’s error reporting may struggle with one class of error: an infinite loop. A loop may just keep executing until it hits PHP’s time limit, which is usually 30 seconds (PHP’s max_execution_time setting), causing a fatal error. Or if the loop allocates new variables or calls functions, it may keep going until PHP runs out of workable memory (PHP’s memory_limit setting).

It may, however, cause the Apache child process to crash, which means nothing will get reported, and you’ll just see a blank or partial page. This type of error is increasingly rare, as PHP and Apache are now very mature and can detect and handle runaway problems like this. But if you are about to bang your head against the wall in frustration because none of the above has worked, then give it some consideration. Deep within your code, you may have a function which calls some other function, which calls the original function in an infinite recursion.

Debuggers

If you’ve gotten this far, and your page is still not showing up, then you’re entering more difficult territory. Your PHP may be executing validly and doing everything it should, but there’s some logical error in your programming. For quick debugging you can var_dump variables to the browser, perhaps wrapping them in an if statement so that only your IP address sees them:

<? if ($_SERVER['REMOTE_ADDR'] == '85.106.118.199') var_dump ($product); ?>

This method will narrow down an error but it is ungraceful and error-prone, so you might consider a debugging tool such as Xdebug or FirePHP. They can provide masses of information, and can also run invisibly to the user, saving their output to a log file. Xdebug can be used like this:

<?
ini_set ('xdebug.collect_params', 1);
xdebug_start_trace ('/tmp/xdebugtrace');
echo "This will get traced.";
xdebug_stop_trace();
?>

This bit of code logs all function calls and arguments to the file /tmp/xdebugtrace.txt. It displays even more information when there is a PHP notice or error. However, the overhead may not be suitable for a live environment, and it needs to be installed on the server, so it’s probably not available in most hosting environments.

FirePHP, on the other hand, is a PHP library that interacts with an add-on to Firebug, a plugin for Firefox. You can output debugging information and stack traces from PHP to the Firebug console.

Security Issues

By this point, you should have some HTML reaching your browser. If it’s not what you expect, then there’s a chance that your website has been compromised. Don’t take it personally (at first). There are many types of hacks and most of them are automated. Someone clever but unscrupulous has written a program which detects vulnerabilities and exploits them. The purpose of the exploit may simply be to send spam, or to use your server as part of a larger attack on a more specific target (a DDoS).

Server Hacks

Operating systems are very complex pieces of software. They may be built from millions of lines of programming code. They are quite likely to have loopholes where sending the wrong message at just the wrong time will cause some kind of blip which allows someone or something to gain entry. That’s why Microsoft, Apple, Ubuntu and others are constantly releasing updates.

Similarly, Apache, nginx, IIS and all the other software on a typical server is complicated. The best thing you can do is keep it up to date with the latest patches. Most good hosts will do this for you.

A hacker can use these flaws to log in to your server and engineer themselves a terminal session. They may initially gain access as an unprivileged user and then try a further hack to become the root user. You should make this as hard as possible by using good passwords, restrictive permissions, and being careful to run software (like Apache) as an unprivileged user.

If someone does gain access, they may leave behind a bit of software which they can later use to take control of your server. This may be detectable by an antivirus scanner or something like the Rootkit Hunter, which looks for anomalies like unexpected hidden files. But there are also a few things you can do if you suspect an intrusion.

The w command shows who is currently logged in to a server and what they are doing:

$ w
 20:44:32 up 44 days,  7:51,  2 users,  load average: 0.07, 0.03, 0.05
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    cpc1-brig17-2-0- 17:54    1:02m  0.15s  0.13s -bash
root     pts/1    cpc1-brig17-2-0- 20:44    0.00s  0.02s  0.00s w...

The last command shows who has logged in recently in date order. Pipe it through head to show only the first 10 lines.

$ last
paul     pts/0        :0.0             Sun May 12 17:21   still logged in
paul     tty7         :0               Sun May 12 17:20   still logged in
reboot   system boot  2.6.32-41-386    Sun May 12 17:18 - 20:48  (03:29)
fred     tty7         :0               Sat May 11 10:10 - down   (01:12)...

It tells you who has logged in and for how long, plus any terminal session they have open. down means until the server shut down. Look for unexpected entries and consult your host or a security expert if you are in doubt.

PHP Hacks

More common are hackers who gain entry though vulnerabilities in PHP scripts, especially popular content management systems like WordPress. Anybody can write a plugin for WordPress and, if it’s useful, people will install it. When writing a plugin, most developers think primarily about the functionality and little about security. And because WordPress allows file uploading, hackers who find vulnerabilities can use them to upload their own PHP scripts and later take control of a computer.

These PHP scripts can use the PHP mail function to send out spam on demand, but they can also try to execute commands in much the same way as you can via a terminal session. PHP can execute commands with its exec or system functions. If you do not need to use these functions, it is advisable to disable them. You can do this by adding the disable_functions directive to your server’s php.ini file (or php5.ini for PHP 5) or to the file php.ini within your document root. If you search for “php disable functions” in Google, you will find a whole list of functions which should be disabled in this way:

disable_functions=fpassthru,crack_check,crack_close...

A quick check you can make for this type of hack is to look for all PHP files modified recently and make sure there are no anomalies. The -mtime -1 option tells find to only consider files modified within the last day. There’s also -mmin for minutes. This command searches all websites within /var/www/vhosts for recently modified files ending in “php” or “inc”:

$ find /var/www/vhosts -mtime -1 ( -name *php -o -name *inc ) -printf "%t %h/%fn"
Sun May 12 21:20:17.0000000000 2013 /var/www/vhosts/smashingmagazine.com/httpdocs/products/view.php


PHP hacks are difficult to detect because they are designed to not stick out. One method hackers use is to gzip their PHP and then encode it as base64. In that case, you may have a PHP file on your system with something like this in it:

eval(gzinflate(base64_decode('HJ3HkqNQEkU/ZzqCBd4t8V4YAQI2E3jvPV8...


Another method is to encode text within variables and then combine them and evaluate them:

$unywlbxc = " uwzsebpgi840hk2a jf";
$hivjytmne = "  jqs9m4y 1znp0  ";
eval ( "m"."i". "croti"...

Both these methods use the PHP eval function, so you can use grep to look for eval. Using a regular expression with bevalb means that the word “eval” must have a word boundary before and after it, which prevents it being found in the middle of words. You can combine this with the find command above and pipe through less for easy reading:

$ find /var/www/vhosts -mtime -1 ( -name *php -o -name *inc ) | sed 's/ /\ /g' | xargs grep -H -e "bevalb" | less
/var/www/vhosts/config.php:eval(gzinflate(base64_decode('HJ3HkqNQE...


If you do find this type of hack in your website, try to discover how they got in before completely removing all the tainted files.

Access Logs

Along with error logs, Apache also keeps access logs. You can browse these for suspicious activity. For example, if you found a PHP hack inside an innocuous looking file called test.php, you can look for all activity related to that file. The access log usually sits alongside the error log and is specified with the CustomLog directive in Apache configuration files. It contains the IP address, date and file requested. Search through it with grep:

$ grep -e "(GET|POST) /test.php" /var/www/vhosts/smashingmagazine.com/statistics/logs/error_log
70.1.5.12 - - [12/May/2013:20:10:49 +0100] "GET /test.php HTTP/1.1" 200 1707 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686;...

This looks for GET and POST requests for the file test.php. It provides you with an IP address, so you can now look for all other access by this address, and also look for a specific date:

$ grep 70.1.5.12 /var/www/vhosts/smashingmagazine.com/statistics/logs/error_log | grep "12/May/2013"
70.1.5.12 - - [12/May/2013:20:10:49 +0100] "GET /products/view.php?something HTTP/1.1" 200 1707 "-"...
70.1.5.12 - - [12/May/2013:20:10:49 +0100] "GET /test.php HTTP/1.1" 200 1707 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686;...

This kind of debugging can be very useful for normal website errors too. If you have a feedback form on your website, add the user’s IP address to the message. If someone reports an error, you can later look through the logs to see what they have been up to. This is far better than relying on vague second-hand information about reported problems.

It can also be useful for detecting SQL injection attacks, whereby hackers try to extract details from your database by fooling your database retrieval functions. This often involves a lot of trial and error. You could send yourself an email whenever a database query goes wrong and include the user’s IP address. You can then cross-reference with the logs to see what else they have tried.

Last Resorts

William Edward Hickson is credited with popularizing the saying:

“If at first you don’t succeed, try, try, try again.”

Hickson was a British educational writer living in early Victorian times. His advice is not appropriate for the modern Web developer, lying in bed on a Saturday morning, drowning in frustration, staring at a blank Web page, preparing to chuck an expensive laptop against a brick wall.

You’ve now been through all the advice above. You’ve checked that the world hasn’t ended, verified your broadband box, tested the Internet and reached your server. You’ve looked for hardware problems and software problems, and delved into the PHP code. But somehow or other, your Widget 3000 is still not there. The next thing to do is…

Have Breakfast

Get out of bed and take your mind off the problem for a little while. Have some toast, a bowl of cereal, something to drink. Maybe even indulge in a shower. Try that new lavender and citrus shampoo you bought by mistake. While you’re doing all this, your subconscious is busily working on the website issue, and may unexpectedly pop a solution into your thoughts. If so, give it a try. If not…

Ask for Help

Check the level of support that you are entitled to by your hosting company. If you are paying $10 per month, it’s probably not much. You may be able to get them to cast a vague glance in your direction within the next 72 hours. If it’s substantially more, they may log in and have a look within the next few minutes. They should be able to help with hardware or software issues. They won’t help with Web programming issues. Alternatively, ring a colleague or freelancer. If you are still stuck…

Prepare

…to release some nervous energy. Find one of those squidgy balls that you can squeeze mercilessly in your hands, or a couple pencils to use as drumsticks, or a pack of cigarettes and a pot full of coffee. And then try the last resort to any computing problem…

Reboot

When your laptop or desktop goes wrong, a common solution is to reboot it. You can try the same trick on your Web server. This is a quite risky. Firstly, it may not solve the problem. If it’s a PHP error, then nothing will change. If, however, your issue is caused by some obscure piece of software becoming unresponsive, then it may well help, though it may not fix the problem permanently. The same thing may happen next week.

Secondly, if the reboot fails then you will be really stuck. If the server shuts down but fails to start back up again, then someone may have to go and press the power button on the physical machine. That someone is an employee of your hosting company, and they may be enjoying their breakfast too, in a nice comfortable office somewhere. They may have left their jumper at home. They may not want to enter the air-conditioned bunker where all the servers are kept. You will be thoroughly dependent on their response time.

Given all the risks, the command is:

$ sudo /sbin/reboot
Broadcast message from admin@thisserver.com (/dev/pts/1) at 13:21 ...
The system is going down for reboot now.

The reboot command is really just a wrapper for /sbin/shutdown -r now. It causes the server to shut down and then restart. That may take a few minutes. Soon after issuing the command above your SSH session will come to an abrupt end. You will then be left for a few nervous minutes wondering if it will come back up again. Use the tools you prepared above.

While you are waiting, you can issue a ping to see if and when your server comes back. On Windows use ping -t for an indefinite ping:

$ ping www.smashingmagazine.com
PING www.smashingmagazine.com (80.72.139.101) 56(84) bytes of data.
Request timeout for icmp_seq 0
Request timeout for icmp_seq 0
Request timeout for icmp_seq 0...
64 bytes from www.smashingmagazine.com (80.72.139.101): icmp_seq=1 ttl=52 time=39.4 ms
64 bytes from www.smashingmagazine.com (80.72.139.101): icmp_seq=1 ttl=52 time=32.4 ms...

You can breathe a sigh of relief when ping finally responds. Wait a couple more minutes and you’ll be able to use ssh again and then try to view the Widget 3000 in your Web browser.

Conclusion

This has been an epic journey, from the end of the world to a single misplaced character in a file. Hopefully, it will help you through the initial few minutes of panic when you wake up one morning and the beautiful product page you created last night is gone.

Some of the reasons and solutions above are very rare. The most likely cause is simply a slight malfunction in your broadband box. Running out of disk space or getting hacked are the only other things that are in any way likely to happen in the middle of the night when nobody else is working on the website. But throw in other developers, server administrators and enthusiastic clients — and anything is possible. Good luck!

Footnotes

1. Oxford Dictionary of Quotations (3rd edition), Oxford University Press, 1979

Paul Tero’s chapter has been reviewed by Ben Dowling7 and Sergey Chikuyonok8.

Pre-Order Your Copy Today!

New Perspectives on Coding9We hope you’ve enjoyed this sample chapter of the Smashing Book #4! We would sincerely appreciate your support with an occasional tweet10, Facebook post or Google+ update, or just a word to your friends and colleagues! You can also learn more about the Smashing Book #411 first. Again, thank you for all your support!

(og) (il) (vf) (ea) (cm)

↑ Back to topShare on Twitter

Paul Tero is an experienced PHP programmer and server administrator. He developed the Stockashop ecommerce system in 2005 for Sensable Media. He now works part-time maintaining and developing Stockashop, and the rest of the time freelancing from a corner of his living room, and sleeping, eating, having fun, etc. He has also written numerous other open sourcish scripts and programs.

  1. 1

    WHY WOULD YOU EVEN…?!

    What’s next? assembly language tutorials?

    0
  2. 2

    Edgars Zagorskis

    July 30, 2013 1:42 pm

    Oh my, this is a really surprisingly large and pleasantly useful article. It’s very nice to see that people still care about console(terminal/prompt).
    I could just add that uncaught errors may ruin the website reputation so people should not forget about error pages or at least install some nifty error handler like “Whoops”.

    0
  3. 3

    I find this article a bit confusing. First off – who runs a web server but doesn’t know where the document root is? I was perplexed that someone, somewhere, has had this issue? If you’ve got as far as setting up a LAMP box, surely you can sort ‘finding a folder’ out. Also, why no mention of locate? If you keep the database updated, or you’re not searching for new files, it’s faster than find.

    There’s mention of ‘whizzing by too quickly’ but no written example of how to pipe to less or similar?

    Backing up web content is fine, it wouldn’t have hurt to include how to do a basic MySQL dump too.

    A lot of people don’t know about live tailing (tail -f) which is great for debugging – would have been nice to see that mentioned – even by grepping a log to tail.

    “A common reason for Web pages not showing up is that they contain a PHP error.” – So are MySQL errors, bad htaccess files, problems with Apache and so on – your text makes it sound like PHP is the ‘usual culprit’ when it’s really not. Also you interchange ‘runtime’ and ‘running’ errors. The PHP error section seems to lack simple practical advice such as an example of the command you can pop in to someone else’s code to get errors displaying: error_reporting(E_ALL). Again – why not grep for the desired error types and pipe to tail -f? Also, bit flags could have been explained a bit better.

    Echoing multiple lines to a single file always seems pointless to me – why not teach people about vi, vim or nano?

    “hackers who find vulnerabilities can use them to upload their own PHP scripts and later take control of a computer.” – Ugh, really?

    ‘Turn it off and back on again’ – why don’t you detail things such as restarting services? Why go straight to rebooting the entire server?

    I came to this article excited to see something on Smashing Magazine about coding, settled down on the sofa and read from start to end. I’m never normally this critical but this article bugged me – it seems to have no real purpose and makes huge conclusions. Website broken? Find the document root (an enormous task, apparently), turn all your error logs on – nothing? You’ve been hacked. Probably PHP. Your computer will shortly be taken over.

    This article had a lot of potential but it seems to have been hastily written, compiled from brief snippets that came from a Google search for ‘Common Website Problems’.

    C’mon SM – This is far below standard. I was excited about the book, until this.

    0
    • 4

      I was excited about this topic too and would like to see more articles like this. You have to start somewhere and this is a good overview of some foundational commands. Dan made some good points and they would be a great outline for a follow up article. Testing and restarting individual services would be useful as that has largely been the issue in my experience.

      0
    • 5

      Actually, it took a very long time to arrange and write and has been through a thorough review process. It is an ambitious article trying to cover a bit of everything, and already well over the maximum number of words it was supposed to be. It also isn’t really aimed at experts – more at beginners to intermediates, to give them a taste of all the sorts of things that can go wrong. It’s impossible to write an article which gives the right amount of depth to all levels of readers.

      A long time was spent on finding the document root, because I wanted the whole article to be conceivably manageable by someone who had never seen the command line before. I imagine that most people who read this article are not setting up a LAMP box themselves, but instead have a virtual or dedicated server where everything is already in place. They may not know the simple stuff. This section also allowed me to introduce a lot of other command line techniques. So the section was about more than just finding the document root.

      Due to length I had to pick and choose what to put in. I did include parts about piping and less and tail. I didn’t think of the tail -f command at the time – that’s a good suggestion. I included bits about finding and killing errant processes. Going into more detail and describing how to restart individual processes (nginx, apache, mysql, etc) would have been a good idea, but added a lot to the length and research time.

      I’m sorry you don’t like the general theme or the bit about hackers. I’ve helped to fix at least one Drupal and one WordPress website where malicious scripts were uploaded, precisely to provide a back door for later, so someone could use the computer to send SPAM or gain access to other computers. So I don’t think that’s an unrealistic statement.

      Anyway – thanks for the comments, even though they were very critical, there are some good suggestions which we may be able to incorporate before publishing.

      0
    • 6

      I thought this article was excellent. From what I can tell it was in no way meant to cover every possible aspect of the subject, especially since there are a million ways to do one thing on a computer/server. It uses situations that may not be real world, but give the author the chance to show different commands in action.

      As for the less command, that’s explained in part one and then just referenced from there on. Did you read both parts of this 2 part article before leaving an (in my opinion) overly critical review?

      From what I can see from your comment you seem upset that the author didn’t use commands that you yourself are used to using. Again, there are many, many ways to accomplish tasks on a server and no one way is more correct then another, it’s just preference at that point.

      As for vi or vim, I’m pretty sure someone could write (and maybe has) a small book on that alone…

      I usually don’t respond to articles or comments, but after reading this article and having my interest on the subject peaked so intensely I couldn’t help but respond to your comment. I think the problem is that this article was written to give people a taste of the subject and you were looking for a whole meal with enough left over to eat the next day. I know I’ll be using this information as a starting point to learn more on the subject. Thank you very much to the author, awesome article, very informative and well written.

      0
  4. 7

    Window graphics

    July 31, 2013 6:29 pm

    This could be helpful and a developer like me is very thankful to have this article. I can use it for debugging and fix some error…

    Cheers!

    0
  5. 8

    Bookmarked!

    0
  6. 9

    Skye Media Group

    August 2, 2013 9:22 am

    Very helpful and informative.

    0
  7. 10

    I see as only 30% of this post is Useful to less than 10% of Smashing’s magazine TOTAL audience (programmers, sys admins?)

    Like posting a “House foundations for home construction tutorial” on a feng-shui forum

    0

Leave a Comment

Yay! You've decided to leave a comment. That's fantastic! Please keep in mind that comments are moderated and rel="nofollow" is in use. So, please do not use a spammy keyword or a domain as your name, or else it will be deleted. Let's have a personal and meaningful conversation instead. Thanks for dropping by!

↑ Back to top