WebCron Site Backup Documentation

[NOTICE: This WebCron module never really reached a release state. It was a first-pass attempt at writing a backup system for web hosts and only provided half of a complete solution (it backed stuff up but restoring things from the backup became too complicated). A much better backup solution was developed later.]

The WebCron Site Backup module is an additional module for WebCron that allows users to perform regular incremental backups of files and/or MySQL databases of a web server, generate and send reports via e-mail, offers a basic host-based intrusion detection system (IDS) that can provide early warning that a website has been compromised or been broken, and offers peace of mind. WebCron Site Backup is a separate product from Barebones CMS and can be used to backup websites on just about any web host that has at least PHP 5.2.x available. You do not need to use Barebones CMS to use this tool! This is an officially recognized module and therefore some kind soul in the support forums will help you if you need it.

NOTE: Currently there is only support for backing up files and MySQL databases. This module does not have a counterpart (yet) to completely restore a website from a backup. Restoring files is fairly simple but restoring MySQL data is a bit more complex since this tool does not use 'mysqldump' to extract the data. This tool also doesn't have a roll up feature (yet), which would help with reducing the amount of time required to restore a backup.

The WebCron Site Backup module is built for performance and to be robust even for large websites where databases can easily exceed millions of rows with hundreds of thousands of files while attempting to conserve bandwidth and CPU usage. This module performs backups of both small and large websites alike using compressed and encrypted data streams.

If I had to compare this module to something, it would be the love child of rsync, mysqldump, an e-mail report generator, and a PHP enabled web server.

This documentation covers everything you need to know about backing up a website using the WebCron Site Backup module.

License

Like Barebones CMS and WebCron, the WebCron Site Backup module is dual-licensed under a MIT or LGPL license - your choice. The license and restrictions are identical to the Barebones CMS License.

If you find the WebCron Site Backup module useful, financial donations are sincerely appreciated and go towards future development efforts.

Installation

Installation of the Site Backup module is fairly straightforward. The installation procedure is as follows:

Okay, so it isn't as simple as a Barebones CMS installation. The WebCron Site Backup module is actually a fairly versatile WebCron client/server pair. With versatility comes complexity and, to fully leverage this module, you are going to have to either write some PHP code or settle for ridiculously long command-lines.

The rest of this documentation covers both methods as well as tips on improving overall performance.

A word of warning: It is easy to forget that a WebCron client only works with the matching server pair. This does, unfortunately, require having multiple instances of the WebCron client if you are backing up multiple websites.

Using The Command Line

The WebCron Site Backup module (wc_backup) can be run via standard WebCron command-line options. However, to backup a website requires usernames and passwords for various resources as well as source and destination paths. This typically creates ridiculously long command-lines. Ridiculously long command-lines may not work under some OSes (e.g. Windows) and, if you move to another computer, setting up the command-line again may be difficult. In addition, the command-line route has slightly fewer features available than the configuration file route.

However, even if you go the configuration file route, you should familiarize yourself with the command-line options that are available. There are similarities between both methods because the command-line builds an internal configuration and then executes it the same way as the configuration file route.

Without further ado, here are the command-line options for the 'wc_backup' module:

Let's be honest: When I started building this tool, I didn't expect to have so many options. If you want to type all those options out, have fun. It works but I don't recommend doing it. The only option you probably even care about is 'wc_backup_cfg', which says to use a configuration file.

Using A Configuration File

The configuration file route has several advantages over the command-line route: The command-line to execute the configuration file is shorter, backing up a configuration file is a lot easier to do, and there are more features available. The downside is you have to write some PHP code to build a configuration file. However, you can copy and paste from the examples and be up and running in very little time.

Example:

<?php
	$backup_info["www.website.com"] = array(
		"files" => array(
			"/path/from/webcron/admin/" => "D:/Backup/website.com/",
			"/another/path/" => array(
				"destdir" => "D:/Backup/website.com-extra/",
				"backup" => true,
				"single" => true
			)
		),
		"mysql" => array(
			"root" => array(
				"server" => "localhost",
				"username" => "root",
				"password" => "*******",
				"destdir" => "D:/Backup/website.com/",
				"backup" => true,
				"single" => false,
				"compress" => true,
				"info" => array(
					"dbname.readonly_dbtable" => "once",
					"dbname.huge_dbtable" => "incremental",
					"dbname.dontcare_dbtable" => "never"
				)
			)
		),
		"report" => array(
			"complete" => array(
				"smtpserver" => "smtp.gmail.com",
				"smtpsecure" => true,
				"pop3server" => "pop.gmail.com",
				"pop3secure" => true,
				"username" => "myemail@gmail.com",
				"password" => "********",
				"from" => "myemail@gmail.com",
				"to" => "webmaster@website.com",
				"always" => false,
				"fileexts" => array(
					"*" => true,
					"jpg" => false,
					"gif" => false,
					"png" => false
				),
				"mysqltypes" => array(
					"*" => true,
					"events" => false
				)
			)
		)
	);
?>

The example above demonstrates all of the available options. Each option has a command-line equivalent. See the previous section for details on each option.

Example site backup using a configuration file.
Example site backup using a configuration file.

Once a configuration file has been created, it can be run by executing a command like from within the location where the WebCron client resides:

php index.php -v -m=wc_backup -wc_backup_cfg=backup_info.php

Assuming the example configuration and command, this tells the WebCron client to run the 'wc_backup' (Site Backup) module with verbose mode enabled. The 'wc_backup_cfg' option tells the module to load 'backup_info.php' in the same directory as the configuration file. Then the module uses the configuration settings that 'backup_info.php' sets and backs up files and directories, MySQL databases, and generates and e-mails a report containing a list of all files changed since the last backup except for images and all MySQL database changes except events.

In the image above, approximately 2.4GB of data was transferred even though only 350MB of bandwidth was used and all the data was transferred across 41 different requests. The difference is because of compression of the data that was transferred. The WebCron client only reports actual bandwidth used. I have a test database with millions of rows and significant file sets to test against, so this is a fairly realistic test result for a medium-sized website performing an initial backup. Future backups are typically significantly smaller as they only download changed files and table data. In essence, an incremental backup.

Example e-mail report generated by Site Backup.
Example e-mail report generated by Site Backup.

Note that e-mail reports are sent from the client, not the server. This is so that the server module doesn't become an open e-mail relay. This does introduce some difficulties as some ISPs block port 25, which is the standard port on which most SMTP servers reside. They do this in an attempt to curb spam going out from their network. Can't really blame them - I've heard that at least 25% of all personal Windows PCs are zombies, of which zombie PCs account for at least half of the spam out there and I've seen estimates that up to 80% of spam originate from zombie PCs. If attempting to send e-mail from a mail server that is behind a firewall, obvious issues may arise there. It may be prudent to use a third-party e-mail provider (e.g. GMail) in such cases, but doing so exposes part of a website's structure to the third-party. Reporting, however, is really important to detect issues early and be able to actively diagnose and squash them before they become serious problems.

Once the configuration file is built, use the built-in scheduler in your OS to schedule the backup to run regularly (e.g. nightly). I like to backup to a drive that is also backed up to an external hard drive so that I've got two backups on hand just in case.

Database Optimizations

The MySQL portion of a Site Backup configuration file can describe how to download the table data. Determining if a file has changed is pretty easy but how does one go about detecting changes to a MySQL table and only download the parts that have changed? It is a lot more difficult than you might think. As a result, there are five different table synchronization modes that are supported by the Site Backup module:

The default mode can be changed for either all tables in a single database or all databases and tables by using "dbname.*" or "*.*" respectively. The default mode is good enough for backing up most small websites.

However, for large tables that change frequently, downloading all the table data is going to be a burden on bandwidth, CPU, and hard drive space. This is where the "incremental" option comes in handy but it does take a bit of doing to set it up. While the MySQL binary log would perhaps be the best way to do incremental updates, it is not guaranteed to be enabled on the host and there is no way to query it from within MySQL itself (i.e. generally requires root privileges on the system). So, I opted for the next best thing, which is probably the correct way to do things anyway. To use "incremental" mode, the host must be running some relatively recent version of MySQL (at least 5.1.6 is required) that supports what are known as "database triggers". You will also have to modify the existing table and create a new table. The MySQL user account needs the "TRIGGER" privilege and, depending on the MySQL server settings, possibly the "SUPER" privilege. There are a few restrictions on the table being considered for use with "incremental" mode. The existing table:

Assuming you have a viable table for "incremental" mode, run SQL statements similar to the following:

/* Add a timestamp for inserted and updated rows. */
ALTER TABLE 'tablename' ADD COLUMN 'wc_backup_ts' TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;

/* Create a table to store deleted row information. */
CREATE TABLE 'tablename_deleted' (id int(11) NOT NULL, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP, key ts (ts));

/* Create a trigger that inserts a row into the table. */
/* Also deletes old rows in the deletion table. */
DELIMITER |
CREATE TRIGGER 'tablename_deleted_trigger' BEFORE DELETE ON 'tablename'
FOR EACH ROW
	INSERT INTO 'tablename_deleted' SET id = old.id;
	DELETE FROM 'tablename_deleted' WHERE ts < DATE_ADD(NOW(), INTERVAL -30 DAY);
END;
|
DELIMITER ;

Where "tablename" is the name of the table and "id" is the primary key of the table. So adjust accordingly. This approach is only slightly invasive into the primary table and shouldn't affect most applications. These statements allow all INSERT, UPDATE, and DELETE statements against the table to be accurately tracked entirely within the MySQL server.

Once all these hurdles have been overcome and the table set up for "incremental" mode, then the table can be set to "incremental" in the configuration for the backup. Note that the incremental MySQL backup automatically ignores the "tablename_deleted" table (automatically set to "never") while downloading content.

In the event that triggers are not available or not desired for whatever reason, the application that uses the table will need to be modified to accommodate the change. Any row that gets deleted will need to have the primary key inserted into the deleted table.

For further information about how to use this tool - questions, tips, and consternations are welcome in the support forums.

© CubicleSoft