Cloud Backup

This is the third attempt at developing a complete backup solution that works for every major platform that matters. Over the years - yes, years - a lot of hard lessons have been learned.

Welcome to Cloud Backup.

What To Look For In Backup

From experience, here is what you should look for as the baseline of a legitimate, modern backup solution:

The Cloud Backup software that you can download from this site does the above. If your current backup solution doesn't do everything in the list above, then it's time to try something else that does. I don't care what you use as long as it offers all of the above.

Contributing

Want to help out? Check out the official GitHub repository.

Donating financially helps keep this project alive.

Getting Started

To get started, download the latest Cloud Backup release:

Download cloud-backup-1.0rc3.zip

Extract the files to a new directory on your computer.

If you do not have PHP installed, then download and install the command-line (CLI) version for your OS. Cloud Backup is written in PHP and therefore requires PHP to be somewhere on the system to function. Windows users may find this GitHub repository useful.

From a command-line, run:

php configure.php

You will be asked a series of questions that will configure the backup. The configuration tool may be re-run at any time - although some options such as service selection can't be changed. Be sure to take advantage of the e-mail notification and file monitoring features.

Cloud Backup currently supports the following services:

After the backup has been configured, run it:

php backup.php

If you encounter any problems, you can test e-mail notifications and service connectivity respectively with these two commands:

php test_notifications.php
php test_service.php

Once the first backup completes, be sure to verify that it is functioning properly by running:

php verify.php

Once everything about the backup looks good, which might take several days of running manual backups and verifications, use your system's built-in task scheduler to run 'backup.php' on a regular basis. Under Windows, use Task Scheduler. Under most other OSes, use cron.

Repeat the whole process above for another backup location. You should have one installation for an on-site backup (e.g. an attached hard drive) and one installation that uses an off-site cloud backup service. If the location where the backup tools are located is in the backup path, be sure to exclude each tool or else they will constantly back up the others' cached files.

Go into the directories where you extracted each backup. Locate the file called 'config.dat'. This is a plain text JSON file containing your backup configuration, but, more importantly, it also contains your encryption keys. Without the file, the backup data is useless. Copy the files to a couple of external thumbdrives and put those thumbdrives somewhere safe. A safe-deposit box at a bank and a decent hiding place at home/work can do wonders here. If your computer is ever infected with malware or your residence burns down or floods or it floats off into space (use your imagination for that one), you can still recover your digital life.

Hooray! Your brand new Cloud Backup installations are set up. But before you do your happy dance, add a reminder to your calendar to tell yourself to verify your backups on a monthly basis. Once a month, manually run 'php verify.php' to make sure each backup is still working properly. It takes just a couple of minutes but saves yourself massive headaches down the road should you ever need to use the backup to recover your data.

Restoring Data

In the event that data needs to be restored from the backup, I recommend performing this two-step process:

php verify.php
php restore.php

The verification spot-checks the backup one time prior to using the backup and displays vital statistics about the files database that tracks details about the directories and files in the backup. If you follow the recommended monthly checkup, this will simply be one extra verification that all is still well.

After retrieving the information for a specific backup, 'restore.php' asks which backup to load and, once loaded, presents a shell-like command-line interface to the backup. This interface is extensible but the initial commands available are:

Depending on how much data is being restored, the process can, of course, take a while.

Backup Scalability and Performance

One of the things most people want to know is, "How fast is it?" When it comes to moving large quantities of data, performance becomes important. The #1 thing to keep in mind is that this backup system does both transparent compression and two rounds of encryption of the data being backed up. In PHP. So let's take a look at peak performance with my own personal setup:

The worst-performing component in that mix is the external hard drive to which data was written. The measured write speed of the drive varies fairly wildly. One moment it'll be plugging along at 25MB/sec and the next it will inexplicably plummet to 5MB/sec. I'm not all that surprised given it is a hard drive I bought at a bargain basement price and it's primary purpose is longer-term storage rather than heavy-duty use.

An initial backup using the 'local' option during 'configure.php', resulted in the following useful stats:

The second run performed 24 hours later, resulted in the following useful stats for the first incremental:

All-in-all, this is a very solid showing for a backup system with anti-malware software fully enabled and no exclusions applied to PHP. Obviously, the cloud service portions of this tool have much longer, slower times - taking days to move the same amount of data over a network that might also have monthly data caps applied.

One thing that could have sped up performing the initial backup would have been to temporarily disable anti-malware software or exclude 'php.exe' from file scans. Since I didn't do that, every file that was opened had to be checked by the anti-malware software, which took significant time away from the backup itself. However, it was a more realistic test as a result.

Defragmenting

Unless you have lots of small files being backed up that experience dramatic changes daily, you should only occasionally defragment the backup. A good rule of thumb is to defragment the backup once a year. Defragmentation only affects shared blocks. Non-shared blocks are self-defragmenting.

To defragment a backup, manually run:

php backup.php -d

In the How It Works section below, you can get more details on how the backup system works with regards to shared blocks. As smaller files are added, removed, and changed, the shared block numbers they point to also change. This, over time, implicitly fragments shared blocks that were created earlier. Each shared block still contains the original data but fewer and fewer references to the shared block will exist.

The defragmentation procedure determines if a shared block has space available greater than two times the small file limit (default 2MB) and then both schedules the shared block for deletion and removes the associated files from the database. The rest of the backup then proceeds normally, which perceives the aforementioned deleted database entries as new files, which will be placed into new shared blocks. The end result is an incremental that eventually makes fairly significant changes once it merges into the base. How long that takes, of course, depends on both how many incrementals are kept around and the frequency of backups.

How It Works

This section gets into the technical details of how Cloud Backup functions behind the scenes. It is generally a good idea to have a basic understanding of how any system works under the hood should anything ever go wrong. Backup systems are not perfect - some are better than others depending on the type(s) of data being backed up. This backup system is quite different due to its target objective: Backup data to remote, possibly untrusted hosts over the Internet (i.e. the cloud).

Let's talk about files for a bit. Backing up directory names and symbolic links are extremely minor bits of information. They are important, sure, for maintaining structure, but they occupy little space and are relatively unimportant. Files, on the other hand, are where data is stored. That data is what is important to people like yourself and you expect a backup system to take good care of that data.

Files mostly come in two main types: Plain text and binary. When I'm talking plain text, I mean a file you can open up in Notepad or another text editor. However, text files are really just a special case of a binary file and, from a backup system perspective, all files should be treated as binary, opaque data.

Files come in all sizes. You've got small files, big files, zero-byte files, and everything in-between. A backup system should handle all sizes of files. The most challenging file sizes are those over 2GB due to 32-bit limitations and...thousands of tiny files.

Ever try transferring 1,000 files over to another computer across a network, especially (S)FTP? It's pretty slow. Even more baffling to some people is they will transfer a single file that exceeds the total size of the 1,000 separate files over the same network and it will complete in a fraction of the time. This is a repeatable problem. The issue is one of coalescence. This brings us back to Cloud Backup. Suffice it to say, sending a zillion little tiny files to a cloud storage provider would take forever. To solve this problem, Cloud Backup uses a block-based strategy to solve this and other problems with sending data over a network.

Cloud Backup has two types of blocks: Shared and non-shared. During a backup, the following logic is used:

If you look at the scalability and performance section, you can see the impact that this has: About 275,000 fewer network requests are made! Gathering the smaller files first into larger files makes a massive difference.

Cloud Backup uses the standard CubicleSoft two-step encryption method to extend the block size to a minimum of 1MB. Anyone who wants to reverse-engineer the dual encryption keys has to repeatedly decrypt 1MB of data twice, which is many times more difficult than baseline AES encryption. Even if AES is ever fully broken, your data is still probably safe and secure from prying eyes. The data being encrypted is surrounded with random bytes - so that even the same input data results in completely different output - and includes a size and a hash for verification purposes. The data is also padded with random bytes out to the nearest 4096 byte boundary (4K increments). This helps make it that much more difficult for an attacker to guess what a file might contain.

Cloud Backup names files in a mostly opaque manner. However, there are reserved blocks and blocks are stored in the target in specific ways. For example, '0_0.dat' is the encrypted, compressed 'files.db' SQLite database file. '0_0.dat' is read as block 0, part 0. For most blocks, they will have one part and counting starts at 0.

The default upper limit on the size of a block part in Cloud Backup is 10MB. This limit exists for a number of reasons, but mostly to keep RAM and network usage down. In order to decrypt a block part, it has to be loaded completely into RAM. Due to how PHP works, there might be 2-3 copies of the block part at any given point in time when it is being read/written, which translates to about 30MB RAM. Throw in not wanting to waste transfer limits with failed uploads, 10MB is a decent default limit. The configuration file can be modified to change the limits if your needs are different but, generally-speaking, the default setting is a good enough starting point for most people.

© CubicleSoft