This article describes one method of automating the copying of data and configuration files from one server to another in thoery. If you have enough knowledge on Linux systems it will be a matter of a few hours to deploy this scenario. In its simplest form, synchronizing the data on two (or more) servers is just a matter of copying files from one server to another. One server acts as a primary repository for data, and changes to the data can only be made on this server (in a high-availability configuration, only one server owns a resource at any given point in time). A regularly scheduled copy utility then sends the data after it has been changed on the primary server to the backup server so it is ready to take ownership of the resource if the primary server crashes. In a cluster configuration all nodes need to access and modify shared data (all cluster nodes offer the same services), so you will probably not use this method of data synchronization on the nodes inside the cluster. You can, however, use the method of data synchronization described in this article on highly available server pairs to copy data and configuration files that change infrequently.
The open source software package rsync that ships with any Linux distribution allows you to copy data from one server to another over a normal network connection. Rsync is more than a replacement for a simple file copy utility, however, because it adds the following features:
- Examines the source files and only copies blocks of data that change. Actually, rsync does a better job of reducing the time and network load required to synchronize data because it creates signatures for each block and only passes these block signatures back and forth to decide which data needs to be copied.
- (Optionally) works with the secure shell to encrypt data before it passes over the network.
- Allows you to compress the data before sending it over the network.
- Will automatically remove files on the destination system when they are removed from the source system.
- Allows you to throttle the data transfer speed for WAN connections.
- Has the ability to copy device files (which is one of the capabilities that enables system cloning as described in the next chapter).
- Online documentation for the rsync project is available at http://samba.anu.edu.au/rsync (and on your system by typing man rsync).
rsync can either push or pull files from the other computer (both must have the rsync software installed on them).
Note: The Unison software package can synchronize files on two hosts even when changes are made on both hosts. See the Unison project at http://www.cis.upenn.edu/~bcpierce/ unison, and also see the Csync2 project for synchronizing multiple servers at http://oss.linbit.com.
Because we want to automate the process of sending data (using a regularly scheduled cron job) we also need a secure method of pushing files to the backup server without typing in a password each time we run rsync. Fortunately, we can accomplish this using the features of the Secure Shell (SSH) program.
Open SSH 2 and rsync
The rsync configuration described in this article will send all data through an Open Secure Shell 2 (henceforth referred to as SSH) connection to the backup server. This will allow us to encrypt all data before it is sent over the network and use a security encryption key to authenticate the remote request (instead of relying on the plaintext passwords in the /etc/passwd file).
A diagram of this configuration is shown below
The arrows in above figure depict the path the data will take as it moves from the disk drive on the primary server to the disk drive on the backup server. When rsync runs on the primary server and sends a list of files that are candidates for synchronization, it also exchanges data block signatures so the backup server can figure out exactly which data changed, and therefore which data should be sent over the network.
On a next article we will see how to secure data transport, ssh encryption keys and how to establish the two way trust relationship.