Home » Articles » Using Unison for remote backup (2004-29-05)

Using Unison for remote backups

Introduction

Unison is a file synchronizer i.e. it can efficiently synchronize (or copy) files or directories between remote hosts (works locally too). You might have heard about rsync which pioneered the algorithm used by unison for efficiently synchronizing content of 2 files. Without going into technical details, both rsync and unison try to send only the differences between the files. In many cases (especially for incremental backup) it saves a lot of network traffic and speeds up the operation. Unison offers 2 main advantages over rsync:

Using Unison for maintaining web sites.

Now that you know what unison does, what problems can it solve? This website is hosted on a Linux server I rent from a hosting company. I develop it on my local Windows machine. What I need is an easy and efficient way of deploying my changes. In this case deploying means copying all the files from my hard-drive to a location on my server.

I could just zip the directory and copy it using ftp or sftp (secure ftp protocol - like ftp except traffic is encrypted using ssh protocol). That easy but not efficient.

In the past I kept the files under version control system (cvs). I would make the changes locally, check them in into cvs repository and update the server copy. That is more efficient but I quickly discovered that overhead of managing cvs was very annoying. It wasn't easy.

Unison is as simple to use as copying files and even more efficient that cvs. Let's see how we can achieve that.

Locally my files are stored in c:\web\blog folder and after I'm done with changes I want to copy them to /var/www/blog folder on my server. I could just use unison to sync between those folders but I'm a bit paranoid so I want to set this up in such a way that allows me to quickly restore previous version of the website on the off-chance that unison will mess something up. My deployment procedure is thus this:

Of course I'm not going to do those steps manually - a simple script will make all that a one-step deployment. If something goes wrong with syncing, I can quickly restore last version from blog-$date.tgz archive ($date is current date/time to make the archive name unique).

First you need to install unison on both computers involved in synchronization. I use the latest stable (at this point) version 2.9.1. On Windows I put unison.exe binary in a folder present in %PATH% variable (I use c:\tools for such programs) so that I can use it from command line. On Linux it also must be installed in appropriate location (/usr/local/bin in my case). Unison supports using secure ssh protocol for transfers but it requires using external ssh client and I couldn't make it work with any client I tried (putty and few Windows ports of ssh) so instead I use (less secure but working) socket mode. For that before starting the sync process, unison must be started on the destination server as unison -socket port. First synchronization must be done manually because unison asks a few questions. Create a destination folder (/var/www/blog-working) and run:

unison c:\web\blog socket://server.com:port//var/www/blog-working
Unison should print a message that this is a first time a sync between those folders is being made, press Enter to acknowledge that. Then it should show:
local server.com dir ----> www [f]
The arrow shows the direction of change propagation (since the files are only present locally, they'll be propagated from local directory to the directory on the server). Unison waits for confirmation, the option in square brackets ([f]) is the default action and means "follow unison's recommendation" so just press enter. Then unison asks:
Proceed with propagating updates? []
There's no default action so press 'y', Enter and watch the files being copied. Now test that synchronization works. Make a small change locally (e.g. add/delete/modify a file) and execute:
unison -batch c:\web\blog socket://server.com:port//var/www/blog-working -force c:\web\blog
Option -batch means running in batch mode i.e. without prompting the user. This is important if we want to run the command from the script. Option -force tells unison that we do only one-way syncing (as opposed to bi-directional syncing) and that we propagate changes from local folder c:\web\blog to /var/www/blog-working folder on the server (and not the other way around).

After syncing a working copy of blog files we need to safely copy them to the final destination. For that we need to have plink.exe (a part of putty - a free Windows telnet/ssh client) installed. The magic incantation is:

plink -pw password user@server.com cd /var/www; tar cjvf blog-`date +%y-%m-%d_%H-%M`.tar.bz2 blog; cp -R blog-working blog-tmp; rm -rf blog; mv blog-tmp blog
This will execute all the necessary commands to archive the directory to a compressed (-j option to tar causes compression with bzip2) archive and replaces the live site with the working copy synchronized via unison.

Tip for Windows .bat files: '%' signs are escaped in .bat files so if you want to have '%' in it (as required for the date formatting string) you need to enter it as '%%'.

Put those two lines in a batch file e.g. deploy_blog.bat and you have one-step, easy and efficient way to maintain a web site. From time to time you should probably delete those backup *tgz files.

Using Unison for incremental backups.

Unison can be just as well used for doing incremental backups. Let's assume a simple scenario: backing up existing c:\backup folder to f:\backup folder. As before, first create destination f:\backup folder. For the first time execute unison c:\backup f:\backup, and, as before, press: Enter, 'y', Enter. After that periodically execute unison -batch c:\backup f:\backup -force c:\backup.

Links

  • Unison - software being described
  • putty - telnet/ssh client for Windows.Includes plink.exe needed if we want to fully automate some scenarios using unison.
  • rsync - similar software but with less features and not working well on Windows
  • cygwin - required to run rsync