Unison vs. Dropbox

A mature project by the name of Unison purports to synchronise directories between a number of computers. It’s cross-platform (at least Windows, Mac and Linux) so it seemed suitable for a test run. Given my adventures with Dropbox, then SpiderOak, this looked promising.

Unison differs in some important respects from both SpiderOak and Dropbox. Firstly, there’s no remote backup (a.k.a. storage in the “cloud”): you can synchronise directories across many machines but you have to have at least some form of access to them (usually SSH). Secondly, Unison doesn’t run as a daemon like SpiderOak and Dropbox do. Those two launch transfers based on input/output (I/O) activity (i.e. you’ve added or removed a file in the synced directories); Unison doesn’t do this on its own. Thirdly, Unison doesn’t do versioning, so you can’t view the changes to a file over a given time. In SpiderOak’s case, this versioning goes back forever whilst Dropbox does so only for the last thirty days. These limitations can be overcome through the use of additional tools (see here for more info), notably file monitoring tools.

Instead of this approach, however, I decided a more straightforward approach would suffice. I have essentially three machines on which I would like to synchronise a directory (and its contents). I decided that a star topology would work best, which is to say one of the machines would act as the master copy and the two other clients would connect to it to upload and download files. The advantage of this approach is that I need only run an SSH server on one machine; the clients need only have SSH clients on them. Since one of these machines is a laptop and the other is behind a corporate firewall, this made set up a lot easier.

The first thing to note is that in order for this to act as Dropbox-like as possible, key-based SSH logins are necessary. Once I’d successfully set up key-based logins for each client machine to the master, then setting up Unison was pretty straightforwards. Their documentation is actually pretty good, and I was able to set up the profiles on the Laptop and the Desktop machine with little hassle.

One point worth making is that Unison versions on the clients in the star network must be close to (or exactly the same as) that on the master in the network. Apparently there’s some differences in the way Unison checks for new files between versions. I’ve used versions 2.40.63 (Linux), 2.40.61 (Mac and Windows), and I haven’t received any error messages about conflicts. On the Windows machine, it was easiest to use the Cygwin version of Unison with Cygwin’s version of OpenSSH too. I didn’t have much luck with the GUI tool on any architecture. In fact, it was much easier to use the command-line and text files.

To set up a profile, Unison expected a .prf file in $HOME/.unison with the following format:

# Some comments, if you want
root = C:UsersslacksetUnison
root = ssh://user@remote-master.com:2222//home/slackset/Unison
sshargs = -C

As you can see, the syntax is pretty simple. Note the // after the remote-master.com which specifies the SSH port number. Omit the :2222 if using the default (22). This will synchronise the contents of C:UsersslacksetUnison on a Windows machine with a target of /home/slackset/Unison on the master. The process is the same on a Mac, but the .prf files live in $HOME/Library/Application Data/Unison. You can create as many profiles as you want, something more akin to functionality in SpiderOak, but missing in Dropbox (which can synchronise only one directory).

There’s a number of options for the command-line invocation of Unison to run this in a daemon-like manner:

/usr/bin/unison ProfileName -contactquietly -batch -repeat 180 -times -ui text -logfile /tmp/unison-“$(date +%Y-%m-%d).log”

The important flags are -batch and -repeat n which forces the syncronisation to occur without prompts and which will repeat every n seconds (in my case, 180, or three minutes). If you omit -logfile with some target, a unison.log will be left in your home directory (which is annoying). I put this in the crontab with the @reboot keyword (see man 5 crontab) on the Windows (through Cygwin) and Mac so every three minutes my files are synchronised. That’s not quite instantaneous, but if I’m feeling impatient, I created an alias to run that command without the -repeat 180:

alias syncme=’/usr/bin/unison ProfileName -contactquietly -batch -times -ui text -logfile /tmp/unison-“$(date +%Y-%m-%d).log”‘

It spits out a list of the files which will be updated (either uploaded or downloaded) to standard output. I could bind this to a keyboard shortcut (with AutoHotKey on Windows, for example) or as an icon somewhere, but since I have a terminal open all the time, it seems easier to just type syncme when I’m impatient.

So far, this is working pretty well over Dropbox, but I do miss the fallback of having versioning. I may eventually get around to setting up git on the master in the star network, which would give me good versioning, I think. Something for a rainy day, perhaps.

Dropbox vs. SpiderOak

There was a hoohah recently on the internet with regards Dropbox and its Terms of Service and the manner in which they encrypt your data on their servers. As a result, I heard talk of alternative systems for syncing files across a few systems. The prerequisites were it needed to be cross-platform (Linux, Windows and Mac), preferably at least the same or better encryption that Dropbox has and it needed to be relatively easy to use.

Top of the list of alternatives was SpiderOak, a service principally aimed at online backup (more often known as storing your data in the cloud these days) but which also provides functionality to sync backed up folders across several machines.

First problem came in the installation of SpiderOak on Slackware. Their website provides a package for Slackware 12.1. At the time, Slackware was 32 bit only, so there’s no 64 bit package available. There are 64 bit packages available for a number of other distros, including Debian, Fedora, CentOS, openSUSE and Ubuntu. I decided to avoid the Slackware package since I wasn’t sure it would play nicely on a more recent (13.37) version of Slackware. Instead, I set about writing a SlackBuild script to repackage one of the other distro’s packages. In the end, I settled on the Ubuntu .debs.

I modified the autotools SlackBuilds.org template to unpack the .deb files instead of the more usual tarballs. Running the SlackBuild produced a package, but after I installed it I received the following error message:

Traceback (most recent call last):
File “<string>”, line 5, in <module>
zipimport.ZipImportError: not a Zip file: ‘/usr/lib/SpiderOak/library.zip’

I found the relevant file in /usr/lib and the file utility confirmed it was not a zip file. After much head scratching, it turned out that the strip lines in the standard SlackBuild significantly alter a number of files in the SpiderOak package. After removing those lines, the package built, installed and ran as expected. edit: If you’re looking for a copy of the SpiderOak SlackBuild, I’ve uploaded it here, with the doinst.sh, spideroak.info and slack-desc too.

The next step was migrating all my data from Dropbox to SpiderOak. SpiderOak allows you to define a number of folders to back up, unlike Dropbox where only a single folder can be synced. I created a new diretory (~/SpiderOak seemed fitting) and copied the contents of my Dropbox folder over to the new SpiderOak folder. I added this folder as a folder to back up in SpiderOak and let it upload the files.

I changed all the relevant scripts which used to dump data into my Dropbox folder to do so into the new SpiderOak folder instead. After setting up similar folders on my desktop (Windows) and my Mac (laptop), I was able to create a sync whereby all three versions would be synced with one another, effectively emulating the Dropbox functionality.

After a few days of dumping data into the folder, all seemed well. A few days after that, however, I started to get worried that my 2GBs of free storage on the SpiderOak servers was filling up rapidly. After some investigation, it became apparent that the manner in which SpiderOak works is slightly, though in my case, crucially different, and it relates to the way in which overwritten or deleted files are handled.

Dropbox stores old versions of files for up to 30 days after they have been overwritten or deleted. This allowed my frequent writes to not fill up all my space. Conversely, SpiderOak do not ever discard overwritten or deleted files. This is a nice feature if you accidentally delete an important file. However, for my purposes, it merely served to swallow my free 2GBs of storage in pretty short order.

It is possible to empty the deleted items for a given profile (i.e. each machine currently being backed up to the SpiderOak servers). Irritatingly, however, it is not possible to empty the deleted items for one of the machines from a different machine; you can view the deleted files and how much space they’re taking, but you can only delete them from the machine from which they were deleted. Since most of my files are created and deleted on my headless server, using VNC to open the SpiderOak GUI and emptying the files every couple of days quickly lost its appeal. I searched through the SpiderOak FAQs and documentation (such that I could find), and only found one option to do this automatically from a machine on the command line. The command is SpiderOak –empty-garbage-bin, though it is accompanied by this dire warning: 

Dangerous/Support Commands: 

Caution: Do not use these commands unless advised by SpiderOak support. They can damage your installation if used improperly.

So, for my purposes, the Dropbox approach of removing anything you’ve deleted or modified after 30 days is much more usable. Since I don’t keep anything particularly sensitive nor critical on there, the hoohah about the privacy and encryption aren’t much of a concern to me. After a month of SpiderOak, I was fed-up of the requirement for me to delete all the deleted files over VNC, and so I moved everything back to Dropbox. I suppose the lesson there is “if it ain’t broke, don’t fix it”.