Automatic upload speed throttling

As I’ve discussed before, the broadband service we receive from Virgin is subject to caps on the traffic on the line during certain times of day (see this post I wrote). Virgin have a relatively complex system for throttling your speed based on the amount you download and upload within certain times. I have a script which performs a backup of my data from my server at home to my work computer using rsync over SSH. This means that I sometimes have a large amount of data to upload during the day. Since Virgin’s upload restrictions apply between 1500 and 2000 during which time you are permitted to upload a maximum of 1.5GB. What this means is that for five hours, the average transfer rate must not exceed 300MB per hour, which equates to a transfer speed of 83.3 kB/s. I searched for a way to limit my server’s upload speed and came across tc (which is a kernel extension which must be enabled in your .config). Slackware’s kernel config includes the necessary parts as modules, so when you use the tc command, they are automatically loaded.

I found a good example which described what I wanted to achieve (limit my uploads to 83.3 kB/s) here. Using the examples on that page, I wrote a little script to allow me to start and stop the limited uploads easily:

#!/bin/bash

# Script to throttle uploads during the times when Virgin
# won’t allow unlimited uploads.
#
# Those times are between 1500 and 2000 no more than
# 1.5GB must be uploaded, so the upload speed needs to be
# capped at 83.3kB/s.

maxRate=83.3kbps
burstRate=100000kbps
interface=eth0

clearRule(){
        # First things first, clear the old rule
        tc qdisc del dev eth0 root
}

makeRule(){
        # Now add the throttling rule
        tc qdisc add dev $interface root handle 1:0 htb default 10
        tc class add dev $interface parent 1:0 classid 1:10 htb rate $maxRate ceil $burstRate prio 0
}

listRules(){
        tc -s qdisc ls dev $interface
}

case “$1” in
        ‘start’)
                clearRule
                makeRule
        ;;
        ‘stop’)
                clearRule
        ;;
        ‘status’)
                listRules
        ;;
        *)
                echo “Usage: $0 {start|stop|status}”
        ;;
esac

You’ll note I’ve set a $burstRate value of 100MB/s. This is probably not necessary, but with the same $burstRate value as used in $maxRate, I was seeing significant slowdown in the responsiveness of the remote session; I hope this high burst rate will alleviate that slowdown.

I saved this script somewhere where root only could get it and then added the following cronjobs to root’s crontab:

# Add networking throttling between 1500 and 2000
00 15 * * * /root/throttle_uploads.sh start
00 20 * * * /root/throttle_uploads.sh stop

So far, the process appears to be working, insofar as my daily uploads from home to work have slowed to approximately 80kB/s during the times when Virgin monitor uploads.

Free space with freedup

A useful tool I came across recently is freedup. If you’re like me, you have potentially many copies of the same file in a number of locations on your disk (this is particularly true of me because I have version-controlled backups of my most important files). Whilst multiple copies of the same file make restoring from older copies easy, it also makes chewing through disk space easy. To solve this, it’s possible to link two identical files together so they share the same data on the disk. In essence, files on a disk are more like information on where to look for the data. That means it’s easy to get two files to point to the same location on the disk. This approach means you don’t need to have two copies of the same data, you just point the two files at the same data. These types of links are often called hard links.

Whilst it’s possible to find all duplicates manually and link the two files together (through the ln command), it’s tedious if you have hundreds or thousands of duplicates. That’s where freedup comes in handy. It can search through specified paths finding all duplicates, and hard link them together for you, telling you how much space you’re saving in the process. A typical command might look like:

freedup -c -d -t -0 -n ./

where -c counts the disk space saved by each link, -d forces modification times to be identical, -t and -0 disable the use of the external hash functions. Most importantly at this stage, -n forces freedup to perform a dummy run through. Piping the output into a new file or a pager (like less) means you can verify it’s looking in the right places for files and that it’s linking what you expected. Remove the -n and rerun the command and it’ll link those files identified in the dummy run. My experience was a several gigabyte space saving on my external disk — not something to be sniffed at.

Unison vs. Dropbox

A mature project by the name of Unison purports to synchronise directories between a number of computers. It’s cross-platform (at least Windows, Mac and Linux) so it seemed suitable for a test run. Given my adventures with Dropbox, then SpiderOak, this looked promising.

Unison differs in some important respects from both SpiderOak and Dropbox. Firstly, there’s no remote backup (a.k.a. storage in the “cloud”): you can synchronise directories across many machines but you have to have at least some form of access to them (usually SSH). Secondly, Unison doesn’t run as a daemon like SpiderOak and Dropbox do. Those two launch transfers based on input/output (I/O) activity (i.e. you’ve added or removed a file in the synced directories); Unison doesn’t do this on its own. Thirdly, Unison doesn’t do versioning, so you can’t view the changes to a file over a given time. In SpiderOak’s case, this versioning goes back forever whilst Dropbox does so only for the last thirty days. These limitations can be overcome through the use of additional tools (see here for more info), notably file monitoring tools.

Instead of this approach, however, I decided a more straightforward approach would suffice. I have essentially three machines on which I would like to synchronise a directory (and its contents). I decided that a star topology would work best, which is to say one of the machines would act as the master copy and the two other clients would connect to it to upload and download files. The advantage of this approach is that I need only run an SSH server on one machine; the clients need only have SSH clients on them. Since one of these machines is a laptop and the other is behind a corporate firewall, this made set up a lot easier.

The first thing to note is that in order for this to act as Dropbox-like as possible, key-based SSH logins are necessary. Once I’d successfully set up key-based logins for each client machine to the master, then setting up Unison was pretty straightforwards. Their documentation is actually pretty good, and I was able to set up the profiles on the Laptop and the Desktop machine with little hassle.

One point worth making is that Unison versions on the clients in the star network must be close to (or exactly the same as) that on the master in the network. Apparently there’s some differences in the way Unison checks for new files between versions. I’ve used versions 2.40.63 (Linux), 2.40.61 (Mac and Windows), and I haven’t received any error messages about conflicts. On the Windows machine, it was easiest to use the Cygwin version of Unison with Cygwin’s version of OpenSSH too. I didn’t have much luck with the GUI tool on any architecture. In fact, it was much easier to use the command-line and text files.

To set up a profile, Unison expected a .prf file in $HOME/.unison with the following format:

# Some comments, if you want
root = C:UsersslacksetUnison
root = ssh://user@remote-master.com:2222//home/slackset/Unison
sshargs = -C

As you can see, the syntax is pretty simple. Note the // after the remote-master.com which specifies the SSH port number. Omit the :2222 if using the default (22). This will synchronise the contents of C:UsersslacksetUnison on a Windows machine with a target of /home/slackset/Unison on the master. The process is the same on a Mac, but the .prf files live in $HOME/Library/Application Data/Unison. You can create as many profiles as you want, something more akin to functionality in SpiderOak, but missing in Dropbox (which can synchronise only one directory).

There’s a number of options for the command-line invocation of Unison to run this in a daemon-like manner:

/usr/bin/unison ProfileName -contactquietly -batch -repeat 180 -times -ui text -logfile /tmp/unison-“$(date +%Y-%m-%d).log”

The important flags are -batch and -repeat n which forces the syncronisation to occur without prompts and which will repeat every n seconds (in my case, 180, or three minutes). If you omit -logfile with some target, a unison.log will be left in your home directory (which is annoying). I put this in the crontab with the @reboot keyword (see man 5 crontab) on the Windows (through Cygwin) and Mac so every three minutes my files are synchronised. That’s not quite instantaneous, but if I’m feeling impatient, I created an alias to run that command without the -repeat 180:

alias syncme=’/usr/bin/unison ProfileName -contactquietly -batch -times -ui text -logfile /tmp/unison-“$(date +%Y-%m-%d).log”‘

It spits out a list of the files which will be updated (either uploaded or downloaded) to standard output. I could bind this to a keyboard shortcut (with AutoHotKey on Windows, for example) or as an icon somewhere, but since I have a terminal open all the time, it seems easier to just type syncme when I’m impatient.

So far, this is working pretty well over Dropbox, but I do miss the fallback of having versioning. I may eventually get around to setting up git on the master in the star network, which would give me good versioning, I think. Something for a rainy day, perhaps.

Dropbox vs. SpiderOak

There was a hoohah recently on the internet with regards Dropbox and its Terms of Service and the manner in which they encrypt your data on their servers. As a result, I heard talk of alternative systems for syncing files across a few systems. The prerequisites were it needed to be cross-platform (Linux, Windows and Mac), preferably at least the same or better encryption that Dropbox has and it needed to be relatively easy to use.

Top of the list of alternatives was SpiderOak, a service principally aimed at online backup (more often known as storing your data in the cloud these days) but which also provides functionality to sync backed up folders across several machines.

First problem came in the installation of SpiderOak on Slackware. Their website provides a package for Slackware 12.1. At the time, Slackware was 32 bit only, so there’s no 64 bit package available. There are 64 bit packages available for a number of other distros, including Debian, Fedora, CentOS, openSUSE and Ubuntu. I decided to avoid the Slackware package since I wasn’t sure it would play nicely on a more recent (13.37) version of Slackware. Instead, I set about writing a SlackBuild script to repackage one of the other distro’s packages. In the end, I settled on the Ubuntu .debs.

I modified the autotools SlackBuilds.org template to unpack the .deb files instead of the more usual tarballs. Running the SlackBuild produced a package, but after I installed it I received the following error message:

Traceback (most recent call last):
File “<string>”, line 5, in <module>
zipimport.ZipImportError: not a Zip file: ‘/usr/lib/SpiderOak/library.zip’

I found the relevant file in /usr/lib and the file utility confirmed it was not a zip file. After much head scratching, it turned out that the strip lines in the standard SlackBuild significantly alter a number of files in the SpiderOak package. After removing those lines, the package built, installed and ran as expected. edit: If you’re looking for a copy of the SpiderOak SlackBuild, I’ve uploaded it here, with the doinst.sh, spideroak.info and slack-desc too.

The next step was migrating all my data from Dropbox to SpiderOak. SpiderOak allows you to define a number of folders to back up, unlike Dropbox where only a single folder can be synced. I created a new diretory (~/SpiderOak seemed fitting) and copied the contents of my Dropbox folder over to the new SpiderOak folder. I added this folder as a folder to back up in SpiderOak and let it upload the files.

I changed all the relevant scripts which used to dump data into my Dropbox folder to do so into the new SpiderOak folder instead. After setting up similar folders on my desktop (Windows) and my Mac (laptop), I was able to create a sync whereby all three versions would be synced with one another, effectively emulating the Dropbox functionality.

After a few days of dumping data into the folder, all seemed well. A few days after that, however, I started to get worried that my 2GBs of free storage on the SpiderOak servers was filling up rapidly. After some investigation, it became apparent that the manner in which SpiderOak works is slightly, though in my case, crucially different, and it relates to the way in which overwritten or deleted files are handled.

Dropbox stores old versions of files for up to 30 days after they have been overwritten or deleted. This allowed my frequent writes to not fill up all my space. Conversely, SpiderOak do not ever discard overwritten or deleted files. This is a nice feature if you accidentally delete an important file. However, for my purposes, it merely served to swallow my free 2GBs of storage in pretty short order.

It is possible to empty the deleted items for a given profile (i.e. each machine currently being backed up to the SpiderOak servers). Irritatingly, however, it is not possible to empty the deleted items for one of the machines from a different machine; you can view the deleted files and how much space they’re taking, but you can only delete them from the machine from which they were deleted. Since most of my files are created and deleted on my headless server, using VNC to open the SpiderOak GUI and emptying the files every couple of days quickly lost its appeal. I searched through the SpiderOak FAQs and documentation (such that I could find), and only found one option to do this automatically from a machine on the command line. The command is SpiderOak –empty-garbage-bin, though it is accompanied by this dire warning: 

Dangerous/Support Commands: 

Caution: Do not use these commands unless advised by SpiderOak support. They can damage your installation if used improperly.

So, for my purposes, the Dropbox approach of removing anything you’ve deleted or modified after 30 days is much more usable. Since I don’t keep anything particularly sensitive nor critical on there, the hoohah about the privacy and encryption aren’t much of a concern to me. After a month of SpiderOak, I was fed-up of the requirement for me to delete all the deleted files over VNC, and so I moved everything back to Dropbox. I suppose the lesson there is “if it ain’t broke, don’t fix it”.