rdiff-backup through gateways using ssh tunnels and remote-schema


Backing-up your workstation off-site on a backup server is generally a good idea. However when both the target and source are behind firewalls, it can be tricky to setup, assuming that you do not want to reconfigure your firewalls to forward ports in either end.

In this article I demonstrate a way to achieve this using rdiff-backup, assuming that both (the workstation and backup server) can be accessed via ssh through their gateways. In this case both source and destination have ssh server installed which are simply not accessible directly from the outside world. However you can reach them if you first ssh to their gateways and then ssh to the final destination.


(Note: The easy solution would be to setup some sort of forwarding to one of the firewalls but if the IT support is incompetent to do so, afraid not.)

rdiff-backup is an excellent tool to perform incremental backups of your valuable data. The basic function of rdiff-backup is to copy all the data from a source directory to the target and subsequently only records differences (diffs) of the files/directories which have changed and creates a snapshot. It is important to understand its advantage over other backup methods such as rsync and this is that it does not only syncs the two directories, but it also keeps the history of the changed files, in case you want to retrieve an older snapshot of your data. Finally the tool would not be so useful if it could not operate over that network.

A typical rdiff-backup command would look like this:

rdiff-backup source destination

The source and destination can either be a a local directory (/path/to/directory) or remote directory (hostname::/path/to/directory) the source and destination arguments are same as those used in scp but you must use double colon character to separate the host name from the path when defining remote directories.

For the discussion purposes let’s assume that the back-up server initiates the backup, retrieves the snapshot and exits. The host name of the workstation we have the original data is workstation and that of the remote gateway is remoteGateway. A command like this will fail because the workstation can not be accessed directly with ssh.

rdiff-backup workstation::/path/to/original/data /path/to/backup

The trick is to use ssh tunnels to reach the workstation. On the back-up server open a terminal and run the command

ssh remoteGateway -L localhost:8000:workstation:22

This will create an interface to the workstation’s ssh server on the backup server (localhost) which operates on port 8000 instead of 22 where ssh normally runs. This command creates a tunnel and any traffic send to localhost port 8000 will be forward through the ssh tunnel between the back-up server and the remoteGateway to the workstation. Hence if you open another terminal and issue the following command you will reach the workstation’s ssh server.

ssh -p8000 localhost

So now you can run rdiff-backup on localhost port 8000 which is actually the workstation. But you need one more trick to make it work. You have to tell rdiff-backup that the ssh sever runs on port 8000. This is done using the switch remote-schema. A command similar to this one will start the back-up process.

rdiff-backup -v6 –print-statistics –remote-schema ‘ssh -p8000 %s rdiff-backup –server’ localhost::/path/to/original/data /path/to/backup

To automate the process you can create a bash script to open the tunnel for you and then run the rdiff-command. You can put it in your crontab to schedule daily backups.

ssh remoteGateway -L localhost:8000:workstation:22 -N &
sleep 5; # Allow some time for tunnel to get established.

rdiff-backup -v6 –print-statistics –remote-schema ‘ssh -p8000 %s rdiff-backup –server’ localhost::/path/to/original/data /path/to/backup

sleep 5; # Allow some time for rdiff-backup to exit
kill $sshpid

Posted in Computing, How To, Linux, Software by Christos| No Comments »