Tsync HOWTO

James W. Anderson

Revision History
Revision 0.82006-01-23Revised by: jwa
updated to reflect new TLS authentication and cert generation
Revision 0.72005-08-29Revised by: jwa
preliminary draft

WARNING: Tsync is still beta software. While it has never lost or corrupted any of our data in tests, the possibility exists that it could. We suggest that you do not trust the only copy of valuable data to Tsync until you have gained confidence in and understand the system.

Tsync is a user-level daemon that provides transparent synchronization for one or more data volumes (directory trees) amongst a set of computers. Tsync uses a peer-to-peer architecture for scalability, efficiency, and robustness, which ensures that each node remains connected with all other connected nodes. The overlay network also provides a scalable means by which a Tsync node can learn about other hosts, besides the bootstrap host with which it was configured. Tsync uses TLS for authentication and encryption.


Table of Contents
1. Preamble
1.1. Copyrights
1.2. Disclaimer
1.3. Acknowledgments
1.4. Feedback
2. Introduction
3. Downloading and Installing Tsync
3.1. Prerequisites
3.2. Downloading Tsync
3.3. Compiling Tsync
4. Configuring tsyncd
4.1. Synchronized Clocks
4.2. TLS RSA keys
4.3. tsyncd.conf
4.4. tsyncd.exports
5. Running Tsync
5.1. tsyncd
5.2. tsync

1. Preamble

1.1. Copyrights

This document is copyright (c) 2005, James W. Anderson. All rights reserved. You may redistribute this document under the terms of the GNU General Public License (GPL).


1.2. Disclaimer

THIS DOCUMENT AND THE SOFTWARE THAT IT DESCRIBES IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENT OR SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


1.3. Acknowledgments

I graciously thank Google, Inc, for sponsoring part of the development of Tsync as a Google Summer of Code project. I would also like to thank Michael Moss, of Google, for his feedback about this document and aspects of Tsync.


1.4. Feedback

Please send feedback and corrections to James W. Anderson (j_w_anderson@users.sourceforge.net). We will be grateful for corrections and will incorporate them as soon as possible.


2. Introduction

This document describes Tsync (pronounced "sink"), which provides transparent synchronization across a set of machines for existing files and directories. A transparent synchronization system makes keeping a set of files consistent across many machines---possibly with differing degrees of connectivity and availability---as simple as possible while requiring minimal effort from the user and maintaining security, robustness to failure, and fast performance.

Traditional synchronization tools, such as the popular Rsync and Unison, require that the user manually synchronize her files after changing them. Moreover, these tools are designed to only synchronize a pair of hosts: if the user wishes to synchronize N machines, then she must run the tool N-1 times. Not only is it inefficient to unicast the same data N-1 times, but the user is also burdened with remembering to restart synchronizations that are interrupted and manually recovering failed hosts.

Tsync will solve the problem of providing transparent synchronization under the assumption of optimistic consistency. Optimistic conistency assumes that the same file is not modified on two hosts at the same time. In the Tsync usage model, the user writes a simple configuration file, similar to /etc/exports, describing which directories should be synchronized, and listing one or more other hosts that are part of the Tsync group (although this list does not have to contain all the hosts in the group). The user runs the Tsync daemon, tsyncd, on each machine in the group. Then when the user creates/modifies/deletes files on one machine, those changes are automatically propagated to all the others. So if the user were to add a bookmark on her machine at the university, it would be reflected on her desktops at home. Even if not all of the computers are connected at the same time (such as if her laptop were powered off), then the next time the disconnected machine regained connectivity, it would automatically learn about the change and update itself.

A synchronization system for widely distributed hosts faces scalability and reliability challenges. The system must gracefully scale to accommodate tens or even hundreds of hosts. Of course, to make managing the system simple, the user cannot be required to manually configure each host with every other host. Hosts must have a way of learning about other hosts, as well as efficiently distributing control messages and data to all other hosts. Furthermore, the system must automatically adapt as hosts are powered off, lose connectivity, or crash, and must rapidly re-synchronize these computers when they re-join. Similarly, adding new hosts should be a simple process, and they should rapidly be brought up-to-date. The design of Tsync uses peer-to-peer and overlay techniques to provide scalable and efficient mechanisms for transparently synchronizing many hosts. Tsync organizes a user's machines into an overlay network with a tree topology. The overlay network, through probing and a root fail-over protocol, ensures that each node remains connected with all other connected nodes. The overlay network also provides a scalable means by which a Tsync node can learn about other hosts, besides the bootstrap host with which it was configured. The tree topology allows any Tsync host to efficiently multicast a message to all the other hosts. The overlay also handles authentication and encryption: hosts authenticate each other using RSA-keys, and all data is encrypted using TLS.


3. Downloading and Installing Tsync

3.1. Prerequisites

3.1.1. Running Tsync

To run Tsync, you need the following libraries:

  • Perl 5.6.0 or greater

  • Perl Frontier::Client module (part of the Frontier-RPC package)

  • libboost

  • libpthread

  • libm

  • libstdc++

  • libcrypto

  • OpenSSH

  • sendmail (or equivalent replacement)


3.1.2. Compiling Tsync

Compiling the Tsync source code requires the following software packages:

  • Perl 5.6.0 or greater

  • boost development headers

  • OpenSSL development headers

  • gcc/g++ version 3.4 or greater

  • lex/yacc


3.2. Downloading Tsync

Tsync is distributed under the GNU General Public License (GPL). The Tsync source code and binary packages are available from the Tsync SourceForge project pages:

http://sourceforge.net/projects/tsyncd/


3.3. Compiling Tsync

To compile Tsync from the source code package or a copy of CVS, simply run make from the root directory. This will build the tsyncd executable.


4. Configuring tsyncd

The tsyncd daemon uses two configuration files: a daemon configuration file (which also includes any Mace options) and an exports file. By default, tsyncd looks for the configuration file named tsyncd.conf and the exports file named tsyncd.exports.

Tsync is built using Mace (http://mace.ucsd.edu), a language and toolkit for building distributed systems. The Tsync distribution ships with all the Mace components Tsync needs, so it is not necessary to download and install Mace separately. Several of the options, summarized here, are Mace related and are fully documented elsewhere.


4.1. Synchronized Clocks

Tsync requires that the clocks on all hosts participating in the Tsync group be synchronized to within 1 second of each other. We recommend running the Network Time Protocol Daemon (ntpd) to accomplish this. On RedHat based Linux distributions, NTP can be enabled by running system-config-time.


4.2. TLS RSA keys

Tsync uses RSA public key cryptography to authenticate hosts and establish encryption keys. Each host needs its own certificate (public and private key pair) that is signed by a common certificate authority (CA) certificate. The Tsync distribution contains a utility gencert found in src/mace/application/ that will create a CA certificate and host certificates. First, create a CA certificate.

./gencert cacert > ca.pem

Next, create a certificate (signed by your CA certificate) for each host on which you will be running Tsync. For hosts with fixed hostnames, the hostname argument must exactly match the DNS name associated with the IP address for each host. Otherwise, for mobile hosts, hosts behind a NAT, or hosts with dynamic IP addresses, the hostname is used as a unique identifier for that host and will need to be included in the MACE_NO_VERIFY_HOSTNAMES parameter, described in Section Section 4.3.2.

./gencert signedcert ca.pem host1.foo.com > host1.pem
./gencert signedcert ca.pem host2.bar.com > host2.pem
...

Finally, copy the CA certificate and the respective host certificate to each of your computers.

scp ca.pem host1.pem host1.foo.com:~/.tsync_certs
scp ca.pem host2.pem host2.bar.com:~/.tsync_certs
...

4.3. tsyncd.conf

The daemon configuration file syntax consists of name = value pairs, with one entry per line. Any text following the comment symbol # will be ignored.

The syntax for specifying hosts is as follows: local-hostname-or-ip[:local-port][/proxy-hostname-or-ip[:proxy-port]]. In the simplest from, a host is specified as a hostname (foo.bar.com) or an IP address (1.2.3.4). If the host's base port (MACE_PORT) is different from the local host's base port, then the alternative port can be optionally specified with a :port. For instance, if the host is foo.bar.com is listening on port 8080, but the local host is not, then the host should be specified as foo.bar.com:8080. Finally, Tsync can operate on hosts that do not have a public IP address, but for which port-forwarding can be setup from a device (the proxy) with a public IP address. If we have a host with the private IP address 192.168.10.20 listening on port 8080, and a router with a hostname of foo.bar.com, that forwards traffic on port 8000 to 192.168.10.20 on port 8080, then we would specify this host as: 192.168.10.20:8080/foo.bar.com:8000. If the host and/or the proxy are listening on the default port, then the port can be omitted.


4.3.1. Required parameters

servers

A space separated list of hosts. These are the other peers that this daemon will contact to form the Tsync overlay. Note that this list does not have to be exhaustive. If the server's hostname is specified, it will be ignored. At least one host other than the server must be specified.

xmlrpcPort

An integer specifying the port on which the XML-RPC server should listen. Currently, all Tsync hosts must be able to use the same XML-RPC port. Note that this port is not the same as the base port, set with the MACE_PORT option. Configure your firewall, if you are running one, to allow TCP traffic to enter on this port.

email

The email address to which the daemon should send notifications.

MACE_CA_FILE

The (optional) path and filename of your CA certificate file. The path can be specified as either a relative or absolute path.

MACE_CERT_FILE

The (optional) path and filename of your host certificate file. The path can be specified as either a relative or absolute path.

MACE_PRIVATE_KEY_FILE

The path and filename of your host private key. If you generated your certificate using the above commands, then this will point to your certificate file, which contains both the public certificate and private key.


4.3.2. Optional parameters

primary

The hostname of a fixed primary. This is useful for debugging. If the primary receives an update, it will abort. Do not use this option if you will be modifying files in exported volumes on any host other than the specified primary.

MACE_NO_VERIFY_HOSTNAMES

A space separated list specifying hostnames that should always be accepted for TLS connections. Normally, Tsync will verify that the hostname presented in the certificate matches the hostname of the machine originating the connection. However, if the certificate hostname is in the MACE_NO_VERIFY_HOSTNAMES list, then the connection will always be allowed. Use this parameter with caution, because if an adversary obtains the certificate and private key for one of these hostnames, then they will be able to connect to your Tsync overlay from any host.

MACE_ALL_HOSTS_REACHABLE

Set this to 1 if you will be running Tsync with hosts that are not directly reachable from all other hosts, that is, hosts that are connected via port-forwarding from a router.

MACE_LOCAL_ADDRESS

This must be set if the host is behind a firewall or if it has multiple IP addresses and you wish to specify (in conjunction with MACE_BIND_LOCAL_ADDRESS) that only one of them should be bound. As a rule of thumb, if running the command hostname returns "localhost.localdomain" or some value other than the hostname corresponding to your actual IP address, then you need to set MACE_LOCAL_ADDRESS.

MACE_BIND_LOCAL_ADDRESS

Set this to 1 if you want Mace to only bind the IP address specified in MACE_LOCAL_ADDRESS. Otherwise, Mace will bind to all possible addresses. MACE_BIND_LOCAL_ADDRESS should be set whenever the command hostname returns "localhost.localdomain" or some value other than the hostname corresponding to your actual IP address.

MACE_PORT

If set, this specifies the base port on which Mace services should listen. Note that Tsync currently requires two (2) consecutive ports for Mace services (that is, MACE_PORT and MACE_PORT + 1). Configure your firewall, if you are running one, to allow TCP traffic to enter on this port.

MACE_LOG_AUTO_SELECTORS

A space separated list of selectors that should be printed in the log. Used for debugging.

MACE_LOG_FILE

The path to a file in which logging should be written. If not specified, logging will be printed to standard out.

MACE_LOG_TIMESTAMP_HUMAN

Set to 1 if log timestamps should be printed in a human-readable format.

MACE_LOG_LEVEL

Set to 1, 2, or 3 to see verbose logging.

MACE_PRINT_HOSTNAME_ONLY

Set to 1 to only print hostnames in logs, as opposed to hostname and IP address.


4.3.3. Sample tsyncd.conf

email = user@foo.com
servers = host1.foo.com host2.bar.com
xmlrpcPort = 6666
MACE_CA_FILE = /home/foo/.tsync_certs/ca.pem
MACE_CERT_FILE = /home/foo/.tsync_certs/host1.pem
MACE_PRIVATE_KEY_FILE = /home/foo/.tsync_certs/host1.pem

# mace configuration options
MACE_NO_VERIFY_HOSTNAMES = host2.bar.com host3.quux.com
MACE_PORT = 6664
MACE_LOG_AUTO_SELECTORS = ERROR
# comment out the following line to log to standard out instead of a file
MACE_LOG_FILE = /tmp/tsync-server.log
MACE_LOG_TIMESTAMP_HUMAN = 1
MACE_PRINT_HOSTNAME_ONLY = 1

4.4. tsyncd.exports

The exports file specifies the volumes, which are directory trees, that Tsync should synchronize. The syntax is:

volume1 /path1/to/volume1

One volume should be specified per line. The volume names must be identical on all hosts synchronizing that volume. A volume does not need to reside in the same path on each host that exports it.


5. Running Tsync

The Tsync system has two components:

  1. tsyncd: the system daemon

  2. tsync: the command-line client, used for querying and interfacing with the daemon


5.1. tsyncd

tsyncd recognizes the following command line options (also printed when -h is specified):

-e file

specify alternate exports file

-c file

specify alternate configuration file

-b

enable backup mode: only receive synchronizations (do not send updates)

-i

clear history and resynchronize. This will cause tsyncd to not multicast any updates from the first poll interval as it rebuilds its version tree.

-r

reset history and versions. This will cause tsyncd to re-multicast all updates as if all the files were newly created.

Note that Tsync will postpone synchronizing a file that is constantly changing until its contents remain unchanged for a brief period of time. For instance, you are synchronizing log files for a busy server, the logs might never be stable long enough for Tsync to synchronize them. To avoid this problem, you could setup a cron job to periodically rotate your logs, so that Tsync can synchronize the logs that are no longer active.


5.2. tsync

Run tsync -h for a list of options accepted by the tsync client.