Chirp User's Manual

9 September 2004

Chirp is Copyright (C) 2004 Douglas Thain and the University of Notre Dame. This software is distributed under a BSD-style license. See the file COPYING for details.

Overview

Chirp is a system for performing input and output across the Internet. Using Chirp, an ordinary user can share storage space and data with friends and colleagues without requiring any sort of administrator privileges anywhere.

Chirp is like a distributed filesystem (such as NFS) except that it can be run over wide area networks and requires no special privileges on either the client or the server end. Chirp allows the end user to set up fine-grained access control so that data can be shared (or not shared) with the right people.

Chirp is also like a file transfer system (such as FTP) that provides streaming point-to-point data transfer over the Internet. However, Chirp also provides fine-grained Unix-like data access suitable for direct access by ordinary programs.

Begin by installing the cctools on your system. When you are ready, proceed below.

Running a Chirp Server

Running a Chirp server is easy. You may run a Chirp server as any ordinary user, and you do not need to install the software or even run the programs as root. To run a Chirp server, you must do three things: pick a storage directory, create an authorization file, and run the server. Here's a simple example:

  1. Pick a storage directory. The Chirp server will only allow access to the directory that you choose. It could be a scratch directory, your home directory, or even your filesystem root. For now, let's store everything in a temporary directory:
    /tmp/mydata
    

  2. Create an authorization file. The Chirp server needs to know who will be allowed to access this space. There are a variety of ways to control access to a Chirp server, described below. For now, let's restrict access to machines in your domain. Create a file called /tmp/authfile and put this into it:
    hostname:*.cs.somewhere.edu:*:myname
    
    Of course, you should substitute your local domain name for cs.somewhere.edu and your username for myname. If you want to access the server from several specific machines or domains, you may put several hostname lines into the authfile. This authfile is fine for now, but we'll refine it later.

  3. Run the server. Simply run chirp_server and direct it to your storage directory and authorization file:
    % chirp_server -r /tmp/mydata -a /tmp/authfile
    
Now that you have a server running on one machine, let's use some tools to move data to and from your server.

Accessing Chirp Servers

The easiest way to access Chirp servers is by using a tool called Parrot. Parrot is a personal virtual filesystem: it "speaks" remote I/O operations on behalf of ordinary programs. For example, you can use Parrot with your regular shell to access Chirp servers like so:

 % parrot tcsh
 ...
 % cd /chirp/myhost.somewhere.edu
 % cp /tmp/bigfile .
 % ls -la
total 804
drwx------    2 condor   dip          4096 Sep 10 12:40 .
drwx------    2 condor   dip          4096 Sep 10 12:40 ..
-rw-r--r--    1 condor   dip      104857600 Sep 10 12:57 bigfile
-rw-r--r--    1 condor   dip           147 Sep 10 12:39 hosts
 % cp /http/www.cse.nd.edu temp.html
 % vi temp.html
 %

(If you are having difficulting accessing your server, have a look at "debugging hints" below.)

Parrot is certainly the most convenient way to access storage, but it has some limitations: it only works on Linux 2.4, and imposes a performance penalty. (This is because Parrot makes an extra data copy in the process of handling a program's system calls.)

For more portable, explicit control of a Chirp server, use the Chirp command line tool. This allows you to connect to a server, copy files, and manage directories, much like an FTP client:

 % chirp
 ...
 chirp::> open myhost.somewhere.edu
 chirp:myhost.somewhere.edu:/> put /tmp/bigfile
file /tmp/bigfile -> /bigfile (11.01 MB/s)
 chirp:myhost.somewhere.edu:/> ls -la
dir      4096 .                                        Fri Sep 10 12:40:27 2004
dir      4096 ..                                       Fri Sep 10 12:40:27 2004
file      147 hosts                                    Fri Sep 10 12:39:54 2004
file 104857600 bigfile                                 Fri Sep 10 12:53:21 2004
 chirp:myhost.somewhere.edu:/>

In scripts, you may find it easier to use the standalone commands chirp_get and chirp_put, which move single files to and from a Chirp server. These commands also allow for streaming data, which can be helpful in a shell pipeline. Also, the -f option to both commands allows you to follow a file, much like the Unix tail command:

 % tar cvzf archive.tar.gz ~/mydata
 % chirp_put archive.tar.gz myhost.somewhere.edu archive.tar.gz
 % ...
 % chirp_get myhost.somewhere.edu archive.tar.gz - | tar xvzf
 % ...
 % chirp_get -f myhost.somewhere.edu logfile - |& less
 %

The fourth way to access the storage pool is write your own programs that access the Chirp C interface. You must compile and link against the following file in the ordinary way:

INSTALL_DIR/include/chirp_client.h
INSTALL_DIR/include/chirp_reli.h
INSTALL_DIR/lib/libchirp.a
The chirp_client.h interface allows you to explicitly connect to a server and open, close, read, and write files, much as in a traditional Unix interface. This interface is unreliable in the sense that a broken connection will cause all further operations to fail. To recover, you must explicitly re-connect to the server.

The chirp_reli.h interface is a reliable version of the chirp_client interface. The programmer need not explicitly connect or disconnect to servers, but simply names the host and file to access. The library transparently handles connection as well as recovery from temporary failures.

Finding Chirp Servers

Now that you know how to run and use Chirp servers, you will need a way to keep track of all of the servers that are available for use. For this purpose, consult the Chirp storage catalog. This web page is a list of all known Chirp servers and their locations.

The storage catalog is highly dynamic. By default, each Chirp server makes itself known to the storage catalog every five minutes. The catalog server records and reports all Chirp servers that it knows about, but will discard servers that have not reported for fifteen minutes.

If you do not want your servers to report to a catalog, then run them with this option:

% chirp_server -u -

Alternatively, you may establish your own catalog server. This can be useful for keeping your systems logically distinct from the main storage pool, but can also help performance and availability if your catalog is close to your Chirp servers. The catalog server is installed in the same place as the Chirp server. Simply run it on any machine that you like and then direct your Chirp servers to update the new catalog with the -u option. The catalog will be published via HTTP on poty 9097 of the catalog machine.

For example, suppose that you wish to run a catalog server on a machine named dopey and a Chirp server on a machine named sneezy:

dopey% catalog_server
...
sneezy% chirp_server -u dopey [more options]
Finally, point your web browser to:
http://dopey:9097
An you will see something like this.

Authentication - The Full Story

ACLs - Access Control Lists

In the near future, Chirp will support per-directory access control lists. In the meantime, you must use the old-style authentication file...

Old-Style Authentication File

Naturally, one should be concerned about the security of a storage service. The Chirp server has a flexible security policy which allows you to accept or deny users via one of several authentication schemes. You may construct a simple policy based on hostnames and addresses, or you may connect to existing security systems. It's up to you.

Here is a summary of the authentication schemes:
TypeSummaryPersonal?Multi-User?
(non-root)(root)
kerberos Centralized private key system no yes (host cert)
globus Distributed public key system yes (user cert) yes (user cert)
filesystem Authenticate via a local or distributed filesystem. yes yes
hostname Reverse DNS lookup yes yes
address Identify by IP address yes yes

The Chirp tools will attempt all of the authentication types it knows until it successfully connects to a Chirp server. You must explicitly specify the security policy for the Chirp server in an authfile, passed on the command line. An example authfile is distributed with Chirp in INSTALL_DIR/etc/chirp.authfile.example.

Here's how it works. Each line in the file has four fields separated by colons: the authentication type, the permitted hostnames, the permitted remote users, and the corresponding local users. Asterisks may be used in the first three fields as wildcards. The fourth field must be either a valid local username or an asterisk, indicating that the local username is chosen by the authentication type. Each line in the file is compared against the calling user in order. If one matches, the user is accepted and assigned the username in the fourth field.

Here are some examples. Suppose that I wish to run a personal server as an ordinary user thain, and I am willing to trust any user calling from two different hosts called red and blue, as well as any hosts that can authenticate with my Globus identity:

    hostname:red.cs.wisc.edu:*:thain
    hostname:blue.cs.wisc.edu:*:thain
    globus:*:/C=US/O=National Computational Science Alliance/CN=Douglas Thain:thain
(The hostname method ignores the third field) Or, suppose that I am running a server as root, and I am willing to trust any user that can authenticate via Kerberos or via the local filesystem if on the same host. In addition, I consider any user on the host operator.cs.wisc.edu to be equivalent to the user named sysop:
    kerberos:*:*:*
    filesystem:bird.cs.wisc.edu:*:*
    hostname:operator.cs.wisc.edu:*:sysop
A Chirp server creates a new process for every incoming client. If the server is run as the superuser, the process will setuid to the id of the authenticated user. If the server is run as an ordinary user, it will check to make sure that the authenticated user matches the owner user, otherwise the connection is declined.

Each of the authentication types has a few things you should know:

Kerberos: The server will attempt to use the Kerberos identity of the host it is run on. (i.e. host/coral.cs.wisc.edu@CS.WISC.EDU) Thus, it must be run as the superuser in order to access its certificates.

Globus: The server and client will attempt to perform peer-to-peer authentication using the Grid Security Infrastructure. Both sides must have access to a proxy certificate by running grid-proxy-init.

Filesystem: This method makes use of an existing filesystem (local or distributed) to establish the client's identity. It assumes that both machines share the same conception of the user database and have a common directory which they can read and write. By default, the server will pick a filename in /tmp, and challenge the client to create that file. If it can, than the server will examine the owner of the file to determine the client's username. Naturally, /tmp will only be available to clients on the same machine. However, if a shared filesystem directory is available, give that to the chirp server via the -c option. Then, any authorized client of the filesystem can authenticate to the server. For example, at Notre Dame, we use -c /afs/nd.edu/user37/ccl/software/rendezvous to authenticate via our AFS distributed file system.

Hostname: The server will rely on a reverse DNS lookup to establish the fully-qualified hostname of the calling client. The second field gives the hostname to be accepted. It may contain an asterisk as a wildcard. The third field is ignored. The fourth field is then used to select an appropriate local username.

Address: Like "hostname" authentication, except the server simply looks at the client's IP address.

By default, Chirp and/or Parrot will attempt every authentication type knows until one succeeds. If you wish to restrict or re-order the authentication types used, give one or more -a options to the client, naming the authentication types to be used, in order. For example, to attempt only hostname and kerberos authentication, in that order:

   % chirp -a hostname -a kerberos

Debugging Advice

Debugging a distributed system can be quite difficult because of the sheer number of hosts involved and the mass of information to be collected. If you are having difficulty with Chirp, we recommend that you make good use of the debugging traces

In all of the Chirp and Parrot tools, the -d option allows you to turn on selected debugging messages. The simplest option is -d all which will show every event that occurs in the system.

To best debug a problem, we recommend that you turn on the debugging options on both the client and server that you are operating. For example, if you are having trouble getting Parrot to connect to a Chirp server, then run both as follows:

% chirp_server -d all [more options] ...
% parrot -d all tcsh
Of course, this is likely to show way more information than you will be able to process. Instead, turn on a debugging flags selectively. For example, if you are having a problem with authentication, just show those messages with -d auth on both sides.

There are a large number of debugging flags. Currently, the choices are: syscall notice channel process resolve libcall tcp dns auth local http ftp nest chirp dcap rfio cache poll remote summary debug time pid all. When debugging problems with Chirp and Parrot, we recommend selectively using -d chirp, -d tcp, -d auth, and -d libcall as needed.