Chirp User's Manual

24 September 2004

Chirp is Copyright (C) 2004 Douglas Thain and the University of Notre Dame. This software is distributed under a BSD-style license. See the file COPYING for details.

Overview

Chirp is a system for performing input and output across the Internet. Using Chirp, an ordinary user can share storage space and data with friends and colleagues without requiring any sort of administrator privileges anywhere.

Chirp is like a distributed filesystem (such as NFS) except that it can be run over wide area networks and requires no special privileges on either the client or the server end. Chirp allows the end user to set up fine-grained access control so that data can be shared (or not shared) with the right people.

Chirp is also like a file transfer system (such as FTP) that provides streaming point-to-point data transfer over the Internet. However, Chirp also provides fine-grained Unix-like data access suitable for direct access by ordinary programs.

Begin by installing the cctools on your system. When you are ready, proceed below.

Running a Chirp Server

Running a Chirp server is easy. You may run a Chirp server as any ordinary user, and you do not need to install the software or even run the programs as root. To run a Chirp server, you must do three things: pick a storage directory, run the server, and then adjust the access control.

  1. Pick a storage directory. The Chirp server will only allow access to the directory that you choose. It could be a scratch directory, your home directory, or even your filesystem root. For now, let's store everything in a temporary directory:
    /tmp/mydata
    

  2. Run the server. Simply run chirp_server and direct it to your storage directory and authorization file.
    % chirp_server -r /tmp/mydata &
    

  3. Adjust the access control. When first started, the Chirp server will allow access only to YOU from the same host. You will probably want to change this to allow access to other people and hosts. To adjust the access control, use the chirp tool and the setacl command to set the access control list. For example, to also allow other hosts in your domain to read and write the server:
    % chirp localhost
     chirp:localhost:/> setacl . hostname:*.mydomain.edu write
    
Now that you have a server running on one machine, let's use some tools to move data to and from your server.

Accessing Chirp Servers

The easiest way to access Chirp servers is by using a tool called Parrot. Parrot is a personal virtual filesystem: it "speaks" remote I/O operations on behalf of ordinary programs. For example, you can use Parrot with your regular shell to access Chirp servers like so:

 % parrot tcsh
 ...
 % cd /chirp/myhost.somewhere.edu
 % cp /tmp/bigfile .
 % ls -la
total 804
drwx------    2 condor   dip          4096 Sep 10 12:40 .
drwx------    2 condor   dip          4096 Sep 10 12:40 ..
-rw-r--r--    1 condor   dip      104857600 Sep 10 12:57 bigfile
-rw-r--r--    1 condor   dip           147 Sep 10 12:39 hosts
 % cp /http/www.cse.nd.edu temp.html
 % vi temp.html
 %

(If you are having difficulting accessing your server, have a look at "debugging hints" below.)

Parrot is certainly the most convenient way to access storage, but it has some limitations: it only works on Linux 2.4, and imposes a performance penalty. (This is because Parrot makes an extra data copy in the process of handling a program's system calls.)

For more portable, explicit control of a Chirp server, use the Chirp command line tool. This allows you to connect to a server, copy files, and manage directories, much like an FTP client:

 % chirp
 ...
 chirp::> open myhost.somewhere.edu
 chirp:myhost.somewhere.edu:/> put /tmp/bigfile
file /tmp/bigfile -> /bigfile (11.01 MB/s)
 chirp:myhost.somewhere.edu:/> ls -la
dir      4096 .                                        Fri Sep 10 12:40:27 2004
dir      4096 ..                                       Fri Sep 10 12:40:27 2004
file      147 hosts                                    Fri Sep 10 12:39:54 2004
file 104857600 bigfile                                 Fri Sep 10 12:53:21 2004
 chirp:myhost.somewhere.edu:/>

In scripts, you may find it easier to use the standalone commands chirp_get and chirp_put, which move single files to and from a Chirp server. These commands also allow for streaming data, which can be helpful in a shell pipeline. Also, the -f option to both commands allows you to follow a file, much like the Unix tail command:

 % tar cvzf archive.tar.gz ~/mydata
 % chirp_put archive.tar.gz myhost.somewhere.edu archive.tar.gz
 % ...
 % chirp_get myhost.somewhere.edu archive.tar.gz - | tar xvzf
 % ...
 % chirp_get -f myhost.somewhere.edu logfile - |& less
 %

The fourth way to access the storage pool is write your own programs that access the Chirp C interface. You must compile and link against the following file in the ordinary way:

INSTALL_DIR/include/chirp_client.h
INSTALL_DIR/include/chirp_reli.h
INSTALL_DIR/lib/libchirp.a
The chirp_client.h interface allows you to explicitly connect to a server and open, close, read, and write files, much as in a traditional Unix interface. This interface is unreliable in the sense that a broken connection will cause all further operations to fail. To recover, you must explicitly re-connect to the server.

The chirp_reli.h interface is a reliable version of the chirp_client interface. The programmer need not explicitly connect or disconnect to servers, but simply names the host and file to access. The library transparently handles connection as well as recovery from temporary failures.

Finding Chirp Servers

Now that you know how to run and use Chirp servers, you will need a way to keep track of all of the servers that are available for use. For this purpose, consult the Chirp storage catalog. This web page is a list of all known Chirp servers and their locations.

The storage catalog is highly dynamic. By default, each Chirp server makes itself known to the storage catalog every five minutes. The catalog server records and reports all Chirp servers that it knows about, but will discard servers that have not reported for fifteen minutes.

If you do not want your servers to report to a catalog, then run them with this option:

% chirp_server -u -

Alternatively, you may establish your own catalog server. This can be useful for keeping your systems logically distinct from the main storage pool, but can also help performance and availability if your catalog is close to your Chirp servers. The catalog server is installed in the same place as the Chirp server. Simply run it on any machine that you like and then direct your Chirp servers to update the new catalog with the -u option. The catalog will be published via HTTP on port 9097 of the catalog machine.

For example, suppose that you wish to run a catalog server on a machine named dopey and a Chirp server on a machine named sneezy:

dopey% catalog_server
...
sneezy% chirp_server -u dopey [more options]
Finally, point your web browser to:
http://dopey:9097
An you will see something like this.

Security

Now that you have an idea how Chirp can be used, let's discuss security in more detail. Different sites require different levels of security and different technological methods of enforcing security. For these reasons, Chirp has a very flexible security system that allows for a range of tools and policies from simple address checks to Kerberos authentiation.

Security really has two aspects: authentication and authorization. Authentication deals with the question "Who are you?" Once your identity has been established, then authorization deals with the question "What are you allowed to do?" Let's deal with each in turn.

Authentication

Chirp supports the following authentication schemes:

TypeSummaryRegular User?Root?
(non-root)(root)
kerberos Centralized private key system no yes (host cert)
globus Distributed public key system yes (user cert) yes (host cert)
unix Authenticate with local unix user ids. yes yes
hostname Reverse DNS lookup yes yes
address Identify by IP address yes yes

The Chirp tools will attempt all of the authentication types that are known and available in the order above until one works. For example, if you have Kerberos installed in your system, Chirp will try that first. If not, Chirp attempts the others.

Once an authentication scheme has succeeded, Chirp assigns the incoming user a subject that describes both the authentication method and the user name within that method. For example, a user that authenticates via Kerberos might have the subject:

    kerberos:dthain@nd.edu
A user authenticating with Globus credentials might be:
    globus:/O=Cooperative_Computing_Lab/CN=Douglas_L_Thain
While another user authenticating by local unix ids might be:
    unix:dthain
While a user authenticating by simple hostnames might be:
    hostname:pigwidgeon.cse.nd.edu
Take note that Chirp considers all of the subjects as different identities, although some of them might correspond to the same person in varying circumstances.

Authorization

Once Chirp has authenticated your identity, you are logged into a server. However, when you attempt to read or manipulate files on a server, Chirp checks to see whether you are authorized to do so. This is determined by access control lists or ACLs.

Every directory in a Chirp server has an ACL, much like in some filesystems such as as AFS or NTFS. To see the ACL for a directory, use the Chirp tool and the getacl command:

 chirp:localhost:/> getacl 
unix:dthain rwlva
hostname:*.mydomain.edu rwl
This ACL indicates that the subject unix:dthain has all five access rights, while the subject pattern hostname:*.mydomain.edu has only three access rights. The access rights are as follows:

r - The subject may read items in the directory.
w - The subject may write and delete items in the directory.
l - The subject may list the directory contents.
v - The subject may reserve a directory.
a - The subject may administer the directory, including changing the ACL.

Access rights often come in combinations, so there are a few aliases for your convenience:

read - alias for rl
write - alias for rwl
admin - alias for rwlva
reserve - alias for lv
none - delete the entry
To change an access control list on a directory, use the setacl command:

 chirp:localhost:/> setacl / kerberos:dthain@nd.edu write
 chirp:localhost:/> getacl 
unix:dthain rwlva
hostname:*.mydomain.edu rwl
kerberos:dthain@nd.edu rwl
The meaning of ACLs is fairly obvious, but there are few subtleties you should know:

Rights are generally inherited. When a new directory is created, it automatically gets the ACL of its parent. Exception: read about the reserve right below.

Rights are generally not hierarchical. In order to access a directory, you only need the appropriate permissions on that directory. For example, if you have permission to write to /data/x/y/z, you do not need any other permissions on /data, /data/x and so forth. Of course, it may be difficult to discover a deep directory without rights on the parents, but you can still access it.

The delete right is absolute. If you have permission to delete a directory, then you are able to delete the entire subtree that it contains, regardless of any other ACLs underneath.

Reservation

The v - reserve right is a important concept that deserves its own discussion.

A shared-storage environment such as Chirp aims to allow many people to read and write common storage space. Of course, with many people reading and writing, we need some mechanism to make sure that everybody does not step on each other's toes.

The reserve right allows a user to create what is essentially a fresh workspace for their own use. When a user creates a new directory and has the v right (but not the w right), Chirp will create a new directory with a fresh ACL giving the creating user all rights.

A good way to use the reserve right is with a wildcard at the top directory. Here's an example. Suppose that Fred creates a new Chirp server on the host bigwig. Initially, no-one except Fred can access the server. The first time it starts, the Chirp server initializes its root directory with the following ACL:

unix:fred rwlva
Now, Fred wants other users in his organization to be able to use this storage, but doesn't want them messing up his existing data. So, Fred uses the Chirp tool to give the reserve right to anyone calling from any machine in his organization:
 chirp:bigwig:> setacl / hostname:*.somewhere.edu reserve
 chirp:bigwig:> getacl /
unix:fred rwlva
hostname:*.somewhere.edu lv
Now, any user calling from anywhere in somewhere.edu can access this server. But, all that any user can do is issue a mkdir in the root directory. For example, suppose that Betty logs into this server from ws1.somewhere.edu. She can not modify the root directory, but she can create her own directory:
 chirp:bigwig:> mkdir /mydata
And, in the new directory, ws1.somewhere.edu can do anything, including edit the access control. Here is the new ACL for /mydata:
 chirp:bigwig:> getacl /mydata
hostname:ws1.somewhere.edu rwlva
If Betty wants to authenticate with Globus credentials from here on, she can change the access control as follows:
 chirp:bigwig:> setacl /mydata globus:/O=Univ_of_Somewhere/CN=Betty admin
And, the new acl will look as follows: chirp:bigwig:> getacl /mydata hostname:ws1.somewhere.edu rwlva globus:/O=Univ_of_Somewhere/CN=Betty rwlva

Notes on Authentication

Each of the authentication types has a few things you should know:

Kerberos: The server will attempt to use the Kerberos identity of the host it is run on. (i.e. host/coral.cs.wisc.edu@CS.WISC.EDU) Thus, it must be run as the superuser in order to access its certificates. Once authentication is complete, there is no need for the server to keep its root access, so it will change to any unprivileged user that you like. Use the -i option to select the userid.

Globus: The server and client will attempt to perform client authentication using the Grid Security Infrastructure (GSI)> Both sides will load either user or host credentials, depending on what is available. If the server is running as an ordinary user, then you must give a it a proxy certificate with grid-proxy-init. Or, the server can be run as root and will use host certificates in the usual place.

Unix: This method makes use of a challenge-response in the local Unix filesystem to determine the client's Unix identity. It assumes that both machines share the same conception of the user database and have a common directory which they can read and write. By default, the server will pick a filename in /tmp, and challenge the client to create that file. If it can, than the server will examine the owner of the file to determine the client's username. Naturally, /tmp will only be available to clients on the same machine. However, if a shared filesystem directory is available, give that to the chirp server via the -c option. Then, any authorized client of the filesystem can authenticate to the server. For example, at Notre Dame, we use -c /afs/nd.edu/user37/ccl/software/rendezvous to authenticate via our AFS distributed file system.

Hostname: The server will rely on a reverse DNS lookup to establish the fully-qualified hostname of the calling client. The second field gives the hostname to be accepted. It may contain an asterisk as a wildcard. The third field is ignored. The fourth field is then used to select an appropriate local username.

Address: Like "hostname" authentication, except the server simply looks at the client's IP address.

By default, Chirp and/or Parrot will attempt every authentication type knows until one succeeds. If you wish to restrict or re-order the authentication types used, give one or more -a options to the client, naming the authentication types to be used, in order. For example, to attempt only hostname and kerberos authentication, in that order:

   % chirp -a hostname -a kerberos

Debugging Advice

Debugging a distributed system can be quite difficult because of the sheer number of hosts involved and the mass of information to be collected. If you are having difficulty with Chirp, we recommend that you make good use of the debugging traces built into the tools.

In all of the Chirp and Parrot tools, the -d option allows you to turn on selected debugging messages. The simplest option is -d all which will show every event that occurs in the system.

To best debug a problem, we recommend that you turn on the debugging options on both the client and server that you are operating. For example, if you are having trouble getting Parrot to connect to a Chirp server, then run both as follows:

% chirp_server -d all [more options] ...
% parrot -d all tcsh
Of course, this is likely to show way more information than you will be able to process. Instead, turn on a debugging flags selectively. For example, if you are having a problem with authentication, just show those messages with -d auth on both sides.

There are a large number of debugging flags. Currently, the choices are: syscall notice channel process resolve libcall tcp dns auth local http ftp nest chirp dcap rfio cache poll remote summary debug time pid all. When debugging problems with Chirp and Parrot, we recommend selectively using -d chirp, -d tcp, -d auth, and -d libcall as needed.