Chirp is Copyright (C) 2004 Douglas Thain and the University of Notre Dame. This software is distributed under a BSD-style license. See the file COPYING for details.
Chirp is like a distributed filesystem (such as NFS) except that it can be run over wide area networks and requires no special privileges on either the client or the server end. Chirp allows the end user to set up fine-grained access control so that data can be shared (or not shared) with the right people.
Chirp is also like a file transfer system (such as FTP) that provides streaming point-to-point data transfer over the Internet. However, Chirp also provides fine-grained Unix-like data access suitable for direct access by ordinary programs.
Begin by installing the cctools on your system. When you are ready, proceed below.
/tmp/mydata
hostname:*.cs.somewhere.edu:*:mynameOf course, you should substitute your local domain name for cs.somewhere.edu and your username for myname. If you want to access the server from several specific machines or domains, you may put several hostname lines into the authfile. This authfile is fine for now, but we'll refine it later.
% chirp_server -r /tmp/mydata -a /tmp/authfile
% parrot tcsh ... % cd /chirp/myhost.somewhere.edu % cp /tmp/bigfile . % ls -la total 804 drwx------ 2 condor dip 4096 Sep 10 12:40 . drwx------ 2 condor dip 4096 Sep 10 12:40 .. -rw-r--r-- 1 condor dip 104857600 Sep 10 12:57 bigfile -rw-r--r-- 1 condor dip 147 Sep 10 12:39 hosts % cp /http/www.cse.nd.edu temp.html % vi temp.html % |
(If you are having difficulting accessing your server, have a look at "debugging hints" below.)
Parrot is certainly the most convenient way to access storage, but it has some limitations: it only works on Linux 2.4, and imposes a performance penalty. (This is because Parrot makes an extra data copy in the process of handling a program's system calls.)
For more portable, explicit control of a Chirp server, use the Chirp command line tool. This allows you to connect to a server, copy files, and manage directories, much like an FTP client:
% chirp ... chirp::> open myhost.somewhere.edu chirp:myhost.somewhere.edu:/> put /tmp/bigfile file /tmp/bigfile -> /bigfile (11.01 MB/s) chirp:myhost.somewhere.edu:/> ls -la dir 4096 . Fri Sep 10 12:40:27 2004 dir 4096 .. Fri Sep 10 12:40:27 2004 file 147 hosts Fri Sep 10 12:39:54 2004 file 104857600 bigfile Fri Sep 10 12:53:21 2004 chirp:myhost.somewhere.edu:/> |
In scripts, you may find it easier to use the standalone commands chirp_get and chirp_put, which move single files to and from a Chirp server. These commands also allow for streaming data, which can be helpful in a shell pipeline. Also, the -f option to both commands allows you to follow a file, much like the Unix tail command:
% tar cvzf archive.tar.gz ~/mydata % chirp_put archive.tar.gz myhost.somewhere.edu archive.tar.gz % ... % chirp_get myhost.somewhere.edu archive.tar.gz - | tar xvzf % ... % chirp_get -f myhost.somewhere.edu logfile - |& less % |
The fourth way to access the storage pool is write your own programs that access the Chirp C interface. You must compile and link against the following file in the ordinary way:
INSTALL_DIR/include/chirp_client.h INSTALL_DIR/include/chirp_reli.h INSTALL_DIR/lib/libchirp.aThe chirp_client.h interface allows you to explicitly connect to a server and open, close, read, and write files, much as in a traditional Unix interface. This interface is unreliable in the sense that a broken connection will cause all further operations to fail. To recover, you must explicitly re-connect to the server.
The chirp_reli.h interface is a reliable version of the chirp_client interface. The programmer need not explicitly connect or disconnect to servers, but simply names the host and file to access. The library transparently handles connection as well as recovery from temporary failures.
The storage catalog is highly dynamic. By default, each Chirp server makes itself known to the storage catalog every five minutes. The catalog server records and reports all Chirp servers that it knows about, but will discard servers that have not reported for fifteen minutes.
If you do not want your servers to report to a catalog, then run them with this option:
% chirp_server -u -
Alternatively, you may establish your own catalog server. This can be useful for keeping your systems logically distinct from the main storage pool, but can also help performance and availability if your catalog is close to your Chirp servers. The catalog server is installed in the same place as the Chirp server. Simply run it on any machine that you like and then direct your Chirp servers to update the new catalog with the -u option. The catalog will be published via HTTP on poty 9097 of the catalog machine.
For example, suppose that you wish to run a catalog server on a machine named dopey and a Chirp server on a machine named sneezy:
dopey% catalog_server ... sneezy% chirp_server -u dopey [more options]Finally, point your web browser to:
http://dopey:9097An you will see something like this.
Here is a summary of the authentication schemes:
Type | Summary | Personal? | Multi-User? |
(non-root) | (root) | ||
kerberos | Centralized private key system | no | yes (host cert) |
globus | Distributed public key system | yes (user cert) | yes (user cert) |
filesystem | Authenticate via a local or distributed filesystem. | yes | yes |
hostname | Reverse DNS lookup | yes | yes |
address | Identify by IP address | yes | yes |
The Chirp tools will attempt all of the authentication types
it knows until it successfully connects to a Chirp server.
You must explicitly specify the security policy for the Chirp
server in an authfile, passed on the command line.
An example authfile is distributed with Chirp in
INSTALL_DIR/etc/chirp.authfile.example
.
Here's how it works. Each line in the file has four fields separated by colons: the authentication type, the permitted hostnames, the permitted remote users, and the corresponding local users. Asterisks may be used in the first three fields as wildcards. The fourth field must be either a valid local username or an asterisk, indicating that the local username is chosen by the authentication type. Each line in the file is compared against the calling user in order. If one matches, the user is accepted and assigned the username in the fourth field.
Here are some examples. Suppose that I wish to run a personal
server as an ordinary user thain
, and I am willing to trust
any user calling from two different hosts called red
and blue
, as well as any hosts that can authenticate
with my Globus identity:
hostname:red.cs.wisc.edu:*:thain hostname:blue.cs.wisc.edu:*:thain globus:*:/C=US/O=National Computational Science Alliance/CN=Douglas Thain:thain(The hostname method ignores the third field) Or, suppose that I am running a server as root, and I am willing to trust any user that can authenticate via Kerberos or via the local filesystem if on the same host. In addition, I consider any user on the host
operator.cs.wisc.edu
to be equivalent to the user named sysop
:
kerberos:*:*:* filesystem:bird.cs.wisc.edu:*:* hostname:operator.cs.wisc.edu:*:sysopA Chirp server creates a new process for every incoming client. If the server is run as the superuser, the process will setuid to the id of the authenticated user. If the server is run as an ordinary user, it will check to make sure that the authenticated user matches the owner user, otherwise the connection is declined.
Each of the authentication types has a few things you should know:
Kerberos: The server will attempt to use the Kerberos identity of the host it is run on. (i.e. host/coral.cs.wisc.edu@CS.WISC.EDU) Thus, it must be run as the superuser in order to access its certificates.
Globus: The server and client will attempt to perform peer-to-peer authentication using the Grid Security Infrastructure. Both sides must have access to a proxy certificate by running grid-proxy-init.
Filesystem: This method makes use of an existing filesystem (local or distributed) to establish the client's identity. It assumes that both machines share the same conception of the user database and have a common directory which they can read and write. By default, the server will pick a filename in /tmp, and challenge the client to create that file. If it can, than the server will examine the owner of the file to determine the client's username. Naturally, /tmp will only be available to clients on the same machine. However, if a shared filesystem directory is available, give that to the chirp server via the -c option. Then, any authorized client of the filesystem can authenticate to the server. For example, at Notre Dame, we use -c /afs/nd.edu/user37/ccl/software/rendezvous to authenticate via our AFS distributed file system.
Hostname: The server will rely on a reverse DNS lookup to establish the fully-qualified hostname of the calling client. The second field gives the hostname to be accepted. It may contain an asterisk as a wildcard. The third field is ignored. The fourth field is then used to select an appropriate local username.
Address: Like "hostname" authentication, except the server simply looks at the client's IP address.
By default, Chirp and/or Parrot will attempt every authentication type knows until one succeeds. If you wish to restrict or re-order the authentication types used, give one or more -a options to the client, naming the authentication types to be used, in order. For example, to attempt only hostname and kerberos authentication, in that order:
% chirp -a hostname -a kerberos
In all of the Chirp and Parrot tools, the -d option allows you to turn on selected debugging messages. The simplest option is -d all which will show every event that occurs in the system.
To best debug a problem, we recommend that you turn on the debugging options on both the client and server that you are operating. For example, if you are having trouble getting Parrot to connect to a Chirp server, then run both as follows:
% chirp_server -d all [more options] ... % parrot -d all tcshOf course, this is likely to show way more information than you will be able to process. Instead, turn on a debugging flags selectively. For example, if you are having a problem with authentication, just show those messages with -d auth on both sides.
There are a large number of debugging flags. Currently, the choices are: syscall notice channel process resolve libcall tcp dns auth local http ftp nest chirp dcap rfio cache poll remote summary debug time pid all. When debugging problems with Chirp and Parrot, we recommend selectively using -d chirp, -d tcp, -d auth, and -d libcall as needed.