Chirp is Copyright (C) 2004 Douglas Thain and the University of Notre Dame. This software is distributed under a BSD-style license. See the file COPYING for details.
Chirp is like a distributed filesystem (such as NFS) except that it can be run over wide area networks and requires no special privileges on either the client or the server end. Chirp allows the end user to set up fine-grained access control so that data can be shared (or not shared) with the right people.
Chirp is also like a file transfer system (such as FTP) that provides streaming point-to-point data transfer over the Internet. However, Chirp also provides fine-grained Unix-like data access suitable for direct access by ordinary programs.
Begin by installing the cctools on your system. When you are ready, proceed below.
/tmp/mydata
% chirp_server -r /tmp/mydata &
% chirp localhost chirp:localhost:/> setacl . hostname:*.mydomain.edu write
% parrot tcsh ... % cd /chirp/myhost.somewhere.edu % cp /tmp/bigfile . % ls -la total 804 drwx------ 2 condor dip 4096 Sep 10 12:40 . drwx------ 2 condor dip 4096 Sep 10 12:40 .. -rw-r--r-- 1 condor dip 104857600 Sep 10 12:57 bigfile -rw-r--r-- 1 condor dip 147 Sep 10 12:39 hosts % cp /http/www.cse.nd.edu temp.html % vi temp.html % |
(If you are having difficulting accessing your server, have a look at "debugging hints" below.)
Parrot is certainly the most convenient way to access storage, but it has some limitations: it only works on Linux 2.4, and imposes a performance penalty. (This is because Parrot makes an extra data copy in the process of handling a program's system calls.)
For more portable, explicit control of a Chirp server, use the Chirp command line tool. This allows you to connect to a server, copy files, and manage directories, much like an FTP client:
% chirp ... chirp::> open myhost.somewhere.edu chirp:myhost.somewhere.edu:/> put /tmp/bigfile file /tmp/bigfile -> /bigfile (11.01 MB/s) chirp:myhost.somewhere.edu:/> ls -la dir 4096 . Fri Sep 10 12:40:27 2004 dir 4096 .. Fri Sep 10 12:40:27 2004 file 147 hosts Fri Sep 10 12:39:54 2004 file 104857600 bigfile Fri Sep 10 12:53:21 2004 chirp:myhost.somewhere.edu:/> |
In scripts, you may find it easier to use the standalone commands chirp_get and chirp_put, which move single files to and from a Chirp server. These commands also allow for streaming data, which can be helpful in a shell pipeline. Also, the -f option to both commands allows you to follow a file, much like the Unix tail command:
% tar cvzf archive.tar.gz ~/mydata % chirp_put archive.tar.gz myhost.somewhere.edu archive.tar.gz % ... % chirp_get myhost.somewhere.edu archive.tar.gz - | tar xvzf % ... % chirp_get -f myhost.somewhere.edu logfile - |& less % |
The fourth way to access the storage pool is write your own programs that access the Chirp C interface. You must compile and link against the following file in the ordinary way:
INSTALL_DIR/include/chirp_client.h INSTALL_DIR/include/chirp_reli.h INSTALL_DIR/lib/libchirp.aThe chirp_client.h interface allows you to explicitly connect to a server and open, close, read, and write files, much as in a traditional Unix interface. This interface is unreliable in the sense that a broken connection will cause all further operations to fail. To recover, you must explicitly re-connect to the server.
The chirp_reli.h interface is a reliable version of the chirp_client interface. The programmer need not explicitly connect or disconnect to servers, but simply names the host and file to access. The library transparently handles connection as well as recovery from temporary failures.
The storage catalog is highly dynamic. By default, each Chirp server makes itself known to the storage catalog every five minutes. The catalog server records and reports all Chirp servers that it knows about, but will discard servers that have not reported for fifteen minutes.
If you do not want your servers to report to a catalog, then run them with this option:
% chirp_server -u -
Alternatively, you may establish your own catalog server. This can be useful for keeping your systems logically distinct from the main storage pool, but can also help performance and availability if your catalog is close to your Chirp servers. The catalog server is installed in the same place as the Chirp server. Simply run it on any machine that you like and then direct your Chirp servers to update the new catalog with the -u option. The catalog will be published via HTTP on port 9097 of the catalog machine.
For example, suppose that you wish to run a catalog server on a machine named dopey and a Chirp server on a machine named sneezy:
dopey% catalog_server ... sneezy% chirp_server -u dopey [more options]Finally, point your web browser to:
http://dopey:9097An you will see something like this.
Security really has two aspects: authentication and authorization. Authentication deals with the question "Who are you?" Once your identity has been established, then authorization deals with the question "What are you allowed to do?" Let's deal with each in turn.
Type | Summary | Regular User? | Root? |
(non-root) | (root) | ||
kerberos | Centralized private key system | no | yes (host cert) |
globus | Distributed public key system | yes (user cert) | yes (host cert) |
unix | Authenticate with local unix user ids. | yes | yes |
hostname | Reverse DNS lookup | yes | yes |
address | Identify by IP address | yes | yes |
The Chirp tools will attempt all of the authentication types that are known and available in the order above until one works. For example, if you have Kerberos installed in your system, Chirp will try that first. If not, Chirp attempts the others.
Once an authentication scheme has succeeded, Chirp assigns the incoming user a subject that describes both the authentication method and the user name within that method. For example, a user that authenticates via Kerberos might have the subject:
kerberos:dthain@nd.eduA user authenticating with Globus credentials might be:
globus:/O=Cooperative_Computing_Lab/CN=Douglas_L_ThainWhile another user authenticating by local unix ids might be:
unix:dthainWhile a user authenticating by simple hostnames might be:
hostname:pigwidgeon.cse.nd.eduTake note that Chirp considers all of the subjects as different identities, although some of them might correspond to the same person in varying circumstances.
Every directory in a Chirp server has an ACL, much like in some filesystems such as as AFS or NTFS. To see the ACL for a directory, use the Chirp tool and the getacl command:
chirp:localhost:/> getacl unix:dthain rwlva hostname:*.mydomain.edu rwlThis ACL indicates that the subject unix:dthain has all five access rights, while the subject pattern hostname:*.mydomain.edu has only three access rights. The access rights are as follows:
r | - The subject may read items in the directory. |
w | - The subject may write and delete items in the directory. |
l | - The subject may list the directory contents. |
v | - The subject may reserve a directory. |
a | - The subject may administer the directory, including changing the ACL. |
Access rights often come in combinations, so there are a few aliases for your convenience:
read | - alias for rl |
write | - alias for rwl |
admin | - alias for rwlva |
reserve | - alias for lv |
none | - delete the entry |
chirp:localhost:/> setacl / kerberos:dthain@nd.edu write chirp:localhost:/> getacl unix:dthain rwlva hostname:*.mydomain.edu rwl kerberos:dthain@nd.edu rwlThe meaning of ACLs is fairly obvious, but there are few subtleties you should know:
Rights are generally inherited. When a new directory is created, it automatically gets the ACL of its parent. Exception: read about the reserve right below.
Rights are generally not hierarchical. In order to access a directory, you only need the appropriate permissions on that directory. For example, if you have permission to write to /data/x/y/z, you do not need any other permissions on /data, /data/x and so forth. Of course, it may be difficult to discover a deep directory without rights on the parents, but you can still access it.
The delete right is absolute. If you have permission to delete a directory, then you are able to delete the entire subtree that it contains, regardless of any other ACLs underneath.
A shared-storage environment such as Chirp aims to allow many people to read and write common storage space. Of course, with many people reading and writing, we need some mechanism to make sure that everybody does not step on each other's toes.
The reserve right allows a user to create what is essentially a fresh workspace for their own use. When a user creates a new directory and has the v right (but not the w right), Chirp will create a new directory with a fresh ACL giving the creating user all rights.
A good way to use the reserve right is with a wildcard at the top directory. Here's an example. Suppose that Fred creates a new Chirp server on the host bigwig. Initially, no-one except Fred can access the server. The first time it starts, the Chirp server initializes its root directory with the following ACL:
unix:fred rwlvaNow, Fred wants other users in his organization to be able to use this storage, but doesn't want them messing up his existing data. So, Fred uses the Chirp tool to give the reserve right to anyone calling from any machine in his organization:
chirp:bigwig:> setacl / hostname:*.somewhere.edu reserve chirp:bigwig:> getacl / unix:fred rwlva hostname:*.somewhere.edu lvNow, any user calling from anywhere in somewhere.edu can access this server. But, all that any user can do is issue a mkdir in the root directory. For example, suppose that Betty logs into this server from ws1.somewhere.edu. She can not modify the root directory, but she can create her own directory:
chirp:bigwig:> mkdir /mydataAnd, in the new directory, ws1.somewhere.edu can do anything, including edit the access control. Here is the new ACL for /mydata:
chirp:bigwig:> getacl /mydata hostname:ws1.somewhere.edu rwlvaIf Betty wants to authenticate with Globus credentials from here on, she can change the access control as follows:
chirp:bigwig:> setacl /mydata globus:/O=Univ_of_Somewhere/CN=Betty adminAnd, the new acl will look as follows: chirp:bigwig:> getacl /mydata hostname:ws1.somewhere.edu rwlva globus:/O=Univ_of_Somewhere/CN=Betty rwlva
Kerberos: The server will attempt to use the Kerberos identity of the host it is run on. (i.e. host/coral.cs.wisc.edu@CS.WISC.EDU) Thus, it must be run as the superuser in order to access its certificates. Once authentication is complete, there is no need for the server to keep its root access, so it will change to any unprivileged user that you like. Use the -i option to select the userid.
Globus: The server and client will attempt to perform client authentication using the Grid Security Infrastructure (GSI)> Both sides will load either user or host credentials, depending on what is available. If the server is running as an ordinary user, then you must give a it a proxy certificate with grid-proxy-init. Or, the server can be run as root and will use host certificates in the usual place.
Unix: This method makes use of a challenge-response in the local Unix filesystem to determine the client's Unix identity. It assumes that both machines share the same conception of the user database and have a common directory which they can read and write. By default, the server will pick a filename in /tmp, and challenge the client to create that file. If it can, than the server will examine the owner of the file to determine the client's username. Naturally, /tmp will only be available to clients on the same machine. However, if a shared filesystem directory is available, give that to the chirp server via the -c option. Then, any authorized client of the filesystem can authenticate to the server. For example, at Notre Dame, we use -c /afs/nd.edu/user37/ccl/software/rendezvous to authenticate via our AFS distributed file system.
Hostname: The server will rely on a reverse DNS lookup to establish the fully-qualified hostname of the calling client. The second field gives the hostname to be accepted. It may contain an asterisk as a wildcard. The third field is ignored. The fourth field is then used to select an appropriate local username.
Address: Like "hostname" authentication, except the server simply looks at the client's IP address.
By default, Chirp and/or Parrot will attempt every authentication type knows until one succeeds. If you wish to restrict or re-order the authentication types used, give one or more -a options to the client, naming the authentication types to be used, in order. For example, to attempt only hostname and kerberos authentication, in that order:
% chirp -a hostname -a kerberos
In all of the Chirp and Parrot tools, the -d option allows you to turn on selected debugging messages. The simplest option is -d all which will show every event that occurs in the system.
To best debug a problem, we recommend that you turn on the debugging options on both the client and server that you are operating. For example, if you are having trouble getting Parrot to connect to a Chirp server, then run both as follows:
% chirp_server -d all [more options] ... % parrot -d all tcshOf course, this is likely to show way more information than you will be able to process. Instead, turn on a debugging flags selectively. For example, if you are having a problem with authentication, just show those messages with -d auth on both sides.
There are a large number of debugging flags. Currently, the choices are: syscall notice channel process resolve libcall tcp dns auth local http ftp nest chirp dcap rfio cache poll remote summary debug time pid all. When debugging problems with Chirp and Parrot, we recommend selectively using -d chirp, -d tcp, -d auth, and -d libcall as needed.