Parrot is Copyright (C) 2003-2004 Douglas Thain and Copyright (C) 2005- The University of Notre Dame. All rights reserved. This software is distributed under a BSD-style license. See the file COPYING for details.
The Glite module of Parrot is Copyright (c) Members of the EGEE Collaboration. 2004. See http://eu-egee.org/partners/ for details on the copyright holders. For license conditions see the license file or http://eu-egee.org/license.html
Please do not cite this web page as a scientific publication!
Instead, cite one of the following:
vi
like so:
% parrot vi /anonftp/ftp.cs.wisc.edu/RoadMap
Parrot is useful to users of distributed systems, because it frees them from rewriting code to work with new systems and relying on remote administrators to trust and install new software. Parrot is also useful to developers of distributed systems, because it allows rapid deployment of new code to real applications and real users that do not have the time, inclination, or permissions to build a kernel-level filesystem.
Parrot currently supports a variety of remote I/O systems, all detailed below. We welcome contributions of new remote I/O drivers from others. However, if you are working on a protocol driver please drop us a note so that we can make sure work is not duplicated.
Almost any application - whether static or dynmically linked,
standard or commercial, command-line or GUI - should work with
Parrot. There are a few exceptions. Because Parrot relies on
the Linux ptrace
interface
any program that relies on the ptrace interface cannot run under Parrot.
This means Parrot cannot run a debugger, nor can it run itself recursively.
In addition, Parrot cannot run setuid programs, as the operating
system considers this a security risk.
Parrot also provide a new experimental features called identity boxing. This feature allows you to securely run a visiting application within a protection domain without become root or creating a new account. Read below for more information on identity boxing.
Parrot currently runs only on the Linux operating system. It relies on some fairly low level details in order to implement system call trapping. Ports to other platforms that are similar to Linux may be possible in the future.
Like any software, Parrot is bound to have some bugs. Please check the known bugs page for the latest scoop.
parrot
command followed by any other Unix program. For example, to run a Parrot-enabled vi
, execute this command:
% parrot vi /anonftp/ftp.cs.wisc.edu/RoadMap
parrot
before every command you run, so try starting a shell with Parrot already loaded:
% parrot tcsh
% acroread /http/www.cs.wisc.edu/condor/doc/usenix_1.92.pdf % grep Yahoo /http/www.yahoo.com % set autolist % cat /anonftp/ftp.cs.wisc.edu/[Press TAB here]
Hint: You may find it useful to have some visual indication
of when Parrot is active, so we recommend that you modify
your shell startup scripts to change the prompt when Parrot is enabled.
If you use tcsh, you might add something like this to your .cshrc:
if ( $?PARROT_ENABLED ) then set prompt = " (Parrot) %n@%m%~%# " else set prompt = " %n@%m%~%# " endif |
We have limited the examples so far to HTTP and anonymous FTP, as they are the only services we know that absolutely everyone is familiar with. There are a number of other more powerful and secure remote services that you may be less familiar with. Parrot supports them in the same form: The filename begins with the service type, then the host name, then the file name. Here are all the currently supported services:
example path | remote service | more info |
/http/www.yahoo.com/index.html | Hypertext Transfer Protocol | included |
/httpfs/www.yahoo.com/index.html | HTTP with filesystem extentions | included |
/ftp/ftp.cs.wisc.edu/RoadMap | File Transfer Protocol | included |
/anonftp/ftp.cs.wisc.edu/RoadMap | Anonymous File Transfer Protocol | included |
/chirp/target.cs.wisc.edu/path | Chirp Storage System | included + more info
|
/gsiftp/ftp.globus.org/path | Globus Security + File Transfer Protocol | more info |
/nest/nest.cs.wisc.edu/path | Network Storage Technology | more info |
/rfio/host.cern.ch/path | Castor Remote File I/O | more info |
/dcap/dcap.cs.wisc.edu/pnfs/cs.wisc.edu/path | DCache Access Protocol | more info |
/glite/xxx/path | GLite Experimental Grid Protocol | more info |
ls
happy
by producing a bogus directory entry:
% parrot ls -la /http/www.yahoo.com/ -r--r--r-- 1 thain thain 0 Jul 16 11:50 /http/www.yahoo.comA less-drastic example is found in FTP. If you attempt to perform a directory listing of an FTP server, Parrot fills in the available information -- the file names and their sizes -- but again inserts bogus information to fill the rest out:
% parrot ls -la /anonftp/ftp.cs.wisc.edu total 0 -rwxrwxrwx 1 thain thain 2629 Jul 16 11:53 RoadMap -rwxrwxrwx 1 thain thain 1622222 Jul 16 11:53 ls-lR -rwxrwxrwx 1 thain thain 367507 Jul 16 11:53 ls-lR.Z -rwxrwxrwx 1 thain thain 212125 Jul 16 11:53 ls-lR.gzIf you would like to get a better idea of the underlying behavior of Parrot, try running it with the
-d remote
option,
which will display all of the remote I/O operations that it performs
on a program's behalf:
% parrot -d remote ls -la /anonftp/ftp.cs.wisc.edu ... ftp.cs.wisc.edu <-- TYPE I ftp.cs.wisc.edu --> 200 Type set to I. ftp.cs.wisc.edu <-- PASV ftp.cs.wisc.edu --> 227 Entering Passive Mode (128,105,2,28,194,103) ftp.cs.wisc.edu <-- NLST / ftp.cs.wisc.edu --> 150 Opening BINARY mode data connection for file list. ...If your program is upset by the unusual semantics of such storage systems, then consider using the Chirp protocol and server:
To start a Chirp server, simply do the following:
% chirp_server -d allThe -d all option turns on debugging, which helps you to understand how it works initially. You may remove this option once everything is working.
Suppose the Chirp server is running on bird.cs.wisc.edu. Using Parrot, you may access all of the Unix features of that host from elsewhere:
% parrot tcsh % cd /chirp/bird.cs.wisc.edu % ls -la % ...In general, Parrot gives better performance and usability with Chirp than with other protocols. You can read extensively about the Chirp server and protocol in the Chirp manual.
In addition, Parrot provides several custom command line tools (parrot_getacl, parrot_setacl, parrot_lsalloc, and parrot_mkalloc) that can be used to manage the access control and space allocation features of Chirp from the Unix command line.
The simplest name resolver is the mountlist, given by the -m mountfile option. This file corresponds closely to /etc/ftsab in Unix. A mountlist is simply a file with two columns. The first column gives a logical directory or file name, while the second gives the physical path that it must be connected to.
For example, if a database is stored at an FTP server under the path /anonftp/ftp.cs.wisc.edu/db, it may be spliced into the filesystem under /dbase with a mount list like this:
/dbase /anonftp/ftp.cs.wisc.edu/dbInstruct Parrot to use the mountlist as follows:
% parrot -m mountfile tcsh % cd /dbase % ls -laA single mount entry may be given on the command line with the -M option as follows:
% parrot -M /dbase=/anonftp/ftp.cs.wisc.edu/db tcshA more sophisticated way to perform name binding is with an external resolver. This is a program executed whenever Parrot needs to locate a file or directory. The program accepts a logical file name and then returns the physical location where it can be found.
Suppose that you have a database service that locates the nearest copy of a file for you. If you run the command locate_file, it will print out the nearest copy of a file. For example:
% locate_file /1523.data /chirp/server.nd.edu/mix/1523.dataTo connect the program locate_file to Parrot, simply give a mount string that specifies the program as a resolver:
% parrot -M /dbase=resolver:/path/to/locate_file tcshNow, if you attempt to access files under /dbase, Parrot will execute locate_file and access the data stored there:
% cat /dbase/1523.data (see contents of /chirp/server.nd.edu/mix/1523.data)
% setenv HTTP_PROXY "proxy.nd.edu:8080"Multiple proxy servers can be given, separated by a semicolon. This will cause Parrot to try each proxy in order until one succeeds. If DIRECT is given as the last name in the list, then Parrot will fall back on a direct connection to the target web server. For example:
% setenv HTTP_PROXY "proxy.nd.edu:8080;proxy.wisc.edu:1000;DIRECT"
To set up an HTTPFS filesystem, you must run make_httpfs on the web server machine with the name of the local storage directory as the argument. For example, suppose that the web server my.server.com stores pages for the URL http://my.server.com/~fred in the local directory /home/fred/www. In this case, you should run the following command:
% make_httpfs /home/fred/wwwNow, others may perceive the web server as a file server under the /httpfs hierarchy. For example:
% parrot tcsh % cd /httpfs/my.server.com/~fred % ls -laIn addition, HTTPFS optionally allows you to generate SHA-1 checksums of data files so that integrity can be verified after transmission. To enable this, run make_httpfs with the -K option. (Computing checksums will cause this to be much slower than before.) Then, run Parrot with the -K option. Checksums will be computed on the fly. If an error is discovered, Parrot will abort with a checksum error.
This is still in experimental state, so occasional hangups are to be expected. The io-client needs to be configured according to the README coming with its RPM. This is the place where the host of the remote service is being given, the glite filenames do not contain the hostname/port as the other protocols do for parrot.
The endpoint of the fireman catalog has to be given on the command line to parrot using the new -E option. The default mount point for the gLite I/O is /glite.
Written by the gLite data management cluster, http://cern.ch/egee-jra1-dm Contact: project-eu-egee-middleware-datamgt@cern.ch
License and copyright notice (it is compatible with the cctools license): Copyright (c) Members of the EGEE Collaboration. 2004. See http://eu-egee.org/partners/ for details on the copyright holders. For license conditions see the license file or http://eu-egee.org/license.html
For example, suppose that you wish to allow a friend to log into your private workstation. Instead of creating a new account, simply use a script supplied with Parrot to create an identity box:
% whoami dthain % parrot_identity_box MyFriend % whoami MyFriend % touch ~dthain/private-data touch: creating ~dthain/private-data': Permission denied
Note that the shell running within the identity box cannot change or modify any of the supervising user's data. In fact, the contained user can only access items that are world-readable or world-writable.
You can give the contained user access to other parts of the filesystem by creating access control lists. (ACLs) An ACL is a list of users and the resources that they are allowed to access. Each directory has it's own ACL in the file .__acl. This file does not appear in a directory listing, but you can read and write it just the same.
For example, MyFriend above can see his initial ACL as follows:
% cat .__acl MyFriend rwlxaThis means that MyFriend can read, write, list, execute, and administer items in the current directory. Now, suppose that MyFriend wants to allow Freddy read access to the same directory. Simply edit the ACL file to read:
MyFriend rwlxa Freddy rlIdentity boxing and ACLs are particularly useful when using distributed storage. You can read more about ACLs and identity boxing in the Chirp manual.
Option | Purpose | Environment Variable |
-a <list> | ||
-b <bytes> | Set the recommended remote I/O block size. | PARROT_LOCAL_BLOCK_SIZE |
-B <bytes> | Set the recommended local I/O block size. | PARROT_REMOTE_BLOCK_SIZE |
-C <MB> | Set the size of the I/O channel. | PARROT_CHANNEL_SIZE |
-d <system> | Enable debugging for this sub-system. | PARROT_DEBUG_FLAGS |
-h | Show this screen. | |
-m <file> | Use this file as a mountlist. | PARROT_MOUNT_FILE |
-M <local>=<remote> | Mount this remote file on this local directory. | |
-o <file> | Send debugging messages to this file. | PARROT_DEBUG_FILE |
-p <host:port> | Use this proxy for HTTP requests. | HTTP_PROXY |
-t <dir> | Where to store temporary files. | PARROT_TEMP_DIR |
-v | Display version number. |
This list is probably out of date, so you should run parrot -h to see the most up-to-date list.
The flexible debugging flags can be a great help in both debugging and understanding Parrot. To turn on multiple debugging flags, you may either issue multiple -d options:
% parrot -d ftp -d chirp tcshOr, you may give a space separated list in the corresponding environment variable:
% setenv PARROT_DEBUG_FLAGS "ftp chirp" % parrot tcshHere is the meaning of each of the debug flags.
syscall | This shows all of the system calls attempted by each program, even those that Parrot does not trap or modify. (To see arguments and return values, try -d libcall instead.) |
libcall | This shows only the I/O calls that are actually trapped and implemented by Parrot. The arguments and return codes are the logical values seen by the application, not the underlying operations. (To see the underlying operations try -d remote or -d local instead.) |
cache | This shows all of the shared segments that are loaded into the channel cache and shared by multiple programs. For most programs, this means all the shared libraries. |
process | This shows all process creations, deletions, signals, and process state changes. |
resolve | This shows every invocation of the name resolver. A plain file name indicates the name was not modified, while more detailed records show names that were changed or denied access. |
local | This shows all local I/O calls from the perspective of Parrot. Notice that the file descriptors and file names shown are internal to Parrot. (To see fds and names from the perspective of the job, try -d libcall.) |
remote | This shows all non-local file activity. |
http | This shows only HTTP operations. |
ftp | This shows only FTP operations. |
nest | This shows only NeST operations. |
chirp | This shows only Chirp operations. |
rfio | This shows only RFIO operations. |
poll | This shows all activity related to processes that block (explicitly or implicitly) waiting for I/O. |
time | This adds the current time to every debug message. |
pid | This adds the calling process id to every debug message. |
all | This shows all possible debugging messages. |