parrot_run_hdfs runs an application or a shell inside the Parrot virtual filesystem.
HDFS is the primary distributed filesystem used in the Hadoop project. Parrot supports read and write access to HDFS systems using the parrot_run_hdfs wrapper. The command checks that the appropriate environmental variables are defined and calls parrot_run. See parrot_run(1).
In particular, you must ensure that you define the following environmental variables:
Based on these environmental variables, parrot_run_hdfs will attempt to find the appropriate paths for libjvm.so and libhdfs.so. These paths are stored in the environmental variables LIBJVM_PATH and LIBHDFS_PATH, which are used by the HDFS Parrot module to load the necessary shared libraries at run-time. To avoid the startup overhead of searching for these libraries, you may set the paths manually in your environment before calling parrot_run_hdfs, or you may edit the script directly.
Note that while Parrot supports read access to HDFS, it only provides write-once support on HDFS. This is because the current implementations of HDFS do not provide reliable append operations. Likewise, files can only be opened in either read (O_RDONLY) or write mode (O_WRONLY), and not both (O_RDWR).
For complete details with examples, see the Parrot User's Manual
See parrot_run(1) for option listing.
% parrot_run_hdfs cat /hdfs/server:port/fooYou can also run an entire shell inside of Parrot, like this:
% parrot_run_hdfs bash % cd /hdfs/server:port/ % ls -la % cat foo