This week the readings will focus on awk, an additional filter we can use in shell scripts for slicing and dicing data.
The readings for Monday, February 29 are:
How To Use the AWK language to Manipulate Text in Linux
A gentle introduction into awk.
A quick overview of the basic features of awk.
Recommended Reading
Awk, Unix, and functional programming
We will discuss functional programming and how it relates to the [Unix Philosophy] later this semester when we learn [Python]. It's important to realize that different programming paradigms are not tied any specific languages, but are ways of thinking that can be use to solve a multitude of problems.
Optional Resources
Common threads: Awk by example, Part 1
Also the followup: Common threads: Awk by example, Part 2
An overview of what awk is and how it can be useful in the Real World.
Another overview of awk and an example of how it is used to solve a Real World problem.
This the reference manual for the GNU implementation of awk.
Another reference for awk.
Optional Resources:
In your reading07
folder, write the following shell scripts:
head.sh
: Use awk to implement your own version of the head Unix filter:
# Print usage
$ ./head.sh -h
usage: head.sh
-n N Display the first N lines
# Print first 10 lines
$ ./head.sh < /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/usr/bin/nologin
daemon:x:2:2:daemon:/:/usr/bin/nologin
mail:x:8:12:mail:/var/spool/mail:/usr/bin/nologin
ftp:x:14:11:ftp:/srv/ftp:/usr/bin/nologin
http:x:33:33:http:/srv/http:/usr/bin/nologin
uuidd:x:68:68:uuidd:/:/usr/bin/nologin
dbus:x:81:81:dbus:/:/usr/bin/nologin
nobody:x:99:99:nobody:/:/usr/bin/nologin
systemd-journal-gateway:x:191:191:systemd-journal-gateway:/:/usr/bin/nologin
# Print first 2 lines
$ ./head.sh -n 2 < /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/usr/bin/nologin
# Download test script
$ curl -O http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/sh/test_head.sh
# Make script executable
$ chmod +x test_head.sh
# Run test script
$ ./test_head.sh
head.sh test succesful!
awk -v name=value
to pass variables from the shell script to
the awk program.catalog_summary.sh
: Write a script that uses awk to parse the contents
of CCL Catalog Server:
$ curl -s http://catalog.cse.nd.edu:9097/query.text
The script should return the total number of cpus
, the total number of
unique machine name
s, and the most common service type
as shown below:
# Fetch and summarize data from default URL
$ ./catalog_summary.sh
Total CPUs: 12272
Total Machines: 853
Most Prolific Type: chirp
# Fetch and summarize data from testing URL
$ ./catalog_summary.sh http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/txt/test_catalog_summary.txt
Total CPUs: 6330
Total Machines: 457
Most Prolific Type: bobbit
# Download test script
$ curl -O http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/sh/test_catalog_summary.sh
# Make script executable
$ chmod +x test_catalog_summary.sh
# Run test script
$ ./test_catalog_summary.sh
catalog_summary.sh test succesful!
The script should take one optional argument, URL
, which specifies
where to fetch the catalog data.
The default URL,
http://catalog.cse.nd.edu:9097/query.text,
returns live data. This means that subsequent runs of the
catalog_summary.sh
script may yield different output.
To simplify testing, the testing URL, http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/txt/test_catalog_summary.txt, is a fixed snapshot that always returns the same information.
Use parameter expansion to handle the optional argument.
Use pattern matching to handle the three different cases:
/pattern/ { ACTION }
For cpus
, you simply need to add to a counter.
For machines
and types
you need to use an associative array to
track previous entries and you will need a counter to track unique
entries. Note, that the expression key in array
evaluates to 0
if
key
is not in the array
.
Use an END
block to print out the totals. You may need to do some
processing with a loop to compute the most prolific type
.
Look in the Lecture 10 slides for examples of awk.
In the reading07
folder, write one summary page called awk.md
that
contains common uses of the command. Be sure to cover the following:
Printing specific fields.
Modifying FS
to control input field separator.
Using BEGIN
and END
.
Using pattern matching.
Using special variables such as NF
and NR
.
Using associative arrays.
Note, your summaries should be in your own words and not simply copy and pasted from the manual pages. They should be short and concise and only include common use cases.
If you have any questions, comments, or concerns regarding the course, please provide your feedback at the end of your response.
To submit your assignment, please commit your work to the reading07
folder
in your Assignments Bitbucket repository by the beginning of class
on Monday, February 29.