Reading 07: Awk

Readings

This week the readings will focus on awk, an additional filter we can use in shell scripts for slicing and dicing data.

The readings for Monday, February 29 are:

How To Use the AWK language to Manipulate Text in Linux

A gentle introduction into awk.
Awk Help

A quick overview of the basic features of awk.

Recommended Reading

Awk, Unix, and functional programming

We will discuss functional programming and how it relates to the [Unix Philosophy] later this semester when we learn [Python]. It's important to realize that different programming paradigms are not tied any specific languages, but are ways of thinking that can be use to solve a multitude of problems.

Optional Resources

Common threads: Awk by example, Part 1

Also the followup: Common threads: Awk by example, Part 2
A Crash Course In AWK

An overview of what awk is and how it can be useful in the Real World.
Awk in 20 Minutes

Another overview of awk and an example of how it is used to solve a Real World problem.
The GNU Awk User's Guide

This the reference manual for the GNU implementation of awk.
Awk

Another reference for awk.

Optional Resources:

Questions

In your reading07 folder, write the following shell scripts:

head.sh: Use awk to implement your own version of the head Unix filter:

# Print usage
$ ./head.sh -h
usage: head.sh

      -n N    Display the first N lines

# Print first 10 lines
$ ./head.sh < /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/usr/bin/nologin
daemon:x:2:2:daemon:/:/usr/bin/nologin
mail:x:8:12:mail:/var/spool/mail:/usr/bin/nologin
ftp:x:14:11:ftp:/srv/ftp:/usr/bin/nologin
http:x:33:33:http:/srv/http:/usr/bin/nologin
uuidd:x:68:68:uuidd:/:/usr/bin/nologin
dbus:x:81:81:dbus:/:/usr/bin/nologin
nobody:x:99:99:nobody:/:/usr/bin/nologin
systemd-journal-gateway:x:191:191:systemd-journal-gateway:/:/usr/bin/nologin

# Print first 2 lines
$ ./head.sh -n 2 < /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/usr/bin/nologin

# Download test script
$ curl -O http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/sh/test_head.sh

# Make script executable
$ chmod +x test_head.sh

# Run test script
$ ./test_head.sh
head.sh test succesful!

Hints

Use awk -v name=value to pass variables from the shell script to the awk program.

catalog_summary.sh: Write a script that uses awk to parse the contents of CCL Catalog Server:
```
$ curl -s http://catalog.cse.nd.edu:9097/query.text
```
The script should return the total number of cpus, the total number of unique machine names, and the most common service type as shown below:
```
# Fetch and summarize data from default URL
$ ./catalog_summary.sh
        Total CPUs: 12272
    Total Machines: 853
Most Prolific Type: chirp

# Fetch and summarize data from testing URL
$ ./catalog_summary.sh http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/txt/test_catalog_summary.txt
        Total CPUs: 6330
    Total Machines: 457
Most Prolific Type: bobbit

# Download test script
$ curl -O http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/sh/test_catalog_summary.sh

# Make script executable
$ chmod +x test_catalog_summary.sh

# Run test script
$ ./test_catalog_summary.sh
catalog_summary.sh test succesful!
```
The script should take one optional argument, URL, which specifies where to fetch the catalog data.

Live Data

The default URL, http://catalog.cse.nd.edu:9097/query.text, returns live data. This means that subsequent runs of the catalog_summary.sh script may yield different output.

To simplify testing, the testing URL, http://www3.nd.edu/~pbui/teaching/cse.20189.sp16/static/txt/test_catalog_summary.txt, is a fixed snapshot that always returns the same information.

Hints
1. Use parameter expansion to handle the optional argument.
2. Use pattern matching to handle the three different cases:
```
/pattern/   {   ACTION  }
```
3. For cpus, you simply need to add to a counter.
4. For machines and types you need to use an associative array to track previous entries and you will need a counter to track unique entries. Note, that the expression key in array evaluates to 0 if key is not in the array.
5. Use an END block to print out the totals. You may need to do some processing with a loop to compute the most prolific type.
6. Look in the Lecture 10 slides for examples of awk.

Commands

In the reading07 folder, write one summary page called awk.md that contains common uses of the command. Be sure to cover the following:

Printing specific fields.
Modifying FS to control input field separator.
Using BEGIN and END.
Using pattern matching.
Using special variables such as NF and NR.
Using associative arrays.

Summaries

Note, your summaries should be in your own words and not simply copy and pasted from the manual pages. They should be short and concise and only include common use cases.

Feedback

If you have any questions, comments, or concerns regarding the course, please provide your feedback at the end of your response.

Submission

To submit your assignment, please commit your work to the reading07 folder in your Assignments Bitbucket repository by the beginning of class on Monday, February 29.

Reading 07: Awk

Readings

Questions

Hints

Live Data

Hints

Commands

Summaries

Feedback

Submission