Assignment A3: Web Server and Multiplexer

In this assignment, you will write a simple web server, demonstrate its use with a standard web browser, and then build a lightweight HTTP-level multiplexer, such as one that might be used in a large industrial web site.

Introduction to HTTP

HTTP is a large, complex prototcol. Fortunately, you only have to know a small amount to build a web server. If you really want to know all of the details, then consult these pages:

1992 HTTP Specification

HTTP/1.0 Full Specification

HTTP Wikipedia Page

Let's briefly review how the web works. The web consists of servers and clients. A web server is a process that runs on a machine and makes data available to any client that may call it up and ask for it. A client that you are certainly familiar with is a web browser, but there are many other kinds of clients that interact with servers in quieter ways. For example, the wget command line tool can be used to request files from a web server without any of the graphical extras.

A server listens on a fixed port. (Port 80 is the standard, but your web server will have to choose a different port.) A client connects to that TCP port, and the server must accept the connection. The client sends an HTTP request stating what file it wishes to retrieve, along with the version of the protocol that it understands.

    GET /index.html HTTP/1.0

The server examines this request, and then sends a response header:

    HTTP/1.0 200 OK
    Date: Tue, 11 Jan 2005 21:31:45 GMT
    Server: Apache/1.3.27
    Connection: close
    Content-Type text/html

...followed by an extran newline and the actual data of the file in question. After sending the file data it closes the connection. If you are curious, you can speak to web servers directly without an intervening browser by using the telnet tool. Try this to see the raw output of a web server:

    % telnet www.cse.nd.edu 80
    GET /index.html HTTP/1.0
    (type return one more time)

Most HTTP requests to the CSE web server are for static (unchanging) content stored in plain files. However, a URL can just as easily refer to dynamic (changing) content. The "file" portion of a URL can refer to a program that must be run to generate a web page on the fly. This would be common in a web server found at an online auction site. The web server might run a program that queries the auction database to determine the state of a sale and produce the appropriate web page. Most real-world web servers have a mix of static and dynamic content.

Note that there are several other ways in which a server can respond. If the client requests a file that does not exist, it will respond:

    HTTP/1.0 404 Not Found

Or, if the client does not have access:

    HTTP/1.0 403 Forbidden

Or, if the server wants to redirect the client elsewhere, it says:

    HTTP/1.0 307 Temporary Redirection
    Location: http://other.server.edu/path

The Web Server

Your assignment is to create a web server and a web multiplexer. Your webserver will be invoked as ./webserver MODE PORT, where MODE may be single or fork and PORT is the port number to listen on. If you run your web server on cclscratch02 on port 9643, then you can connect to it with any web browser at the URL http://cclscratch02.cse.nd.edu:9643/somefilename. Note: on the cclscratch machines, you must use a port between 9000 and 10000..

The main program must call tcp_listen to listen on a particular port number, and then in a loop, accept a connection with tcp_accept, handle the request, and then drop the connection with tcp_close. To handle each request, you must read the request line with tcp_readline, open the proper file, transmit it to the client, then close the file.

In single mode, the server will simply handle one request at a time, then close the connection, and wait for another connection to be accepted. In fork mode, the server should call fork to create a new child process to handle the request, then immediately close the connection and attempt to accept a new one.

Note that different files needed to be handled in different ways. Your server should accept requests for files ending in html or gif, and transmit them to the browser with a Content-Type of text/html or image/gif, respectively. If the user requests a file ending in cgi, then your server must instead execute the program and return its output with a Content-Type of text/plain. (This is easier than it sounds: look up the popen command.) If the web browser should request a file with any other extension, the web server should respond with a well-formed 403 Forbidden code. This is important, because it will prevent other people from using your web server to view your source code!

In a distributed system, it is absolutely vital that you detect and respond to errors. Your server must check the result of every operation that it attempts, and return an appropriate error message. For example, if the server cannot listen on the desired port, it should print a message to the standard output and exit. Or, if the server cannot provide the browser with the desired file, then it must return an appropriate error code to the web browser. There are other error conditions, and it is your job to identify and handle them all correctly!

The Multiplexer

Once you have written the web server, the multiplexer is easy. The multiplexer is just like a single threaded server, except that it always responds with a 307 response that tells the client to fetch data from another server. Write your multiplexer so that it redirects the client to one of cclscratch01-03, chosen randomly. Test it by running the multiplexer on cclscratch00 and directing your web browser at it.

Technical Requirements

Get started right away! This project will require some work to get right!

I will provide you with a module for managing tcp connections. Brief documentation is given in the header file tcp.h, and we will discuss more of the details in class. Download the following files to get started:

tcp.h

tcp.c

webserver.c

webmux.c

Makefile

You will need your favorite book on C programming to complete this assignment. My favorite book is the classic. Make sure that you look up the man page entries for these functions:

printf, sprintf, sscanf, fopen, fread, fclose, popen, pclose, atoi, rand, srand, fork, exit

I recommend that you use extensive logging to the console with printf, so that you can see exactly what the browser sends to the server. The logging has no effect on the web browser, so you can leave it in your server permanently.

Turn in three files: webserver.c, webmux.c, and Makefile into the dropbox directory:

/afs/nd.edu/courses/cse/cse40771.01/dropbox/YOURNAME/a3

Your grade will be based on the following:

Correct functioning of the web server on normal file requests. (50 percent)

Correct functioning of the web server on error conditions. (20 percent)

Correct functioning of the web multiplexer. (20 percent)

Good coding style, such as sensibly chosen variable names, complex tasks broken down into simpler steps, and descriptive comments where appropriate. (10 percent)