Readings

This week we will be discussing the relational nature of SQL. In particular, we will focus on SQL.

The readings for Thursday, February 25 are:

  1. SQL Joins Explained

  2. A Visual Explanation of SQL Joins

Questions

Once you have completed these readings, please complete the following tasks:

  1. Download and extract the reading05.zip archive:

    # Download archive
    $ curl -O https://www3.nd.edu/~pbui/teaching/cdt.30020.sp16/static/zip/reading05.zip
    
    # Extract archive
    $ unzip reading05.zip
    
    # Go to extract folder
    $ cd reading05
    

    The archive contains the following files:

    1. movies.dat: This is the movies data from the MovieLens 1M dataset that we have been using in class to explore databases.

    2. movies.html, groups.html: These are the HTML templates for the web application. They utilize some basic Bootstrap elements for visual appeal. The theme is Paper from the Bootswatch collection.

    3. movielens.py: This is the Python web application. It utilizes both SQLite and Tornado.

    An example of the web application in action can be found at http://xavier.h4x0r.space:9999/.

  2. You are to complete the eight TODOs in the movielens.py script. These TODOs primarily require you to complete the SQL statements. You should not need to modify significant amounts of Python code or HTML templates, though you may if you wish.

    Here is a walk-through of what you need to complete:

    # TODO 1: Connect to database, create tables, and load tables
    

    For this TODO, you need to create a connection to the sqlite3 database and assign it to the conn instance variable of the Database class. Afterwards, you need to call the internal _create_tables and _load_tables() methods.

    # TODO 2: Complete movies_sql and genres_sql
    

    For this TODO, you write the SQL statements that create the Movies and Genres tables, which look like the following.

    Movies:
        Columns:      MovieID | Title | Year
        Primary Key:  MovieID
    
    Genres:
        Columns:      MovieID | Genre
        Primary Key:  (MovieID, Genre)
        Foreign Key:  MovieID
    

    The Genres table is different from the previous SQL tables we have created. This table has a composite key as its primary key. That is, each row in Genres is uniquely identified by multiple columns: MovieID and Genre.

    Additionally, the Genres has a foreign key that references another table in the database. In this case, Genres has MovieID which references the MovieID from the Movies table. For a reference on how to specify a foreign key, you can refer to Referential Integrity in SQLite or SQLite Foreign Key Support.

    Be sure to consider the case where the tables are already in the database.

    # TODO 3: Complete movie_sql and genre_sql
    

    For this TODO, you need to write the SQL statements to insert data into the Movies and Genres table respectively.

    Be sure to consider the case where the data is already in the database.

    # TODO 4: Complete movie_sql
    

    For this TODO, you need to write the SQL statements to select all the movies from the database that match (ie. LIKE) the specified title.

    This SQL statement will require us to perform a JOIN, that is mix data from two different tables. Because this is a bit tricky, here is a sketch of what this SQL statement should look like:

    SELECT    Movies.MovieID, Title, Year, GROUP_CONCAT(Genres.Genre) AS Genres
    FROM      ___________
    JOIN      ___________ ON ___________ = ___________
    WHERE     ___________
    GROUP BY  ___________
    LIMIT     ___________
    

    As can be seen from the above sketch, we want to return a collection of movies which consist of MovieID, Title, Year, and Genres.

    To compute Genres, we instruct SQLite to perform the GROUP_CONCAT aggregate function. In conjunction with the GROUP BY, the GROUP_CONCAT combines the column data from multiple related rows into a single column entry (in this case, we are combining multiple Genres.Genre values into a single Genres value for each movie.

    In order for us to associate each movie with its collection of Genres, we must perform a JOIN. When we perform this operation, we must tell the database what table to JOIN and ON which column. In this case, we know that that the MovieID from the Movies table is associated with the MovieID from the Genres table.

    # TODO 5: Complete genre_sql
    

    For this TODO, you need to write the SQL statements to select all the movies from the database that have the specified genre.

    This SQL statement will look like the previous movies_sql except the WHERE argument will be different and there should be no LIMIT.

    # TODO 6: Complete year_sql
    

    For this TODO, you need to write the SQL statements to select all the movies from the database that have the specified year.

    This SQL statement will look like the previous genre_sql except the WHERE argument will be different.

    # TODO 7: Complete genres_sql
    

    For this TODO, you need to write the SQL statements to select all the DISTINCT genres from the Genres table.

    We can use the DISTINCT keyword with the SELECT statement to only retrieve the unique elements from a column. Be sure to order the results appropriately.

    # TODO 8: Complete years_sql
    

    For this TODO, you need to write the SQL statements to select all the DISTINCT years from the Movies table.

    This SQL statement will look like the previous genres_sql.

When you are finished with these tasks, please submit the complete movielens.py:

$ git add movielens.py
$ git commit -m "reading 05: completed"
$ git push

Feedback

If you have any questions, comments, or concerns regarding the course, please provide your feedback at the end of your response.

Submission

To submit your response, please commit your work (responses and scripts) to the reading05 folder of your your Readings Bitbucket repository.