BIOS60576-02
Topics in Bioinformatics: Introduction to Perl and BioPerl


Fall Semester 2007


This seven-week course module is an introduction to bioinformatics programming and scripting using the Perl language and the BioPerl toolkit. A list of topics to be covered is given below. Perl is an open-source computer programming language. BioPerl is a toolkit of Perl modules useful in building bioinformatics solutions in Perl. BioPerl can be used to parse sequence data retrieved from local and remote databases, to transform the formats of sequence data and files, to manipulate individual sequences, to search for patterns in sequences, to assist with creating and manipulating sequence alignments, and to search for genes, transposons, and other structures in genomic data.

This course will assume no knowledge of programming, although skills using a computer will be expected. Weekly programming assignments will be given. Students are strongly encouraged to bring a laptop to class so that they can apply material as it is introduced.


Class Meeting Times/Location:
322 Jordan Hall
MW 3:00 - 4:15, October 29 to December 10, 2007


Instructor:
Greg Madey, gmadey@nd.edu, (574)631-8752, 350 Fitzpatrick Hall

Office Hours:
By appointment (and whenever my office door is open!)

Teaching Assistant:
Matt Van Antwerp, mvanantw at cse.nd.edu, (574)631-7596, 206 Cushing

Textbook:

Beginning Perl for Bioinformatics
by James Tisdall
Paperback: 400 pages
Publisher: O'Reilly Media, Inc.; 1 edition (October 15, 2001)
Language: English
ISBN-10: 0596000804
ISBN-13: 978-0596000806
0596000804_cat

Course Goals:

In this course, students will learn how to and/or about:

Getting started with Perl
  1. Accessing and installing Perl and BioPerl
    b Running Perl programs
    c Editors
    d Finding help
    e Using modules, like BioPerl
The Art of Programming
  1. The Programming process
    b Algorithms
Sequences and Strings
  1. Variables
    b Arrays
    c Files
Motifs and Loops
  1. Flow control
    b String operators
    c Writing files
Subroutines
  1. Scoping
    b Arguments
    c Command line arguments
    d Passing data to subroutines
    e Modules and Libraries
    f Debugging
Data Structures and Algorithms for Biology
  1. Hashes
    b Translating DNA into Proteins
    c Working with the FASTA Format
    d Reading frames
Regular Expressions
  1. Restriction Maps
    b Restriction Enzyme Data
Topics
  1. Working with GenBank data
    b Analyzing DNA
    c Working with BLAST output
    d BioPerl Modules

Grading:

Programming assignments 40%
Class participation 40%
Final project 20%