View on GitHub


CSC2417 - Algorithms & Genome Analysis

Course Description

Instructor: Jared Simpson
Office: OICR 6-27
Office hours: By appointment
Lectures: Wednesdays 10-12PM in BL113

In this course we will explore the computational problems that have emerged from genome sequencing. The topics will include the string algorithms used for the classic sequence alignment and genome assembly problems, algorithms for comparing genomes, sequence classification using probabilistic models and algorithms for analysing gene expression data. Throughout the course a special emphasis will be made on efficient algorithms designed to meet the challenges of rapidly growing data sets in the current era of high-throughput genome sequencing. The course is intended for Computer Science graduate students, and all of the required biology will be explained in the class. Students in biological and related sciences with a strong computational background are encouraged to participate.


This course will be assessed by three written assignments (60%) and a final project (40%). The final project will in the form of a paper on a topic selected by the student relating to sequence analysis. This can be a review of recently published algorithms, an original idea or an implementation of a significant algorithm. Detailed instructions will be posted later this month.


  1. Assignment 1 - Due November 2nd
  2. Assignment 2 - Due November 21st
  3. Assignment 3 - Due December 8th


  1. Sept 14 - Introduction to Genomes and Genome sequencing [slides]
  2. Sept 21 - No lecture
  3. Sept 28 - Strings - Introduction to strings and matching problems [slides] [SSAHA] [q-grams]
  4. Oct 5 - Sequence Alignment 1 - Alignment and dynamic programming [slides] [BLAST] [MAQ]
  5. Oct 12 - Sequence Alignment 2 - Indexing - suffix tries, trees and arrays [slides] [MUMmer] [MUMmer slides]
  6. Oct 19 - Sequence Alignment 3 - Indexing - the bwt and fm-index [slides] [bwa]
  7. Oct 26 - Genome Assembly 1 - Assembly Graphs and Overlap-based Assembly [slides]
  8. Nov 2 - Genome Assembly 2 - de Bruijn graph assembly and memory efficiency [slides]
  9. Nov 9 - Genome Analysis 1 - Basic sequence classification [slides]
  10. Nov 16 (11am) - Genome Analysis 2 - Genome Annotation Guest Lecture by Michael Hoffman
  11. Nov 23 - Reconstructing tumour lineages - Guest Lecture by Quaid Morris
  12. Nov 30 - Quantifying transcript abundance using RNA-Seq [slides]
  13. Dec 7 - Nanopore analysis [slides]

Teaching Materials

There is no required textbook for this course but the classic reference for much of the material is Biological Sequence Analysis by Durbin, Eddy, Krogh and Mitchison. The lectures are designed to stand alone but this book is well worth reading to supplement the course's content.
Many of the lectures reuse slides with permission from Ben Langmead's excellent teaching materials. In the lectures posted above Ben's slides retain the John Hopkins University logo. The Week 12 (RNA-Seq) slides use material provided by Rob Patro and Pall Melsted.

Discussion Group

There is a google group for the course where you can ask questions and receive help. Please sign up here as it will also be used for course announcements.

Office Hours

The course instructor is located at OICR. I am available to talk to students most days of the week but you should email first to make sure I am free.