Course Details

In case you missed it, here are the highlights from last time followed by Course Details for this semester!

What will the students learn?

  • Server and Cluster usage in Linux
  • Basic Programming in R
  • Basic Statistics
  • Introduction to Next Gen Sequencing technologies
  • NGS Data Quality Control and Alignment
  • Downstream analysis of NGS Data including RNA-seq, ChIP-seq, Gene Set Enrichment Analysis, SNP calling, Pathway analysis, publication quality Heatmaps and graphics using R
  • Pipeline construction using shell scripts in Linux

Course timeline and details

  1. Introduction to Biology (Dr. Ting) {1 Lecture}
    • Intro to Next Gen Sequencing
  2. Introduction to Servers (Dr. Thapar) {2 Lecture}
    • Server accounts and Logging into “The Shell”
    • Linux command-line operations programming concepts and practice;
    • FTP and other services for Remote usage
    • Parallel Programming with Queues at MGH on Erisone
  3. Sequence Analysis Part 1 (Dr. Thapar) {3 Lectures}
    • Introduction to Quality Control analysis for NGS data: FastQC tool, Fastq and Fasta format along with sequence quality scores and Phred scaling
    • Read Alignment with Tophat, Star and other aligners. Aligner comparison and specific use cases.
    • File types and conversions with introduction to Samtools, Bedtools, and PICARD (Bigwigs, Bam, GTF, Wig, Bed file types explained.)
  4. Introduction to R Programming (Dr. Thapar) {3 Lectures}
    • Introduction of the fundamental principles of object-oriented programming using the R language.
    • Variables, Data Types, Data Structures, Expressions and Statements
    • Functions, Control Structures, Loops in R
    • Files and System commands in R
    • Introduction to Bioconductor Packages in R
  5. Introduction to Statistics (Dr. Aryee) {1 Lecture}
    • Distribution of data
    • Hypothesis testing and p-value
    • Normalization
    • Statistical Power
  6. Sequence Analysis Part 2 (Dr. Thapar) {4 Lectures}
    • Differential Expression analysis in R (DESeq, EdgeR, CuffDiff)
    • Visualizing your results via Graphs and Heatmaps in R
    • Peak Calling using MACS
    • Pathway Analysis using DAVID software and other tools in R
    • Gene Set Enrichment Analysis using Broad Software
    • Visualization of Data in Genome Browser (UCSC and IGV)
  7. Simultaneous Project Practical involving sequence data analysis (Dr. Thapar) {12 sessions}
    • Unix Commands in the Shell (1 Session)
    • Quality control and alignment of data (1 sessions)
    • Programming in R (4 sessions)
    • RNA-Seq (Single Cell or Bulk Sequencing) (1 sessions)
    • ChIP-Seq (1 session)
    • GSEA and Pathway analysis (1 session)
    • SNP calling and variant calling (1 session)
    • Pipelines in shell for NGS analysis (0.5 session)
    • Final presentations (1.5 sessions)

Form of education

●      14 Lectures (1 hour lecture in the afternoon, one day a week)

●      14 Project practical sessions (1.5 hour session following the 
lectures, one day per week), fully supervised

●      Office hours every week and tutoring (and upon request)

●      Make up session (2 hours, optional session to go over the practical in detail)

●      6 Homework assignments

●      1 Final Project

●      Optional Final Examination for grade

Please sign up for the upcoming info session on the 24th of Feb (Tentatively) and we will get in touch with you with further details.

The course shall begin in the Second week of March 2016 (Tentatively)