HarvardX Biomedical Data Science Open Online Training

In 2014 we received funding from the NIH BD2K initiative to develop MOOCs for biomedical data science. The courses are divided into the Data Analysis for the Life Sciences series, the Genomics Data Analysis series, and the Using Python for Research course.

This page includes links to the course material for the three series:

We including video lectures, when available an R markdown document to follow along, and the course itself. Note that you must be logged in to EdX to access the course. Registration is free. We also include links to the course pages.

There is a book available for the first series. You can download a free pdf, buy a hard copy, or access the R markdowns used to create the book.

Data Analysis for the Life Sciences Series

Lecture Title Time Video Material Course Week 1: Multiple Comparisons An Example of High-throughput Data 06:53 Youtube Chapter6 EdX The challenge of multiple testing 08:27 Youtube Chapter6 EdX p-values Are Random Variables 03:23 Youtube Chapter6 EdX Week 2: Error Rates Error Rates and Procedures 04:26 Youtube Chapter6 EdX Error Rates and Procedures Examples 05:36 Youtube Chapter6 EdX Bonferroni Correction 03:52 Youtube Chapter6 EdX False Discovery Rate and Benjamini–Hochberg procedure 08:06 Youtube Chapter6 EdX q-values 04:05 Youtube Chapter6 EdX Week 3: Statistical Models Introduction to Statistical Models 05:10 Youtube Chapter7 EdX Poisson Example from RNA-seq 04:20 Youtube Chapter7 EdX Maximum Likelihood Estimate 04:47 Youtube Chapter7 EdX Models for Variance 03:50 Youtube Chapter7 EdX Week 4a: Introduction to Bayesian Analysis Bayes’ Rule 05:53 Youtube Chapter7 EdX Bayes’ Rule in Practice 09:59 Youtube Chapter7 EdX Hierarchical Models in Practice 04:56 Youtube Chapter7 EdX Week 4b: Data Visualization Volcano plots and p-value histograms, boxplots, and MAplots 08:58 Youtube Chapter7 EdX

Lecture Title Time Video Material Course Week 1: Distance Introduction 00:52 Youtube N/A EdX Distance 03:51 Youtube Chapter8 EdX Distance (in practice) 00:00 Youtube Chapter8 EdX Distance Reduction Motivation 00:00 Youtube Chapter8 EdX Week 2: Principal Component Analysis Projections 06:02 Youtube Chapter8 EdX Rotations 01:54 Youtube Chapter8 EdX SVD 15:57 Youtube Chapter8 EdX MDS 05:22 Youtube Chapter8 EdX PCA 04:51 Youtube Chapter8 EdX Week 3: Clustering and Machine Learning Clustering 06:11 Youtube Chapter9 EdX How Randomness Affects Clustering 02:45 Youtube Chapter9 EdX Hiearchichal Clustering in R 07:09 Youtube Chapter9 EdX K-Means 04:09 Youtube Chapter9 EdX K-Means Clustering in R 04:37 Youtube Chapter9 EdX Heat Maps in R 07:18 Youtube Chapter9 EdX Gene Clustering 06:30 Youtube Chapter9 EdX Conditional Expectations 03:01 Youtube Chapter9 EdX Example: Linear Regression 04:56 Youtube Chapter9 EdX Smoothing 07:45 Youtube Chapter9 EdX K-Nearest Neighbors 07:44 Youtube Chapter9 EdX Cross Validation 09:20 Youtube Chapter9 EdX Week 4: Confounding and Batch Effects Confounding 06:46 Youtube Chapter10 EdX Confounding in Genomics 03:34 Youtube Chapter10 EdX EDA with PCA 05:12 Youtube Chapter10 EdX Modeling Batch Effects 06:28 Youtube Chapter10 EdX ComBat 03:12 Youtube Chapter10 EdX Factor Analysis 06:17 Youtube Chapter10 EdX Motivating Factor Analysis 02:21 Youtube Chapter10 EdX Surrogate Variable Analysis (SVA) 06:04 Youtube Chapter10 EdX

Genomics Data Analysis Series

Lecture Title Time Video Material Course RNA-Seq Introduction to RNA-seq 05:25 Youtube RNA EdX Data Generation and Counts 04:51 Youtube RNA EdX Model for Quantification 04:18 Youtube RNA EdX Transcript Quantification 06:54 Youtube RNA EdX Unstable Quantification 04:51 Youtube RNA EdX Links for RNA-seq alignment N/A N/A N/A EdX Downloading FASTQ files 07:01 Youtube FASTQ EdX Quality control with FASTQC 08:19 Youtube FASTQ EdX FASTQC notes N/A N/A N/A EdX Genome alignment with STAR I 06:56 Youtube STAR EdX Genome alignment with STAR II 01:42 Youtube STAR EdX Integrative Genomics Viewer (IGV) 06:31 Youtube RNA EdX Transcriptome alignment with RSEM I 03:51 Youtube RSEM EdX Transcriptome alignment with RSEM II 05:33 Youtube RSEM EdX Install R and Bioconductor 02:51 Youtube Installing EdX BAM files and GTF file 05:35 Youtube RNA EdX Building a count matrix 10:02 Youtube RNA EdX Normalizing for sequencing depth 09:43 Youtube RNA EdX Transformations and Variance 08:55 Youtube RNA EdX RNA-seq and ratios of counts 04:51 Youtube RNA EdX Modeling raw counts 06:07 Youtube RNA EdX Negative binomial distribution 03:37 Youtube RNA EdX Negative binomial distribution 03:37 Youtube RNA EdX Running DESeq2 07:13 Youtube RNA EdX Plotting results: MA-plot 04:50 Youtube RNA EdX Plot counts for one gene 06:01 Youtube RNA EdX Fast, pseudoaligners for RNA-seq N/A N/A RNA EdX Isoform or exon-level expression 04:07 Youtube Exon EdX Differential exon usage with DEXSeq 07:51 Youtube Exon EdX Differential isoform expression with Cufflinks/cummeRbund 06:08 Youtube Exploring EdX DNA Methylation Epigenetics 02:08 Youtube Methylation EdX DNA Methylation 03:22 Youtube Methylation EdX CpG islands 04:06 Youtube Methylation EdX Bisulfite Treatment 01:31 Youtube 450K EdX Measuring Methylation with the 450K Array 01:52 Youtube 450K EdX Statistical Considerations 08:20 Youtube 450K EdX DNA Methylation Data Analysis in R 11:30 Youtube 450K EdX Finding Differentially Methylated Regions in R 13:08 Youtube 450K EdX Reading Raw 450K Array Data 08:34 Youtube 450K EdX Reading Raw 450K Array Data 08:34 Youtube 450K EdX Downloading Data N/A N/A N/A EdX InferenceForDNAmeth 07:54 Youtube Inference EdX InferenceForDNAMethInR 08:49 Youtube Inference EdX CpGIslandShores 10:23 Youtube N/A EdX cellComposition 08:24 Youtube N/A EdX blocks 05:41 Youtube N/A EdX Measuring Methylation from Sequencing 01:52 Youtube N/A EdX ChIP-Seq Introduction to transcription regulation 06:42 Youtube Chip EdX ChIP-seq technique 06:50 Youtube Chip EdX ChIP-seq peak calling 11:44 Youtube Chip EdX ChIP-seq Quality Control 1 14:32 Youtube Chip EdX ChIP-seq quality control 2 08:31 Youtube Chip EdX ChIP-seq Target Genes 13:10 Youtube Chip EdX ChIP-seq Example 09:45 Youtube Chip EdX CISTROME 07:51 Youtube N/A EdX Cistrome Analysis Pipeline Hands-On 09:53 Youtube N/A EdX BETA Software Suite 13:09 Youtube N/A EdX

Series Number Lecture Title Time Video Series 1.0 0.1 Why Program? Why Python? 04:34 Youtube 1.1.1 Python Basics 04:30 Youtube 1.1.2 Objects 04:39 Youtube 1.1.3 Modules and Methods 07:29 Youtube 1.1.4 Numbers and Basic Calulations 04:25 Youtube 1.1.5 Random Choice 01:55 Youtube 1.1.6 Expressions and Booleans 05:52 Youtube 1.2.1 Sequences 03:21 Youtube 1.2.2 Lists 07:12 Youtube 1.2.3 Tuples 06:36 Youtube 1.2.4 Ranges 02:46 Youtube 1.2.5 Strings 08:38 Youtube 1.2.6 Sets 07:02 Youtube 1.2.7 Dictionaries 08:09 Youtube 1.3.1 Dynamic Typing 11:25 Youtube 1.3.2 Copies 01:58 Youtube 1.3.3 Statements 04:35 Youtube 1.3.4 For and While Loops 08:05 Youtube 1.3.5 List Comprehensions 02:37 Youtube 1.3.6 Reading and Writing Files 05:20 Youtube 1.3.7 Introduction to Functions 05:23 Youtube 1.3.8 Writing Simple Functions 09:36 Youtube 1.3.9 Common Mistakes and Errors 06:44 Youtube Series 2.0 2.1.1 Scope Rules 08:29 Youtube 2.1.2 Classes and Object-Oriented Programming 07:34 Youtube 2.2.1 Introduction to NumPy Arrays 06:26 Youtube 2.2.2 Slicing NumPy Arrays 05:13 Youtube 2.2.3 Indexing NumPy Arrays 07:20 Youtube 2.2.4 Building and Examining NumPy Arrays 05:52 Youtube 2.3.1 Introduction to Matplotlib and Pyplot 08:21 Youtube 2.3.2 Customizing Your Plots 05:28 Youtube 2.3.3 Plotting Using Logarithmic Axes 05:07 Youtube 2.3.4 Generating Histograms 07:46 Youtube 2.4.1 Simulating Randomness 07:41 Youtube 2.4.2 Examples Involving Randomness 13:40 Youtube 2.4.3 Using the NumPy Random Module 11:36 Youtube 2.4.4 Measuring Time 03:42 Youtube 2.4.5 Random Walks 16:40 Youtube Series 3.0 3.1.1 Introduction to DNA Translation 04:44 Youtube 3.1.2 Downloading DNA Data 04:22 Youtube 3.1.3 Importing DNA Data Into Python 04:57 Youtube 3.1.4 Translating the DNA Sequence 12:26 Youtube 3.1.5 Comparing Your Translation 08:19 Youtube 3.2.1 Introduction to Language Processing 02:16 Youtube 3.2.2 Counting Words 10:33 Youtube 3.2.3 Reading in a Book 03:55 Youtube 3.2.4 Computing Word Frequency Statistics 04:41 Youtube 3.2.5 Reading Multiple Files 11:47 Youtube 3.2.6 Plotting Book Statistics 06:03 Youtube 3.3.1 Introduction to kNN Classification 03:22 Youtube 3.3.2 Finding the Distance Between Two Points 05:22 Youtube 3.3.3 Majority Vote 11:24 Youtube 3.3.4 Finding Nearest Neighbors 14:02 Youtube 3.3.5 Generating Synthetic Data 07:14 Youtube 3.3.6 Making a Prediction Grid 10:44 Youtube 3.3.7 Plotting the Prediction Grid 04:14 Youtube 3.3.8 Applying the kNN Method 09:56 Youtube Series 4.0 4.1.1 Getting Started With Pandas 11:02 Youtube 4.1.2 Loading and Inspecting Data 04:28 Youtube 4.1.3 Exploring Correlations 05:37 Youtube 4.1.4 Clustering Whiskies by Flavor Profile 06:54 Youtube 4.1.5 Comparing Correlation Matrices 03:59 Youtube 4.2.1 Introduction to GPS Tracking of Birds 02:54 Youtube 4.2.2 Simple Data Visualizations 05:19 Youtube 4.2.3 Examining Flight Speed 06:51 Youtube 4.2.4 Using Datetime 11:13 Youtube 4.2.5 Calculating Daily Mean Speed 07:49 Youtube 4.2.6 Using the Cartopy Library 05:22 Youtube 4.3.1 Introduction to Network Analysis 05:50 Youtube 4.3.2 Basics of NetworkX 05:47 Youtube 4.3.3 Graph Visualization 04:03 Youtube 4.3.4 Random Graphs 11:25 Youtube 4.3.5 Plotting the Degree Distribution 05:51 Youtube 4.3.6 Descriptive Statistics of Empirical Social Networks 07:17 Youtube 4.3.7 Finding the Largest Connected Component 10:00 Youtube Series 5.0 5.1.1 Introduction to Statistcal Learning 08:11 Youtube 5.1.2 Generating example regression data 05:03 Youtube 5.1.3 Simple linear regression 03:38 Youtube 5.1.4 Least squares estimation in code 05:02 Youtube 5.1.5 Simple linear regression in code 07:22 Youtube 5.1.6 Multiple linear regression 01:43 Youtube 5.1.7 Scikit learn for Linear Regression 09:15 Youtube 5.1.8 Assessing Model Accuracy 08:49 Youtube 5.2.1 Generating Example Classification Data 07:59 Youtube 5.2.2 Logistic Regression 05:10 Youtube 5.2.3 Logistic Regression in Code 10:50 Youtube 5.2.4 Computing Predictive Probability Across the Grid 09:29 Youtube 5.3.1 Tree-Based Methods for Regression and Classification 07:12 Youtube 5.3.2 Random Forest Predictions 04:24 Youtube

Contributors