Computational challenges and performance optimizations in NGS data analyses

This workshop was aimed at bioinformaticians who are actively involved in next-generation sequencing (NGS) data analysis projects and wanted to learn how to use high-performance computing (HPC) solutions to run their analytical pipelines in an efficient and reproducible manner. 

The aim of this course was to familiarize the participants with HPC methodologies and to provide hands-on training on how to optimize a NGS analysis pipeline. DNA and RNA sequencing analysis workflows were used to explore bottlenecks and demonstrate solutions. 

This event was jointly organized by Intel and the Francis Crick Institute. For more information on this event and the content of this page, feel free to email either Nicholas Luscombe (nicholas.luscombe@crick.ac.uk) or Gabriella Rustici (gabry@ebi.ac.uk).

Topics covered:

  • How to optimize NGS analysis workflows through HPC best practices
  • Optimal use of software tools for short read alignment, with emphasis on Bowtie2 and Tophat2
  • HPC concepts including parallelization, single/multi-process, shared/distributed memory, CPU memory and I/O constraints, etc. 
  • Diagnostic tools for debugging and monitoring of parallel programs
  • Benchmarking of various technology and system architecture approaches
  • Cloud-based analytics
  • Scaling up a workflow to deal with a production scale environment and increasingly large datasets 

Prerequisites: For the practicals, familiarity with the LINUX/UNIX operating system and knowledge of the R programming language.    

 

Agenda:  Click on titles for links to presentations 

Day 1 - Sep 3

NGS data analysis workflows overview,

bottlenecks and optimization solutions

Lead instructor

8:30 - 9:00

Registration

9:00 - 9:30

Overview of workshop

Nick Luscombe

09:30 - 10:15

Workload Characterization & Optimization Tradeoffs (View Lecture here

Chris Dagdigian

10:15 - 10:45

Coffee break

10:45 - 11:30

Introduction to Parallelism (View Lecture here)

Clay Breshears

11:30 - 13:00

Lunch

13:00 - 13:15

Welcome by Jim Smith

Jim Smith

13:15 - 14:00

Memory, I/O or CPU constraints (View Lecture here)

Clay Breshears

14:00 - 15:30

Diagnostic Tools

Clay Breshears

15:30 - 15:45

Coffee break

15:45 - 16:45

Mapping strategies overview (View Lecture here)

Ernest Turro

16:45 - 17:45

System Architecture & Technology Options (View Lecture here)

Chris Dagdigian

17:45-18:15

System Architecture Scavenger Hunt

Clay Breshears

18:15 - 18:30

Q&A session

Intel/Crick

18:30 onwards

Drinks reception

Day 2 - Sep 4

Mapping

9:00 - 10:00

Introduction to RNA-seq analysis (View Lecture here)

Vincent Plagnol

10:00 - 10:15

Coffee break

10:15 - 11:15

Thread and Process Level Optimizations (View Lecture here)

Clay Breshears

11:15 - 12:30

Thread and Process Level Optimizations

Clay Breshears

12:30 - 14:00

Lunch

14:00 - 15:00

Data Latency, Data Chunking & Placement (View Lecture here)

Clay Breshears

15:00 - 16:00

Data chunking

Clay Breshears

16:00 - 16:15

Coffee Break

16:15 - 17:30

Data chunking (continued)

Clay Breshears

17:30 - 18:00

Q&A session

Intel/Crick

Day 3 - Sep 5

RNA-seq analysis

9:00 - 10:00

Debugging and Profiling in R (View Lecture here)

Robert Sugar

10:00 - 10:15

Coffee break

10:15 - 12:30

R-based Optimization (View Lecture here)

Robert Sugar/ Kathi Zarnack

12:30 - 14:00

Lunch

14:00 - 14:45

SPRINT Overview and Case study  (View Lecture here)

Eilidh Troup

14:45 - 15:15

Coffee break

15:15 - 16:30

R-based Optimization and exercises 1, 2 and 3

Eilidh Troup

16:30 - 17:00

SGI UV2 with Xeon Phi

Simon Appleby

17:00 - 17:30

Q&A session

Crick/Intel/SPRINT

Day 4 - Sep 6

Benchmarking and scaling up

9:00 - 10:00

On the empirical evaluation of RNA-seq gene profiling pipelines

Nuno Fonseca

10:00 - 10:30

Coffee break

10:30 - 11:30

Pipelines for large sequencing projects (E.g cancer genomics, UK10K) (View Lecture here)

Steve Searle

11:30 - 13:00

Lunch

13:00 - 14:00

Cloud-based Analytics & Map Reduce (View Lecture here)

Ketan Paranjape

14:00 - 16:00

Cloud-based Analytics & Map Reduce

Ketan Paranjape

15:30 - 15:45

Coffee break

15:45 - 16:45

Scaling up to Production (View Lecture here)

Ketan Paranjape

16:45 - 17:15

Q&A and wrap up

Crick/Intel

Intel Logo