DRAGEN Somatic V2 Pipeline

Ultra-Rapid, Highly Accurate Analysis of Tumor Sequence Data

Overview

The DRAGEN Somatic V2 Pipeline allows ultra-rapid analysis of next-generation sequencing (NGS) data to identify cancer-associated mutations. DRAGEN is able to call SNPs and INDELs from both matched tumor/normal pairs and tumor-only samples. DRAGEN produces rapid results while achieving a level of accuracy greater than top somatic variant callers.

How Does it Work?

For the tumor/normal pipeline, both samples are analyzed jointly such that germline variants are excluded, generating an output specific to tumor mutations. The tumor-only pipeline produces a VCF file that can be further analyzed to identify tumor mutations. Both pipelines make no ploidy assumptions, enabling detection of low-frequency alleles. New features of DRAGEN Somatic V2 are a sample-specific calibration algorithm that improves accuracy, and refined mapper and aligner algorithms.

DRAGEN Somatic Pipeline V2

The DRAGEN V2 Somatic Pipeline offers flexible data analysis to suit different needs. It accepts FASTQ, BAM/CRAM, and BCL files and supports NGS input from whole genome, whole exome, and targeted cancer panels. The somatic pipeline calls SNPs and INDELs while also reporting allele frequency. The somatic pipeline can be automated using the DRAGEN Workflow Management System for ease of use and bulk sample processing. It is available onsite, in the Cloud, or as a hybrid Cloud solution.

Tumor/Normal Pipeline

In the tumor/normal pipeline, both samples go through identical processing steps and are input into the variant caller, where germline variants are excluded to produce a VCF file specific for tumor mutations.

DRAGEN Genome Pipeline

Tumor-Only Pipeline

The tumor-only pipeline lacks a matching normal sample and produces a VCF file containing both somatic and germline variants. Users have two options for refining the data to identify somatic variants: 1) Input a panel-of-normals dataset as a germline filter in the variant caller; 2) Compare the output VCF to publicly available databases of germline SNPs and INDELs to remove known germline variants.

Improved Accuracy

The ICGC-TCGA DREAM Mutation Calling Challenge was a contest to find the most accurate tool for detecting variants in synthetically generated mutation datasets. The DRAGEN  Somatic V2 Variant Caller was benchmarked against the top-performing variant callers from the DREAM Challenge and outperformed them for both SNP and INDEL accuracy. In the chart below, the winning submission of the DREAM Challenge for synthetic dataset 4 is highlighted in grey; DRAGEN Somatic V2 Pipeline is in blue.

A 2015 study (Alioto et al., Nature Communications) compared multiple somatic variant calling tools for accuracy in calling SNPs and INDELs from a medulloblastoma tumor sample. A curated gold set was used to benchmark performance. The DRAGEN V2 Somatic Variant Caller analyzed the same FASTQs as the other tools and produced better measures of accuracy for SNPs and INDELs. The top performing submissions are highlighted in grey; DRAGEN V2 in blue

DRAGEN Somatic V2 Pipeline Steps

1

Input/Output File Formats

  • FASTQ or BCL to BAM/CRAM or VCF
  • BAM/CRAM to VCF
2

Compression/Decompression

  • Decompression of FASTQ, BCL, BAM/CRAM
  • Gzip and CRAM in and out
logo

BCL Convert/Demultiplex

  • BCL conversion to FASTQ
  • BCL can also be processed directly
4

Mapping/Aligning

  • Single end or paired end reads
  • Supports read lengths from 26 bp to 10k bp
5

Position Sorting

  • Binning by reference range
  • Sorting of bins by reference position
6

Duplicate Marking

  • Based on starting position and CIGAR string
  • Highest quality duplicate report
7

Somatic Variant Calling

  • Somatic variant caller filters germline variants
  • Tumor/Normal analysis filtering is conducted with variant comparison between match-pairs