- Does the DRAGEN variant caller support the ability to define the regions where variants are to be called, so for example, bait intervals can be masked out?
- Yes, you can provide a .bed file indicating the intervals where you want calling performed. Please refer to the DRAGEN User Guide.
- Is there an easy way to process multiple fastq files produced by the Illumina bcl2fastq command for the same subject?
- Yes, there are command line options, combine-samples-by-name or fastq-list. Both options are specified in the User guide. The fastq-list is the preferred option which will let you specific all fastq files for a sample along with read group IDs.
- Does the DRAGEN variant caller support population based calling, e.g., calling a cohort of samples?
- Yes, DRAGEN support joint calling of Trios, large cohorts or populations with the availing Combine gVCF and Joint Genotyping capability.
- Where are the output BAM and VCF files saved?
- The output BAM and VCF files are stored in the specified output directory on the command line. The output directory will also store temporary/intermediate files, however it is always recommended to save intermediate files on the /staging folder for on-site servers or on the ephemeral or EBS in the cloud. The command line option for that is intermediate-results-dir.
- Can the DRAGEN system handle concurrent users?
- The DRAGEN system can only be used by one user at a time. The best way to handle multiple users is to place a job queueing mechanism in front of the DRAGEN that can accept jobs from users and queue them up. That mechanism then calls the DRAGEN application with one job at a time, notifying users when their job is done. DRAGEN works well with SLURM or LSF as a scheduler to manage multiple users.
- How do I process compressed/gzipped input files?
- DRAGEN supports gzip and cram input formats. DRAGEN also outputs compressed outputs.
- Does DRAGEN output secondary and supplementary alignments by default?
- The maximum number of supplementary (chimeric) and secondary (suboptimal) alignments per read can be specified from 0 to 30. The default ceilings are 3 supplementary and 0 secondary. For secondary alignments, there are also filtering parameters to specify the maximum score difference or phred-scale likelihood difference between the best alignment and a secondary.
- How does DRAGEN mapping quality scores compare to BWA mem?
- Methodology of MAPQ calculation is like BWA MEM. Alignment candidate pair scores are formed by summing mate alignment scores and subtracting a penalty representing the likelihood of insert size vs. the empirical distribution. MAPQ is primarily proportionate to pair score difference between best and second-best candidates, with corrections for clustered suboptimal scores. Some inconsistencies in BWA’s calculation are corrected, particularly that BWA applies a different scaling from alignment score differences to MAPQ in paired vs. unpaired cases.
- Is DRAGEN available in the Cloud?
- Yes, DRAGEN is available in AWS Cloud today and uses the f1.2x and f1.16x instances to process data. DRAGEN can be accessed via AWS Marketplace as well us through other partners like DNAnexus, Illumina Basespace and Seven Bridges.
- Does DRAGEN support hard and soft clipping in the Aligner?
- DRAGEN primary alignments are soft-clipped, so all the bases and quals are present. By default we hard-clip the supp. alignments. However, it is possible to disable this using the below Command line option,
This parameter, ranging from 0 to 7, is considered as a field of 3 bits. Bit 0 is for primary alignments, bit 1 for supplementary alignments, and bit 2 for secondary alignments. Each bit determines whether local alignments of that type are reported with hard clipping (1) or soft clipping (0). The default is hard-clips = 6, meaning only primary alignments use soft clipping.
||Flags for hard clipping:  primary,  supplementary,  secondary
You can use this command line option to disable all hard clips -> —Aligner.hard-clips=0
- In Somatic mode, with BAM input for Tumor-Normal sample, I get an error related to same read groups in both BAM files?
If you are using the Tumor- Normal BAM input option, and your BAMs’ read groups have a shared RGID, DRAGEN won’t know which read group your reads belong to. Ideally, you should have different RGIDs for each read group, but you can work around the problem by adding “–prepend-filename-to-rgid true” to the command line.
What Network card does the DRAGEN On-site Server have?
- Does the DRAGEN On-site Server include redundant OS Drives?
- Yes, it includes two 120GB drives that are Raided for redundancy
- Does the DRAGEN On-site Server include redundant Power supplies?
- Yes, the DRAGEN Server comes with Dual 750W redundant Power supplies.
- With BAM input, if DRAGEN is used to Re-map and Re-align, would I lose any raw data?
- If DRAGEN is run with default settings, then no read bases or base quals are lost. If the hard-clips option is set to 7, then data may be lost. The bases and base quals in unaligned reads are retained in the DRAGEN output BAM.
- We’d like to double check that the BAM file produced by Dragen will contain all the unaligned reads from FASTQ file, so we can re-constitute the FASTQ files in the future. Is this the case? Are there any edge cases where the original FASTQ cannot be fully recovered from the BAM files produced by DRAGEN?
- Our general intention is to enable you to reconstruct FASTQs from DRAGEN output BAMs to the full extent you find necessary. Below is a pretty comprehensive list of present limitations and edge cases. If you find our current capabilities inadequate in any way, please let us know, so that we can expedite updates to meet your needs.
All FASTQ reads are transmitted to the output BAM, including unmapped reads
- Paired-end reads may be provided by dual synchronized FASTQs or one interleaved FASTQ; mate number is flagged in the output BAM
- However, original FASTQ order of reads is forgotten, unless it can be derived from the read names
All sequence bases are retained, provided the user does not enable hard clipping for primary alignments (which are soft-clipped by default)
Supported bases are [ACGT] and ‘N’ (but see base quality notes below) FASTQ ‘@’ lines may contain:
- Sequence base name – transmitted to the output BAM unchanged
- Optional mate suffix (e.g. “/1”, “/2”; other delimiters supported) – stripped by default, but there is an option to retain these
- Optional description after first whitespace – currently dropped, not included in the output BAM
Any information on FASTQ ‘+’ lines is dropped (rarely seen, and if present should theoretically match the ‘@’ lines)
Base quality scores are generally retained, but there are edge cases / limitations, mainly for old or non-Illumina sources:
- Quality scores above phred 63 are not supported, as 6 bit encoding is used internally
- Input ‘N’s are represented internally using quality score 0
- Input ‘N’s are transmitted to the output BAM, but always with a fixed base quality score (2 by default, matching Illumina)
- Input [ACGT] bases with quality score 0 (Illumina doesn’t do this) are changed to ‘N’s in the output BAM
FASTQ whitespace variations or deviations from standard FASTQ format are not captured in the output BAM