arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

DRAGEN Array Local Analysis

hashtag
DRAGEN Array Local Overview

DRAGEN Array provides accurate, comprehensive, and efficient analysis of Infinium microarray data. The local command-line interface makes it easy for power users to have granular control and flexibility to support large scale microarray genomic studies.

hashtag
Getting Started

DRAGEN Array Local utilizes a command-line interface which allows full user control of software functionality and easy automation of tasks. The software is designed to be used by power users and bioinformaticians. If new to using command-line interface, please review the .

hashtag
Computing Requirements

Before downloading and installing the software, ensure the following specifications are met for best performance:

Category
Recommendation

hashtag
Quota Specifications

The star-allele call command in DRAGEN Array Local requires quota to run. The quota is charged per sample analyzed and can be purchased on the . Quota is used for all samples analyzed including re-analysis or low-quality samples.

The credential provided in the activation email after purchasing should be used as an input to the star-allele call command through the "--license-server-url" option. During runtime, the will record the remaining quota at the beginning and the end of the analysis.

Internet is required to do a software license check and ensure paid quota is available for all samples in the analysis batch. For the software license check, the following endpoint is used: license.edicogenome.com.

hashtag
Installation

Please follow the steps below to install the software on your compute infrastructure:

  1. Click on the DRAGEN Array v1.0 installation package for the platform of your choice. Installers for Windows and Linux are available on the . Once download is completed, move the DRAGEN Array v1.0 installation package to the desired folder. Administrative permissions may be required for system folders, for example /usr/local/bin for Linux, and C:\Program Files for Windows. Note: Throughout the remaining of the document, Linux will be assumed in the examples.\

  2. Unzip and extract the package. The executable can be found in the dragena subfolder of the software download after extraction.

The version of the software will be displayed in the terminal window when the installation was successful.

hashtag
Run DRAGEN Array Local

For CNV PGx analysis, a minimum of 24 samples is required to run analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2. With a standard hardware specification in section , up to 500 GDA-ePGx samples can be processed per analysis batch.

For genotyping analysis, there is no sample minimum required to run analysis.

To optimize performance of the targeted PGx CNV caller and minimize batch effect, it is recommended to:

  • Analyze samples that were processed together in one batch

  • Avoid combining sample batches processed on different reagent lots.

  • Analyze batches of 96 samples or more

hashtag
Quick Start

Use the following instructions to start the full PGx analysis, covering genotyping, PGx CNV and PGx star allele calling. Refer to for parameters for all commands.

Review section for information on input files to use, sample minimums per analysis type and other best practices.

Command examples show analysis for a Linux system using folders instead of sample sheets. For Windows users, make sure to substitute the file paths in the commands following windows conventions, e.g., using backslash (\) instead of forward-slash (/). A sample sheet can be used to select specific samples out of a folder.

  1. Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Or a different, desired directory if the executable was added to the PATH environmental variable.

  2. Use the genotype call command to call genotypes and generate GTC files using IDAT files as input. dragena genotype call --bpm-manifest /user/productfiles/manifest.bpm --cluster-file /user/productfiles/clusterfile.egt –-idat-folder /user/IDATs –-output-folder /user/gtc

hashtag
Command Index

Use the following syntax when using the command-line interface:

dragena [command] [required parameters] [optional parameters]

hashtag
copy-number

The root command for actions that act on copy number variants.

Command
Description

hashtag
copy-number call

The command used to call copy number variants. A batch of 24 samples or more are required for analysis. For a successful analysis, 22 samples must pass QC defined as having log R dev < 0.2.

Option
Description

hashtag
copy-number help

Displays help information for a copy-number command.

hashtag
copy-number train

Trains copy number (CN) model for a set of samples. Generate a new CN model if using a customized cluster file (.egt) optimized for the specific data set.

  • Execute the train command using the data sets that were used to optimize the cluster file.

  • To use a CN model generated by the train command, the mask file for the manifest must be saved in the same directory as the manifest.

  • A minimum of 96 samples is required to use the copy-number train command. For optimal performance, at least 150 is recommended.

See for further details.

Option
Description

hashtag
copy-number version

Displays version information for copy-number command.

hashtag
genotype

The root command for genotype calling.

Command
Description

hashtag
genotype call

Determines genotype calls (GTC) from IDAT files.

Option
Description

hashtag
genotype gtc-to-bedgraph

Converts GTC to BedGraph files, producing BedGraph formatted visualization files from the log R ratio data contained in the GTC intermediate files.

Option
Description

hashtag
genotype gtc-to-vcf

Converts GTC to . The command is only applicable for produced by DRAGEN Array.

Option
Description

hashtag
genotype help

Displays the help information for a genotype command.

hashtag
genotype version

Displays current DRAGEN Array Local version.

hashtag
help

Displays the help information.

hashtag
version

Displays current DRAGEN Array Local version.

hashtag
star-allele

The root command PGx star allele calling.

Command
Description

hashtag
star-allele help

Displays help information for a star-allele command.

hashtag
star-allele version

Displays version information for star-allele.

hashtag
star-allele call

Calls PGx star allele diplotypes. The SNV VCF files should be generated using the DRAGEN Array gtc-to-vcf command with unsquash-duplicates off (default) and without filter loci.

Option
Description

hashtag
star-allele annotate

Annotates and summarizes the star-alleles, specifically for metabolizer statuses and outputs in a consolidated JSON report. Metatolizer status is determined through direct lookup into public PGx guidelines CPIC or DPWG as specified by the user.

Option
Description

hashtag
Troubleshooting and Additional Support

hashtag
Tips for using the Command-line interface

DRAGEN Array Local utilizes a command-line interface which allows full user control of software functionality and easy automation of tasks. The software is designed to be used by power users and bioinformaticians.

When using command-line consider the following tips:

  • Spaces cannot be part of a file name in a command. If the file name has spaces, use quotes around the file name

  • To correct a typing error in a previously entered command, use the up arrow to repeat the previous command, then correct the error before re-entering it.

  • Double check the command. Misspelling, extra, or missing dashes, etc. will cause the command to be unrecognizable by the software.

hashtag
Optimizing cluster files and copy number models

A (.egt) contains the cluster positions of every probe used for genotyping analysis. Illumina provides a standard cluster file for all commercial Infinium BeadChips. It may be desirable to create a custom cluster file if the one provided does not fit the data well or if a semi-custom or custom BeadChip, that do not come with a cluster file, are used. is the software used to create custom cluster files.

To facilitate the review and optimization of PGx variant GenTrain cluster positions, a GenomeStudio auxiliary file is provided for each PGx Array product through the and array product files page, e.g. . The auxiliary file is a tab-delimited text file that can be imported into GenomeStudio through Column Import. The file contains the Infinium Assay to PGx star allele mapping, covering the variants involved in DRAGEN Array PGx star allele calling.

When updating the cluster file for pharmacogenomic applications, understand the specifications for the copy number model file before beginning.

Before creating a custom cluster file, review the , the , and .

A (.dat) contains the data needed to make accurate copy number calls for pharmacogenomics. This file is used in the creation CNV VCFs which are inputs to the star allele calling command. Illumina provides a standard CN model file for all commercial PGx Infinium BeadChips. If it is determined the cluster file needs to be customized, the CN Model File should also be updated using the copy-number train command available with DRAGEN Array Local only. Review the for details of this command.

To retrain the CN model file, 96 samples must be used at minimum with 90 of those samples passing QC defined as Log R Dev less than or equal to 0.2. It is recommended to train with at least 150 samples. A greater number of samples can be advantageous, but diminishing returns and longer computation times are seen after 3,000 samples.

It is recommended to manually QC the training samples and remove samples that have Log R Dev > 0.2, call rate < 0.99, or TGA Control probe < 1.0 so only the highest quality sample are used in the training. The same samples used to create the new cluster file should be used to retrain the CN Model. To minimize batch effect in the training sample set, the samples should be analyzed in as few batches as possible and come from the same reagent lots.

The copy-number train algorithm is designed with the assumption that the copy number distribution resembles the standard population distributions. This ensures the updated CN model file is representative of the normal populations in which it will be used to calculate copy number for key pharmacogenomic targets.

hashtag
Pharmacogenomic analysis for semi-custom arrays

Semi-custom arrays add additional content or other pre-designed to enhance the commercial array content. This additional content can be analyzed for to obtain information on SNV and indel calls.

For , PGx CNV and star allele calls are limited to content included on the commercial Infinium PGx arrays. Additional semi-custom content will not be included in the pharmacogenomic results.

When designing a semi-custom array using a commercial Infinium PGx array backbone, such as the Global Diversity Array with enhanced PGx, it is important to retain all backbone content in the design as removing content could decrease the quality of result.

Pharmacogenomic analysis for semi-custom arrays should be run using . The genotype call, copy-number call, and star-allele call commands should all be run using the commercial Infinium PGx array product files.

To check that the DRAGEN Array installation was successful, follow these steps:

  • Open a command prompt (Windows) or terminal (Linux).

  • [Optional] Add /path/to/dragena/, e.g. /usr/local/bin/dragena-linux-x64-DAv1.0.0/dragena/, to your PATH – to access the executable anywhere in the folder structure

  • Execute the following command: /path/to/dragena/dragena version, or if the environmental variable PATH is set: dragena version

Use the CN Model and PGx Database File provided as part of the standard product files
Use the genotype gtc-to-vcf command to create SNV VCF files from the GTC files generated by the genotype call command.
dragena genotype gtc-to-vcf --bpm-manifest /user/productfiles/manifest.bpm --csv-manifest /user/productfiles/manifest.csv --genome-fasta-file /user/productfiles/genome.fa --gtc-folder /user/gtc --output-folder /user/vcf
  • Use the copy-number call command to call PGx CNVs from the GTC files and produce CNV VCF files. It is recommended to use the same output folder used for SNV VCF since the star-allele call command accepts one VCF folder with SNV and CNV VCFs. dragena copy-number call --cn-model /user/productfiles/cnv_model.dat --gtc-folder /user/gtc --output-folder /user/vcf

  • Use the star-allele call command to generate star allele calls using the CNV and SNV VCF files generated by the gtc-to-vcf and copy-number call commands. dragena star-allele call --vcf-folder /user/vcf --database /user/productfiles/GDA_ePGx_E2_DAv1.0.0.zip --output-folder /user/star-alleles --license-server-url https://username:[email protected]

  • Use the star-allele annotate command to summarize the star alleles and add metabolizer statuses to the star alleles generated by the star-allele call command. Guidelines (CPIC or DPWG) can be specified. dragena star-allele annotate –-star-alleles star_alleles.csv --guidelines CPIC --output-folder /user/metabolizer-statuses

  • [Optional] Use the copy-number train command to retrain the copy number model. dragena copy-number train --bpm-manifest /user/productfiles/manifest.bpm -–genome-fasta-file /user/productfiles/genome.fa –-gtc-folder /user/gtc --platform LCG –-output-folder /user/productfiles/cnmodelnew

  • --no-bgzip

    VCFs are not bgzip compressed (.gz) and no tabix index files (.tbi) are output. Default is false.

    --output-folder

    [Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if the GTC folder is provided.

    --version

    Displays version information.

    For best performance, validate the CN model using truth data before using in CN calling.

    --disable-genome-cache

    Disables the reference genome cache.

    --help

    Displays help information for the copy-number train command.

    --json-log

    Outputs logs in JSON format. Default is false.

    --version

    Displays version information.

    --output-folder

    [Optional] The location to output the CN model. By default, the output folder is the current working directory.

    --help

    Displays help information for the genotype call command.

    --json-log

    Outputs logs in JSON format. Default is false.

    --num-threads

    Number of parallel threads to run.

    --output-folder

    [Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the IDAT folder, if the IDAT folder is provided.

    --version

    Displays version information.

    --output-folder

    [Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if the GTC folder is provided.

    --version

    Displays version information.

    --debug

    Include stack traces in logs. Default is false.

    --disable-genome-cache

    Disables the reference genome cache.

    --filter-loci

    Generates a text file containing a list of probe names to be filtered.

    --unsquash-duplicates

    Generates unique VCF records for duplicate assays. Default is false.

    --help

    Displays help information for the genotype gtc-to-vcf command.

    --json-log

    Outputs logs in JSON format. Default is false.

    --no-bgzip

    VCFs are not bgzip compressed (.gz) and no tabix index files (.tbi) are output. Default is false.

    --output-folder

    [Optional] Specifies the path to the folder where the output files are saved. The output directory structure matches the directory structure of the GTC folder, if GTC folder is provided.

    --version

    Displays version information.

    --output-folder

    [Optional] Directory path to output files. Default is the current working directory.

    --version

    Displays version information.

    --version

    Displays version information.

    When entering paths or long names, copy and paste the values to help avoid errors.

  • If using Windows, use a File Explorer window to navigate to the product file or folder that is needed by the DRAGEN Array Local command. While holding down the shift button on the keyboard, right click the file and select the 'Copy as Path' option. Then paste the copied path into the command prompt to use the file or folder.

  • To cancel a command while it is running, press Control + C on the keyboard.

  • CPU

    8 cores

    Memory

    16 GB or more

    Hard Drive

    30 GB or more of free disk space

    Operating System

    One of the following:

    • Windows 10 or later – win10-x64

    • CentOS 7 or later, Ubuntu 20.04 or later – linux-x64

    copy-number call

    Determines copy number variants given genotypes (GTC to CNV VCF).

    copy-number help

    Displays help information for a copy-number command.

    copy-number train

    Trains copy number model for a set of samples (GTC to CN Model File).

    copy-number version

    Displays version information for copy-number.

    --cn-model

    [Required] Specifies the path to the copy number model parameters file (.dat).

    --gtc-folder

    [Required] Specifies the path to the directory where all genotype files (.gtc) are located. The command cannot be used with --gtc-sample-sheet.

    This path also includes the contents of all subdirectories.

    --gtc-sample-sheet

    [Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). The sample sheet can be in CSV or JSON format. The command cannot be used with --gtc-folder.

    --debug

    Includes stack traces in logs. Default is false.

    --help

    Displays help information for the copy-number call command.

    --json-log

    Outputs logs in JSON format. Default is false.

    --bpm-manifest

    [Required] Specifies the path to the bead pool manifest in BPM format. Assumes mask file (.msk) is in the same directory.

    --genome-fasta-file

    [Required] Specifies the path to the genome FASTA file (.fa). Assumes FASTA index file (.fai) is in the same directory.

    --gtc-folder

    [Required] Specifies the path to the directory where all genotype files (.gtc) are located. Can be in CSV or JSON format. Cannot be used with --gtc-sample-sheet.

    This path also includes the contents of all subdirectories.

    --gtc-sample-sheet

    [Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.

    --platform

    [Required] Specifies which microarray platform generated the data. Set to 'LCG' for Global Diversity Array with enhanced PGx.

    --debug

    Includes stack traces in logs. Default is false.

    genotype call

    Determines genotype calls (GTC) from IDAT files.

    genotype gtc-to-bedgraph

    Converts GTC to BedGraphs, producing BedGraph formatted visualization files from the log R ratio data contained in the GTC intermediate files.

    genotype gtc-to-vcf

    Converts GTC to VCF.

    genotype help

    Displays the help information for the genotype command.

    genotype version

    Displays version information for the genotype command.

    --bpm-manifest

    [Required] Specifies the path to the bead pool manifest in BPM format.

    --cluster-file

    [Required] Specifies the path to the EGT cluster file to use.

    --idat-folder

    [Required] Specifies the path to the directory where all intensity data IDATs (for the samples to be processed) are located. Must be in IDAT format. Cannot be used with --idat-sample-sheet.

    This path also includes the contents of all subdirectories.

    --idat-sample-sheet

    [Required] Specifies the path to a sample sheet containing paths to intensity data IDATs. Can be in CSV or JSON format. Cannot be used with --idat-folder.

    --debug

    Includes stack traces in logs. Default is false.

    --gencall-cutoff

    GenCall score cutoff to label a NoCall. Default is 0.15.

    --bpm-manifest

    [Required] Specifies the path to the bead pool manifest in BPM format.

    --gtc-folder

    [Required] Specifies the path to the directory where all genotype (.gtc) files are located. Cannot be used with --gtc-sample-sheet.

    This path also includes the contents of all subdirectories.

    --gtc-sample-sheet

    [Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.

    --debug

    Include stack traces in logs. Default is false.

    --help

    Displays help information for the genotype gtc-to-bedgraph command.

    --json-log

    Outputs logs in JSON format. Default is false.

    --bpm-manifest

    [Required] Specifies the path to the bead pool manifest in BPM format.

    --csv-manifest

    [Required] Specifies the path to the CSV manifest with SourceSeq column.

    --genome-fasta-file

    [Required] Specifies the path to the genome FASTA file (.fa). Assumes FASTA index file (.fai) is in the same directory.

    --gtc-folder

    [Required] Specifies the path to the directory where all genotype files (.gtc) are located. Cannot be used with --gtc-sample-sheet.

    This path also includes the contents of all subdirectories.

    --gtc-sample-sheet

    [Required] Specifies the path to a sample sheet containing paths to genotype files (.gtc). Can be in CSV or JSON format. Cannot be used with --gtc-folder.

    --auxiliary-loci

    Specifies the path to the VCF file with auxiliary definitions of loci, such as for multi-nucleotide variants.

    star-allele call

    Determines PGx star allele and variant genotypes.

    star-alle annotate

    Annotate PGx gene functions and product JSON report.

    star-allele help

    Displays help information for a star allele command.

    star-allele version

    Displays version information for star allele.

    --database

    [Required] The PGx database file (.zip).

    --license-server-url

    [Required] The license server url with credentials.

    --vcf-folder

    [Required] The directory containing *.snv.vcf.gz and *.cnv.vcf.gz files.

    --debug

    Includes stack traces in logs. Default is false.

    --help

    Displays help information for the star-allele call command.

    --json-log

    Outputs logs in JSON format. Default is false.

    --star-alleles

    [Required] Path to star alleles file (.csv) generated by the call subcommand.

    --guidelines

    PGx guidelines to use for annotation. Valid values are ‘CPIC’ and ‘DPWG’. Default is ‘CPIC’.

    --debug

    Includes stack traces in logs. Default is false.

    --help

    Displays help information for the star-allele annotate command.

    --json-log

    Outputs logs in JSON format. Default is false.

    --output-folder

    [Optional] Directory path to output files. Default is the current working directory.

    Command-line interface Basics
    Illumina Product Pagearrow-up-right
    logs
    Illumina Support Sitearrow-up-right
    Computing Requirements
    Command Index
    DRAGEN Array Applications
    Optimizing cluster files and copy number models
    SNV VCF Files
    Genotype Call Files
    Cluster File
    GenomeStudioarrow-up-right
    DRAGEN Array Support Sitearrow-up-right
    Infinium Global Diversity Array with Enhanced PGx Product Filesarrow-up-right
    Infinium Genotyping Data Analysis Technical Notearrow-up-right
    Infinium Arrays Support Webinar Videoarrow-up-right
    Custom cluster file creation for improved copy number analysisarrow-up-right
    Copy Number (CN) Model File
    Command Index
    Infinium booster contentarrow-up-right
    genotyping applications
    pharmacogenomic applications
    DRAGEN Array Local