# Input Files

The following section describes the input files required by DRAGEN Array.\
Product files (anything other than the IDATs) can be found on the [support site](https://support.illumina.com/array/array_software/dragen-array-secondary-analysis/downloads.html).

## IDAT Files <a href="#section-idat" id="section-idat"></a>

For each sample a pair of raw intensity files (.idat) are generated from the iScan System or NextSeq550 (for select arrays). They provide intensities in the red and green channels for each probe on the Infinium array. More information on which arrays can be used with NextSeq550, can be found on the [Illumina Knowledge page on NextSeq550](https://knowledge.illumina.com/microarray/general/microarray-general-faq-list/000003871).

An IDAT file is identified by the BeadChip Barcode (12-digit unique Sentrix ID, i.e. 123456789101), BeadChip Position (row and column of the sample, i.e. R01C01), and Grn (Green) or Red for the specific channel.

## Manifest Files <a href="#manifest_files" id="manifest_files"></a>

The CSV and BPM manifest files can be found on the Illumina Support Site for all commercial Infinium BeadChips or on [MyIllumina](http://my.illumina.com/) for custom and semi-custom designs. DRAGEN Array only supports manifest files from the Illumina Support site. For instructions on obtaining manifest files from MyIllumina, see Illumina Knowledge article, [How to access custom array product files (manifest and product definition files) in MyIllumina](https://knowledge.illumina.com/microarray/general/microarray-general-reference_material-list/000001531).

The CSV manifest file (.csv) provides complementary data to the BPM manifest file in a human readable format. It is a required input to the genotype gtc-to-vcf command to enable VCF generation for insertion/deletion variants. `gtc-to-vcf` depends on the presence of accurate mapping information within the manifest, and may produce inaccurate results if the mapping information is incorrect. Mapping information follows the implicit dbSNP standard, where

* Positions are reported with 1-based indexing.
* Positions in the PAR are reported with mapping position to the X chromosome.
* For an insertion relative to the reference, the position of the base immediately 5' to the insertion (on the plus strand) is given.
* For a deletion relative to the reference, the position of the most 5' deleted based (on the plus strand) is given.

## Cluster File <a href="#section-cluster-file" id="section-cluster-file"></a>

The cluster file (.egt) is a standard product file provided by Illumina for commercial genotyping products and it is a required input for the genotype call command in DRAGEN Array. Custom cluster files may be required for optimal genotyping performance. See section [Optimizing cluster files and copy number models](/product-guides/dragen-array-local-analysis.md#optimizing_cluster_files) for additional details.

## PGx CN Model File <a href="#cn_model_file" id="cn_model_file"></a>

The PGx CN (Copy Number) model file (.dat) is a required input to the pgx copy-number call command to enable accurate copy number calling for pharmacogenomics. Illumina provides a standard CN model file for each PGx array product. CN model files are named based on the manifest file revision (e.g., a CN model file trained from manifest revision B1 is paired with that B1 manifest) and must be used with their paired manifest file of the same revision. See section [Optimizing cluster files and copy number models](/product-guides/dragen-array-local-analysis.md#optimizing_cluster_files) for additional details.

## Cytogenetics Model File <a href="#cyto_model_file" id="cyto_model_file"></a>

The cytogenetics model file (.dat) is a required input to the cyto call command. Illumina provides a standard cyto model file for each supported array product. Cyto model files are named based on the manifest file revision and must be used with their paired manifest file of the same revision. For custom or other products, please contact Tech Support to request a cyto model file and ensure you include the product BPM manifest, EGT cluster file, and IDAT or GTC files for at least one sample. The cytogenetics model file primarily contains probe GC content information and is tied to the probe sequence/manifest. As long as the manifest and DRAGEN Array version remain unchanged, retraining is generally not required. Contrary to the PGx CN Model File, the Cytogenetics Model File does not require retraining when a different cluster file is used.

**Note:** The CN model file needs to be updated upon manifest revisions since probes can be added or removed during manifest revisions. A mismatch between the CN model file and the manifest will cause an error during `pgx copy-number call` and `cyto call`.

## Mask File <a href="#mask_file" id="mask_file"></a>

The mask file (.msk) is a required input to the pgx copy-number train command to enable accurate pgx copy number training for pharmacogenomics. It does not need to be provided as an explicit input to the command line interface but should reside in the same folder as the BPM manifest. It should have the same base name as the manifest for the product. Illumina provides a mask file for each PGx array product and these can be found on the [product files support page.](https://support.illumina.com/array/array_software/dragen-array-secondary-analysis/downloads.html)

## PGx Database File <a href="#section-pgx-database-file" id="section-pgx-database-file"></a>

The PGx database file (.zip) contains the variant mapping information from Infinium PGx arrays to PGx variants. Each line in this file represents a single probe ID mapping to a variant's HGVS (Human Genome Variation Society) tag. This creates a map of many probes to one variant. DRAGEN Array cross references this map with SNV VCF IDs during runtime to do star allele calling. It works across all supported PGx products, even though the probes and variant coverage differ across them.

## Cytogenetics Database File <a href="#cyto_db_file" id="cyto_db_file"></a>

The cytogenetics database file (.zip) contains information from Ensembl and RefSeq data sources used in the generation of Cytogenetics Annotation JSON File. This file can be used across products (beadchip/manifest types and versions). It is only necessary for input to local analysis (i.e., `cyto annotate`) as it is already stored in the cloud for cloud analysis. It may be updated in the future to accomodate changes in the underlying Ensembl and RefSeq datasources.

## Genome FASTA Files <a href="#section-genome-fasta-files" id="section-genome-fasta-files"></a>

The genome FASTA file (.fa) is a text file with the reference genome sequences.The FASTA index file (.fai) contains metadata about chromosomal orchestration within the FASTA file for a particular species. DRAGEN Array PGx calling supports human genome build 37 and 38. The genome FASTA file and FASTA index file are both provided by Illumina for human species and should be stored together in the same input folder.\
For custom reference genomes, the contig identifiers in the provided genome FASTA file must match exactly the chromosome identifiers specified in the provided manifest. For a standard human product manifest, this means that the contig headers should read ">1" rather than ">chr1". Note: The Genome FASTA file is only required for the dragen-array-local-analysis workflow. If you're using dragen-array-cloud-analysis, you do not need to provide this file.

## Sample Sheet <a href="#section-sample-sheet" id="section-sample-sheet"></a>

The sample sheet is a CSV formatted input file that utilizes a couple required fields for sample lookup (`SentrixBarcode_A, SentrixPosition_A` for local, `beadChipName, sampleSectionName` for cloud) to enable adding optional metadata and analyzing a filtered list of samples within a folder. It is intended to be flexible and the local version should be backwards compatible with most GenomeStudio samplesheets.

The root folder which DRAGEN Array will search the files for can be set by either providing it via the `--idat-folder` or `--gtc-folder` options (where applicable). Or by setting the `RootFolder` field in the `[Header]` section. This `RootFolder` should be the full absolute path to the sample files. e.g.,

```
[Header]
RootFolder,/test/samples
[Data]
....
```

**Note:** In the case of conflict between `RootFolder` and the CLI options (`--idat-folder` or `--gtc-folder`), the CLI options take precedence.

The following are examples of all valid samplesheets.

* Most basic (no sections, one sample)

```
SentrixBarcode_A,SentrixPosition_A
204753010023,R02C01
```

* Medium complexity (no sections, multiple samples, optional data)

```
SentrixBarcode_A,SentrixPosition_A,Sample_ID,Sample_Group,MetaData1
204753010023,R01C01,NA1231,Group1,F
204753010024,R01C01,NA1233,Group2,M
```

* High complexity (sections, multiple samples, optional data)

```
[Header]
RootFolder,/tests/samples
Date,1/1/2025
[Data]
SentrixBarcode_A,SentrixPosition_A,Sample_ID,Sample_Group,MetaData1
204753010023,R01C01,NA1231,Group1,F
204753010024,R01C01,NA1233,Group2,M
```

**Notes:**

* The column names are case insensitive. For example, the columns `Sample_Name` and `sample_name`, would be considered the same and the software would produce an error like this: `Duplicate column sample_name found. Column names are case-insensitive. Please remove or rename the column from the samplesheet and re-process.`
* Because user-provided fields get output in the [Genotype Summary File](/product-guides/output-files.md#genotype_summary_files), the column names cannot conflict with those fields. For example, if the user provides a column named `Sex Estimate` in their samplesheet. DRAGEN Array will produce the following error: `Sex Estimate is a reserved keyword. Please remove or rename the column from the samplesheet and re-process.`
* The optional fields (i.e. not `SentrixBarcode_A` and `SentrixPosition_A`) will be output as-is in the [genotype summary files](/product-guides/output-files.md#genotype_summary_files) for the `genotype call` command.
* The `[Manifests]` section (used by GenomeStudio to delineate manifests in multi-manifest analyses) is currently ignored in DRAGEN Array.
* The samplesheet is validated to ensure that column names and field values do not exceed 500 characters.
* When editing samplesheets in Excel, format `SentrixBarcode_A` as a number with 0 decimal places. If stored as text with a decimal or in scientific notation, the software cannot determine the IDAT file names and the run will fail.

### Sample Name Determination

The sample name used in downstream outputs is determined by the following precedence rules:

1. **Samplesheet** — If a `Sample_Name` column is present, that value per row/sample is used. **Note:** If that column is present, it *must* be non-empty for every row and it cannot conflict with the default Sample ID - `SentrixBarcode_Position` (e.g., `204753010023_R01C01`).
2. **IDAT or GTC metadata** — If no samplesheet value is available, the sample name embedded in the IDAT or GTC file metadata is used.
3. **SentrixBarcode\_Position fallback** — If neither (1) nor (2) provides a sample name, the sample name defaults to the default Sample ID - `SentrixBarcode_Position` (e.g., `204753010023_R01C01`).

When both the samplesheet and file metadata define a sample name, the samplesheet value takes precedence. **Note:** does not modify the sample name stored in the GTC file; it only affects downstream outputs such as the [Genotype Sample Summary](/product-guides/output-files.md#genotype_summary_files) and VCFs. If downstream commands are run with different samplesheets that specify different sample names, the resulting downstream outputs will reflect those different names. The key takeaway is that the samplesheet for any given command is treated as the source of truth for the sample name.

### Methylation QC sample sheet

For DRAGEN Array Methylation QC on cloud, the samplesheet does not currently support sections such as `[Header]` and `[Data]` and instead of using `SentrixBarcode_A` and `SentrixPosition_A` columns as the sample's keys, it uses `beadChipName` and `sampleSectionName`. Furthermore, the additional optional sample sheet fields are used in analysis.

Following Sample\_Group, any number of additional columns can be added to include meta data fields such as sex, sample type, plate and well information, etc. Additional columns added after the Sample\_Group column may have user-defined column header values. The Sample\_ID field and any additional metadata added will be replicated in the Sample QC Summary output files.

The Sample\_Group field will be used to populate the PCA Control Plot within the Sample QC Summary Plots file and the Principal Component Summary file. For the PCA Control Plot, each sample group will be assigned a unique color. Samples assigned to the same Sample\_Group value will be the same color in the PCA Control Plot. e.g.,

```
beadChipName,sampleSectionName,Sample_ID,Sample_Group,MetaData1
204753010023,R01C01,NA1231,Group1,F
204753010023,R02C01,NA1232,Group2,F
204753010024,R01C01,NA1233,Group2,M
204753010024,R02C01,NA1234,Group1,M
```

### Cytogenetics analysis + Emedgene interpretation sample sheet

For Cytogenetics analysis + Emedgene interpretation on cloud, an additional column: `demographicSex` will be used to compare against to the `Sex Estimate` output from DRAGEN Array genotyping module and be displayed in Emedgene. The allowed values for this field are `M` (Male), `F` (Female), or `U` (Unknown).

Example:

```
SentrixBarcode_A,SentrixPosition_A,demographicSex
204753010023,R01C01,F
204753010023,R02C01,F
204753010024,R01C01,M
204753010024,R02C01,M
```

## YAML Config File <a href="#section-yaml-config-file" id="section-yaml-config-file"></a>

The YAML config file (`.yaml` or `.yml`) is an optional user-provided input to the local `qc report` command and a pre-setup input file to the cloud [DRAGEN Array - Genotyping and QC](/product-guides/dragen-array-cloud-analysis/overview/dragen-array-genotyping.md) pipeline. Use it when you want to override QC thresholds.

Set a threshold to a numeric value to enable that check, or set it to `null` to disable it.

If `--config` is omitted, `qc report` uses built-in defaults for `callRate` and `logRDev`. Other thresholds are unset by default.

For cloud [DRAGEN Array - Genotyping and QC](/product-guides/dragen-array-cloud-analysis/overview/dragen-array-genotyping.md) pipeline, a set of default product file configurations are provided for commercial array products, where chemistry-specific default YAML config files are used. To change the QC thresholds, users can create Custom Configuration with updated YAML config files.

For the required YAML structure, chemistry-specific template files, generic examples, and additional guidance, see [Configuration file (optional)](/product-guides/dragen-array-local-analysis/qc-report.md#configuration-file-optional).

## Input File Summary Table <a href="#section-input-file-summary-table" id="section-input-file-summary-table"></a>

In addition to the input files, there are set of intermediate files, including GTC, SNV VCF, CNV VCF and PGx CSV, which are outputs of some DRAGEN Array Local commands and inputs to other commands.

The table below summarizes the input files or intermediate file, their sources, and the associated DRAGEN Array Local commands and options.

| Input File                                                                                                                                     | File Extension                 | Source                                                                | Command                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Option              |
| ---------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | --------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- |
| [IDAT](#section-idat)                                                                                                                          | .idat                          | User provided from scanning instrument                                | <p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-call">genotype call</a><br><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#qc-call">qc call</a></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | --idat-folder       |
| [CSV Manifest](#manifest_files)                                                                                                                | .csv                           | Product file from Illumina                                            | <p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-vcf">genotype gtc-to-vcf</a><br><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#qc-call">qc call</a></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | --csv-manifest      |
| [BPM Manifest](#manifest_files)                                                                                                                | .bpm                           | Product file from Illumina                                            | <p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-pgx-copy-number-train">pgx copy-number train</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-call">genotype call</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-bedgraph">genotype gtc-to-bedgraph</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-vcf">genotype gtc-to-vcf</a></p>                                                                                                                                                                                                                                            | --bpm-manifest      |
| [Cluster File](#section-cluster-file)                                                                                                          | .egt                           | Product file from Illumina or user created using GenomeStudio         | [genotype call](/product-guides/dragen-array-local-analysis.md#section-genotype-call)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | --cluster-file      |
| [PGx CN Model](#cn_model_file)                                                                                                                 | .dat                           | Product file from Illumina or user created using DRAGEN Array Local   | [pgx copy-number call](/product-guides/dragen-array-local-analysis.md#section-pgx-copy-number-call)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | --cn-model          |
| [Cytogenetics CN Model](#cyto_model_file)                                                                                                      | .dat                           | Product file from Illumina                                            | [cyto call](/product-guides/dragen-array-local-analysis.md#section-cyto-call)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | --cn-model          |
| [PGx Database](#section-pgx-database-file)                                                                                                     | .zip                           | Product file from Illumina                                            | [pgx star-allele call](/product-guides/dragen-array-local-analysis.md#section-pgx-star-allele-call)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | --database          |
| [Cytogenetics Database](#cyto_db_file)                                                                                                         | .zip                           | Product file from Illumina                                            | [cyto annotate](/product-guides/dragen-array-local-analysis.md#section-cyto-annotate)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | --annotation-db     |
| [Genome FASTA](#section-genome-fasta-files)                                                                                                    | .fa                            | Product file from Illumina                                            | <p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-vcf">genotype gtc-to-vcf</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-pgx-copy-number-train">pgx copy-number train</a></p>                                                                                                                                                                                                                                                                                                                                                                                                                                          | --genome-fasta-file |
| [YAML Config File](#section-yaml-config-file)                                                                                                  | <p>.yaml</p><p>.yml</p>        | Illumina template config file or user-provided config file            | [qc report](/product-guides/dragen-array-local-analysis/qc-report.md)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | --config            |
| [Sample Sheet](#section-sample-sheet)                                                                                                          | .csv                           | User provided                                                         | <p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-call">genotype call</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-bedgraph">genotype gtc-to-bedgraph</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-vcf">genotype gtc-to-vcf</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-pgx-copy-number-call">pgx copy-number call</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-pgx-copy-number-train">pgx copy-number train</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-cyto-call">cyto call</a><br><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#qc-call">qc call</a></p> | --sample-sheet      |
| [GTC](/product-guides/output-files.md#genotype_call_file)                                                                                      | .gtc                           | DRAGEN Array output from genotype call                                | <p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-bedgraph">genotype gtc-to-bedgraph</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-genotype-gtc-to-vcf">genotype gtc-to-vcf</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-pgx-copy-number-call">pgx copy-number call</a></p><p><a href="/pages/eAC7PEbpUOZ9VeXL4NXW#section-pgx-copy-number-train">pgx copy-number train</a></p>                                                                                                                                                                                                                              | --gtc-folder        |
| <p><a href="/pages/adC86Efp3WqTAMRZycJM#snv_vcf_file">SNV VCF</a></p><p><a href="/pages/adC86Efp3WqTAMRZycJM#cnv_vcf_file">PGx CNV VCF</a></p> | <p>.snv.vcf</p><p>.cnv.vcf</p> | DRAGEN Array output from genotype gtc-to-vcf and pgx copy-number call | [pgx star-allele call](/product-guides/dragen-array-local-analysis.md#section-pgx-star-allele-call)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | --vcf-folder        |
| [PGx DAT](/product-guides/output-files.md#star_allele_dat)                                                                                     | .dat                           | DRAGEN Array output from pgx star-allele call                         | [pgx star-allele annotate](/product-guides/dragen-array-local-analysis.md#section-pgx-star-allele-annotate)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | --star-alleles      |
| [Cytogenetics CNV VCF](/product-guides/output-files.md#cyto_vcf_file)                                                                          | .cnv.vcf                       | DRAGEN Array output from cyto call                                    | [cyto annotate](/product-guides/dragen-array-local-analysis.md#section-cyto-annotate)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | --vcf-folder        |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragenarray.illumina.com/product-guides/input-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
