# Input Files

The following section describes the input files required by DRAGEN Array.

## IDAT Files <a href="#idat" id="idat"></a>

For each sample a pair of raw intensity files (.idat) are generated from the iScan System or NextSeq550 (for non-methylation arrays). They provide intensities in the red and green channels for each probe on the Infinium array.

An IDAT file is identified by the BeadChip Barcode (12-digit unique Sentrix ID, i.e. 123456789101), BeadChip Position (row and column of the sample, i.e. R01C01), and Grn (Green) or Red for the specific channel.

## Manifest Files <a href="#manifest_files" id="manifest_files"></a>

The CSV and BPM manifest files can be found on the Illumina Support Site for all commercial Infinium BeadChips or on [MyIllumina](http://my.illumina.com/) for custom and semi-custom designs. For instructions on obtaining manifest files from MyIllumina, see Illumina Knowledge article, [How to access custom array product files (manifest and product definition files) in MyIllumina](https://knowledge.illumina.com/microarray/general/microarray-general-reference_material-list/000001531).

The CSV manifest file (.csv) provides complementary data to the BPM manifest file in a human readable format. It is a required input to the genotype gtc-to-vcf command to enable VCF generation for insertion/deletion variants.

## Cluster File <a href="#toc150786136" id="toc150786136"></a>

The cluster file (.egt) is a standard product file provided by Illumina for commercial genotyping products and it is a required input for the genotype call command in DRAGEN Array. Custom cluster files may be required for optimal genotyping performance. See section [Optimizing cluster files and copy number models](https://help.dragenarray.illumina.com/dragen-array-v1.0/dragen-array-local-analysis#optimizing_cluster_files) for additional details.

## CN Model File <a href="#cn_model_file" id="cn_model_file"></a>

The CN (Copy Number) model file (.dat) is a required input to the copy-number call command to enable accurate copy number calling for pharmacogenomics. Illumina provides a standard CN model file for each PGx array product. See section [Optimizing cluster files and copy number models](https://help.dragenarray.illumina.com/dragen-array-v1.0/dragen-array-local-analysis#optimizing_cluster_files) for additional details.

## PGx Database File <a href="#toc150786138" id="toc150786138"></a>

The PGx database file (.zip) contains the variant mapping information from Infinium PGx arrays to PGx variants. For each gene and each variant used in the star allele definitions of the gene, there is a mapping to the ID field in the SNV VCF file. Each line in the gene mapping file represents a single variant and contains the SNV VCF ID for that variant followed by the HGVS (Human Genome Variation Society) tag for the variant. The PGx database file is array specific and is one of the product files provided by Illumina for each PGx array product.

## Genome FASTA Files <a href="#toc150786139" id="toc150786139"></a>

The genome FASTA file (.fa) is a text file with the reference genome sequences.The FASTA index file (.fai) contains meta-data about chromosomal orchestration within the FASTA file for a particular species. DRAGEN Array PGx calling supports human genome build 37 and 38. The genome FASTA file and FASTA index file are both provided by Illumina for human species and should be stored together in the same input folder.

## IDAT Sample Sheet <a href="#toc150786140" id="toc150786140"></a>

For local analysis, the IDAT sample sheet can be a CSV or JSON formatted file with direct paths to sample IDAT files. It enables easy analysis of samples from different directories.

Example CSV format:

`Green IDAT Path,Red IDAT Path`

`/path/to/sample1_Grn.idat,/path/to/sample1_Red.idat`

`/path/to/sample2_Grn.idat,/path/to/sample2_Red.idat`

`/path/to/sample3_Grn.idat,/path/to/sample3_Red.idat`

Example JSON format:

`[`

`{`

`"Green IDAT Path": "/path/to/sample1_Grn.idat",`

`"Red IDAT Path": "/path/to/sample1_Red.idat"`

`},`

`{`

`"Green IDAT Path: "/path/to/sample2_Grn.idat",`

`"Red IDAT Path": "/path/to/sample2_Red.idat"`

`},`

`{`

`"Green IDAT Path": "/path/to/sample3_Grn.idat",`

`"Red IDAT Path": "/path/to/sample3_Red.idat"`

`},`

`]`

For cloud analysis, the IDAT sample sheet can be a CSV formatted file.

`beadChipName,sampleSectionName`

`Beadchip 1 barcode (204753010023), sample section (R01C01)`

`Beadchip 1 barcode (204753010023), sample section (R02C01)`

`Beadchip 2 barcode (204753010024), sample section (R01C01)`

`Beadchip 2 barcode (204753010024), sample section (R02C01)`

For DRAGEN Array Methylation QC on cloud, additional optional sample sheet fields are available.

Following Sample\_Group, any number of additional columns can be added to include meta data fields such as sex, sample type, plate and well information, etc. Additional columns added after the Sample\_Group column may have user-defined column header values. The Sample\_ID field and any additional meta data added will be replicated in the Sample QC Summary output files.

The Sample\_Group field will be used to populate the PCA Control Plot within the Sample QC Summary Plots file and the Principal Component Summary file. For the PCA Control Plot, each sample group will be assigned a unique color. Samples assigned to the same Sample\_Group value will be the same color in the PCA Control Plot.

`beadChipName,sampleSectionName,Sample_ID,Sample_Group,MetaData1`

`Beadchip 1 barcode (204753010023), sample section (R01C01),NA1231,Group1,F`

`Beadchip 1 barcode (204753010023), sample section (R02C01),NA1232,Group2,F`

`Beadchip 2 barcode (204753010024), sample section (R01C01),NA1233,Group2,M`

`Beadchip 2 barcode (204753010024), sample section (R02C01),NA1234,Group1,M`

## GTC Sample Sheet <a href="#toc150786141" id="toc150786141"></a>

The GTC sample sheet is a CSV or JSON formatted file with direct paths to sample GTC files. It enables easy analysis of samples from different directories.

Example CSV format:

`GTC Path`

`/path/to/sample1.gtc`

`/path/to/sample2.gtc`

`/path/to/sample3.gtc`

Example JSON format:

`[`

`{`

`"GTC Path": "/path/to/sample1.gtc"`

`},`

`{`

`"GTC Path": "/path/to/sample2.gtc"`

`},`

`{`

`"GTC Path": "/path/to/sample3.gtc"`

`}`

`]`

## Input File Summary Table <a href="#toc150786142" id="toc150786142"></a>

In addition to the input files, there are set of intermediate files, including GTC, SNV VCF, CNV VCF and PGx CSV, which are outputs of some DRAGEN Array Local commands and inputs to other commands.

The table below summarizes the input files or intermediate file, their sources, and the associated DRAGEN Array Local commands and options.

| Input File        | Source                                                              | Command                                                                                                  | Option              |
| ----------------- | ------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | ------------------- |
| IDAT              | User provided from scanning instrument                              | genotype call                                                                                            | --idat-folder       |
| CSV Manifest      | Product file from Illumina                                          | genotype gtc-to-vcf                                                                                      | --csv-manifest      |
| BPM Manifest      | Product file from Illumina                                          | <p>copy-number train</p><p>genotype call</p><p>genotype gtc-to-bedgraph</p><p>genotype gtc-to-vcf</p>    | --bpm-manifest      |
| Cluster File      | Product file from Illumina or user created using GenomeStudio       | genotype call                                                                                            | --cluster-file      |
| CN Model          | Product file from Illumina or user created using DRAGEN Array Local | copy-number call                                                                                         | --cn-model          |
| PGx Database      | Product file from Illumina                                          | star-allele call                                                                                         | --database          |
| Genome FASTA      | Product file from Illumina                                          | <p>genotype gtc-to-vcf</p><p>copy-number train</p>                                                       | --genome-fasta-file |
| IDAT Sample Sheet | User provided                                                       | genotype call                                                                                            | --idat-sample-sheet |
| GTC Sample Sheet  | User provided                                                       | <p>genotype gtc-to-bedgraph</p><p>genotype gtc-to-vcf</p><p>copy-number call</p><p>copy-number train</p> | --gtc-sample-sheet  |
| GTC               | DRAGEN Array output from genotype call                              | <p>genotype gtc-to-bedgraph</p><p>genotype gtc-to-vcf</p><p>copy-number call</p><p>copy-number train</p> | --gtc-folder        |
| SNV and CNV VCF   | DRAGEN Array output from genotype gtc-to-vcf and copy-number call   | star-allele call                                                                                         | --vcf-folder        |
| PGx CSV           | DRAGEN Array output from star-allele call                           | star-allele annotate                                                                                     | --star-alleles      |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragenarray.illumina.com/dragen-array-v1.0/product-guides/input-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
