# DRAGEN Array QC Report

The DRAGEN Array QC Report is a **self-contained, interactive HTML dashboard** that helps you evaluate the quality of microarray datasets processed with the **DRAGEN Array** pipeline. It combines per-sample functional QC metrics, control-probe QC metrics (probe-level and summarized), and interactive visualizations to help you quickly:

* Inspect per-sample metrics, for example, **Autosomal Call Rate**, **Log R Ratio Standard Deviation (LogRDev)**, **Sex estimate**
* Detect assay or instrument issues using control-probe intensity patterns
* Identify outliers, spatial artifacts, and batch effects using heatmaps and trend plots
* Apply automated QC thresholds and export results for downstream review
* Quickly calculate project-wide average call rate and LogRDev, and monitor trends across multiple datasets

***

## Analysis Workflow

Use the following instructions to generate an interactive QC Report (HTML format) and QC Table (spreadsheet format). If you used a [Sample Sheet](/product-guides/input-files.md#section-sample-sheet) in the upstream workflow, user-defined metadata can be carried into the final QC report outputs. See [Command Index](/product-guides/dragen-array-local-analysis.md#command_index_1) for all command parameters.

Methylation and genotyping workflow differences are highlighted below.

| Workflow                | Upstream command        | Key inputs                                                                      | Dataset folder contents for `dragena qc report`                                  | When to use                                                                                                                                         |
| ----------------------- | ----------------------- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| Methylation             | `dragena qc call`       | CSV manifest (`--csv-manifest`) + IDAT folder                                   | `controls.raw_metrics.csv` + `controls.qc_metrics.csv`                           | Standard methylation QC-report workflow                                                                                                             |
| Genotyping, recommended | `dragena genotype call` | BPM manifest (`--bpm-manifest`) + cluster file (`--cluster-file`) + IDAT folder | `controls.raw_metrics.csv` + `controls.qc_metrics.csv` + `gt_sample_summary.csv` | Recommended when you want the richer HTML QC report experience, including functional QC, Autosomal Call Rate and LogRDev views, and sample heatmaps |
| Genotyping, limited     | `dragena qc call`       | CSV manifest (`--csv-manifest`) + `--array-type genotyping` + IDAT folder       | `controls.raw_metrics.csv` + `controls.qc_metrics.csv`                           | Use only for control-based genotyping QC inputs when `gt_sample_summary.csv` is not needed                                                          |

### Methylation workflow

```mermaid
---
config:
  look: handDrawn
  theme: redux
  layout: elk
---
flowchart TB
  classDef methylCmd fill:#CDEED8,stroke:#2F7D4A,color:#184A2B,stroke-width:2px;
  classDef input fill:#FFF0D9,stroke:#C56A1A,color:#7A3E00,stroke-width:1.5px;
  classDef option fill:#EFE7FF,stroke:#7B57C8,color:#4B2B88,stroke-width:1.5px;
  classDef output fill:#FFFFFF,stroke:#5E6B7A,color:#1F2937,stroke-width:1.25px;

  subgraph M_inputs["Inputs to dragena qc call"]
    direction TB
    M_csv["CSV manifest<br/>(--csv-manifest)"]:::input
    M_array["--array-type<br/>methylation"]:::option
    M_idat["IDAT folder<br/>(--idat-folder)"]:::input
    M_sheet["Sample sheet<br/>(--sample-sheet)<br/><i>Optional</i>"]:::input
  end

  M_qccall["dragena qc call"]:::methylCmd

  subgraph M_dataset_box["QC#8209;call&nbsp;output&nbsp;folder"]
    direction TB
    M_qcmetrics["controls.qc_metrics.csv"]:::output
    M_raw["controls.raw_metrics.csv"]:::output
  end

  M_data["--data"]:::option
  M_config["--config<br/><i>Optional</i>"]:::option
  M_outdir["--output-folder<br/>Default: Current working directory"]:::option
  M_report["dragena qc report"]:::methylCmd
  M_html["QC report<br/>(HTML)"]:::output
  M_table["QC table<br/>(CSV or XLSX)"]:::output

  M_inputs --> M_qccall
  M_qccall --> M_dataset_box
  M_dataset_box -.-> M_data
  M_outdir --> M_report
  M_data --> M_report
  M_config --> M_report
  M_report --> M_html
  M_report --> M_table

  style M_inputs fill:#FFF9EE,stroke:#D6A04E,stroke-width:1px
  style M_dataset_box fill:#F9FBFD,stroke:#94A3B8,stroke-width:1.5px
```

### Genotyping workflow

For genotyping, `dragena genotype call` is the recommended upstream path because it produces `gt_sample_summary.csv`, which enables functional QC metrics and richer QC-report visualizations such as Autosomal Call Rate and LogRDev views, plus the sample heatmaps.

#### Control-only path: `dragena qc call`

Use this path when you only need control-based genotyping QC inputs. It uses a CSV manifest and does not generate `gt_sample_summary.csv`, so the QC report will not include functional QC metrics.

If you have existing `gt_sample_summary.csv` files generated by older versions of DRAGEN Array prior to v1.4.0 release, you can combine those with the outputs from `dragena qc call` by specifying the existing output folder through the `--output-folder` option of `dragena qc call`, and then generate the full QC Report.

```mermaid
---
config:
  look: handDrawn
  theme: redux
  layout: elk
---
flowchart TB
  classDef methylCmd fill:#CDEED8,stroke:#2F7D4A,color:#184A2B,stroke-width:2px;
  classDef input fill:#FFF0D9,stroke:#C56A1A,color:#7A3E00,stroke-width:1.5px;
  classDef option fill:#EFE7FF,stroke:#7B57C8,color:#4B2B88,stroke-width:1.5px;
  classDef output fill:#FFFFFF,stroke:#5E6B7A,color:#1F2937,stroke-width:1.25px;

  subgraph M_inputs["Inputs to dragena qc call"]
    direction TB
    M_csv["CSV manifest<br/>(--csv-manifest)"]:::input
    M_array["--array-type<br/>genotyping"]:::option
    M_idat["IDAT folder<br/>(--idat-folder)"]:::input
    M_sheet["Sample sheet<br/>(--sample-sheet)<br/><i>Optional</i>"]:::input
  end

  M_qccall["dragena qc call"]:::methylCmd

  subgraph M_dataset_box["QC#8209;call&nbsp;output&nbsp;folder"]
    direction TB
    M_qcmetrics["controls.qc_metrics.csv"]:::output
    M_raw["controls.raw_metrics.csv"]:::output
  end
  
  subgraph G_dataset_box["Legacy&nbsp;genotype#8209;call&nbsp;output"]
    direction TB
    G_gt["gt_sample_summary.csv"]:::output
  end


  M_outdir["--output-folder<br/>Default: Current working directory"]:::option
  M_config["--config<br/><i>Optional</i>"]:::option
  M_data["--data"]:::option
  M_report["dragena qc report"]:::methylCmd
  M_html["QC report<br/>(HTML)"]:::output
  M_table["QC table<br/>(CSV or XLSX)"]:::output

  M_inputs --> M_qccall
  M_qccall --> M_dataset_box

  M_dataset_box -.-> M_data
  G_dataset_box -.- M_dataset_box
  M_outdir --> M_report
  M_data --> M_report
  M_config --> M_report
  M_report --> M_html
  M_report --> M_table

  style M_inputs fill:#FFF9EE,stroke:#D6A04E,stroke-width:1px
  style M_dataset_box fill:#F9FBFD,stroke:#94A3B8,stroke-width:1.5px
```

#### Recommended path: `dragena genotype call`

Use this path for most genotyping datasets. It uses the BPM manifest and cluster file, and the output folder already contains the control QC files plus `gt_sample_summary.csv` for functional QC reporting.

```mermaid
---
config:
  look: handDrawn
  theme: redux
  layout: elk
---
flowchart TB
  classDef genotypeCmd fill:#D9E9FF,stroke:#2A62B8,color:#15386B,stroke-width:2px;
  classDef input fill:#FFF0D9,stroke:#C56A1A,color:#7A3E00,stroke-width:1.5px;
  classDef option fill:#EFE7FF,stroke:#7B57C8,color:#4B2B88,stroke-width:1.5px;
  classDef output fill:#FFFFFF,stroke:#5E6B7A,color:#1F2937,stroke-width:1.25px;

  subgraph G_inputs["Inputs to dragena genotype call"]
    direction TB
    G_bpm["BPM manifest<br/>(--bpm-manifest)"]:::input
    G_egt["Cluster file<br/>(--cluster-file)"]:::input
    G_idat["IDAT folder<br/>(--idat-folder)"]:::input
    G_sheet["Sample sheet<br/>(--sample-sheet)<br/><i>Optional</i><br/><b>Source of metadata</b>"]:::input
  end

  G_gencall["dragena genotype call"]:::genotypeCmd

  subgraph G_dataset_box["Genotype#8209;call&nbsp;output&nbsp;folder"]
    direction TB
    G_qcmetrics["controls.qc_metrics.csv"]:::output
    G_raw["controls.raw_metrics.csv"]:::output
    G_gt["gt_sample_summary.csv"]:::output
  end


  G_data["--data"]:::option
  G_config["--config<br/><i>Optional</i>"]:::option
  G_outdir["--output-folder<br/>Default: Current working directory"]:::option
  G_report["dragena qc report"]:::genotypeCmd
  G_html["QC report<br/>(HTML)"]:::output
  G_table["QC table<br/>(CSV or XLSX)"]:::output

  G_inputs --> G_gencall
  G_gencall --> G_dataset_box
  G_dataset_box -.-> G_data
  G_outdir --> G_report
  G_data --> G_report
  G_config --> G_report
  G_report --> G_html
  G_report --> G_table

  style G_inputs fill:#FFF9EE,stroke:#D6A04E,stroke-width:1px
  style G_dataset_box fill:#F9FBFD,stroke:#94A3B8,stroke-width:1.5px
```

{% hint style="info" %}
Use `dragena qc call` for methylation or for control-only genotyping inputs that rely on a CSV manifest. For genotyping, prefer `dragena genotype call` because it uses the BPM manifest plus cluster file and generates `gt_sample_summary.csv`, enabling functional QC metrics and the fuller QC-report feature set. In either workflow, `--data` points to the dataset folder, a parent folder, or a comma-separated dataset list for `dragena qc report`.
{% endhint %}

### Instructions

1. Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Alternatively, navigate to any working directory if the executable was added to your PATH.
2. Generate the QC input files using one of the following commands:

* For genotyping datasets, use `dragena genotype call` in most cases. This is the recommended path when you want both control-based QC and functional QC metrics such as **Autosomal Call Rate**, **LogRDev**, and **Sex estimate**.

```bash
dragena genotype call \
  --bpm-manifest /user/productfiles/manifest.bpm \
  --cluster-file /user/productfiles/clusterfile.egt \
  --idat-folder /user/IDATs \
  --output-folder /user/gtc
```

* Use `dragena qc call` if you want control-based QC inputs only, or if you have existing `gt_sample_summary.csv` file for the input IDATs, or if you are preparing QC-report inputs for methylation datasets. For genotyping, this command uses a CSV manifest and does not produce `gt_sample_summary.csv`.

```bash
dragena qc call \
  --array-type <genotyping|methylation> \
  --csv-manifest /user/productfiles/manifest.csv \
  --idat-folder /user/IDATs \
  --output-folder /user/qc_metrics
```

3. Prepare a dataset folder for `dragena qc report`. The dataset folder must contain at least the following files:
   * `controls.raw_metrics.csv` (required)
   * `controls.qc_metrics.csv` (required)
   * `gt_sample_summary.csv` (highly recommended)

If you ran `dragena genotype call` for the dataset you want to review, you can use the genotype output folder directly as the dataset folder because it already contains `controls.raw_metrics.csv`, `controls.qc_metrics.csv`, and `gt_sample_summary.csv`.

If you want to combine multiple datasets or use a parent folder that contains many dataset folders, see [Example input folder structures](#example-input-folder-structures) below.

4. Run `dragena qc report` using the dataset folder from Step 3 as input. The `--config` file is optional. See [Configuration file (optional)](#configuration-file-optional) below for template links and default-threshold behavior.

```bash
dragena qc report \
  --data /user/dataset_folder \
  --config /user/config.yaml \
  --output-folder /user/qc_report
```

If you want to use the genotype output folder directly and do not need a custom config file, the command can be as simple as:

```bash
dragena qc report \
  --data /user/gtc \
  --output-folder /user/qc_report
```

If you want to combine multiple known dataset folders into one QC report, provide them as a comma-separated list after `--data`. If you also provide `--label`, use the same comma-separated style and supply one label per dataset. Do not add spaces between items; use commas only.

```bash
dragena qc report \
  --data /user/dataset_A,/user/dataset_B \
  --label batch_A,batch_B \
  --output-folder /user/qc_report
```

5. Open the HTML report in your web browser:
   * `DRAGENArray_QC_Report_YYYY_MM_DD.html`

{% hint style="info" %}
If you generated the QC report in **BaseSpace Sequence Hub (BSSH)**, download the HTML report to your computer and then double-click the file to open it in your web browser.
{% endhint %}

{% hint style="warning" %}
If the input data for `qc report` was generated while using a sample sheet, sample metadata such as sample names, processing information, and other sample-sheet values will be carried into the final QC report outputs when present in the input files. Review the QC report outputs before sharing them. If you need to remove some metadata before sharing, remove it from `gt_sample_summary.csv` and rerun `dragena qc report`.
{% endhint %}

{% hint style="info" %}
If you provide a parent folder (for example `--data data/parent_folder`), the tool can automatically discover and process multiple dataset folders. See **Example input folder structures** below.
{% endhint %}

{% hint style="info" %}
Local QC reports can combine multiple dataset folders into one report. This is the supported way to compare runs or batches in a single **Trend Analysis** view. Current cloud QC reports are generated for a single dataset, so their Trend Analysis view summarizes that dataset only rather than comparing multiple runs.
{% endhint %}

***

## Command-line options

The `dragena qc report` command supports the following options to control output location, format, and QC thresholds.

{% hint style="warning" %}
CLI options are case-sensitive and must be entered exactly as shown. In particular, the threshold flags use mixed case: `--threshold-callRate` and `--threshold-logRdev`.
{% endhint %}

### Output options

#### `--output-folder`

Directory path where output files are written.

* **Default:** Current working directory
* Applies to both the HTML report and QC tables

Example:

```bash
dragena qc report \
  --data project_folder \
  --output-folder output/qc_report
```

#### `--output-format <csv|xlsx>`

Use this option to choose whether the per-sample QC table is written as an `xlsx` workbook or a `csv` file. The default output format is `xlsx`.

When the QC table is written as `xlsx`, the workbook includes conditional formatting based on the thresholds that were applied when the report was generated, and it includes a **Thresholds** worksheet that records those threshold settings. Those applied thresholds can come from built-in defaults, a YAML file provided with [`--config`](#configuration-file-optional), or command-line overrides described in [QC threshold overrides (CLI)](#qc-threshold-overrides-cli).

When the QC table is written as `csv`, the output contains values only. It does not include workbook formatting or additional worksheets.

Example:

```bash
dragena qc report \
  --data project_folder \
  --output-format csv
```

### QC threshold overrides (CLI)

These options override functional QC thresholds directly from the command line. They take precedence over built-in defaults **and** any corresponding values provided via `--config`.

For example, if `--config` sets `callRate: 0.90` but you run with `--threshold-callRate 0.95`, the report uses **0.95**.

#### `--threshold-callRate <value>`

Override the Autosomal Call Rate threshold.

* **Default:** `0.98`
* Samples with Autosomal Call Rate **below** this value are marked as FAIL.

Example:

```bash
dragena qc report \
  --data project_folder \
  --threshold-callRate 0.95
```

#### `--threshold-logRdev <value>`

Override the Log R Ratio Standard Deviation (LogRDev) threshold.

* **Default:** `0.20`
* Samples with LogRDev **above** this value are marked as FAIL.

Example:

```bash
dragena qc report \
  --data project_folder \
  --threshold-logRdev 0.25
```

{% hint style="info" %}
For complex or reproducible QC configurations, use a YAML configuration file via `--config`. CLI threshold options are most useful for quick exploratory runs.
{% endhint %}

{% hint style="info" %}
**Precedence (highest to lowest):**

1. Command-line threshold flags (e.g., `--threshold-callRate`, `--threshold-logRdev`)
2. YAML config file values provided via `--config`
3. Built-in defaults
   {% endhint %}

***

## Input files

Each dataset folder must include the required QC metric files below. Optional files enable additional features (for example, functional genotyping QC metrics) or override defaults (configuration).

| File name                  | Description                                                                                                                              | Required                  |
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | ------------------------- |
| `controls.raw_metrics.csv` | Probe-level control intensities used to compute control QC metrics.                                                                      | Yes                       |
| `controls.qc_metrics.csv`  | Per-sample summarized QC metrics derived from control probes.                                                                            | Yes                       |
| `gt_sample_summary.csv`    | Per-sample genotyping metrics (including Autosomal Call Rate, LogRDev, Sex estimate). Strongly recommended for functional QC evaluation. | No (strongly recommended) |
| `config.yaml`              | Overrides QC thresholds and report behavior when provided via `--config`.                                                                | No                        |

{% hint style="info" %}
See [**Metadata propagation (gt\_sample\_summary)**](#metadata-propagation-gt_sample_summary) for details on how user-defined columns are exposed in the report.
{% endhint %}

{% hint style="info" %}
Only a subset of control probe metrics from `controls.raw_metrics.csv` are propagated to the merged QC outputs.\
See [**Control probe propagation from `controls.raw_metrics.csv`**](#control-probe-propagation-from-controls.raw_metrics.csv) for details.
{% endhint %}

### Example input files

#### `controls.qc_metrics.csv`

```
SampleId,ImagingDate,SamplePlate,SampleWell,ScannerId,ScannerVersion,StainingGreenQC,StainingRedQC,ExtensionGreenQC,ExtensionRedQC,HybridizationHighMediumQC,HybridizationMediumLowQC,TargetRemovalIQC,StringencyQC,NonSpecificBindingGreenQC,NonSpecificBindingRedQC,NonPolymorphicGreenQC,NonPolymorphicRedQC
205930510001_R01C01,11/19/2021 12:39:31 PM,WG0587702-DNA,A06,N0782,4.3.0.934,18.989004,15.226524,17.069284,7.0317426,1.4189211,2.8047338,3.2466092,1.7800437,3.1761158,3.1761158,0.07639844,0.11531731
...
```

#### `controls.raw_metrics.csv`

```
SampleId,Category,Beadtype,Control,Color,GreenIntensity,RedIntensity
205930510001_R01C01,Extension,11603365,Extension (G),Blue,38679,4168
205930510001_R01C01,Extension,12613307,Extension (C),Green,39708,4505
...
```

#### `gt_sample_summary.csv`

```
Sample ID,Sample Name,Sample Folder,Autosomal Call Rate,Call Rate,Log R Ratio Std Dev,Sex Estimate,TGA_Ctrl_5716 Norm R,SentrixBarcode_A,SentrixPosition_A
205930510001_R01C01,205930510001_R01C01,GSAPGx/205930510001,0.95813435,0.9528165,0.23342112,U,3.1297944,205930510001,R01C01
...
```

***

## Metadata propagation (gt\_sample\_summary) <a href="#metadata-propagation-gt_sample_summary" id="metadata-propagation-gt_sample_summary"></a>

When the upstream workflow uses a sample sheet, sample-sheet metadata columns are first propagated into `gt_sample_summary.csv`. During QC report generation, all columns present in `gt_sample_summary.csv` are then propagated into the report sample metadata and into the merged QC table outputs.

In the merged QC `xlsx` and `csv` outputs, propagated metadata columns from `gt_sample_summary.csv` are included and ordered alphabetically.

Not every propagated metadata field is offered in the report UI for **Color by** or **Facet by**. Those controls only include fields that behave like useful categorical groupings. In particular, fields with only one unique value are not offered, and fields with more than 20 unique values are also not offered for coloring or faceting.

Metadata fields that are not eligible for **Color by** or **Facet by** can still appear in hover text when space allows. However, some metadata may be omitted from hover tooltips when there is not enough room to display all fields.

***

## Control probe propagation from `controls.raw_metrics.csv`

Not every raw control-probe row present in `controls.raw_metrics.csv` is carried forward into the QC report outputs.

Here, **not propagated** means that specific raw probe entries from `controls.raw_metrics.csv` are excluded from the merged downstream raw-probe data used by the report and QC table.

The report focuses on raw control probes that support actionable review and on summarized QC metrics used for downstream evaluation. As a result, some raw probe categories are intentionally excluded during merge and summarization.

### Included versus excluded raw probe names

The key distinction is between the **standard raw probe names used by the report** and the **additional raw probe names that may appear in some input files but are not propagated downstream**.

| Control category    | Raw probe names propagated downstream                                                   | Raw probe names excluded from downstream raw-probe outputs                                      |
| ------------------- | --------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| **NON-POLYMORPHIC** | Standard non-polymorphic probe rows such as `NP (G)` and `NP (T)`                       | Additional methylation-platform rows `NP (G) 1`, `NP (G) 2`, `NP (G) 3`, `NP (G) 4`, `NP (G) 5` |
| **STAINING**        | Standard staining probe rows `Biotin (Bkg)`, `Biotin (High)`, `DNP (Bkg)`, `DNP (High)` | Additional methylation-platform rows `Biotin(5K)` and `DNP(20K)`                                |

### Excluded raw control-probe entries

The following raw control-probe categories or probe names are excluded from the merged downstream raw-probe outputs:

* **NEGATIVE**
* **NORM**
* **Additional NON-POLYMORPHIC NP (G) probe rows found on some methylation platforms**
  * NP (G) 1
  * NP (G) 2
  * NP (G) 3
  * NP (G) 4
  * NP (G) 5
* **Additional STAINING probe rows found on some methylation platforms**
  * DNP(20K)
  * Biotin(5K)

These probe rows can still be present in the original `controls.raw_metrics.csv` input file for completeness, but they are not propagated as downstream raw-probe entries in the QC report outputs.

In other words, users may still see summarized **Staining** or **Non-Polymorphic** control metrics in the report, and may also see the standard raw probe rows used for those metrics, even though the extra raw probe rows listed above are excluded.

## Configuration file (optional) <a href="#configuration-file-optional" id="configuration-file-optional"></a>

Provide a YAML configuration file to override the report’s QC thresholds. You can start from one of the Illumina template config files below or supply your own YAML file. Set a value to `null` to **disable** a check, or set a numeric value to **enable** it (for example `tgaControl: 1.0`). The template config files below are suggested starting points and should be adjusted based on sample type, platform, and lab-specific performance.

Illumina template config files are available below:

{% file src="/files/wqHVYLDo1m2wx2zQ9d2h" %}

{% file src="/files/GBZegmeDYoPinNXmvoHx" %}

{% file src="/files/n2MNa2RrPWQbssTS1ol4" %}

{% file src="/files/Zp6CKGOL4NkdsfQl71Pq" %}

Suggested starting points by product and assay chemistry:

* Genotyping assays that use **Infinium non-EX** reagents/chemistry: start with `config_genotyping.yaml`.
* Genotyping assays that use **Infinium EX** reagents/chemistry: start with `config_genotyping_EX.yaml`. Example products include **GSA: Infinium Global Screening Array-48 v4.0 Kit**, **GSA-ePGx: Infinium Global Screening Array with Enhanced PGx-48 v4.0**, **GCRA: Infinium Global Clinical Research Array-24 v1.0 Kit**, **GCRA-ePGx: Infinium Global Clinical Research Array with Enhanced PGx-24 v1.0 Kit**, and other customized **Infinium EX** products.
* Methylation assays that use **MethylationEPIC** reagents/chemistry: start with `config_methylation_EPIC.yaml`.
* Methylation assays that use **Infinium EX** reagents/chemistry: start with `config_methylation_MSA.yaml`. Example products include **Infinium Methylation Screening Array-48 Kit** and **Infinium Methyl EX iSelect Custom BeadChip (24/48 formats)**.

If you are using a customized product and are unsure which assay chemistry it uses, contact [Illumina Technical Support](mailto:techsupport@illumina.com) before selecting a QC configuration file.

You can edit the QC config YAML using a plain‑text editor (for example, Notepad).

{% hint style="warning" %}
When saving a config file from Windows Notepad, verify that the file extension remains `.yaml` or `.yml` and was not changed to `.yaml.txt` or `.yml.txt`. If needed, use **Save As**, set **Save as type** to **All Files**, and enter a filename such as `config.yaml`.
{% endhint %}

For guidance on adjusting DNA methylation QC thresholds, see the following [Illumina documentation](https://help.dragenarray.illumina.com/product-guides/dragen-array-cloud-analysis/dragen-array-methylation-qc#methylation-qc-threshold-adjustment).

### Apply a configuration file

Use the `--config` CLI flag to apply the selected configuration file:

```bash
dragena qc report \
  --data data/parent_folder \
  --config <recommended_config>.yaml \
  --output-folder output/parent_folder_report
```

{% hint style="info" %}
If `--config` is omitted, the report uses built-in defaults.
{% endhint %}

### How suggested thresholds were derived

The suggested thresholds in the example configuration files were derived empirically from an internal review of more than 10 datasets spanning both expected good-quality samples and failed samples.

For each metric, Illumina evaluated the observed distribution of values, including the center and spread of the apparent null or background distribution, and used that information to choose practical starting thresholds for routine QC review.

These values are intended as **suggested starting points**, not universal acceptance criteria. Users should review and revise thresholds for their own assay, sample type, laboratory workflow, scanner settings, bisulfite conversion method, FFPE usage, and historical performance.

If a dataset repeatedly shows a consistent offset for one of these control metrics while other QC evidence remains acceptable, review the threshold in context rather than treating the default value as absolute.

**Note:** For `config_genotyping.yaml`, those thresholds were evaluated only using LCG and HTS datasets.

{% hint style="info" %}
For control metrics, configured thresholds are primarily used to flag samples for review in the QC report. They are recommended operating cutoffs, not assay-independent pass/fail truths.
{% endhint %}

### Generic example YAML structure (illustrative only)

The two YAML blocks below are **generic illustrative examples**, not chemistry-specific recommended starting points.

* The genotyping example below is not specific to **Infinium EX** or **LCG/HTS** chemistry.
* The methylation example below is not specific to **Infinium EX/MSA** or **MethylationEPIC** chemistry.
* For chemistry-specific starting points, use the recommended template files listed above.

#### Generic genotyping YAML example

```yaml
QC:
  Report:
    # Core sample metrics
    callRate: 0.98                # Minimum acceptable call rate.
    logRDev: 0.15                 # Maximum acceptable Log R Ratio SD.

    # Shared control probe metrics
    stainingGreen: 5              # Minimum Biotin High / Background ratio.
    stainingRed: 5                # Minimum DNP High / Background ratio.
    extensionGreen: 5             # Minimum green-channel extension ratio.
    extensionRed: 5               # Minimum red-channel extension ratio.
    targetRemovalI: 5             # Minimum target removal ratio for Control I.
    hybGreenHighMed: 1.25         # Minimum hybridization High / Medium ratio.
    hybGreenMediumLow: 1.25       # Minimum hybridization Medium / Low ratio.
    npGreen: 5                    # Minimum non-polymorphic green ratio.
    npRed: 5                      # Minimum non-polymorphic red ratio.
    restoration: null             # Use 1 when the FFPE Restore kit is used.

    # Genotyping-specific metrics
    stringency: 1.5               # Minimum stringency ratio.
    nonSpecificBindingGreen: 5    # Minimum nonspecific binding green ratio.
    nonSpecificBindingRed: 5      # Minimum nonspecific binding red ratio.
    tgaControl: 1                 # Minimum TGA_Ctrl_5716 normalized R value.
```

#### Generic DNA methylation YAML example

```yaml
QC:
  Report:
    # Staining controls
    stainingGreen: 5             # Minimum Biotin High / Background ratio.
    stainingRed: 5               # Minimum DNP High / Background ratio.

    # Extension controls
    extensionGreen: 5            # Minimum green-channel extension ratio.
    extensionRed: 5              # Minimum red-channel extension ratio.

    # Hybridization controls
    hybGreenHighMed: 1           # Minimum hybridization High / Medium ratio.
    hybGreenMediumLow: 1         # Minimum hybridization Medium / Low ratio.

    # Restoration control
    restoration: null            # Use 1 when the FFPE Restore kit is used.

    # Target removal controls
    targetRemovalI: 1            # Minimum target removal ratio for Control I.
    targetRemovalII: 1           # Minimum target removal ratio for Control II.

    # Bisulfite conversion controls
    bisulfiteConversionIGreen: 1     # Minimum Type I green conversion ratio.
    bisulfiteConversionIGreenBg: 0.5 # Minimum Type I green background ratio.
    bisulfiteConversionIRed: 1       # Minimum Type I red conversion ratio.
    bisulfiteConversionIRedBg: 0.5   # Minimum Type I red background ratio.
    bisulfiteConversionII: 0.5       # Minimum Type II conversion ratio.
    bisulfiteConversionIIBg: 0.5     # Minimum Type II background ratio.

    # Specificity controls
    specificityIGreen: 1         # Minimum Type I green specificity ratio.
    specificityIRed: 1           # Minimum Type I red specificity ratio.
    specificityII: 1             # Minimum Type II specificity ratio.
    specificityIIBg: 1           # Minimum Type II background specificity ratio.

    # Non-polymorphic controls
    npGreen: 2.5                 # Minimum non-polymorphic green ratio.
    npRed: 3                     # Minimum non-polymorphic red ratio.
```

***

## Output files

| File name                                     | Description                                                                                                                                                                                                                                                                     |
| --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DRAGENArray_QC_Report_YYYY_MM_DD.html`       | Self-contained interactive HTML report intended to be distributable and viewable offline in a web browser. The report includes dashboards such as Control Dashboard, Automated QC, Sample QC Heatmaps, Trend Analysis, and a QC Metric Config menu for threshold customization. |
| `DRAGENArray_QC_table_YYYY_MM_DD.<xlsx\|csv>` | Per-sample QC table (one row per sample) containing functional metrics (when available), derived control metrics, and selected raw control-probe intensities. The file extension depends on `--output-format` (`xlsx` or `csv`).                                                |

***

## QC evaluation criteria <a href="#section-qc-evaluation-criteria" id="section-qc-evaluation-criteria"></a>

### Functional QC (per sample)

Functional QC evaluates overall genotyping performance of each sample:

* **Autosomal Call Rate**\
  Fraction of autosomal probes successfully called for a sample. Higher values indicate better performance.
* **Log R Ratio Standard Deviation (LogRDev)**\
  Measures signal noise across probes. Lower values indicate more stable intensity measurements.

#### Functional QC status (PASS/FAIL)

A sample’s functional QC status is determined by comparing its metrics to configured thresholds:

* **PASS**
  * `Autosomal Call Rate` ≥ threshold
  * `LogRDev` ≤ threshold
* **FAIL**
  * One or both metrics fall outside thresholds

{% hint style="info" %}
Functional QC requires a per-sample metric file (for example `gt_sample_summary.csv`). If that file is not provided, functional QC is unavailable. It does not support methylation at this time. For methylation QC please refer to [DRAGEN Array - Methylation QC](/product-guides/dragen-array-cloud-analysis/overview/dragen-array-methylation-qc.md) on cloud.
{% endhint %}

***

### Control-based QC (per sample)

Control-based QC evaluates whether array chemistry and processing performed as expected. Control metrics originate from:

* `controls.raw_metrics.csv` — raw, probe-level control intensities
* `controls.qc_metrics.csv` — summarized per-sample control QC values

Common control categories include:

* Staining
* Extension
* Hybridization (High/Medium/Low)
* Non-polymorphic
* Non-specific binding
* Target removal
* Stringency
* Restoration (when applicable)
* Bisulfite conversion controls (methylation arrays)
* Specificity (methylation arrays)

For more background on interpreting Infinium controls, see: [Evaluation of Infinium Genotyping Assay Controls Training Guide](https://support.illumina.com/content/dam/illumina-support/courses/eval-inf-controls/story_content/external_files/Infinium_Controls_Training_Guide.pdf)

#### Control QC status & flags

Each control metric is compared to its configured threshold. When a value is outside the acceptable range, the sample receives a **flag** indicating the affected control and channel.

***

### Interpreting combined QC results

The report shows both functional QC and control-based QC for every sample:

* A sample may **PASS functional QC** but still receive **control warnings**.
* Multiple or severe control failures may indicate assay-related issues that impact downstream results.

This combined view can help distinguish:

* **Biological failures** (for example, degraded DNA)
* **Technical failures** (for example, staining or hybridization issues)

***

## HTML QC report

The HTML report is organized into dashboards designed for routine QC review and deeper troubleshooting.

### Control dashboard

The Control Dashboard provides interactive visualizations of raw control-probe intensities to help identify instrument and assay issues. These plots help you review whether control signals such as staining, extension, hybridization, and non-polymorphic probes behave as expected across samples. You can:

* Select samples directly from plots to highlight those same samples across related plots in both the Control Dashboard and the Automated QC dashboard
* Hover to see details (for example, sample ID, barcode, position, autosomal call rate when available)
* Explore distributions and trends to detect outliers or systematic shifts
* Zoom and filter for focused investigation

To clear a plot-based selection, double-click in the plot area. This restores the full sample set in the Automated QC table and removes the cross-plot highlighting.

![Control dashboard](/files/jgm67BbOMGAbFvs1pGFs)

#### Chart toolbar reference

Each plot includes a toolbar for zooming, panning, selection, autoscaling, and exporting.

![Plotly toolbar controls](/files/Yda41qtaIQ1tEicIze5u)

***

### Automated QC

The Automated QC dashboard provides a consolidated, objective view of sample-level QC by applying QC rules and thresholds across all samples. Use it to:

* Quickly assess pass/fail status
* Identify samples and metrics outside thresholds
* Support decisions for sample inclusion, reprocessing, or follow-up analysis

#### Functional QC Status Histogram

This histogram shows the number and percentage of samples in the dataset that are classified as **PASS** or **FAIL** for functional QC.

Functional QC status is determined from the sample-level functional metrics, using the Autosomal Call Rate and LogRDev thresholds currently applied in the report. Those thresholds can come from the defaults, a YAML config file, command-line overrides, or the **QC Metric Config** settings described in [Updating thresholds in the HTML report](#updating-thresholds-in-the-html-report).

A sample is counted as **FAIL** if it fails any individual functional QC metric that is currently enabled. Otherwise, the sample is counted as **PASS**.

#### Control-Based QC Status Histogram

This histogram shows the number and percentage of samples in the dataset that are **FLAGGED** or **CLEAR** for control-based QC.

Control-based QC status is determined from the control QC metric thresholds currently applied in the report. A sample is counted as **FLAGGED** if any individual control QC metric is outside its applied threshold. Samples without any active control-based QC flags are counted as **CLEAR**.

#### Sample QC table

The Sample QC table provides a per-sample summary of the same QC decisions shown in the histograms, along with the underlying metrics used to support review. It includes:

* Functional QC status (PASS/FAIL) when functional metrics are available
* Control QC flags and annotations that highlight values outside thresholds
* Sorting and filtering to focus on failing samples or specific metrics

The table can be exported in either `xlsx` or `csv` format. The `xlsx` output preserves conditional coloring, while the `csv` output contains values only. For more detail about the table output formats, see [`--output-format <csv|xlsx>`](#output-format-csvxlsx).

![Automated QC table](/files/IkkMVti8hICaAFZDkoRc)

#### QC metric plots

Supporting plots in the Automated QC dashboard complement the table by showing **per-sample QC metric scatter plots**. In these plots, the x-axis represents samples in the current dataset view, and the y-axis represents one derived QC metric. Users can sort the sample order by **Autosomal Call Rate**, **Log R Dev**, any propagated user-provided metadata field, or any derived QC metric available in the report. Users can also color samples by eligible propagated metadata fields to help reveal group-specific patterns or batch effects.

For genotyping arrays, the initial point colors are based on **functional QC status**: **PASS** or **FAIL** determined from the applied **Autosomal Call Rate** and **Log R Dev** thresholds.

Plotted metrics can include control-derived metrics such as **Staining**, **Extension**, **Hybridization**, **Target Removal**, **Nonpolymorphic**, **Stringency**, **Specificity**, and **Bisulfite Conversion**. For example, **Staining Red** is calculated as `DNP High Red / DNP Bkg Red`, and **Stringency** is calculated as `Stringency PM (Red) / Stringency MM (Red)`.

Each Automated QC scatter plot also includes help text with the formula used to derive the selected metric.

Most control metrics are constructed as ratios that compare expected signal against background or against an opposing control signal. That makes them more stable for QC review than raw intensities alone because even if absolute intensities shift between scanners or runs, the signal-versus-background relationship is expected to remain relatively consistent.

When a QC threshold is set for a plotted metric, the scatter plot shows that cutoff as a dashed threshold reference line. These thresholds can come from built-in defaults, a YAML config file, command-line overrides, or values applied in the **QC Metric Config** menu. For more detail on how methylation QC control metrics and their recommended starting thresholds are defined, see [Methylation Sample QC Summary Files](https://github.com/illumina-swi/dragen-array-docs/blob/DAv1.4/docs/output-files.md#methyl_qc_report) and [Methylation QC Threshold Adjustment](/product-guides/dragen-array-cloud-analysis/overview/dragen-array-methylation-qc.md#section-methylation-qc-threshold-adjustment).

These plots also participate in linked selection. When you select samples in one scatter plot, those same samples are highlighted in the other scatter plots across the Automated QC and Control Dashboard views, and the Automated QC table is filtered to show only the selected samples. To clear the selection, double-click in the plot area.

These plots help you:

* See which individual samples fall outside a QC threshold
* Compare sample-to-sample variation for one metric at a time
* Relate noisy or failing samples to other QC views in the report

In the example figure below, the bottom plot shows a group of samples below the **Stringency** threshold line, and those points are also colored as functional QC **FAIL**. Other samples in the same figure are also colored as **FAIL** even though they do not fall below the Stringency cutoff, which suggests that other QC metrics may be contributing to the failure; review the other QC metric plots and the table to identify the control probes or sample-level metrics associated with those samples.

![Automated QC scatter plots](/files/jEfzeHVz0sCiqXVpGmdX)

***

### Sample QC heatmaps

The Sample QC Heatmaps dashboard provides a spatial view of sample-level QC metrics across chips or plates to help detect spatial artifacts and localized issues.

#### Overview

* Each cell represents a sample
* Cell color reflects a selected QC metric (for example Autosomal Call Rate, LogRDev)
* Hover shows sample identifiers and metric values

Plate information may be derived from:

* IDAT metadata, or
* A user-provided sample sheet (columns: `Sample_Plate` and `Sample_Well`)

{% hint style="info" %}
If both sources provide plate information, the report uses the user-provided sample sheet values.
{% endhint %}

![Heatmap by chip](/files/95UJJOmvjXLLd2oafwL2)

![Heatmap by plate](/files/psxcUZkElx74VN8LCb0J)

***

### Trend analysis

The Trend Analysis dashboard provides a high-level view of QC trends across multiple datasets, runs, batches, or instruments.

For local analysis, this cross-dataset summary is available when you generate one report from multiple dataset folders. If you generate the report from a single dataset folder, the dashboard still appears but summarizes that one dataset only. Current cloud QC reports also operate on a single dataset at a time.

Use it to:

* Monitor changes over runs and scan dates
* Detect drift, batch effects, or instrument-specific anomalies
* Compare QC performance between datasets or groups of chips

#### Summary table

The summary table aggregates QC metrics at the dataset level (for example):

* Number of samples / chips
* Scan date range
* Autosomal Call rate statistics (min/mean/standard deviation, counts above/below threshold)
* LogRDev statistics (mean/standard deviation, counts above threshold)
* TGA control statistics for PGx Genotyping datasets
* Sex prediction summary (number of males, females and unknowns)

![Trend analysis table](/files/y9L1DObJWACz7Bn12kkR)

#### Summary plots

Summary plots provide visual comparisons across datasets or barcodes, such as:

* Autosomal Call rate distributions
* LogRDev box plots
* Sample count plots
* Sex estimate distributions

For example, box-and-whisker plots in Trend Analysis summarize how a metric is distributed within each dataset so you can compare center, spread, and outliers across runs. In a box plot, the box represents the middle 50% of the data distribution, the center line marks the median, and the whiskers extend to the smallest and largest non-outlier values shown for that dataset. When a threshold is configured for a plotted metric, the plot also shows the applied cutoff as a dashed reference line.

The following schematic shows the main parts of a box-and-whisker plot:

![Box-and-whisker plot anatomy: whiskers extend from the box to the smallest and largest non-outlier values. The box shows the interquartile range (Q1-Q3), which is the middle 50% of the data.](/files/nic1dz2xdAe1aQ3orbt2)

![Trend analysis plots](/files/qktSf4AQcYR5y98vpL7O)

***

## Updating thresholds in the HTML report

You can update QC threshold cutoffs directly in the HTML report:

1. Open **QC Metric Config** (⚙).
2. Enter new numeric values for metrics you want to enforce.
   * Leave a field blank (or set it to `null` in YAML) to **disable** that check.
3. Click **Apply Thresholds**.

### Buttons

* **Apply Thresholds**: Recalculates pass/fail and refreshes visuals, including threshold reference lines in plots, using the current values.
* **Download Thresholds**: Exports the currently applied thresholds as a YAML file.
* **Upload Thresholds**: Imports a YAML file and fills the threshold fields.

{% hint style="warning" %}
Changes made in the HTML report are **session-only** and apply only to the currently opened report file.

**Download Thresholds** exports the currently applied thresholds as a **YAML file**. To reuse the same thresholds in a future run, pass that YAML file to the `qc report` command (for example, via `--config`) when generating a new report. **Upload Thresholds** loads a YAML file back into the UI fields for the current report.
{% endhint %}

![QC thresholds config](/files/BL9RwKDWHBlNFGmitvs9)

***

## Interactive features

### Filter table samples using scatter plot selection

Selecting samples in a scatter plot filters the corresponding rows in the Automated QC table. The same selected samples are also highlighted in related scatter plots across the Automated QC and Control Dashboard views. To clear this plot-based filtering and cross-plot highlighting, return to the plot and double-click in the plot area to remove the selection.

![Sample selection filters table](/files/0P7IjMG1CUIaT8reZOW6)

### Highlight samples in scatter plots by selecting from the table

Selecting rows in the table highlights the corresponding points in scatter plots. To clear this table-based highlighting, click **Select None** in the bottom-right corner of the table.

![Table selection highlights samples](/files/IO3BlemNuQXyazEKzdpm)

When exporting the table, any active row selection is preserved. If you click **Excel** or **CSV** while rows are selected, only those selected rows are exported. To export all samples, click **Select None** before exporting.

### Table filtering and sorting

The interactive table supports:

* Showing/hiding columns
* Sorting by any metric
* Filtering by search or criteria
* Combining table filtering with plot selection for cross-linked exploration

***

## Performance & dataset size limits

For performance reasons, large datasets may use a tabbed layout to keep the browser responsive.

* **More than 1,000 samples in a single dataset**: control and automated-QC plots are arranged into tabs (interactive features remain available).
* **More than 12,000 samples in a single dataset**: scatter plots are disabled and replaced with a notice.

Example tabbed layout:

![Tabbed Control dashboard](/files/kx9aeyZg4WxZxHKQWb2U)

![Tabbed Automated QC](/files/lbijzWbymP2vHRJMFMjZ)

### How to visualize very large projects

If your project exceeds the scatter-plot limit, consider:

* Filtering input by run date/batch/folder and generating separate reports
* Splitting input into logical batches (for example per-run or per-center)

When a QC Report html contains more than 20,000 samples across datasets, the loading can be slowed down.

***

## Example input folder structures

### Single dataset folder

In this example, `project_folder/` is one dataset folder that contains the required input files for a single QC report.

```
project_folder/
├── controls.qc_metrics.csv
├── controls.raw_metrics.csv
├── gt_sample_summary.csv
```

```bash
dragena qc report \
  --data project_folder/ \
  --label pgx2025 \
  --output-folder output/single_project
```

***

### Multiple dataset folders <a href="#multiple-dataset-folders" id="multiple-dataset-folders"></a>

Use this mode on local analysis when you want to combine multiple runs or batches into a single QC report and compare them in **Trend Analysis**.

In this mode, `--data` is a comma-separated list of dataset folder paths. Each path must point directly to a folder that contains the required input files. In the example below, `dataset_A/` and `dataset_B/` are example folder paths.

When writing a list for `--data`, do not include spaces before or after the commas. If you use `--label`, write the labels the same way: one comma-separated item per dataset, with no spaces between items.

```
dataset_A/
├── controls.qc_metrics.csv
├── controls.raw_metrics.csv
├── gt_sample_summary.csv

dataset_B/
├── controls.qc_metrics.csv
├── controls.raw_metrics.csv
├── gt_sample_summary.csv
```

The command below tells `dragena qc report` to load both dataset folders into one report:

```bash
dragena qc report \
  --data dataset_A,dataset_B \
  --label pgx2025,pgx2024 \
  --output-folder output/multiple_datasets_with_labels_provided
```

Use this form when you already know the exact dataset folders you want to include.

***

### Parent folder containing multiple dataset folders <a href="#parent-folder-containing-multiple-dataset-folders" id="parent-folder-containing-multiple-dataset-folders"></a>

Use this mode when many dataset folders are organized under one parent folder and you want the local QC report to discover them automatically.

This is different from the previous example only in how `--data` is used:

* In the previous example, `--data` lists each dataset folder path explicitly.
* In this example, `--data` points to one parent folder path, and the tool searches below that parent folder to find valid dataset folders automatically.

Some users describe this as a "recursive" search. In practice, it means the tool starts from the parent folder and looks through its subfolders for dataset folders that contain the required QC files.

```
project_folder/
├── dataset_A/
│   ├── controls.qc_metrics.csv
│   ├── controls.raw_metrics.csv
│   ├── gt_sample_summary.csv
├── dataset_B/
│   ├── controls.qc_metrics.csv
│   ├── controls.raw_metrics.csv
│   ├── gt_sample_summary.csv
├── dataset_C/
│   └── ...
```

The command below tells `dragena qc report` to start from the parent folder `project_folder` and automatically discover dataset folders under it:

```bash
dragena qc report \
  --data project_folder \
  --output-folder output/parent_folder_report
```

If you do not provide `--label`, the report assigns a dataset label based on each detected dataset folder name (for example `dataset_A`, `dataset_B`).

To compare multiple runs on local, use one of these two approaches:

* Use `--data dataset_A,dataset_B,...` when you want to list the dataset folder paths yourself.
* Use `--data project_folder` when `project_folder` is a parent folder that contains many dataset folders and you want the tool to discover them automatically.

**Parent-folder discovery labels and ordering**

* If you supply `--label`, labels are assigned in the order datasets are discovered.
* If multiple datasets share the same folder name, a numeric suffix is appended (for example `datasetA`, `datasetA_2`) to ensure uniqueness.

***

## Troubleshooting & FAQ

<details>

<summary><strong>Missing samples</strong></summary>

**Possible cause:** Sample IDs do not match across input files.\
**Recommended action:** Align Sample IDs across `controls.*` and (if provided) `gt_sample_summary.csv`.

</details>

<details>

<summary><strong>Unexpected QC failures</strong></summary>

**Possible cause:** Thresholds are too strict for your dataset or application.\
**Recommended action:** Review and adjust thresholds using `--config` or the **QC Metric Config** menu in the HTML report.

</details>

<details>

<summary><strong>Scatter plots are disabled</strong></summary>

Scatter plots are disabled when the dataset exceeds **12,000 samples**.\
**Recommended action:** Filter or split the dataset and generate separate reports.

</details>

<details>

<summary><strong>CSV parsing errors</strong></summary>

**Possible cause:** Quoting/encoding issues.\
**Recommended action:** Ensure UTF-8 encoding and valid comma delimiters.

</details>

<details>

<summary><strong>Report is extremely slow to load or does not finish loading</strong></summary>

**Possible cause:**\
Very large datasets (for example, >50,000 samples) generate large input files that require significant client‑side processing. In the current implementation, extremely large datasets (for example, \~100,000 samples) may exceed browser or memory limits and fail to load completely.

**Recommended action:**\
Filter or split the dataset into smaller subsets and generate separate QC reports for each subset.

</details>

<details>

<summary><strong>What happens if the input folder contains both genotyping and methylation datasets?</strong></summary>

If `dragena qc report` detects both genotyping and methylation dataset folders within the same `--data` input set, report generation stops with an error.

In this guide, **mixed dataset types** specifically means this combination.

**Example:** A parent folder passed to `--data` contains one dataset folder with methylation `qc call` outputs and two dataset folders with genotyping `qc call` outputs.

This folder structure triggers the mixed-dataset error:

```
datasets/
├── dataset1_methylation/
│   ├── controls.qc_metrics.csv
│   └── controls.raw_metrics.csv
├── dataset2_genotyping_pgx/
│   ├── controls.qc_metrics.csv
│   ├── controls.raw_metrics.csv
│   └── gt_sample_summary.csv
└── dataset3_genotyping_non_pgx/
    ├── controls.qc_metrics.csv
    ├── controls.raw_metrics.csv
    └── gt_sample_summary.csv
```

```bash
dragena qc report --data datasets
```

**Recommended action:** Generate separate QC reports for genotyping and methylation inputs. For example, separate them into different parent folders:

```
reports_input/
├── methylation/
│   └── dataset1_methylation/
│       ├── controls.qc_metrics.csv
│       └── controls.raw_metrics.csv
└── genotyping/
    ├── dataset2_genotyping_pgx/
    │   ├── controls.qc_metrics.csv
    │   ├── controls.raw_metrics.csv
    │   └── gt_sample_summary.csv
    └── dataset3_genotyping_non_pgx/
        ├── controls.qc_metrics.csv
        ├── controls.raw_metrics.csv
        └── gt_sample_summary.csv
```

```bash
dragena qc report --data reports_input/methylation
dragena qc report --data reports_input/genotyping
```

</details>

<details>

<summary><strong>What happens if the input folder contains both PGx and non-PGx genotyping datasets?</strong></summary>

This combination is supported. The report is generated as long as each dataset folder contains the required input files.

PGx datasets can include **TGA control** information, while non-PGx datasets do not. Other shared genotyping QC outputs are generated normally.

For example, if the input contains one PGx dataset folder and one non-PGx genotyping dataset folder, the report still runs as a single genotyping report. TGA-related values are populated only for the PGx dataset.

</details>

<details>

<summary><strong>What happens if the input folder contains both legacy and EX genotyping chips?</strong></summary>

This combination is supported and does not trigger the mixed-dataset error.

There is no special difference in report generation or QC interpretation beyond chip-format-specific heatmap layout. Because legacy and EX chips use different physical layouts, the chip heatmaps can appear different even when the rest of the report is comparable.

The same guidance applies to different versions of the same genotyping BeadChip family: they are treated as supported genotyping inputs rather than as mixed dataset types.

</details>

***

## Warning and error messages

During report generation, the following situations may trigger warnings or errors:

* Input data directory does not exist
* Required input files are missing in one or more dataset folders
* Number of labels does not match number of detected datasets
* Labels are modified during sanitization and become non-unique
* Genotyping and methylation dataset folders are detected together in a single run
* Invalid output format is specified
* Invalid or unsupported QC threshold configuration is provided

Note, **mixed dataset types** refers specifically to combining **genotyping** and **methylation** datasets in one report. Supported mixed genotyping inputs, such as **PGx + non-PGx**, **legacy + EX**, or different versions of the same genotyping BeadChip family, do **not** trigger this error.

### Examples

| Message                                                                                                                                                | Cause                                                                                                                                                                       |
| ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ❌ Input data folder does not exist: `data/non_existent_dir`                                                                                            | Input directory does not exist                                                                                                                                              |
| ❌ No valid dataset folders detected. Each folder must include at least: `controls.raw_metrics.csv`, `controls.qc_metrics.csv`                          | No folders containing both required CSV files were found                                                                                                                    |
| ❌ Dataset `MyDataset` is missing required file(s): ... `controls.qc_metrics.csv` ...                                                                   | One or more required control metric files are absent                                                                                                                        |
| ❌ Number of labels (2) does not match number of datasets (3).                                                                                          | `--label` count does not match detected datasets                                                                                                                            |
| ⚠️ Label sanitized: `My Label<1>` → `My_Label_1`                                                                                                       | Label contained unsupported characters                                                                                                                                      |
| ❌ Duplicate labels after sanitization: `dataset` (x2). Each dataset label must be unique.                                                              | Labels became non-unique after sanitization                                                                                                                                 |
| ❌ Mixed dataset types are not supported in a single report. Genotyping datasets: `geno_run_A`, `geno_run_B` ... Methylation datasets: `meth_run_C` ... | A single run included both genotyping and methylation dataset folders. This error is not raised for supported mixed genotyping inputs such as PGx + non-PGx or legacy + EX. |
| ❌ Invalid `--output-format` value. Only `csv` or `xlsx` are accepted.                                                                                  | Unsupported output format                                                                                                                                                   |
| ❌ QC config file was provided but does not exist: `/path/to/config.yaml`                                                                               | Config path does not exist                                                                                                                                                  |
| ❌ Input file must be a YAML file with extension `.yaml`/`.yml`. Current config file extension: `.json`                                                 | Config must be YAML                                                                                                                                                         |
| ❌ Failed to parse QC config as YAML ...                                                                                                                | YAML syntax error                                                                                                                                                           |
| ❌ QC config must be wrapped in a top-level `QC` object containing a `Report` object.                                                                   | Config structure is incorrect                                                                                                                                               |
| ❌ Invalid threshold overrides: Unknown threshold key ...                                                                                               | Unrecognized threshold name                                                                                                                                                 |
| ❌ Invalid threshold overrides: Invalid numeric value ...                                                                                               | Value could not be parsed as a number                                                                                                                                       |
| ❌ Threshold for `Autosomal Call Rate` must be between 0 and 1 ...                                                                                      | Autosomal Call Rate threshold out of bounds                                                                                                                                 |
| ❌ Threshold for `stainingGreen` must be non-negative ...                                                                                               | Threshold must be ≥ 0                                                                                                                                                       |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragenarray.illumina.com/product-guides/dragen-array-local-analysis/qc-report.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
