DRAGEN Array QC Report

The DRAGEN Array QC Report is a self-contained, interactive HTML dashboard that helps you evaluate the quality of microarray datasets processed with the DRAGEN Array pipeline. It combines per-sample functional QC metrics, control-probe QC metrics (probe-level and summarized), and interactive visualizations to help you quickly:

  • Inspect per-sample metrics, for example, Autosomal Call Rate, Log R Ratio Standard Deviation (LogRDev), Sex estimate

  • Detect assay or instrument issues using control-probe intensity patterns

  • Identify outliers, spatial artifacts, and batch effects using heatmaps and trend plots

  • Apply automated QC thresholds and export results for downstream review

  • Quickly calculate project-wide average call rate and LogRDev, and monitor trends across multiple datasets


Analysis Workflow

Use the following instructions to generate an interactive QC Report (HTML format) and QC Table (spreadsheet format). If you used a Sample Sheet in the upstream workflow, user-defined metadata can be carried into the final QC report outputs. See Command Index for all command parameters.

Methylation and genotyping workflow differences are highlighted below.

Workflow

Upstream command

Key inputs

Dataset folder contents for dragena qc report

When to use

Methylation

dragena qc call

CSV manifest (--csv-manifest) + IDAT folder

controls.raw_metrics.csv + controls.qc_metrics.csv

Standard methylation QC-report workflow

Genotyping, recommended

dragena genotype call

BPM manifest (--bpm-manifest) + cluster file (--cluster-file) + IDAT folder

controls.raw_metrics.csv + controls.qc_metrics.csv + gt_sample_summary.csv

Recommended when you want the richer HTML QC report experience, including functional QC, Autosomal Call Rate and LogRDev views, and sample heatmaps

Genotyping, limited

dragena qc call

CSV manifest (--csv-manifest) + --array-type genotyping + IDAT folder

controls.raw_metrics.csv + controls.qc_metrics.csv

Use only for control-based genotyping QC inputs when gt_sample_summary.csv is not needed

Methylation workflow

Genotyping workflow

For genotyping, dragena genotype call is the recommended upstream path because it produces gt_sample_summary.csv, which enables functional QC metrics and richer QC-report visualizations such as Autosomal Call Rate and LogRDev views, plus the sample heatmaps.

Control-only path: dragena qc call

Use this path when you only need control-based genotyping QC inputs. It uses a CSV manifest and does not generate gt_sample_summary.csv, so the QC report will not include functional QC metrics.

If you have existing gt_sample_summary.csv files generated by older versions of DRAGEN Array prior to v1.4.0 release, you can combine those with the outputs from dragena qc call by specifying the existing output folder through the --output-folder option of dragena qc call, and then generate the full QC Report.

Use this path for most genotyping datasets. It uses the BPM manifest and cluster file, and the output folder already contains the control QC files plus gt_sample_summary.csv for functional QC reporting.

Use dragena qc call for methylation or for control-only genotyping inputs that rely on a CSV manifest. For genotyping, prefer dragena genotype call because it uses the BPM manifest plus cluster file and generates gt_sample_summary.csv, enabling functional QC metrics and the fuller QC-report feature set. In either workflow, --data points to the dataset folder, a parent folder, or a comma-separated dataset list for dragena qc report.

Instructions

  1. Open a command prompt (Windows) or terminal window (Linux) and navigate to the directory where the software was installed. Alternatively, navigate to any working directory if the executable was added to your PATH.

  2. Generate the QC input files using one of the following commands:

  • For genotyping datasets, use dragena genotype call in most cases. This is the recommended path when you want both control-based QC and functional QC metrics such as Autosomal Call Rate, LogRDev, and Sex estimate.

  • Use dragena qc call if you want control-based QC inputs only, or if you have existing gt_sample_summary.csv file for the input IDATs, or if you are preparing QC-report inputs for methylation datasets. For genotyping, this command uses a CSV manifest and does not produce gt_sample_summary.csv.

  1. Prepare a dataset folder for dragena qc report. The dataset folder must contain at least the following files:

    • controls.raw_metrics.csv (required)

    • controls.qc_metrics.csv (required)

    • gt_sample_summary.csv (highly recommended)

If you ran dragena genotype call for the dataset you want to review, you can use the genotype output folder directly as the dataset folder because it already contains controls.raw_metrics.csv, controls.qc_metrics.csv, and gt_sample_summary.csv.

If you want to combine multiple datasets or use a parent folder that contains many dataset folders, see Example input folder structures below.

  1. Run dragena qc report using the dataset folder from Step 3 as input. The --config file is optional. See Configuration file (optional) below for template links and default-threshold behavior.

If you want to use the genotype output folder directly and do not need a custom config file, the command can be as simple as:

If you want to combine multiple known dataset folders into one QC report, provide them as a comma-separated list after --data. If you also provide --label, use the same comma-separated style and supply one label per dataset. Do not add spaces between items; use commas only.

  1. Open the HTML report in your web browser:

    • DRAGENArray_QC_Report_YYYY_MM_DD.html

If you generated the QC report in BaseSpace Sequence Hub (BSSH), download the HTML report to your computer and then double-click the file to open it in your web browser.

If you provide a parent folder (for example --data data/parent_folder), the tool can automatically discover and process multiple dataset folders. See Example input folder structures below.

Local QC reports can combine multiple dataset folders into one report. This is the supported way to compare runs or batches in a single Trend Analysis view. Current cloud QC reports are generated for a single dataset, so their Trend Analysis view summarizes that dataset only rather than comparing multiple runs.


Command-line options

The dragena qc report command supports the following options to control output location, format, and QC thresholds.

Output options

--output-folder

Directory path where output files are written.

  • Default: Current working directory

  • Applies to both the HTML report and QC tables

Example:

--output-format <csv|xlsx>

Use this option to choose whether the per-sample QC table is written as an xlsx workbook or a csv file. The default output format is xlsx.

When the QC table is written as xlsx, the workbook includes conditional formatting based on the thresholds that were applied when the report was generated, and it includes a Thresholds worksheet that records those threshold settings. Those applied thresholds can come from built-in defaults, a YAML file provided with --config, or command-line overrides described in QC threshold overrides (CLI).

When the QC table is written as csv, the output contains values only. It does not include workbook formatting or additional worksheets.

Example:

QC threshold overrides (CLI)

These options override functional QC thresholds directly from the command line. They take precedence over built-in defaults and any corresponding values provided via --config.

For example, if --config sets callRate: 0.90 but you run with --threshold-callRate 0.95, the report uses 0.95.

--threshold-callRate <value>

Override the Autosomal Call Rate threshold.

  • Default: 0.98

  • Samples with Autosomal Call Rate below this value are marked as FAIL.

Example:

--threshold-logRdev <value>

Override the Log R Ratio Standard Deviation (LogRDev) threshold.

  • Default: 0.20

  • Samples with LogRDev above this value are marked as FAIL.

Example:

For complex or reproducible QC configurations, use a YAML configuration file via --config. CLI threshold options are most useful for quick exploratory runs.

Precedence (highest to lowest):

  1. Command-line threshold flags (e.g., --threshold-callRate, --threshold-logRdev)

  2. YAML config file values provided via --config

  3. Built-in defaults


Input files

Each dataset folder must include the required QC metric files below. Optional files enable additional features (for example, functional genotyping QC metrics) or override defaults (configuration).

File name
Description
Required

controls.raw_metrics.csv

Probe-level control intensities used to compute control QC metrics.

Yes

controls.qc_metrics.csv

Per-sample summarized QC metrics derived from control probes.

Yes

gt_sample_summary.csv

Per-sample genotyping metrics (including Autosomal Call Rate, LogRDev, Sex estimate). Strongly recommended for functional QC evaluation.

No (strongly recommended)

config.yaml

Overrides QC thresholds and report behavior when provided via --config.

No

See Metadata propagation (gt_sample_summary) for details on how user-defined columns are exposed in the report.

Only a subset of control probe metrics from controls.raw_metrics.csv are propagated to the merged QC outputs. See Control probe propagation from controls.raw_metrics.csv for details.

Example input files

controls.qc_metrics.csv

controls.raw_metrics.csv

gt_sample_summary.csv


Metadata propagation (gt_sample_summary)

When the upstream workflow uses a sample sheet, sample-sheet metadata columns are first propagated into gt_sample_summary.csv. During QC report generation, all columns present in gt_sample_summary.csv are then propagated into the report sample metadata and into the merged QC table outputs.

In the merged QC xlsx and csv outputs, propagated metadata columns from gt_sample_summary.csv are included and ordered alphabetically.

Not every propagated metadata field is offered in the report UI for Color by or Facet by. Those controls only include fields that behave like useful categorical groupings. In particular, fields with only one unique value are not offered, and fields with more than 20 unique values are also not offered for coloring or faceting.

Metadata fields that are not eligible for Color by or Facet by can still appear in hover text when space allows. However, some metadata may be omitted from hover tooltips when there is not enough room to display all fields.


Control probe propagation from controls.raw_metrics.csv

Not every raw control-probe row present in controls.raw_metrics.csv is carried forward into the QC report outputs.

Here, not propagated means that specific raw probe entries from controls.raw_metrics.csv are excluded from the merged downstream raw-probe data used by the report and QC table.

The report focuses on raw control probes that support actionable review and on summarized QC metrics used for downstream evaluation. As a result, some raw probe categories are intentionally excluded during merge and summarization.

Included versus excluded raw probe names

The key distinction is between the standard raw probe names used by the report and the additional raw probe names that may appear in some input files but are not propagated downstream.

Control category
Raw probe names propagated downstream
Raw probe names excluded from downstream raw-probe outputs

NON-POLYMORPHIC

Standard non-polymorphic probe rows such as NP (G) and NP (T)

Additional methylation-platform rows NP (G) 1, NP (G) 2, NP (G) 3, NP (G) 4, NP (G) 5

STAINING

Standard staining probe rows Biotin (Bkg), Biotin (High), DNP (Bkg), DNP (High)

Additional methylation-platform rows Biotin(5K) and DNP(20K)

Excluded raw control-probe entries

The following raw control-probe categories or probe names are excluded from the merged downstream raw-probe outputs:

  • NEGATIVE

  • NORM

  • Additional NON-POLYMORPHIC NP (G) probe rows found on some methylation platforms

    • NP (G) 1

    • NP (G) 2

    • NP (G) 3

    • NP (G) 4

    • NP (G) 5

  • Additional STAINING probe rows found on some methylation platforms

    • DNP(20K)

    • Biotin(5K)

These probe rows can still be present in the original controls.raw_metrics.csv input file for completeness, but they are not propagated as downstream raw-probe entries in the QC report outputs.

In other words, users may still see summarized Staining or Non-Polymorphic control metrics in the report, and may also see the standard raw probe rows used for those metrics, even though the extra raw probe rows listed above are excluded.

Configuration file (optional)

Provide a YAML configuration file to override the report’s QC thresholds. You can start from one of the Illumina template config files below or supply your own YAML file. Set a value to null to disable a check, or set a numeric value to enable it (for example tgaControl: 1.0). The template config files below are suggested starting points and should be adjusted based on sample type, platform, and lab-specific performance.

Illumina template config files are available below:

Suggested starting points by product and assay chemistry:

  • Genotyping assays that use Infinium non-EX reagents/chemistry: start with config_genotyping.yaml.

  • Genotyping assays that use Infinium EX reagents/chemistry: start with config_genotyping_EX.yaml. Example products include GSA: Infinium Global Screening Array-48 v4.0 Kit, GSA-ePGx: Infinium Global Screening Array with Enhanced PGx-48 v4.0, GCRA: Infinium Global Clinical Research Array-24 v1.0 Kit, GCRA-ePGx: Infinium Global Clinical Research Array with Enhanced PGx-24 v1.0 Kit, and other customized Infinium EX products.

  • Methylation assays that use MethylationEPIC reagents/chemistry: start with config_methylation_EPIC.yaml.

  • Methylation assays that use Infinium EX reagents/chemistry: start with config_methylation_MSA.yaml. Example products include Infinium Methylation Screening Array-48 Kit and Infinium Methyl EX iSelect Custom BeadChip (24/48 formats).

If you are using a customized product and are unsure which assay chemistry it uses, contact Illumina Technical Support before selecting a QC configuration file.

You can edit the QC config YAML using a plain‑text editor (for example, Notepad).

For guidance on adjusting DNA methylation QC thresholds, see the following Illumina documentation.

Apply a configuration file

Use the --config CLI flag to apply the selected configuration file:

If --config is omitted, the report uses built-in defaults.

How suggested thresholds were derived

The suggested thresholds in the example configuration files were derived empirically from an internal review of more than 10 datasets spanning both expected good-quality samples and failed samples.

For each metric, Illumina evaluated the observed distribution of values, including the center and spread of the apparent null or background distribution, and used that information to choose practical starting thresholds for routine QC review.

These values are intended as suggested starting points, not universal acceptance criteria. Users should review and revise thresholds for their own assay, sample type, laboratory workflow, scanner settings, bisulfite conversion method, FFPE usage, and historical performance.

If a dataset repeatedly shows a consistent offset for one of these control metrics while other QC evidence remains acceptable, review the threshold in context rather than treating the default value as absolute.

Note: For config_genotyping.yaml, those thresholds were evaluated only using LCG and HTS datasets.

For control metrics, configured thresholds are primarily used to flag samples for review in the QC report. They are recommended operating cutoffs, not assay-independent pass/fail truths.

Generic example YAML structure (illustrative only)

The two YAML blocks below are generic illustrative examples, not chemistry-specific recommended starting points.

  • The genotyping example below is not specific to Infinium EX or LCG/HTS chemistry.

  • The methylation example below is not specific to Infinium EX/MSA or MethylationEPIC chemistry.

  • For chemistry-specific starting points, use the recommended template files listed above.

Generic genotyping YAML example

Generic DNA methylation YAML example


Output files

File name
Description

DRAGENArray_QC_Report_YYYY_MM_DD.html

Self-contained interactive HTML report intended to be distributable and viewable offline in a web browser. The report includes dashboards such as Control Dashboard, Automated QC, Sample QC Heatmaps, Trend Analysis, and a QC Metric Config menu for threshold customization.

DRAGENArray_QC_table_YYYY_MM_DD.<xlsx|csv>

Per-sample QC table (one row per sample) containing functional metrics (when available), derived control metrics, and selected raw control-probe intensities. The file extension depends on --output-format (xlsx or csv).


QC evaluation criteria

Functional QC (per sample)

Functional QC evaluates overall genotyping performance of each sample:

  • Autosomal Call Rate Fraction of autosomal probes successfully called for a sample. Higher values indicate better performance.

  • Log R Ratio Standard Deviation (LogRDev) Measures signal noise across probes. Lower values indicate more stable intensity measurements.

Functional QC status (PASS/FAIL)

A sample’s functional QC status is determined by comparing its metrics to configured thresholds:

  • PASS

    • Autosomal Call Rate ≥ threshold

    • LogRDev ≤ threshold

  • FAIL

    • One or both metrics fall outside thresholds

Functional QC requires a per-sample metric file (for example gt_sample_summary.csv). If that file is not provided, functional QC is unavailable. It does not support methylation at this time. For methylation QC please refer to DRAGEN Array - Methylation QC on cloud.


Control-based QC (per sample)

Control-based QC evaluates whether array chemistry and processing performed as expected. Control metrics originate from:

  • controls.raw_metrics.csv — raw, probe-level control intensities

  • controls.qc_metrics.csv — summarized per-sample control QC values

Common control categories include:

  • Staining

  • Extension

  • Hybridization (High/Medium/Low)

  • Non-polymorphic

  • Non-specific binding

  • Target removal

  • Stringency

  • Restoration (when applicable)

  • Bisulfite conversion controls (methylation arrays)

  • Specificity (methylation arrays)

For more background on interpreting Infinium controls, see: Evaluation of Infinium Genotyping Assay Controls Training Guide

Control QC status & flags

Each control metric is compared to its configured threshold. When a value is outside the acceptable range, the sample receives a flag indicating the affected control and channel.


Interpreting combined QC results

The report shows both functional QC and control-based QC for every sample:

  • A sample may PASS functional QC but still receive control warnings.

  • Multiple or severe control failures may indicate assay-related issues that impact downstream results.

This combined view can help distinguish:

  • Biological failures (for example, degraded DNA)

  • Technical failures (for example, staining or hybridization issues)


HTML QC report

The HTML report is organized into dashboards designed for routine QC review and deeper troubleshooting.

Control dashboard

The Control Dashboard provides interactive visualizations of raw control-probe intensities to help identify instrument and assay issues. These plots help you review whether control signals such as staining, extension, hybridization, and non-polymorphic probes behave as expected across samples. You can:

  • Select samples directly from plots to highlight those same samples across related plots in both the Control Dashboard and the Automated QC dashboard

  • Hover to see details (for example, sample ID, barcode, position, autosomal call rate when available)

  • Explore distributions and trends to detect outliers or systematic shifts

  • Zoom and filter for focused investigation

To clear a plot-based selection, double-click in the plot area. This restores the full sample set in the Automated QC table and removes the cross-plot highlighting.

Control dashboard

Chart toolbar reference

Each plot includes a toolbar for zooming, panning, selection, autoscaling, and exporting.

Plotly toolbar controls

Automated QC

The Automated QC dashboard provides a consolidated, objective view of sample-level QC by applying QC rules and thresholds across all samples. Use it to:

  • Quickly assess pass/fail status

  • Identify samples and metrics outside thresholds

  • Support decisions for sample inclusion, reprocessing, or follow-up analysis

Functional QC Status Histogram

This histogram shows the number and percentage of samples in the dataset that are classified as PASS or FAIL for functional QC.

Functional QC status is determined from the sample-level functional metrics, using the Autosomal Call Rate and LogRDev thresholds currently applied in the report. Those thresholds can come from the defaults, a YAML config file, command-line overrides, or the QC Metric Config settings described in Updating thresholds in the HTML report.

A sample is counted as FAIL if it fails any individual functional QC metric that is currently enabled. Otherwise, the sample is counted as PASS.

Control-Based QC Status Histogram

This histogram shows the number and percentage of samples in the dataset that are FLAGGED or CLEAR for control-based QC.

Control-based QC status is determined from the control QC metric thresholds currently applied in the report. A sample is counted as FLAGGED if any individual control QC metric is outside its applied threshold. Samples without any active control-based QC flags are counted as CLEAR.

Sample QC table

The Sample QC table provides a per-sample summary of the same QC decisions shown in the histograms, along with the underlying metrics used to support review. It includes:

  • Functional QC status (PASS/FAIL) when functional metrics are available

  • Control QC flags and annotations that highlight values outside thresholds

  • Sorting and filtering to focus on failing samples or specific metrics

The table can be exported in either xlsx or csv format. The xlsx output preserves conditional coloring, while the csv output contains values only. For more detail about the table output formats, see --output-format <csv|xlsx>.

Automated QC table

QC metric plots

Supporting plots in the Automated QC dashboard complement the table by showing per-sample QC metric scatter plots. In these plots, the x-axis represents samples in the current dataset view, and the y-axis represents one derived QC metric. Users can sort the sample order by Autosomal Call Rate, Log R Dev, any propagated user-provided metadata field, or any derived QC metric available in the report. Users can also color samples by eligible propagated metadata fields to help reveal group-specific patterns or batch effects.

For genotyping arrays, the initial point colors are based on functional QC status: PASS or FAIL determined from the applied Autosomal Call Rate and Log R Dev thresholds.

Plotted metrics can include control-derived metrics such as Staining, Extension, Hybridization, Target Removal, Nonpolymorphic, Stringency, Specificity, and Bisulfite Conversion. For example, Staining Red is calculated as DNP High Red / DNP Bkg Red, and Stringency is calculated as Stringency PM (Red) / Stringency MM (Red).

Each Automated QC scatter plot also includes help text with the formula used to derive the selected metric.

Most control metrics are constructed as ratios that compare expected signal against background or against an opposing control signal. That makes them more stable for QC review than raw intensities alone because even if absolute intensities shift between scanners or runs, the signal-versus-background relationship is expected to remain relatively consistent.

When a QC threshold is set for a plotted metric, the scatter plot shows that cutoff as a dashed threshold reference line. These thresholds can come from built-in defaults, a YAML config file, command-line overrides, or values applied in the QC Metric Config menu. For more detail on how methylation QC control metrics and their recommended starting thresholds are defined, see Methylation Sample QC Summary Files and Methylation QC Threshold Adjustment.

These plots also participate in linked selection. When you select samples in one scatter plot, those same samples are highlighted in the other scatter plots across the Automated QC and Control Dashboard views, and the Automated QC table is filtered to show only the selected samples. To clear the selection, double-click in the plot area.

These plots help you:

  • See which individual samples fall outside a QC threshold

  • Compare sample-to-sample variation for one metric at a time

  • Relate noisy or failing samples to other QC views in the report

In the example figure below, the bottom plot shows a group of samples below the Stringency threshold line, and those points are also colored as functional QC FAIL. Other samples in the same figure are also colored as FAIL even though they do not fall below the Stringency cutoff, which suggests that other QC metrics may be contributing to the failure; review the other QC metric plots and the table to identify the control probes or sample-level metrics associated with those samples.

Automated QC scatter plots

Sample QC heatmaps

The Sample QC Heatmaps dashboard provides a spatial view of sample-level QC metrics across chips or plates to help detect spatial artifacts and localized issues.

Overview

  • Each cell represents a sample

  • Cell color reflects a selected QC metric (for example Autosomal Call Rate, LogRDev)

  • Hover shows sample identifiers and metric values

Plate information may be derived from:

  • IDAT metadata, or

  • A user-provided sample sheet (columns: Sample_Plate and Sample_Well)

If both sources provide plate information, the report uses the user-provided sample sheet values.

Heatmap by chip
Heatmap by plate

Trend analysis

The Trend Analysis dashboard provides a high-level view of QC trends across multiple datasets, runs, batches, or instruments.

For local analysis, this cross-dataset summary is available when you generate one report from multiple dataset folders. If you generate the report from a single dataset folder, the dashboard still appears but summarizes that one dataset only. Current cloud QC reports also operate on a single dataset at a time.

Use it to:

  • Monitor changes over runs and scan dates

  • Detect drift, batch effects, or instrument-specific anomalies

  • Compare QC performance between datasets or groups of chips

Summary table

The summary table aggregates QC metrics at the dataset level (for example):

  • Number of samples / chips

  • Scan date range

  • Autosomal Call rate statistics (min/mean/standard deviation, counts above/below threshold)

  • LogRDev statistics (mean/standard deviation, counts above threshold)

  • TGA control statistics for PGx Genotyping datasets

  • Sex prediction summary (number of males, females and unknowns)

Trend analysis table

Summary plots

Summary plots provide visual comparisons across datasets or barcodes, such as:

  • Autosomal Call rate distributions

  • LogRDev box plots

  • Sample count plots

  • Sex estimate distributions

For example, box-and-whisker plots in Trend Analysis summarize how a metric is distributed within each dataset so you can compare center, spread, and outliers across runs. In a box plot, the box represents the middle 50% of the data distribution, the center line marks the median, and the whiskers extend to the smallest and largest non-outlier values shown for that dataset. When a threshold is configured for a plotted metric, the plot also shows the applied cutoff as a dashed reference line.

The following schematic shows the main parts of a box-and-whisker plot:

Box-and-whisker plot anatomy: whiskers extend from the box to the smallest and largest non-outlier values. The box shows the interquartile range (Q1-Q3), which is the middle 50% of the data.
Trend analysis plots

Updating thresholds in the HTML report

You can update QC threshold cutoffs directly in the HTML report:

  1. Open QC Metric Config (⚙).

  2. Enter new numeric values for metrics you want to enforce.

    • Leave a field blank (or set it to null in YAML) to disable that check.

  3. Click Apply Thresholds.

Buttons

  • Apply Thresholds: Recalculates pass/fail and refreshes visuals, including threshold reference lines in plots, using the current values.

  • Download Thresholds: Exports the currently applied thresholds as a YAML file.

  • Upload Thresholds: Imports a YAML file and fills the threshold fields.

QC thresholds config

Interactive features

Filter table samples using scatter plot selection

Selecting samples in a scatter plot filters the corresponding rows in the Automated QC table. The same selected samples are also highlighted in related scatter plots across the Automated QC and Control Dashboard views. To clear this plot-based filtering and cross-plot highlighting, return to the plot and double-click in the plot area to remove the selection.

Sample selection filters table

Highlight samples in scatter plots by selecting from the table

Selecting rows in the table highlights the corresponding points in scatter plots. To clear this table-based highlighting, click Select None in the bottom-right corner of the table.

Table selection highlights samples

When exporting the table, any active row selection is preserved. If you click Excel or CSV while rows are selected, only those selected rows are exported. To export all samples, click Select None before exporting.

Table filtering and sorting

The interactive table supports:

  • Showing/hiding columns

  • Sorting by any metric

  • Filtering by search or criteria

  • Combining table filtering with plot selection for cross-linked exploration


Performance & dataset size limits

For performance reasons, large datasets may use a tabbed layout to keep the browser responsive.

  • More than 1,000 samples in a single dataset: control and automated-QC plots are arranged into tabs (interactive features remain available).

  • More than 12,000 samples in a single dataset: scatter plots are disabled and replaced with a notice.

Example tabbed layout:

Tabbed Control dashboard
Tabbed Automated QC

How to visualize very large projects

If your project exceeds the scatter-plot limit, consider:

  • Filtering input by run date/batch/folder and generating separate reports

  • Splitting input into logical batches (for example per-run or per-center)

When a QC Report html contains more than 20,000 samples across datasets, the loading can be slowed down.


Example input folder structures

Single dataset folder

In this example, project_folder/ is one dataset folder that contains the required input files for a single QC report.


Multiple dataset folders

Use this mode on local analysis when you want to combine multiple runs or batches into a single QC report and compare them in Trend Analysis.

In this mode, --data is a comma-separated list of dataset folder paths. Each path must point directly to a folder that contains the required input files. In the example below, dataset_A/ and dataset_B/ are example folder paths.

When writing a list for --data, do not include spaces before or after the commas. If you use --label, write the labels the same way: one comma-separated item per dataset, with no spaces between items.

The command below tells dragena qc report to load both dataset folders into one report:

Use this form when you already know the exact dataset folders you want to include.


Parent folder containing multiple dataset folders

Use this mode when many dataset folders are organized under one parent folder and you want the local QC report to discover them automatically.

This is different from the previous example only in how --data is used:

  • In the previous example, --data lists each dataset folder path explicitly.

  • In this example, --data points to one parent folder path, and the tool searches below that parent folder to find valid dataset folders automatically.

Some users describe this as a "recursive" search. In practice, it means the tool starts from the parent folder and looks through its subfolders for dataset folders that contain the required QC files.

The command below tells dragena qc report to start from the parent folder project_folder and automatically discover dataset folders under it:

If you do not provide --label, the report assigns a dataset label based on each detected dataset folder name (for example dataset_A, dataset_B).

To compare multiple runs on local, use one of these two approaches:

  • Use --data dataset_A,dataset_B,... when you want to list the dataset folder paths yourself.

  • Use --data project_folder when project_folder is a parent folder that contains many dataset folders and you want the tool to discover them automatically.

Parent-folder discovery labels and ordering

  • If you supply --label, labels are assigned in the order datasets are discovered.

  • If multiple datasets share the same folder name, a numeric suffix is appended (for example datasetA, datasetA_2) to ensure uniqueness.


Troubleshooting & FAQ

Missing samples

Possible cause: Sample IDs do not match across input files. Recommended action: Align Sample IDs across controls.* and (if provided) gt_sample_summary.csv.

Unexpected QC failures

Possible cause: Thresholds are too strict for your dataset or application. Recommended action: Review and adjust thresholds using --config or the QC Metric Config menu in the HTML report.

Scatter plots are disabled

Scatter plots are disabled when the dataset exceeds 12,000 samples. Recommended action: Filter or split the dataset and generate separate reports.

CSV parsing errors

Possible cause: Quoting/encoding issues. Recommended action: Ensure UTF-8 encoding and valid comma delimiters.

Report is extremely slow to load or does not finish loading

Possible cause: Very large datasets (for example, >50,000 samples) generate large input files that require significant client‑side processing. In the current implementation, extremely large datasets (for example, ~100,000 samples) may exceed browser or memory limits and fail to load completely.

Recommended action: Filter or split the dataset into smaller subsets and generate separate QC reports for each subset.

What happens if the input folder contains both genotyping and methylation datasets?

If dragena qc report detects both genotyping and methylation dataset folders within the same --data input set, report generation stops with an error.

In this guide, mixed dataset types specifically means this combination.

Example: A parent folder passed to --data contains one dataset folder with methylation qc call outputs and two dataset folders with genotyping qc call outputs.

This folder structure triggers the mixed-dataset error:

Recommended action: Generate separate QC reports for genotyping and methylation inputs. For example, separate them into different parent folders:

What happens if the input folder contains both PGx and non-PGx genotyping datasets?

This combination is supported. The report is generated as long as each dataset folder contains the required input files.

PGx datasets can include TGA control information, while non-PGx datasets do not. Other shared genotyping QC outputs are generated normally.

For example, if the input contains one PGx dataset folder and one non-PGx genotyping dataset folder, the report still runs as a single genotyping report. TGA-related values are populated only for the PGx dataset.

What happens if the input folder contains both legacy and EX genotyping chips?

This combination is supported and does not trigger the mixed-dataset error.

There is no special difference in report generation or QC interpretation beyond chip-format-specific heatmap layout. Because legacy and EX chips use different physical layouts, the chip heatmaps can appear different even when the rest of the report is comparable.

The same guidance applies to different versions of the same genotyping BeadChip family: they are treated as supported genotyping inputs rather than as mixed dataset types.


Warning and error messages

During report generation, the following situations may trigger warnings or errors:

  • Input data directory does not exist

  • Required input files are missing in one or more dataset folders

  • Number of labels does not match number of detected datasets

  • Labels are modified during sanitization and become non-unique

  • Genotyping and methylation dataset folders are detected together in a single run

  • Invalid output format is specified

  • Invalid or unsupported QC threshold configuration is provided

Note, mixed dataset types refers specifically to combining genotyping and methylation datasets in one report. Supported mixed genotyping inputs, such as PGx + non-PGx, legacy + EX, or different versions of the same genotyping BeadChip family, do not trigger this error.

Examples

Message
Cause

❌ Input data folder does not exist: data/non_existent_dir

Input directory does not exist

❌ No valid dataset folders detected. Each folder must include at least: controls.raw_metrics.csv, controls.qc_metrics.csv

No folders containing both required CSV files were found

❌ Dataset MyDataset is missing required file(s): ... controls.qc_metrics.csv ...

One or more required control metric files are absent

❌ Number of labels (2) does not match number of datasets (3).

--label count does not match detected datasets

⚠️ Label sanitized: My Label<1>My_Label_1

Label contained unsupported characters

❌ Duplicate labels after sanitization: dataset (x2). Each dataset label must be unique.

Labels became non-unique after sanitization

❌ Mixed dataset types are not supported in a single report. Genotyping datasets: geno_run_A, geno_run_B ... Methylation datasets: meth_run_C ...

A single run included both genotyping and methylation dataset folders. This error is not raised for supported mixed genotyping inputs such as PGx + non-PGx or legacy + EX.

❌ Invalid --output-format value. Only csv or xlsx are accepted.

Unsupported output format

❌ QC config file was provided but does not exist: /path/to/config.yaml

Config path does not exist

❌ Input file must be a YAML file with extension .yaml/.yml. Current config file extension: .json

Config must be YAML

❌ Failed to parse QC config as YAML ...

YAML syntax error

❌ QC config must be wrapped in a top-level QC object containing a Report object.

Config structure is incorrect

❌ Invalid threshold overrides: Unknown threshold key ...

Unrecognized threshold name

❌ Invalid threshold overrides: Invalid numeric value ...

Value could not be parsed as a number

❌ Threshold for Autosomal Call Rate must be between 0 and 1 ...

Autosomal Call Rate threshold out of bounds

❌ Threshold for stainingGreen must be non-negative ...

Threshold must be ≥ 0

Last updated

Was this helpful?