All of Us Genomic Quality Report

The document attached details the All of Us Genome Centers (GC) and Data and Research Center (DRC) quality control (QC) steps for the genomic data in the research pipeline. This pipeline removes or flags samples and variants in the genomic data that fail quality thresholds. We apply the pipeline before we release the genomic data for research use. We, the All Of Us Data and Research Center (DRC), only describe QC processes that are performed analytically (i.e., after the sample has been genotyped and sequenced). All descriptions and results are limited to the Q2 2022 release made available in the Researcher Workbench June 22rd, 2022, which contains 165,127 array samples and 98,590 whole genome sequencing (WGS) samples. The samples in the genomic data correspond to the All of Us Curated Data Repository (CDR) release C2022Q2R2 (Controlled Tier Dataset v6). These pipelines are automated unless otherwise noted. This document covers all genomic datatypes made available to researchers at this time including small variants (SNPs and Indels) for arrays and short-read WGS.

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request