How do I select specific variants from the Hail MatrixTables?

The simplest way to isolate or select a specific variant or set of variants from a defined genomic region in the All of Us Hail MatrixTables is using the code snippet listed below from the “Manipulate Hail MatrixTable” notebook in the “How to Work with All of Us Genomic Data” Featured Workspace. It provides a great walkthrough about setting up the appropriate environment, loading Hail and the MatrixTable, and then using this snippet to isolate variants. 

This specific snippet filters variants from base pairs 32355000-32375000 on chromosome 13. First, you define the region of interest using the “test_intervals” function, then filter using the following function below it.

mt = hl.filter_intervals(test_intervals = ['chr13:32355000-32375000'] ##change sites here to the variants you are interested in##
   mt,
   [hl.parse_locus_interval(x,)
     for x in test_intervals])

Using this snippet, all you need to change is the text in the brackets based on the genomic region you’re isolating.

 

Examples

Here are some examples using the snippet.

To select all of chromosome 13:

test_intervals = ['chr13'] 
mt = hl.filter_intervals(
   mt,
   [hl.parse_locus_interval(x,)
     for x in test_intervals])

To select two different intervals on different chromosomes:

test_intervals = ['chr1:100M-200M', 'chr16:29.1M-30.2M'] 
mt = hl.filter_intervals(
   mt,
   [hl.parse_locus_interval(x,)
     for x in test_intervals])

To select a single specific variant (the BRCA2 variant chr13-32355250-T-C). Importantly, you need to end the interval at least +1 bp past the variant of interest.

test_intervals = ['chr13:32355250-32355251'] 
mt = hl.filter_intervals(
   mt,
   [hl.parse_locus_interval(x,)
     for x in test_intervals])

 

Important notes

We also have a walkthrough video that goes over the use of this snippet, as well as necessary prior steps to load and use the MatrixTable. 

It’s possible to select variants based on their rsID or gene name, but this is a bit more complicated because it first requires annotating a MatrixTable with the variant annotation table. To see more about annotating a MatrixTable, please see the “Getting Started with Genomic Data” notebooks in the “How to Work with All of Us Genomic Data” Featured Workspace as well as the following video from the All of Us data science team. 

 

Was this article helpful?

1 out of 2 found this helpful

Have more questions? Submit a request