A "concept" represents health information obtained from electronic health records (EHR) or participant provided information (PPI) sources, which are derived from surveys. Understanding how concepts are defined, organized, and the potential complex relationships that can exist between them in the All of Us dataset will help you decide which concepts to select within your cohort for analysis.
All concepts are located in the concept table. Relationships between concepts can be found in the concept_relationship and the concept_ancestor tables. We will talk more about this below.
Example row in the concept table:
Let's explore data within the concept table using atrial fibrillation as an example. Below is the row for atrial fibrillation in the concept table. You can see that it falls within the domain "Condition" and is assigned a concept_id and concept_code.
Why do we need concept ID’s?
Some publicly available vocabularies use the same codes. This is a problem when creating a database that includes all source vocabularies.
Let’s explore different ways to pull data on specific concepts...
One benefit of a common data model is that it gives researchers flexibility to explore data. You can either pull specific data using a concept_id or pull all data related to a source vocabulary code such as an ICD9 code. We will be using SQL to perform our example queries of OMOP tables. These types of queries are done for you when you use the Cohort Builder and Dataset Builder tools. You are able to see the code once you export a dataset to a notebook. To learn more about concept tables, click here.
This is the standard form of a SQL query:
SELECT column1, column2, … (Use * to select all columns from a table)
How to pull data…
…if you know the concept NAME
Note that concept-name is not a unique identifier. For example, several concepts are available within our CDR that describe “depression.” Our example here will be modified for brevity and will not show all concepts.
WHERE concept_name = "Major depressive disorder"
…if you know the concept ID
WHERE concept_id = 4152280
…if you know the concept CODE
WHERE concept_code = '370143000'
What if you don’t know any of the above, but you do know the code in another vocabulary (e.g., ICD9CM, ICD10CM, CPT, etc.)?
WHERE concept_code = ‘296.3’
Using the concept_id found from the query above, you can find the equivalent SNOMED concept ID by looking in the concept_relationship table.
Look for “Maps to” in the relationship_id column
WHERE concept_id_1 = 44825293
Putting it all together…
1. Look up the source concept
WHERE concept_code = ‘296.3’
2. Translate to standard vocabulary
WHERE concept_id_1 = 44825293 AND relationship_id = ‘Maps to’
3. Check out the translated concept
WHERE concept_id = 4282316
As you’ve now seen, concepts stored in the data model can have complex relationships. For example, there are several different types of recurrent major depression, including mild recurrent major depression. In this example, recurrent major depression is an ancestor concept and subsumes mild recurrent major depression, which is a descendant concept. Now let's explore what other concepts are related to atrial fibrillation by looking at the relationship table. To learn more about relationship tables, click here.
Use the concept_relationship table to find the concepts that relate to major recurrent depression (concept_id = 4282316):
WHERE concept_id_1 = 4282316
To find out what is concept_id_2, join the concept table to the concept_relationship table.
SELECT cr.relationship_id, c.*
FROM concept_relationship cr
JOIN concept c
ON cr.concept_id_2 = c.concept_id
WHERE cr.concept_id_1 = 4282316
Mapped from = source vocabulary
Is a = ancestor concept
Subsumes = descendant concept
Note that shown above is only a portion of the resultant table from the SQL query above.
The OMOP common data model (CDM) is a powerful resource that helps researchers study data from disparate sources. Below is one final image using major depressive disorder as example to help clarify what we have covered so far. The source codes for major depressive disorder are ICD9CM: 296.3 or ICD10CM: F33. These source codes are assigned a concept_id and then mapped to a standard concept_code and concept_id.