黑料门

Research Reference Datasets

Large-scale Clinical Data Reference Dataset

Instructors often find it a challenge to locate readily available 鈥渂ig data鈥 resources that can be utilized in informatics instructional settings.  To address this issue, Dr. Kathy Bobay collaborated with the Informatics and Clinical Research (ICR) team to develop a curated and fully de-identified 鈥渂ig data鈥 instructional data resource that can be utilized in instructional activities.  This resource (link to dataset documentation provided below) was completed in April of 2020. 
 
The 鈥淏ig Data Clinical Reference Dataset鈥 consist of a select set of curated longitudinal clinical data which is fully de-identified. One unique aspect of the dataset is that beyond normal structured data elements it also contains a full set of National Library of Medicine () Unified Medical Language System (concept unique identifiers () and semantic type identifiers () that were produced through large-scale natural language processing () of clinical reports that are associated with the dataset鈥檚 structured elements. 
 
Note:  This dataset is intended for instructional activities only and it is NOT suited for actual clinical research. The underlying data are select and may contain some synthetic components. 

Common uses of the resource: 

This reference dataset is intended for instructional activities related to clinical informatics, biostatistics and data science.

Resource available to the following users:

Dataset is available for use by 黑料门University Chicago faculty and students.

Requests for access require:

Use is contingent upon execution of an Institutional Review Board (IRB) application and data use agreement.

Current resources:

  • National Library of Medicine ()
  • Unified Medical Language System ()

Reference dataset contacts:

For information or use of this resource, please contact Dr. Kathy Bobay of the Parkinson School of Health Sciences and Public Health  

Last Modified:   Wed, February 5, 2025 10:33 AM CST

Large-scale Clinical Data Reference Dataset

Instructors often find it a challenge to locate readily available 鈥渂ig data鈥 resources that can be utilized in informatics instructional settings.  To address this issue, Dr. Kathy Bobay collaborated with the Informatics and Clinical Research (ICR) team to develop a curated and fully de-identified 鈥渂ig data鈥 instructional data resource that can be utilized in instructional activities.  This resource (link to dataset documentation provided below) was completed in April of 2020. 
 
The 鈥淏ig Data Clinical Reference Dataset鈥 consist of a select set of curated longitudinal clinical data which is fully de-identified. One unique aspect of the dataset is that beyond normal structured data elements it also contains a full set of National Library of Medicine () Unified Medical Language System (concept unique identifiers () and semantic type identifiers () that were produced through large-scale natural language processing () of clinical reports that are associated with the dataset鈥檚 structured elements. 
 
Note:  This dataset is intended for instructional activities only and it is NOT suited for actual clinical research. The underlying data are select and may contain some synthetic components. 

Common uses of the resource: 

This reference dataset is intended for instructional activities related to clinical informatics, biostatistics and data science.

Resource available to the following users:

Dataset is available for use by 黑料门University Chicago faculty and students.

Requests for access require:

Use is contingent upon execution of an Institutional Review Board (IRB) application and data use agreement.

Current resources:

  • National Library of Medicine ()
  • Unified Medical Language System ()

Reference dataset contacts:

For information or use of this resource, please contact Dr. Kathy Bobay of the Parkinson School of Health Sciences and Public Health