The research at the Children's Hospital Informatics Program spans a wide range of problems in bioinformatics and clinical informatics. Our goal is to make significant contributions to biomedical research and patient care by understanding and utilizing various types of genomic and proteomic data and by developing innovative hardware and software technologies.

Instrumenting the Healthcare Enterprise for Discovery Research

Since its inception in 2005 i2b2 been designed to provide the instrumentation for using the informational byproducts of health care and the biological materials accumulated through the delivery of health care to – and as a complement to prospective cohort studies and trials -  conduct discovery research and to study the healthcare system in vivo. The utility of this approach is demonstrated by the grass-roots adoption of i2b2 by over 84 academic health centers (AHCs) internationally, each implementation of which is a major, local institutional commitment.  i2b2 is now at the core of the Scalable Collaborative Infrastucture for a Learning Healthcare System grant from PCORI to build a national-scale clinical data infrastructure. 

Phelan-McDermid Syndrome Data Network (PMS_DN)

To collect all available patient data from Phelan-McDermid Syndrome (PMS) patients to make meaningful, well-annotated clinical data available to researchers and to share insights with members of the PCORI network. TranSMART platform based on i2b2 is being used to integrate Patient Reported Outcomes and Knowledge extracted from Clinical Notes using cTAKES

SMART Platforms -- the "App Store" for health

Substitutable Medical Apps, reusable technologies A platform with substitutable apps constructed around core services is a promising approach to driving down healthcare costs, supporting standards evolution, accommodating differences in care workflow, fostering competition in the market, and accelerating innovation.  Recent developments include the adoption of the SMART API by Cerner and Intermountain Healthcare as well as Hewlett Packard and the Harris Corporation.  The SMART Advisory Committee was recently convened including members from HCA, CMS, SureScripts, Lily, the BMJGroup.

Growth Calculator

Growth Calculator is an online anthropometric calculator, helpful for calculating a variety of standard deviation scores and velocities, as well as for predicted height calculations using heights and bone age values.

Health Map

HealthMap brings together disparate data sources to achieve a unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health. This freely available Web site integrates outbreak data of varying reliability, ranging from news sources (such as Google News) to curated personal accounts (such as ProMED) to validated official alerts (such as World Health Organization). Through an automated text processing system, the data is aggregated by disease and displayed by location for user-friendly access to the original alert. HealthMap provides a jumping-off point for real-time information on emerging infectious diseases and has particular interest for public health officials and international travelers.


There is an enormous trove of prior knowledge gleaned in the biological sciences. Macrobiology leverages this prior knowledge in the interpretation of whole physiologies (and pathologies) using empirical grounding offered by high-throughput, comprehensive measurements.

Self - Scaling Registries

The Self Scaling Registry project is an open source software platform aimed to simplify multi-institutional patient registry collaboration. Built upon existing successful open source projects i2b2 and SHRINE, the Self Scaling Registry project empowers researchers to form their own data sharing networks, manage data use, and build on top of existing datasets. Our initial deployment of this platform is supporting the CARRA network, a group of pediatric rheumatologists participating from 60 medical institutions, in forming a patient registry to be the basis for future research work, comparative effectiveness studies, and post marketing surveillance. in 2014, we received funding from the National Heart, Lung and Blood Institute to create a self-scaling registry for patients with pediatric pulmonary hypertension

Multi-source Integrated Platform for Answering Clinical Questions (MiPACQ)

Clinical question answering (cQA) systems focus on the physician needs usually at the point of care, or the investigator in the lab. The questions usually asked either require information highly specific to their patient, e.g. the patient’s lab results or previous history, answered by the patient’s health record, or a more general type of information usually answered through generally available information sources.

Pharmaco-Genomics Research Network (PGRN)

We are working on developing a RA disease activity level classifier for clinical notes directly from Electronic Health Records with chart review and with Natural Language Processing techniques.

Shared Annotation Resources (ShARe)

We are developing standards and infrastructure that can enable technology to extract scientific information from textual medical records. We are annotating a 500K word clinical narrative corpus for syntactic information following the Penn Treebank guidelines and for semantic information following the UMLS definitions. The corpus will be made available to the research community in 2014.

Strategic Health IT Advanced Research Projects (SHARP), SHARPn

We are in the process of creating several open source NLP modules for semantic analysis of clinical narratives, which include a module for coreference, relation extraction, and predicate-argument structure of the sentence. We are also currently involved in several annotation tasks that aim to create a richly annotated corpus of clinical texts. This corpus will include multiple layers of syntactic and semantic annotation such as treebank, propbank, and UMLS annotations. We are also involved in projects which focuses on utilizing active learning to reduce the cost of annotation.

Temporal History of Your Medical Events (THYME)

Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles. The goal of our current proposal is to automatically discover temporal relations from clinical free text and create a timeline.