Phenotyping across Multiple Institutions
Electronic Health Records (EHRs) present a robust source of data for phenotyping.  However, patients, especially in urban settings, may seek health care at multiple facilities, likewise fragmenting their health information.  By utilizing the Chicago HealthLNK Data Repository (CHDR) city-wide database, which consists of merged and de-duplicated patient EHRs from seven major institutions within Chicago and the surrounding area, we are able to supplement single site EHR data with data from other institutions.  We then can compare outcomes of phenotype algorithms using single-site and multi-site de-duplicated data sets across various conditions.

Estimated Aggregation/Segregation with Geo-Spatial Simulation & Regression
U.S. Census data, given its specificity and breadth, presents a powerful and unique tool for the development of useful models of the environmental impacts on human health.  However, based on the premise that the identification of citizens and their specific health conditions is an invasion of privacy, most health data is aggregated such that it is impossible to draw useful conclusions about health outcomes based on environmental factors.  Our work focuses on using health outcome observations, such as diagnoses, along with spatially explicit environmental datasets and aggregated U.S. Census demographic & socio-economic data to dis-aggregate existing de-identified health data sets such that useful environmental/health models can be developed.  The process involves sometimes aggregating health data into larger, more ‘coarse’ spatial units to develop greater statistical power as well as ‘down’, to estimate probable patient location at extremely fine spatial resolutions.