Health Informatics

Mission: (1) To enable data-driven knowledge discovery through multi-modal datasets and novel analysis pipelines that will translate to clinical studies and findings to clinical decision making and population health; (2) to educate the next generation of clinical informaticians, investigators in the value of electronic health data in research, and the community about health data and clinical informatics; and (3) to integrate with the CTSA national network and contribute data sources, open-source software, and analysis methods.


The Health Informatics module's informatics and training components extensively leverage the expertise and past and ongoing efforts in the Department of Biomedical Informatics. Visit our Research Services page to submit a request for consultations or services.

We provide a wide spectrum of informatics services and capabilities to investigators. These include an enhanced Research Data Warehouse (RDW), which hosts a curated collection of datasets from multiple data modalities. The RDW builds on scalable, reliable data ingestion, quality control and harmonization pipelines which have been optimized and field-tested with a very large volume of data in the National COVID Cohort Collaborative (N3C). The module also provides a growing suite of machine learning-based data analysis methods. The current set of methods include deep learning pipelines for clinical data, Radiology imaging data, and Pathology imaging data. Additional methods for text data and methods for integrative analyses of imaging, clinical and text data are being actively developed. 

Informatics Capabilities

The Research Data Warehouse (RDW)

Stony Brook's Research Data Warehouse (RDW) is intended to support data-driven knowledge discovery for researchers. Data includes health records, hand abstracted data, administrative data, radiology and histopathology image data, clinical reports, and features derived by machine learning methods. The RDW also captures structured data from genomic testing and allows linking of these results to patient data. 

Clinical Databases

  1. Cerner HealtheIntent, Cerner’s main data warehouse updated daily with transaction data from Cerner Millenium.
  2. ElasticSearch repository, a searchable index of a wide range of Stony Brook patients' clinical notes. The notes currently include those for aortic aneurysm screening, COVID-19 positive patients, and inpatient and emergency department patients.
  3. Customized Stony Brook patient DataMarts for county-level data derived from NY SPARCS inpatient and emergency department encounters to assist system-wide planning, post-COVID kidney function data, inpatient blood glucose monitoring, and aortic aneurysms clinical data linked to aortic size.
  4. A comprehensive database (OHDSI CDM) for COVID-19 patients seen at Stony Brook dating from the beginning of the COVID-19 pandemic. 

Imaging Databases

  1. De-identified Radiology DICOM images and associated image metadata obtained from COVID-19 and aortic aneurysm patients seen at Stony Brook.
  2. Whole slide tissue images collection of more than 40,000 whole slide images, including images collected in projects at Stony Brook and in collaborative projects with other institutions. We also have access to a large collection of images from the Cancer Genome Atlas project.

Data Analytics

Our team has developed innovative solutions for analysis and management of imaging data. These include the PRISM platform, a suite of software services and tools for management and visualization of Radiology and Pathology imaging data and features, QuIP82, which is a micro-services platform for management, visualization and interrogation of Pathology imaging data and features, and a collection of machine/deep learning analysis pipelines for biomedical data analysis.

Example: Using Imaging for Predictive Analysis 

Our team has developed artificial intelligence (AI) methods that automatically extract salient features from images and predict outcomes. For example, we trained an AI workflow using serial chest radiographs of COVID-19 patients to predict lung infiltrate progression, mortality, and need for mechanical ventilation. 

Tissue specimen images contain highly detailed data on cancer morphology at the sub-cellular level. AI methods can be trained to detect and segment tumor regions, nuclei and cells in these images and predict spatial patterns of different types of cells. For example, spatial relationships between tumor regions and tumor infiltrating lymphocytes can be used to predict patient survival outcomes. 

These maps were generated by training AI methods to segment tumor regions and detect distributions of lymphocytes across the tissue.

composite map


Upper-left: Low-resolution version of the input whole slide image.

Upper-right: The tumor region map. 

Lower-left: Distribution map of tumor infiltrating lymphocytes.

Lower-right: The composite figure that combines the tumor and TIL maps.


Training and Consulting

The module provides a range of training and consulting services. These include a comprehensive collection of courses, bootcamps, and community outreach activities in informatics; one-on-one consulting services for investigators; and structured Studio sessions for investigators to maximize benefits from clinical data and informatics in their proposals and on-going research projects. The collection of training and consulting services is designed to create a matrix infrastructure, technologies, and education to advance biomedical research across many driving questions and support data-driven knowledge discovery. Examples of the training and consulting services are: 

  1. Training through Biomedical Informatics courses, informatics bootcamps, high school and community outreach. Bootcamps are designed to cover a wide range of topics  in telehealth, mining clinical data, and research informatics directed at healthcare professionals from inside and outside Stony Brook.
  2. Consulting to guide investigators on the application of informatics technologies for incorporating clinical datasets from the research data warehouse and analyzing the datasets in their research projects. 
  3. Studio sessions to pre-review an investigator’s proposal and evaluation of their new or ongoing project for informatics and data analysis requirements by a panel of experts. The goal of the Studio sessions is to help investigators incorporate appropriate informatics 


Module Lead

Joel Saltz

Joel Saltz, MD, PhD
Professor and Chair Biomedical Informatics

Phone: (631) 638-1420

Last Updated