Data Lake Evolution: A Longitudinal Overview
Data Source Platforms: The VISTA Oncology Data Lake integrates data from three primary platforms: Genomic data from Philips ISPM (IntelliSpace Precision Medicine), Cancer Registry data from NeuralFrame, and DICOM Imaging data from the Vendor Neutral Archive.
Analysis Data Sources: This analysis reflects the raw PHI data from February to November 2025.
Interactive Visualizations: All plots in this analysis are interactive. Click on legend items to show/hide data series for better exploration.
Demographic Summary
The demographic composition of the cohort has remained relatively stable across releases.
†Note: The Tumor Board definition was updated in May 2025. February 2025 data uses the original definition based on encounter type, while May-November releases use an enhanced definition based on the OMOP visit_occurrence table, where the source value for the visit contains the phrase ‘tumor board’.
Mortality Tracking
The VISTA Oncology Data Lake captures mortality information from multiple sources to support survival analysis and outcomes research. Death records are consolidated from both the OMOP Clinical Data Warehouse (structured clinical death dates) and the Stanford Cancer Registry via NeuralFrame (vital status descriptions). This dual-source approach ensures comprehensive mortality tracking across the oncology cohort.
The mortality tracking shows the cumulative count of patients with death records from either OMOP clinical data (death_date) or the Stanford Cancer Registry (vital status = “Dead”).
Tumor Board and Thoracic Cancer Cohorts
The VISTA Oncology Data Lake includes specialized sub-cohorts for tumor board patients and thoracic cancer cases, enabling focused research on multidisciplinary care coordination and disease-specific outcomes. These curated cohorts demonstrate the platform’s capability to support both population-level analyses and targeted clinical investigations.
Tumor Board Sub-Cohort: Patients presented at multidisciplinary tumor boards represent complex cases requiring expert consensus for treatment planning. Tumor Board encounters are defined using the OMOP visit_occurrence table, where the source value for the visit contains the phrase ‘tumor board’.
Thoracic Cancer Sub-Cohort: Focused on lung, bronchus, and thymus malignancies, this subset enables specialized research in thoracic oncology. The cohort tracks patients identified through the Stanford Cancer Registry (NeuralFrame), supporting longitudinal studies of treatment patterns and outcomes in thoracic malignancies.
Image Occurrence Modalities
The VISTA Oncology Data Lake integrates comprehensive medical imaging data, linking diagnostic images to patient clinical records through the OMOP image occurrence table. This integration enables research connecting imaging studies with clinical outcomes, treatments, and genomic profiles.
Imaging Coverage Growth: The imaging dataset has expanded dramatically from approximately 93k patients in February 2025 to over 195k patients by November 2025, representing a substantial increase in multi-modal data availability for cancer research.
Image Series and Studies: The platform captures detailed imaging metadata including study identifiers, series information, anatomic regions, and imaging modalities (CT, MRI, PET, etc.), supporting comprehensive radiomics and image-based research.
Philips ISPM Genomic Testing
The VISTA Oncology Data Lake integrates comprehensive molecular testing data from the Philips IntelliSpace Precision Medicine (ISPM) platform, beginning with the May 2025 release. This dataset includes genomic and molecular profiling results from various testing platforms including STAMP (Stanford Actionable Mutation Panel), Heme-STAMP, and FoundationOne, enabling research on precision oncology and genomic-driven treatment strategies.
Testing Platform Coverage: The ISPM dataset captures detailed molecular testing information including test types, accession numbers, order dates, and genomic aberrations across multiple assay platforms.
Research Applications: This integration supports studies of actionable mutations, variant frequencies, treatment selection based on genomic profiles, and longitudinal tracking of molecular testing patterns in cancer care.