Data Labeling and Modeling
This page provides an overview of our data labeling initiatives focused on extracting structured information from unstructured clinical data.
PHI Labeling
Our current focus is on labeling Protected Health Information (PHI) in clinical notes. This project is crucial for:
- Developing robust de-identification systems
- Ensuring patient privacy compliance
- Creating high-quality training data for machine learning models
AI for Automatic Synoptic Reporting
Our team is using AI models to automatically extract structured information from pathology reports. Key aspects of this work include:
- Identifying key data elements from College of American Pathologists (CAP) cancer protocols
- Converting unstructured free-text reports into standardized synoptic formats
- Validating model accuracy against human expert annotation
This initiative aims to improve data standardization, reduce manual extraction efforts, and enhance the completeness of cancer registry data.