PHI Labeling and Modeling Report

In this section, we present an analysis of the population characteristics and note attributes for the labeled sample used in our study. We compare these characteristics with the broader STARR-OMOP population to provide context and highlight any significant differences. The analysis includes age distribution, note length distribution and and note types. For PHI (Protected Health Information) distribution it is only shown on the labeled sample.

Age Distribution

Below a histogram of the distribution of ages of the population represented in the labeled notes. For comparison the characteristics of the entire STARR-OMOP population are shown as well. The Age of the patient is calculated as at the moment of the extraction and not at the moment the note was written.

Demographic Groups

In this section, we present a detailed analysis of the demographic characteristics of the labeled sample used in our study. We compare these characteristics with the broader STARR-OMOP population to provide context and highlight any significant differences. The analysis includes age distribution, sex, race, and ethnicity. This comparison helps to understand the representativeness of the labeled sample and identify teh characteristics of the biased sample.

Age Group
Labeled Sample
STARR-OMOP
n1 % n1 %
0-17 169 12.12 161 11.55
18-44 361 25.90 358 25.68
45-64 343 24.61 332 23.82
65+ 521 37.37 543 38.95
1 Note: 'n' represents the number of notes, not the number of patients.
Sex
Labeled Sample
STARR-OMOP
n1 % n1 %
FEMALE 719 51.58 719 51.58
MALE 666 47.78 666 47.78
No matching concept 9 0.65 NA NA
Unknown NA NA 9 0.65
1 Note: 'n' represents the number of notes, not the number of patients.
Race
Labeled Sample
STARR-OMOP
n1 % n1 %
American Indian or Alaska Native 72 5.16 72 5.16
Asian 222 15.93 222 15.93
Black or African American 177 12.70 177 12.70
Native Hawaiian or Other Pacific Islander 111 7.96 111 7.96
Unknown 422 30.27 422 30.27
White 390 27.98 390 27.98
1 Note: 'n' represents the number of notes, not the number of patients.
Ethnicity
Labeled Sample
STARR-OMOP
n1 % n1 %
Hispanic or Latino 408 29.27 408 29.27
Not Hispanic or Latino 764 54.81 764 54.81
Unknown 222 15.93 222 15.93
1 Note: 'n' represents the number of notes, not the number of patients.

Note Length Distribution

Below the note length distribution for the clinical notes sampled for the labeling task. For comparison the distribution of the lengths for STARR-OMOP is shown. The note length distribution is shown in characters.

Range
STARR-OMOP
Labeled Sample
n % n %
0 - 500 106,492,695 41.89 146 10.47
500 - 2500 79,624,169 31.32 1,240 88.95
2500 - 10000 55,615,894 21.88 8 0.57
10000 - 50000 12,425,322 4.89 NA NA
50000 - 1e+05 52,640 0.02 NA NA
1e+05 - 1150000 4,837 0.00 NA NA

PHI Distribution

In this section, we present an analysis of the distribution of PHI entities within the labeled sample. The analysis includes the frequency of different PHI entities and their distribution across documents. This helps to understand the prevalence of various PHI types in the dataset and provides insights into the labeling process. The analysis is divided into two parts: the total count of each PHI entity type and the distribution of PHI entities per document. It is important to clarify that DOCTOR and HOSPITAL are not classified as PHI in the Safe-Harbor definition. However, being able to identify and obfuscate such information may be important for potential data sharing use cases.

Note Types

In this section, we present an analysis of the different types of clinical notes included in the labeled sample used in our study. We compare these note types with the broader STARR-OMOP population to provide context and highlight any significant differences. This comparison helps to understand the representativeness of the labeled sample and identify the differences in the bias sample in the types of notes included. The analysis includes the distribution of note types and their frequencies in both the labeled sample and the STARR-OMOP population. In the distribution below there are several types that contain radiology and pathology reports. Among those are the notes labeled as procedures. Those are a combination of clinical results that are a result of procedures.

Distribution of Note Types in the Dataset
Note Type
Labeled Sample
STARR-OMOP
n % n %
letter 45 3.23 45 3.23
imaging 36 2.58 36 2.58
procedures 36 2.58 36 2.58
progress notes 34 2.44 34 2.44
clinic support note 32 2.30 32 2.30
patient instructions 32 2.30 32 2.30
discharge instructions 31 2.22 31 2.22
telephone encounter 31 2.22 31 2.22
nursing note 29 2.08 29 2.08
consults 27 1.94 27 1.94
care plan note 26 1.87 26 1.87
h&p 26 1.87 26 1.87
pathology 26 1.87 26 1.87
rtf letter 26 1.87 26 1.87
anesthesia procedure notes 25 1.79 25 1.79
unmapped external results 24 1.72 24 1.72
anesthesia postprocedure evaluation 23 1.65 23 1.65
care plan 23 1.65 23 1.65
ecg 23 1.65 23 1.65
pathology and cytology 23 1.65 23 1.65
procedure note 23 1.65 23 1.65
assessment & plan note 22 1.58 22 1.58
lab 21 1.51 21 1.51
rn transfer note 20 1.43 20 1.43
sign out note 19 1.36 19 1.36
unmapped external note 19 1.36 19 1.36
pr charge 18 1.29 18 1.29
operative note 17 1.22 17 1.22
operative report 17 1.22 17 1.22
consult follow-up 16 1.15 16 1.15
documentation clarification 16 1.15 16 1.15
microbiology 16 1.15 16 1.15
ed provider notes 15 1.08 15 1.08
ed temp/rap patient 15 1.08 15 1.08
microbiology culture 15 1.08 15 1.08
lab panel 14 1.00 14 1.00
point of care testing 14 1.00 14 1.00
transplant summary 14 1.00 14 1.00
ancillary 13 0.93 13 0.93
group note 13 0.93 13 0.93
interval h&p note 13 0.93 13 0.93
advance care planning 12 0.86 12 0.86
ed notes 12 0.86 12 0.86
ip letter 12 0.86 12 0.86
rehab daily note 12 0.86 12 0.86
nursing referral 11 0.79 11 0.79
physical therapy 11 0.79 11 0.79
anesthesia preprocedure evaluation 10 0.72 10 0.72
er notes 10 0.72 10 0.72
h&p interval 10 0.72 10 0.72
nsg picc refer 10 0.72 10 0.72
perfusion event 10 0.72 10 0.72
pft 10 0.72 10 0.72
consult 9 0.65 9 0.65
imaging non-reportable 9 0.65 9 0.65
neurology 9 0.65 9 0.65
nursing 9 0.65 9 0.65
rehab 9 0.65 9 0.65
accountable care division cm note 8 0.57 8 0.57
discharge summary 8 0.57 8 0.57
h&p (view-only) 8 0.57 8 0.57
hiv lab non-restricted 8 0.57 8 0.57
outpatient letter 8 0.57 8 0.57
wound care 8 0.57 8 0.57
code blue/rapid response team note 7 0.50 7 0.50
health plan operations cm note 7 0.50 7 0.50
immediate post op note 7 0.50 7 0.50
lactation note 7 0.50 7 0.50
pharmacy medication review 7 0.50 7 0.50
vascular ultrasound 7 0.50 7 0.50
anesthesia post-op follow-up note 6 0.43 6 0.43
cardiac services 6 0.43 6 0.43
echo 6 0.43 6 0.43
psych 6 0.43 6 0.43
significant event 6 0.43 6 0.43
slp 6 0.43 6 0.43
dermatology 5 0.36 5 0.36
gi 5 0.36 5 0.36
hospice 5 0.36 5 0.36
ob 5 0.36 5 0.36
anesthesia post-op 4 0.29 4 0.29
blood bank 4 0.29 4 0.29
discharge instr - other orders 4 0.29 4 0.29
hospital course 4 0.29 4 0.29
manual entry echo 4 0.29 4 0.29
miscellaneous 4 0.29 4 0.29
occupational therapy 4 0.29 4 0.29
or surgeon 4 0.29 4 0.29
patient care conference 4 0.29 4 0.29
plan of care 4 0.29 4 0.29
anesthesia followup note 3 0.22 3 0.22
bh treatment plan 3 0.22 3 0.22
care conference 3 0.22 3 0.22
cath angio 3 0.22 3 0.22
clinical letter 3 0.22 3 0.22
committee review 3 0.22 3 0.22
consult follow up 3 0.22 3 0.22
echocardiography 3 0.22 3 0.22
home health plan of care 3 0.22 3 0.22
ir procedure notes 3 0.22 3 0.22
l&d delivery note 3 0.22 3 0.22
respiratory care 3 0.22 3 0.22
result encounter note 3 0.22 3 0.22
subjective & objective 3 0.22 3 0.22
transfer center follow up clinical screen 3 0.22 3 0.22
transfer center initial clinical screen 3 0.22 3 0.22
acp (advance care planning) 2 0.14 2 0.14
cardiac angio 2 0.14 2 0.14
case communication 2 0.14 2 0.14
discharge instr - brief hospital 2 0.14 2 0.14
ed triage notes 2 0.14 2 0.14
electrophysiology 2 0.14 2 0.14
event note 2 0.14 2 0.14
hiv lab restricted 2 0.14 2 0.14
lab only 2 0.14 2 0.14
lab only - beaker 2 0.14 2 0.14
ltc provider review 2 0.14 2 0.14
manual entry imaging 2 0.14 2 0.14
radiation oncology treatment summary 2 0.14 2 0.14
reference labs 2 0.14 2 0.14
speech language pathology 2 0.14 2 0.14
transfer of care summary 2 0.14 2 0.14
visit note 2 0.14 2 0.14
addendum note 1 0.07 1 0.07
anesthesia pre-op 1 0.07 1 0.07
cardiac monitors 1 0.07 1 0.07
code documentation 1 0.07 1 0.07
disch intstr - brief hosp trans 1 0.07 1 0.07
discharge instr - activity 1 0.07 1 0.07
discharge instr - appointments 1 0.07 1 0.07
discharge instr - diet 1 0.07 1 0.07
discharge instr - radiology 1 0.07 1 0.07
ent 1 0.07 1 0.07
interim summary 1 0.07 1 0.07
manual entry lab 1 0.07 1 0.07
medication review 1 0.07 1 0.07
mock-up query 1 0.07 1 0.07
multi-disciplinary team discussion 1 0.07 1 0.07
nutrition services 1 0.07 1 0.07
or postop 1 0.07 1 0.07
ot 1 0.07 1 0.07
pre-procedure assessment 1 0.07 1 0.07
pre-procedure instructions 1 0.07 1 0.07
rehab evaluation 1 0.07 1 0.07
research note 1 0.07 1 0.07
telephone encounter follow up 1 0.07 1 0.07