PHI Labeling Metrics
In this section, we present an analysis of the population characteristics and note attributes for the labeled sample used in our study. We compare these characteristics with the broader STARR-OMOP population to provide context and highlight any significant differences. The analysis includes age distribution, note length distribution and and note types. For PHI (Protected Health Information) distribution it is only shown on the labeled sample.
Age Distribution
Below a histogram of the distribution of ages of the population represented in the labeled notes. For comparison the characteristics of the entire STARR-OMOP population are shown as well. The Age of the patient is calculated as at the moment of the extraction and not at the moment the note was written.
Demographic Groups
In this section, we present a detailed analysis of the demographic characteristics of the labeled sample used in our study. We compare these characteristics with the broader STARR-OMOP population to provide context and highlight any significant differences. The analysis includes age distribution, sex, race, and ethnicity. This comparison helps to understand the representativeness of the labeled sample and identify teh characteristics of the biased sample.
| Age Group |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| 0-17 | 59 | 12.55 | 24,709,539 | 10.82 |
| 18-44 | 112 | 23.83 | 51,841,502 | 22.70 |
| 45-64 | 121 | 25.74 | 53,873,202 | 23.59 |
| 65+ | 178 | 37.87 | 97,932,528 | 42.89 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
| Sex |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| FEMALE | 248 | 52.77 | 128,810,665 | 56.41 |
| MALE | 220 | 46.81 | 99,506,081 | 43.57 |
| No matching concept | 2 | 0.43 | NA | NA |
| Unknown | NA | NA | 40,025 | 0.02 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
| Race |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| American Indian or Alaska Native | 23 | 4.89 | 985,689 | 0.43 |
| Asian | 77 | 16.38 | 40,808,962 | 17.87 |
| Black or African American | 62 | 13.19 | 11,182,306 | 4.90 |
| Native Hawaiian or Other Pacific Islander | 35 | 7.45 | 2,854,092 | 1.25 |
| Unknown | 161 | 34.26 | 51,605,777 | 22.60 |
| White | 112 | 23.83 | 120,919,945 | 52.95 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
| Ethnicity |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| Hispanic or Latino | 151 | 32.13 | 39,152,701 | 17.15 |
| Not Hispanic or Latino | 242 | 51.49 | 173,951,827 | 76.18 |
| Unknown | 77 | 16.38 | 15,252,243 | 6.68 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
Note Length Distribution
Below the note length distribution for the clinical notes sampled for the labeling task. For comparison the distribution of the lengths for STARR-OMOP is shown. The note length distribution is shown in characters.
| Range |
STARR-OMOP
|
Labeled Sample
|
||
|---|---|---|---|---|
| n | % | n | % | |
| 0 - 500 | 96,583,879 | 42.30 | 60 | 12.77 |
| 500 - 2500 | 71,191,327 | 31.18 | 407 | 86.60 |
| 2500 - 10000 | 49,905,133 | 21.85 | 3 | 0.64 |
| 10000 - 50000 | 10,631,213 | 4.66 | NA | NA |
| 50000 - 1e+05 | 41,626 | 0.02 | NA | NA |
| 1e+05 - 1150000 | 3,593 | 0.00 | NA | NA |
PHI Distribution
In this section, we present an analysis of the distribution of PHI entities within the labeled sample. The analysis includes the frequency of different PHI entities and their distribution across documents. This helps to understand the prevalence of various PHI types in the dataset and provides insights into the labeling process. The analysis is divided into two parts: the total count of each PHI entity type and the distribution of PHI entities per document. It is important to clarify that DOCTOR and HOSPITAL are not classified as PHI in the Safe-Harbor definition. However, being able to identify and obfuscate such information may be important for potential data sharing use cases.
Note Types
In this section, we present an analysis of the different types of clinical notes included in the labeled sample used in our study. We compare these note types with the broader STARR-OMOP population to provide context and highlight any significant differences. This comparison helps to understand the representativeness of the labeled sample and identify the differences in the bias sample in the types of notes included. The analysis includes the distribution of note types and their frequencies in both the labeled sample and the STARR-OMOP population. In the distribution below there are several types that contain radiology and pathology reports. Among those are the notes labeled as procedures. Those are a combination of clinical results that are a result of procedures.
| Distribution of Note Types in the Dataset | ||||
|---|---|---|---|---|
| Note Type |
Labeled Sample
|
STARR-OMOP
|
||
| n | % | n | % | |
| care plan note | 17 | 3.62 | 4,139,518 | 1.81 |
| letter | 17 | 3.62 | 6,889,804 | 3.02 |
| procedures | 16 | 3.40 | 2,542,658 | 1.11 |
| discharge instructions | 14 | 2.98 | 1,983,103 | 0.87 |
| anesthesia postprocedure evaluation | 13 | 2.77 | 915,373 | 0.40 |
| care plan | 13 | 2.77 | 145,491 | 0.06 |
| nursing note | 13 | 2.77 | 1,605,802 | 0.70 |
| progress notes | 13 | 2.77 | 54,086,167 | 23.68 |
| imaging | 12 | 2.55 | 27,170,511 | 11.90 |
| patient instructions | 12 | 2.55 | 9,786,849 | 4.29 |
| assessment & plan note | 11 | 2.34 | 2,240,039 | 0.98 |
| lab | 11 | 2.34 | 24,825,380 | 10.87 |
| anesthesia procedure notes | 10 | 2.13 | 762,798 | 0.33 |
| microbiology culture | 10 | 2.13 | 1,097,747 | 0.48 |
| er notes | 9 | 1.91 | 43,442 | 0.02 |
| clinic support note | 8 | 1.70 | 3,041,493 | 1.33 |
| ecg | 8 | 1.70 | 527,231 | 0.23 |
| h&p | 8 | 1.70 | 1,787,536 | 0.78 |
| lab panel | 8 | 1.70 | 118,246 | 0.05 |
| pathology and cytology | 8 | 1.70 | 395,258 | 0.17 |
| rtf letter | 8 | 1.70 | 3,549,891 | 1.55 |
| telephone encounter | 8 | 1.70 | 38,675,938 | 16.94 |
| microbiology | 7 | 1.49 | 812,474 | 0.36 |
| unmapped external results | 7 | 1.49 | 900,521 | 0.39 |
| advance care planning | 6 | 1.28 | 97,282 | 0.04 |
| consults | 6 | 1.28 | 3,313,915 | 1.45 |
| ed notes | 6 | 1.28 | 6,496,913 | 2.85 |
| ed provider notes | 6 | 1.28 | 1,862,895 | 0.82 |
| pathology | 6 | 1.28 | 1,812,971 | 0.79 |
| sign out note | 6 | 1.28 | 2,080,313 | 0.91 |
| unmapped external note | 6 | 1.28 | 277,058 | 0.12 |
| code blue/rapid response team note | 5 | 1.06 | 8,238 | 0.00 |
| documentation clarification | 5 | 1.06 | 179,822 | 0.08 |
| h&p interval | 5 | 1.06 | 244,086 | 0.11 |
| interval h&p note | 5 | 1.06 | 200,704 | 0.09 |
| pft | 5 | 1.06 | 172,001 | 0.08 |
| ancillary | 4 | 0.85 | 1,176,069 | 0.52 |
| anesthesia post-op follow-up note | 4 | 0.85 | 50,193 | 0.02 |
| consult follow-up | 4 | 0.85 | 1,159,388 | 0.51 |
| lactation note | 4 | 0.85 | 274,944 | 0.12 |
| operative report | 4 | 0.85 | 969,031 | 0.42 |
| physical therapy | 4 | 0.85 | 184,380 | 0.08 |
| point of care testing | 4 | 0.85 | 368,205 | 0.16 |
| pr charge | 4 | 0.85 | 617,387 | 0.27 |
| rehab daily note | 4 | 0.85 | 516,657 | 0.23 |
| transplant summary | 4 | 0.85 | 1,282,579 | 0.56 |
| consult | 3 | 0.64 | 533,155 | 0.23 |
| discharge summary | 3 | 0.64 | 1,040,761 | 0.46 |
| hiv lab non-restricted | 3 | 0.64 | 25,412 | 0.01 |
| hospice | 3 | 0.64 | 4,221 | 0.00 |
| manual entry echo | 3 | 0.64 | 11,058 | 0.00 |
| miscellaneous | 3 | 0.64 | 6,922 | 0.00 |
| procedure note | 3 | 0.64 | 241,589 | 0.11 |
| accountable care division cm note | 2 | 0.43 | 143,998 | 0.06 |
| anesthesia post-op | 2 | 0.43 | 88,070 | 0.04 |
| anesthesia preprocedure evaluation | 2 | 0.43 | 1,148,683 | 0.50 |
| blood bank | 2 | 0.43 | 21,218 | 0.01 |
| cardiac angio | 2 | 0.43 | 6,686 | 0.00 |
| committee review | 2 | 0.43 | 21,630 | 0.01 |
| discharge instr - other orders | 2 | 0.43 | 37,133 | 0.02 |
| ed triage notes | 2 | 0.43 | 13,480 | 0.01 |
| group note | 2 | 0.43 | 66,342 | 0.03 |
| h&p (view-only) | 2 | 0.43 | 171,763 | 0.08 |
| imaging non-reportable | 2 | 0.43 | 1,584,644 | 0.69 |
| lab only | 2 | 0.43 | 1,341 | 0.00 |
| lab only - beaker | 2 | 0.43 | 45,177 | 0.02 |
| neurology | 2 | 0.43 | 112,384 | 0.05 |
| nsg picc refer | 2 | 0.43 | 127,275 | 0.06 |
| occupational therapy | 2 | 0.43 | 193,917 | 0.08 |
| operative note | 2 | 0.43 | 521,214 | 0.23 |
| outpatient letter | 2 | 0.43 | 307,086 | 0.13 |
| plan of care | 2 | 0.43 | 646,309 | 0.28 |
| result encounter note | 2 | 0.43 | 37,539 | 0.02 |
| rn transfer note | 2 | 0.43 | 238,578 | 0.10 |
| transfer of care summary | 2 | 0.43 | 21,769 | 0.01 |
| addendum note | 1 | 0.21 | 2,098,264 | 0.92 |
| anesthesia followup note | 1 | 0.21 | 48,523 | 0.02 |
| cardiac monitors | 1 | 0.21 | 732 | 0.00 |
| cardiac services | 1 | 0.21 | 40,949 | 0.02 |
| care conference | 1 | 0.21 | 29,554 | 0.01 |
| case communication | 1 | 0.21 | 8,315 | 0.00 |
| cath angio | 1 | 0.21 | 42,514 | 0.02 |
| dermatology | 1 | 0.21 | 5,511 | 0.00 |
| discharge instr - activity | 1 | 0.21 | 15,477 | 0.01 |
| discharge instr - radiology | 1 | 0.21 | 291 | 0.00 |
| echo | 1 | 0.21 | 642,794 | 0.28 |
| echocardiography | 1 | 0.21 | 287,578 | 0.13 |
| electrophysiology | 1 | 0.21 | 26,980 | 0.01 |
| hiv lab restricted | 1 | 0.21 | 162,940 | 0.07 |
| hospital course | 1 | 0.21 | 18,606 | 0.01 |
| immediate post op note | 1 | 0.21 | 83,068 | 0.04 |
| ip letter | 1 | 0.21 | 301,987 | 0.13 |
| ir procedure notes | 1 | 0.21 | 13,264 | 0.01 |
| manual entry imaging | 1 | 0.21 | 7,752 | 0.00 |
| manual entry lab | 1 | 0.21 | 31,434 | 0.01 |
| medication review | 1 | 0.21 | 20,691 | 0.01 |
| nursing | 1 | 0.21 | 19,401 | 0.01 |
| nursing referral | 1 | 0.21 | 205,098 | 0.09 |
| ob | 1 | 0.21 | 41,869 | 0.02 |
| or postop | 1 | 0.21 | 4,351 | 0.00 |
| or surgeon | 1 | 0.21 | 14,425 | 0.01 |
| patient care conference | 1 | 0.21 | 25,671 | 0.01 |
| pharmacy medication review | 1 | 0.21 | 50,057 | 0.02 |
| pre-procedure instructions | 1 | 0.21 | 1,953 | 0.00 |
| radiation oncology treatment summary | 1 | 0.21 | 25,144 | 0.01 |
| reference labs | 1 | 0.21 | 503,157 | 0.22 |
| research note | 1 | 0.21 | 4,243 | 0.00 |
| respiratory care | 1 | 0.21 | 10,302 | 0.00 |
| transfer center follow up clinical screen | 1 | 0.21 | 9,203 | 0.00 |
| transfer center initial clinical screen | 1 | 0.21 | 20,031 | 0.01 |
| vascular ultrasound | 1 | 0.21 | 78,447 | 0.03 |
| wound care | 1 | 0.21 | 46,420 | 0.02 |