PHI Labeling Metrics
In this section, we present an analysis of the population characteristics and note attributes for the labeled sample used in our study. We compare these characteristics with the broader STARR-OMOP population to provide context and highlight any significant differences. The analysis includes age distribution, note length distribution and and note types. For PHI (Protected Health Information) distribution it is only shown on the labeled sample.
Age Distribution
Below a histogram of the distribution of ages of the population represented in the labeled notes. For comparison the characteristics of the entire STARR-OMOP population are shown as well. The Age of the patient is calculated as at the moment of the extraction and not at the moment the note was written.
Demographic Groups
In this section, we present a detailed analysis of the demographic characteristics of the labeled sample used in our study. We compare these characteristics with the broader STARR-OMOP population to provide context and highlight any significant differences. The analysis includes age distribution, sex, race, and ethnicity. This comparison helps to understand the representativeness of the labeled sample and identify teh characteristics of the biased sample.
| Age Group |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| 0-17 | 169 | 12.12 | 24,709,539 | 10.82 |
| 18-44 | 361 | 25.90 | 51,841,502 | 22.70 |
| 45-64 | 343 | 24.61 | 53,873,202 | 23.59 |
| 65+ | 521 | 37.37 | 97,932,528 | 42.89 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
| Sex |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| FEMALE | 719 | 51.58 | 128,810,665 | 56.41 |
| MALE | 666 | 47.78 | 99,506,081 | 43.57 |
| No matching concept | 9 | 0.65 | NA | NA |
| Unknown | NA | NA | 40,025 | 0.02 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
| Race |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| American Indian or Alaska Native | 72 | 5.16 | 985,689 | 0.43 |
| Asian | 222 | 15.93 | 40,808,962 | 17.87 |
| Black or African American | 177 | 12.70 | 11,182,306 | 4.90 |
| Native Hawaiian or Other Pacific Islander | 111 | 7.96 | 2,854,092 | 1.25 |
| Unknown | 422 | 30.27 | 51,605,777 | 22.60 |
| White | 390 | 27.98 | 120,919,945 | 52.95 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
| Ethnicity |
Labeled Sample
|
STARR-OMOP
|
||
|---|---|---|---|---|
| n1 | % | n1 | % | |
| Hispanic or Latino | 408 | 29.27 | 39,152,701 | 17.15 |
| Not Hispanic or Latino | 764 | 54.81 | 173,951,827 | 76.18 |
| Unknown | 222 | 15.93 | 15,252,243 | 6.68 |
| 1 Note: 'n' represents the number of notes, not the number of patients. | ||||
Note Length Distribution
Below the note length distribution for the clinical notes sampled for the labeling task. For comparison the distribution of the lengths for STARR-OMOP is shown. The note length distribution is shown in characters.
| Range |
STARR-OMOP
|
Labeled Sample
|
||
|---|---|---|---|---|
| n | % | n | % | |
| 0 - 500 | 96,583,879 | 42.30 | 146 | 10.47 |
| 500 - 2500 | 71,191,327 | 31.18 | 1,240 | 88.95 |
| 2500 - 10000 | 49,905,133 | 21.85 | 8 | 0.57 |
| 10000 - 50000 | 10,631,213 | 4.66 | NA | NA |
| 50000 - 1e+05 | 41,626 | 0.02 | NA | NA |
| 1e+05 - 1150000 | 3,593 | 0.00 | NA | NA |
PHI Distribution
In this section, we present an analysis of the distribution of PHI entities within the labeled sample. The analysis includes the frequency of different PHI entities and their distribution across documents. This helps to understand the prevalence of various PHI types in the dataset and provides insights into the labeling process. The analysis is divided into two parts: the total count of each PHI entity type and the distribution of PHI entities per document. It is important to clarify that DOCTOR and HOSPITAL are not classified as PHI in the Safe-Harbor definition. However, being able to identify and obfuscate such information may be important for potential data sharing use cases.
Note Types
In this section, we present an analysis of the different types of clinical notes included in the labeled sample used in our study. We compare these note types with the broader STARR-OMOP population to provide context and highlight any significant differences. This comparison helps to understand the representativeness of the labeled sample and identify the differences in the bias sample in the types of notes included. The analysis includes the distribution of note types and their frequencies in both the labeled sample and the STARR-OMOP population. In the distribution below there are several types that contain radiology and pathology reports. Among those are the notes labeled as procedures. Those are a combination of clinical results that are a result of procedures.
| Distribution of Note Types in the Dataset | ||||
|---|---|---|---|---|
| Note Type |
Labeled Sample
|
STARR-OMOP
|
||
| n | % | n | % | |
| letter | 45 | 3.23 | 6,889,804 | 3.02 |
| imaging | 36 | 2.58 | 27,170,511 | 11.90 |
| procedures | 36 | 2.58 | 2,542,658 | 1.11 |
| progress notes | 34 | 2.44 | 54,086,167 | 23.68 |
| clinic support note | 32 | 2.30 | 3,041,493 | 1.33 |
| patient instructions | 32 | 2.30 | 9,786,849 | 4.29 |
| discharge instructions | 31 | 2.22 | 1,983,103 | 0.87 |
| telephone encounter | 31 | 2.22 | 38,675,938 | 16.94 |
| nursing note | 29 | 2.08 | 1,605,802 | 0.70 |
| consults | 27 | 1.94 | 3,313,915 | 1.45 |
| care plan note | 26 | 1.87 | 4,139,518 | 1.81 |
| h&p | 26 | 1.87 | 1,787,536 | 0.78 |
| pathology | 26 | 1.87 | 1,812,971 | 0.79 |
| rtf letter | 26 | 1.87 | 3,549,891 | 1.55 |
| anesthesia procedure notes | 25 | 1.79 | 762,798 | 0.33 |
| unmapped external results | 24 | 1.72 | 900,521 | 0.39 |
| anesthesia postprocedure evaluation | 23 | 1.65 | 915,373 | 0.40 |
| care plan | 23 | 1.65 | 145,491 | 0.06 |
| ecg | 23 | 1.65 | 527,231 | 0.23 |
| pathology and cytology | 23 | 1.65 | 395,258 | 0.17 |
| procedure note | 23 | 1.65 | 241,589 | 0.11 |
| assessment & plan note | 22 | 1.58 | 2,240,039 | 0.98 |
| lab | 21 | 1.51 | 24,825,380 | 10.87 |
| rn transfer note | 20 | 1.43 | 238,578 | 0.10 |
| sign out note | 19 | 1.36 | 2,080,313 | 0.91 |
| unmapped external note | 19 | 1.36 | 277,058 | 0.12 |
| pr charge | 18 | 1.29 | 617,387 | 0.27 |
| operative note | 17 | 1.22 | 521,214 | 0.23 |
| operative report | 17 | 1.22 | 969,031 | 0.42 |
| consult follow-up | 16 | 1.15 | 1,159,388 | 0.51 |
| documentation clarification | 16 | 1.15 | 179,822 | 0.08 |
| microbiology | 16 | 1.15 | 812,474 | 0.36 |
| ed provider notes | 15 | 1.08 | 1,862,895 | 0.82 |
| ed temp/rap patient | 15 | 1.08 | 142,004 | 0.06 |
| microbiology culture | 15 | 1.08 | 1,097,747 | 0.48 |
| lab panel | 14 | 1.00 | 118,246 | 0.05 |
| point of care testing | 14 | 1.00 | 368,205 | 0.16 |
| transplant summary | 14 | 1.00 | 1,282,579 | 0.56 |
| ancillary | 13 | 0.93 | 1,176,069 | 0.52 |
| group note | 13 | 0.93 | 66,342 | 0.03 |
| interval h&p note | 13 | 0.93 | 200,704 | 0.09 |
| advance care planning | 12 | 0.86 | 97,282 | 0.04 |
| ed notes | 12 | 0.86 | 6,496,913 | 2.85 |
| ip letter | 12 | 0.86 | 301,987 | 0.13 |
| rehab daily note | 12 | 0.86 | 516,657 | 0.23 |
| nursing referral | 11 | 0.79 | 205,098 | 0.09 |
| physical therapy | 11 | 0.79 | 184,380 | 0.08 |
| anesthesia preprocedure evaluation | 10 | 0.72 | 1,148,683 | 0.50 |
| er notes | 10 | 0.72 | 43,442 | 0.02 |
| h&p interval | 10 | 0.72 | 244,086 | 0.11 |
| nsg picc refer | 10 | 0.72 | 127,275 | 0.06 |
| perfusion event | 10 | 0.72 | 13,641 | 0.01 |
| pft | 10 | 0.72 | 172,001 | 0.08 |
| consult | 9 | 0.65 | 533,155 | 0.23 |
| imaging non-reportable | 9 | 0.65 | 1,584,644 | 0.69 |
| neurology | 9 | 0.65 | 112,384 | 0.05 |
| nursing | 9 | 0.65 | 19,401 | 0.01 |
| rehab | 9 | 0.65 | 657,127 | 0.29 |
| accountable care division cm note | 8 | 0.57 | 143,998 | 0.06 |
| discharge summary | 8 | 0.57 | 1,040,761 | 0.46 |
| h&p (view-only) | 8 | 0.57 | 171,763 | 0.08 |
| hiv lab non-restricted | 8 | 0.57 | 25,412 | 0.01 |
| outpatient letter | 8 | 0.57 | 307,086 | 0.13 |
| wound care | 8 | 0.57 | 46,420 | 0.02 |
| code blue/rapid response team note | 7 | 0.50 | 8,238 | 0.00 |
| health plan operations cm note | 7 | 0.50 | 37,015 | 0.02 |
| immediate post op note | 7 | 0.50 | 83,068 | 0.04 |
| lactation note | 7 | 0.50 | 274,944 | 0.12 |
| pharmacy medication review | 7 | 0.50 | 50,057 | 0.02 |
| vascular ultrasound | 7 | 0.50 | 78,447 | 0.03 |
| anesthesia post-op follow-up note | 6 | 0.43 | 50,193 | 0.02 |
| cardiac services | 6 | 0.43 | 40,949 | 0.02 |
| echo | 6 | 0.43 | 642,794 | 0.28 |
| psych | 6 | 0.43 | 386,747 | 0.17 |
| significant event | 6 | 0.43 | 8,713 | 0.00 |
| slp | 6 | 0.43 | 82,010 | 0.04 |
| dermatology | 5 | 0.36 | 5,511 | 0.00 |
| gi | 5 | 0.36 | 317,911 | 0.14 |
| hospice | 5 | 0.36 | 4,221 | 0.00 |
| ob | 5 | 0.36 | 41,869 | 0.02 |
| anesthesia post-op | 4 | 0.29 | 88,070 | 0.04 |
| blood bank | 4 | 0.29 | 21,218 | 0.01 |
| discharge instr - other orders | 4 | 0.29 | 37,133 | 0.02 |
| hospital course | 4 | 0.29 | 18,606 | 0.01 |
| manual entry echo | 4 | 0.29 | 11,058 | 0.00 |
| miscellaneous | 4 | 0.29 | 6,922 | 0.00 |
| occupational therapy | 4 | 0.29 | 193,917 | 0.08 |
| or surgeon | 4 | 0.29 | 14,425 | 0.01 |
| patient care conference | 4 | 0.29 | 25,671 | 0.01 |
| plan of care | 4 | 0.29 | 646,309 | 0.28 |
| anesthesia followup note | 3 | 0.22 | 48,523 | 0.02 |
| bh treatment plan | 3 | 0.22 | 7,561 | 0.00 |
| care conference | 3 | 0.22 | 29,554 | 0.01 |
| cath angio | 3 | 0.22 | 42,514 | 0.02 |
| clinical letter | 3 | 0.22 | 357,922 | 0.16 |
| committee review | 3 | 0.22 | 21,630 | 0.01 |
| consult follow up | 3 | 0.22 | 103,375 | 0.05 |
| echocardiography | 3 | 0.22 | 287,578 | 0.13 |
| home health plan of care | 3 | 0.22 | 674 | 0.00 |
| ir procedure notes | 3 | 0.22 | 13,264 | 0.01 |
| l&d delivery note | 3 | 0.22 | 62,042 | 0.03 |
| respiratory care | 3 | 0.22 | 10,302 | 0.00 |
| result encounter note | 3 | 0.22 | 37,539 | 0.02 |
| subjective & objective | 3 | 0.22 | 120,904 | 0.05 |
| transfer center follow up clinical screen | 3 | 0.22 | 9,203 | 0.00 |
| transfer center initial clinical screen | 3 | 0.22 | 20,031 | 0.01 |
| acp (advance care planning) | 2 | 0.14 | 752 | 0.00 |
| cardiac angio | 2 | 0.14 | 6,686 | 0.00 |
| case communication | 2 | 0.14 | 8,315 | 0.00 |
| discharge instr - brief hospital | 2 | 0.14 | 76,477 | 0.03 |
| ed triage notes | 2 | 0.14 | 13,480 | 0.01 |
| electrophysiology | 2 | 0.14 | 26,980 | 0.01 |
| event note | 2 | 0.14 | 3,478 | 0.00 |
| hiv lab restricted | 2 | 0.14 | 162,940 | 0.07 |
| lab only | 2 | 0.14 | 1,341 | 0.00 |
| lab only - beaker | 2 | 0.14 | 45,177 | 0.02 |
| ltc provider review | 2 | 0.14 | 2,003 | 0.00 |
| manual entry imaging | 2 | 0.14 | 7,752 | 0.00 |
| radiation oncology treatment summary | 2 | 0.14 | 25,144 | 0.01 |
| reference labs | 2 | 0.14 | 503,157 | 0.22 |
| speech language pathology | 2 | 0.14 | 36,237 | 0.02 |
| transfer of care summary | 2 | 0.14 | 21,769 | 0.01 |
| visit note | 2 | 0.14 | 12,203 | 0.01 |
| addendum note | 1 | 0.07 | 2,098,264 | 0.92 |
| anesthesia pre-op | 1 | 0.07 | 4,118 | 0.00 |
| cardiac monitors | 1 | 0.07 | 732 | 0.00 |
| code documentation | 1 | 0.07 | 1,069 | 0.00 |
| disch intstr - brief hosp trans | 1 | 0.07 | 41,461 | 0.02 |
| discharge instr - activity | 1 | 0.07 | 15,477 | 0.01 |
| discharge instr - appointments | 1 | 0.07 | 38,729 | 0.02 |
| discharge instr - diet | 1 | 0.07 | 14,731 | 0.01 |
| discharge instr - radiology | 1 | 0.07 | 291 | 0.00 |
| ent | 1 | 0.07 | 1,473 | 0.00 |
| interim summary | 1 | 0.07 | 22,826 | 0.01 |
| manual entry lab | 1 | 0.07 | 31,434 | 0.01 |
| medication review | 1 | 0.07 | 20,691 | 0.01 |
| mock-up query | 1 | 0.07 | 2,147 | 0.00 |
| multi-disciplinary team discussion | 1 | 0.07 | 19 | 0.00 |
| nutrition services | 1 | 0.07 | 179,213 | 0.08 |
| or postop | 1 | 0.07 | 4,351 | 0.00 |
| ot | 1 | 0.07 | 16,264 | 0.01 |
| pre-procedure assessment | 1 | 0.07 | 3,159 | 0.00 |
| pre-procedure instructions | 1 | 0.07 | 1,953 | 0.00 |
| rehab evaluation | 1 | 0.07 | 86,611 | 0.04 |
| research note | 1 | 0.07 | 4,243 | 0.00 |
| telephone encounter follow up | 1 | 0.07 | 45,473 | 0.02 |