HIPPO Benchmark
Gold Standard Full Text
This table contains the full text of the clinical notes, plus some patient demographics.
| field | type |
|---|---|
| note_id | STRING |
| note_text | STRING |
| note_title | STRING |
| note_length | INTEGER |
| sex | STRING |
| age | INTEGER |
| race | STRING |
| ethnicity | STRING |
| note_source_value | STRING |
| stanford_patient_uid | STRING |
| load_table_id | STRING |
Columns Description
note_id
Unique identifier for the clinical note
note_text
Text of the clinical note
note_title
Short description with the type of note
note_length
Length of the clinical note text in characters
sex
Biological Sex of the person to whom the note is about.
age
This is the years difference since 2025 and the birth date.
race
This is the self reported race of the person to whom the note is about.
ethnicity
Self reported ethnicity of the person to whom the note is about.
note_source_value
This is the unique identifier of the note in EPIC Clarity.
stanford_patient_uid
Unique patient identifier within the STARR DataLake that consist on the concatenation of patient MRN and DOB
load_table_id
Internal identifier for the data loading process that brought this record into the system.
Gold Standard Spans
This table contains the annotated spans.
| field | type |
|---|---|
| uuid | STRING |
| note_id | STRING |
| span_start | INTEGER |
| span_end | INTEGER |
| span_tag | STRING |
Columns Description
uuid
Unique identifier of the sample
note_id
Unique identifier for the clinical note to whom the spans corresponds.
span_start
Integer index of the start of the annotation inside the note_text.
span_end
Integer index of the end of the annotation inside the note_text.
span_tag
Label of the annotation.