Table 1

Details on the categories of extracted information and criteria for manual review which were used as algorithm development guidelines

CategoryDetails
Clinic dateThe date the patient visited the clinic.
Date of birthThe patient’s date of birth.
Epilepsy diagnosisItems of information which confirmed a diagnosis of epilepsy, for example, ‘this lady has a diagnosis of focal epilepsy’ or ‘… has recurrent unprovoked generalised tonic-clonic seizures’. We specified that the epilepsy diagnosis must be attributable to the patient (eg, not a family member); and did not include items of information that described epilepsy clinic attendance, or a discussion about epilepsy in general, as confirmation of an epilepsy diagnosis. Only epilepsy diagnosis with a certainty level 4 or 5 was considered to be a true positive.
Epilepsy typeWhether the patient had focal or generalised epilepsy or an epilepsy syndrome where epilepsy type could be inferred. For example, generalised epilepsy if the letter confirmed juvenile myoclonic epilepsy. We based this information on the UMLS CUI extracted with the epilepsy diagnosis information. We only used explicit mentions of epilepsy types or syndromes within the clinic letters, and did not use other information, such as seizure type or investigation results, to infer the epilepsy type. Only an epilepsy type with a certainty level 4 or 5 was considered to be a true positive.
Seizure typeSpecific seizure types, for example, ‘focal motor seizures’ or ‘absence seizures’. We categorised the seizure type into focal seizures or generalised seizures at the validation stage. Only a seizure type with a certainty level 4 or 5 was considered to be a true positive.
Seizure frequencyThe number of seizures in a specific time period, for example, ‘two seizures per day’, ‘seven seizures in a year’ or ‘seizure-free since last seen in clinic.’
MedicationAn identifiable drug name with a quantity and frequency, for example, ‘Lamotrigine 250 mg bd’.
InvestigationThe type of investigation and classification of results (normal or abnormal). We used UMLS CUI codes to assign a normal/abnormal value to investigation results, using the simplified abnormal outcomes gazetteers. We categorised the investigation results into CT, MRI and EEG results at the validation stage.
Levels of certaintyNot a category in itself, but an annotation qualifier addressing the uncertainty of diagnosis expressed in clinic letters. We defined five levels of certainty: (1) (no diagnosis), for example, ‘epilepsy has been ruled out’; (2) (unlikely diagnosis), for example, ‘I doubt that these episodes are epileptic in nature’; (3) (uncertain diagnosis), for example, ‘it is possible that these are focal motor seizures’; (4) (likely diagnosis), for example, ‘the impression is that this is JME’ and (5) (definite diagnosis), for example, ‘this patient is having complex partial seizures’. We applied these certainty levels to epilepsy diagnosis, epilepsy type and seizure type.