Validating epilepsy diagnoses in routinely collected data

Purpose: Anonymised, routinely-collected healthcare data is increasingly being used for epilepsy research. We validated algorithms using general practitioner (GP) primary healthcare records to identify people with epilepsy from anonymised healthcare data within the Secure Anonymised Information Linkage (SAIL) databank in Wales, UK


Pl e a s e n o t e:
C h a n g e s m a d e a s a r e s ul t of p u blis hi n g p r o c e s s e s s u c h a s c o py-e di ti n g, fo r m a t ti n g a n d p a g e n u m b e r s m a y n o t b e r efl e c t e d in t his ve r sio n.Fo r t h e d efi nitiv e ve r sio n of t hi s p u blic a tio n, pl e a s e r ef e r t o t h e p u blis h e d s o u r c e.You a r e a d vis e d t o c o n s ul t t h e p u blis h e r's v e r sio n if yo u wi s h t o cit e t hi s p a p er.
Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a .cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s.Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

Introduction
Vast amounts of electronic, routinely-collected, medical and related administrative data are generated in modern healthcare systems.These data can be anonymised, linked and used for healthcare research [1,2].Large numbers of individuals can be studied without having to specifically recruit individuals for projects, which can be expensive, time-consuming and introduce selection bias.Records can also be linked from a wide variety of different sources, enabling a wide breadth of data to be analysed.Routinely-collected data are increasingly being used for high quality epilepsy studies [3][4][5].
Every individual in the United Kingdom (UK) is entitled to register with a primary care General Practitioner (GP) and there is evidence that almost everyone in the UK does register with a GP [6].GPs have a central role in providing primary care for people with epilepsy through assessment, diagnosis, appropriate referral to secondary and tertiary services, managing and prescribing medications (including the vast majority of anti-epileptic drugs) and creating and maintaining a centralised health care record.GPs are the patient's primary contact point for access to specialist services.GP health records contain details of encounters with GPs and other healthcare providers using Read codes.
Read codes are the current clinical terminology coding system used in UK primary care systems to record symptoms, diagnosis and prescriptions [7].Read codes are hierarchical (with increasing level of detail with increasing digits) e.g: F25 is used to record epilepsy, F25A. is used for juvenile myoclonic epilepsy and F2540 for temporal lobe epilepsy.Contents lists available at ScienceDirect Seizure journal homepage: www.elsevier.com/locate/yseizGP records have been used as the basis for epilepsy studies within data repositories such as the clinical practice research datalink (CPRD) and the Secure Anonymised Information Linkage (SAIL) databank [4,8,9].One of the limitations of using routinelycollected data for epilepsy studies is the possibility of including incorrectly recorded epilepsy diagnoses.In particular, it's not known how accurately epilepsy diagnoses made by hospital specialists are recorded in GP records.Guidelines advise that algorithms used for case ascertainment in routinely-collected data studies are validated in each population studied [10].The accuracy of UK GP diagnosis codes has been validated for many diseases but, to our knowledge, has only been partially validated for epilepsy diagnosis [8,9,11].In this study we specifically aimed to validate the accuracy of algorithms using GP records to identify people with epilepsy from anonymised, linked, routinely collected Welsh healthcare data.

Method
In Wales, anonymised GP primary care electronic health records are collated and linked with other data within the Secure Anonymised Information Linkage (SAIL) databank [1,12].We searched the SAIL databank on 13th April 2016, at this time GP records were available up to 31st December 2015 and there were records for 73% of GP practices across Wales (approx.2.4 million people).GP records can be tracked over time, so that individual patient's records can be analysed longitudinally through multiple GP practices.We used combinations of epilepsy diagnosis and antiepileptic drug (AED) prescription codes to create three epilepsy case ascertainment algorithms.
We anonymously uploaded and linked a list of 150 individuals with epilepsy and 150 individuals without epilepsy (reference population) to existing SAIL records, using an established and validated split-file approach [1,12].We then compared the performance of the three different epilepsy case ascertainment algorithms within SAIL in identifying the reference population.

The reference population
The Swansea Epilepsy Database currently holds detailed clinical information (including diagnosis, medications, imaging and EEG results) for 960 patients seen by a clinician with a specialist interest in epilepsy (neurologist or paediatric neurologist) treated at Morriston Hospital, Swansea.283 (29%) of these patients have generalised epilepsy, 510 (53%) have focal epilepsy, 125 (13%) have unclassifiable epilepsy and 42 (4%) have an uncertain diagnosis.
Between January and March 2015, we examined the database and used a random number generator to select a sample of 100 adults (50 men and 50 women, who were over 16 at their last consultation date) and 50 children (25 boys and 25 girls, who were 16 and under at their last consultation date) with a clinically definite diagnosis of epilepsy from the database.The clinical record and investigation results for each of these 150 individuals were reviewed to confirm a clinically definite diagnosis as per the International League Against Epilepsy's (ILAE) practical clinical definition of epilepsy.These 150 individuals formed the reference population of people with epilepsy.
To ascertain a control cohort, 300 patients were reviewed from consecutive general neurology clinics run by neurologists and paediatric neurologists.Their diagnosis was checked using clinic letters stored in an electronic format on the hospital system.Patients with a diagnosis of epilepsy were excluded.Using a random number generator, we randomly selected a sample of 100 adults (50 men and 50 women, who were over 16 at their last consultation date) and 50 children (25 boys and 25 girls, who were 16 and under at their last consultation date) from these 300 patients.These 150 individuals formed the reference population of people without epilepsy.
We have previously estimated the sensitivity of an epilepsy case ascertainment algorithm at 90% using GP diagnosis and AED prescription [9].Based on this, a sample size of 150 provides a 95% confidence interval of 10% for sensitivities (proportions) of 90%.

Table 1
Proportion of epilepsy cases (n = 145) and cases without epilepsy (n = 143) identified within SAIL using three different algorithms: A À Individuals with a primary care epilepsy diagnosis code and at least two consecutive codes for prescription of an anti-epileptic drugs (AED); B À Individuals with an epilepsy diagnosis code only; C À Individuals with at least two consecutive codes for prescription of an AED.See method section for definitions of positive predictive value, sensitivity, false positive rate, specificity and Youden '

Algorithm construction and assessment
We used three different algorithms to identify people with epilepsy within SAIL: A) individuals with an epilepsy diagnosis Read code and two prescriptions of the same AED within six months; B) individuals with an epilepsy diagnosis Read code only and C) individuals with two prescriptions of the same AED within six months only.
We used version 2 Read codes.For diagnosis we used F25 and all subcodes beginning with F25 as well as Read codes 1O30, 667B., and SC200.For AEDs we used dn and do (and all sub codes).For a full list of Read codes used see Supporting information in Pickrell et al 2015 [9].

Analysis and statistical tests
True positive (TP) cases had a hospital diagnosis of epilepsy and were identified within SAIL as having epilepsy; true negative (TN) cases did not have epilepsy as confirmed by hospital records and were not identified as having epilepsy within SAIL; false positive (FP) cases did not have epilepsy as confirmed by hospital records and were identified as having epilepsy within SAIL; and false negative (FN) cases had a hospital diagnosis of epilepsy and were not identified as having epilepsy within SAIL.Positive predictive value (PPV) was defined as TP/(TP + FP); sensitivity TP/(TP + FN); specificity TN/(TN + FP) and false positive rate (FPR) as FP/(FP + TN).We calculated Youden's index (J) using sensitivity + specificity-1, as a measure of the accuracy of the algorithms.J ranges from À1 to 1 (J = 1 for a perfect test) [13].Confidence limits were calculated using the exact binomial method.We used R version 3.0.1 to perform the statistical analysis.

Ethical approval
This study was approved by SAIL's independent Information Governance Review Panel (project 387).The National Research Ethics Service has confirmed that SAIL projects using anonymised data do not require specific NHS research ethics committee approval.

Results
145 of the 150 reference cases with epilepsy (97%) and 143 of the 150 reference cases without epilepsy (95%) were found to be registered with a SAIL GP.The sensitivity, specificity, positive predictive value, false positive rate and accuracy of each of the three algorithms in identifying the reference cases are shown in Table 1.

Discussion
Our results show that anonymised GP records can be used to accurately identify patients with epilepsy diagnosed by a hospital specialist in Wales.The best sensitivities achieved for all patients, adults and children were 92%, 94% and 88% respectively.The corresponding figures for specificity were 99%, 98% and 100%.These figures compare well with sensitivities and specificities from other similar epilepsy case definition validation studies in different healthcare systems e.g.Australian, Italian and American studies achieved sensitivities of 82-90% and specificities of 94-100% [14][15][16].We have previously used similar algorithms within the SAIL databank to estimate the prevalence of epilepsy in Wales to be 0.77% (95% CI 0.76-0.79%)[9].
As with previous studies, algorithm A (diagnosis and AED code) is the most specific (98-100%) given that it has the 'narrowest' criteria and algorithm C (AED only) is the most sensitive (88-94%) with its 'broader' criteria.The large difference in specificity between adults and children for algorithm C (61% c.f. 98%) can be explained by the widespread use of AEDs for indications other than epilepsy in adults (e.g.migraine, mood disorders and neuropathic pain).AEDs are seldom prescribed for indications other than epilepsy in children in the UK [8].Our results suggest that using criteria of AED prescription alone can be used to identify children with epilepsy.
There was surprisingly little difference in performance between algorithm A and B, Algorithm A (additional AED code) was more specific than algorithm B and algorithm B was more sensitive but their overall accuracy was comparable.GP diagnosis codes for epilepsy therefore seem reliable in their own right.Although this is expected, given that epilepsy diagnosis should be made in secondary care in the UK and later transcribed into the primary care record by GPs [17], to our knowledge this has not been described before and is an important result for future research involving GP epilepsy diagnosis codes.
Several factors may have improved UK GP epilepsy diagnosis coding practice in recent years.The Quality Outcomes Framework (QOF) for GPs was introduced in 2004 and provides financial incentives to GPs who achieve certain indicators.The current QOF indicator for epilepsy includes a record of patients aged 18 or over on drug treatment for epilepsy who have been seizure-free for the last 12 months [18].Previous QOF indicators have included an indicator for maintaining a register of adults with epilepsy on antiepileptic drug treatment.Current UK guidelines for the diagnosis and management of epilepsy advise structured management systems and regular reviews within primary care which are likely to encourage accurate epilepsy diagnosis coding [17,19].
We have used relatively small numbers in the reference population in this study due to the resources needed to manually check medical records and test results.We could not review the anonymised data within SAIL to ascertain the reasons for the false positive and false negative cases.We also did not stratify the nonepilepsy cases into epilepsy mimics such as dissociative seizures and acute symptomatic seizures which may have a higher miscoding rate in GP records.At the time of analysis, 73% of the Welsh population's GP records were available within SAIL.This lack of 100% coverage likely explains why a small number of the reference cases with and without epilepsy were not 'found' in SAIL.
These results are specific to primary care records in Wales and are not applicable to other healthcare systems or methods of ascertaining epilepsy cases (for example hospital discharge summaries).Other parts of the UK do have similar healthcare systems and although the results may be generalizable to the remainder of the UK further work needs to be done to prove this.Currently there is no facility to include EEG and imaging data within SAIL and so we could not include these in our ascertainment algorithms.Additionally it is impossible to identify people with epilepsy who do not attend their GP or have not been seen by a hospital specialist.
Our reference epilepsy cohort was obtained from a secondary care epilepsy database which may have provided a bias towards people with more severe epilepsy.We selected a group of people without epilepsy from patients who had attended general neurology clinics as a control group.This group therefore does not represent the 'general' population without epilepsy.However, this group of patients may be considered as a 'better test' of ascertainment algorithms as patients with other neurological conditions may be more likely to be incorrectly coded as having epilepsy than the general population.Conversely it is also possible (although unlikely in our opinion) that neurologists would not record a diagnosis of epilepsy in a general neurology clinic appointment with a different focus (e.g.headache).
The strengths of our study are that we have used a carefully validated reference population with an epilepsy diagnosis from an epilepsy specialist and without an epilepsy diagnosis to validate algorithms within an established anonymised databank containing extensive primary care records for at least 2.4 million people.

Conclusion
Using primary care epilepsy diagnosis codes is an accurate method to identify patients with epilepsy within the SAIL databank.Using AED prescription codes in addition to epilepsy diagnosis codes increases the specificity and positive predictive value by 2% at a cost of a 2% reduction in sensitivity.In children using AED prescription codes alone is an accurate way to identify epilepsy cases.These results are generalizable to other studies that use UK primary care records for epilepsy case ascertainment and can serve as a baseline measure of accuracy of case ascertainment in such studies.

Disclosure of conflicts of interest
Author MK has received honoraria from UCB and Johnson and Johnson.The remaining authors have no conflicts of interest.We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.
s Index.*Weincluded 145 (97 adults, 48 children) people with a hospital diagnosis of epilepsy and 143 (98 adults and 45 children) people without a hospital diagnosis of epilepsy.