Article Text

Diagnostic accuracy study of three alcohol breathalysers marketed for sale to the public
  1. Helen F Ashdown,
  2. Susannah Fleming,
  3. Elizabeth A Spencer,
  4. Matthew J Thompson,
  5. Richard J Stevens
  1. Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
  1. Correspondence to Dr Helen F Ashdown; helen.ashdown{at}phc.ox.ac.uk

Abstract

Objectives To assess the diagnostic accuracy of three personal breathalyser devices available for sale to the public marketed to test safety to drive after drinking alcohol.

Design Prospective comparative diagnostic accuracy study comparing two single-use breathalysers and one digital multiuse breathalyser (index tests) to a police breathalyser (reference test).

Setting Establishments licensed to serve alcohol in a UK city.

Participants Of 222 participants recruited, 208 were included in the main analysis. Participants were eligible if they were 18 years old or over, had consumed alcohol and were not intending to drive within the following 6 h.

Outcome measures Sensitivity and specificity of the breathalysers for the detection of being at or over the UK legal driving limit (35 µg/100 mL breath alcohol concentration).

Results 18% of participants (38/208) were at or over the UK driving limit according to the police breathalyser. The digital multiuse breathalyser had a sensitivity of 89.5% (95% CI 75.9% to 95.8%) and a specificity of 64.1% (95% CI 56.6% to 71.0%). The single-use breathalysers had a sensitivity of 94.7% (95% CI 75.4% to 99.1%) and 26.3% (95% CI 11.8% to 48.8%), and a specificity of 50.6% (95% CI 40.4% to 60.7%) and 97.5% (95% CI 91.4% to 99.3%), respectively. Self-reported alcohol consumption threshold of 5 UK units or fewer had a higher sensitivity than all personal breathalysers.

Conclusions One alcohol breathalyser had sensitivity of 26%, corresponding to false reassurance for approximately one person in four who is over the limit by the reference standard, at least on the evening of drinking alcohol. The other devices tested had 90% sensitivity or higher. All estimates were subject to uncertainty. There is no clearly defined minimum sensitivity for this safety-critical application. We conclude that current regulatory frameworks do not ensure high sensitivity for these devices marketed to consumers for a decision with potentially catastrophic consequences.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Personal breathalysers are available for sale to the public in pharmacies for assessing safety to drive following consuming alcohol, some very inexpensively, and in some jurisdictions is now promoted by law.

  • Accuracy of personal breathalyser devices has never previously been studied.

  • This study tested diagnostic accuracy of these devices in a real-life setting including participant estimation of readings.

  • Limitations include the uncontrolled environment of public houses and bars, use of a pragmatic reference standard, and wide CIs for some results due to low prevalence of those over the driving limit.

  • However, our conclusions for the worst performing device are robust even against this uncertainty, since even the upper confidence limit for sensitivity of the worst performing device (48.8%) would still mean that about one in two individuals would be falsely reassured and this could lead to potentially dangerous driving decisions.

Introduction

Road traffic collisions (RTCs) are the second leading cause of death worldwide among people aged 5–29 years, estimated at 1.2 million deaths each year, and forecast to rise further by 2020.1 Consumption of alcohol is an important factor influencing the likelihood and severity of RTCs, and has been found to be a causal factor in 17–40% of RTCs worldwide.2 The risk of an RTC increases rapidly with increasing blood alcohol concentration (BAC), with relative risk rising significantly beyond a BAC of 0.04 g alcohol per 100 mL blood (g/dL) to reach a relative risk of approximately 5 at 0.10 g/dL.2 Introduction of maximum legal BAC limits for driving, which vary from 0.02 (eg, Russia) to 0.15 g/dL (eg, Uganda), has been effective in reducing alcohol-related injuries and deaths.1

Alcohol breathalysers, which have long been available to law enforcement agencies, are now marketed direct to consumers, for example in some UK pharmacies and motoring stores, to test safety to drive following drinking alcohol, including the morning after. In July 2012, it became a legal requirement for drivers in France to carry a personal breathalyser at all times.3 Devices marketed to consumers are order of magnitudes lower in cost than those intended for law enforcement: for example at the time of writing the Alcosense Single is more than 300 times cheaper than the Dräger 6510 Home Office certified police breathalyser.4 ,5

The theoretical accuracy of breath alcohol measurement as a surrogate for blood alcohol measurement has been well-studied,6 and in the UK breath alcohol forms part of the prescribed legal limit, along with blood and urine alcohol concentration.7 However to the best of our knowledge the accuracy of devices currently marketed to the consumer/motorist has not been studied. Many such devices carry regulatory approvals such as the Conformité Européenne (CE), French NF certification and British Standard (BSI Kitemark) but in general such marks are statements of engineering quality rather than diagnostic accuracy. We therefore aimed to assess the diagnostic accuracy of personal breathalysers compared with a police breathalyser for detection of being at or over the UK legal driving limit in a real world situation representing a possible application for such devices.

Methods

Index tests and reference standard

We selected for study as index tests the Alcosense Single, the UK counterpart of an NF-approved device widely sold for motorists in France; the Dräger Alco-Check, as a comparable single-use device from a competing manufacturer; and the Alcosense Elite, as an example of a digital multiuse device readily available from pharmacies, high street and online stores (Boots, Halfords, Amazon and others). We selected as reference standard the Dräger Alcotest 6510 device. This has Home Office approval and is standard issue to UK police for use as an initial test at the roadside,5 and is approved in the USA as an evidential breath-testing device (National Highway Traffic Safety Administration Standard 49 FR 48854) meaning readings can be used as evidence for prosecution of drink-driving offences in US courts. Manufacturer information states measurement precision as ±0.008 mg/L or 1.7% of measurement value.8 The Dräger Alcotest 6510 and Alcosense Elite devices come with instruction manuals including details of breathalyser operating technique and result interpretation. The single-use devices contain information on their packaging and an enclosed sheet of paper including details of breathalyser operating technique and result interpretation. These manuals and leaflets recommended at least 15 min (Dräger Alcocheck), at least 20 min (Dräger 6510 and Alcosense Single), or at least 30 min (Alcosense Elite) elapse between alcohol consumption and use. All personal breathalyser instructions clearly state that any amount of alcohol even below the limit can impair driving ability.

Study participants

To test the breathalysers in a real world situation, representing one possible application for personal breathalysers, we recruited participants from establishments licensed to serve alcohol in the city centre of Oxford, UK, including college bars and public houses. In the absence of prior data with which to estimate sample size, we decided to recruit 200 participants, which would allow us to report the prevalence of intoxication with small SE (maximum 3.5%). Recruitment took place on evenings during the period from October 2012 to January 2013: 11 study evenings were required to reach a sample size greater than 200. Participants were eligible if they were 18 years or over, had consumed alcohol, and were not intending to drive within the following 6 h. We excluded potential drivers for ethical reasons.

Recruitment and consent

Individuals in these establishments were informally approached by a member of the research team and given preliminary information about the study. Potentially eligible interested participants were then asked to sit with the research team, at tables reserved within the same premises, where they received a full description of the study, were asked to read the study literature, and given an opportunity to ask further questions. Eligibility was checked and written consent taken. Participants were given a card with contact details to use in the event of withdrawing consent subsequently (eg, the next day): in the event, no participants used this option to withdraw consent.

Study procedures

The research team consisted of between two and four researchers, who followed the reference breathalyser manufacturers’ written instructions in directing their use: a minimum of 20 min was enforced between recruitment and using the breathalyser devices, and participants were asked not to drink further alcohol, smoke, use mouthwash or drink fruit juice during this period. Participants were provided with water and asked to take at least one sip in order to clear any residual alcohol from the upper airway. During this 20 min period basic demographic details (age and sex) and reported alcohol consumption during the preceding 12 h were recorded by the researchers. Participants were not required to remain under observation during the 20 min period.

The 20 min waiting period did not comply with the manufacturers’ instructions for one of the devices (Alcosense Elite). However, it was felt that a longer gap between recruitment and testing would result in high attrition rates, and increased potential for protocol violation by participants (eg, by smoking or drinking liquids other than water). We therefore compromised on a waiting time of 20 min based on the instructions for the reference device, and the fact that this complied with manufacturer instructions for the other two index devices.

Participants then used three breathalysers, at intervals of at least 1 min, at the study tables under the supervision of a research team member. Each participant used, in random order, the reference standard, the Alcosense Elite multiuse and a randomly selected single-use breathalyser. Randomisation was carried out in advance by study number in random permuted blocks.

A member of the research team recorded digital outputs of the multiuse and reference breathalysers. The single-use breathalysers require subjective assessment of colour of crystals after use: we recorded assessments by researcher (primary outcome) and participant (secondary outcome). Participants were blinded to the researcher assessment and to the reference result.

Statistical analysis

We calculated sensitivity, specificity, positive and negative predictive values of the breathalysers for the outcome of being at or over the current UK legal driving limit7 of 35 µg/100 mL or 0.8‰ BAC according to the reference standard. Self-reported alcohol consumption, converted to UK alcohol units (1 unit=8 g alcohol) using nutritional tables,9 was also assessed for diagnostic accuracy of being at or over the UK legal driving limit, and a receiver operating characteristic (ROC) analysis undertaken to assess different unit thresholds. Statistical calculations were carried out using standard methods10 and the ROC analysis using Stata (Release V.11). Minor protocol violations were discussed among the research team, with sensitivity analyses performed to determine whether there was any difference in overall results due to inclusion or exclusion of these. Participants with missing data (eg, sex, self-reported alcohol consumption or participant estimation of result) were included except for analyses involving the missing component itself.

Results

A total of 208 participants were included in the main analysis (see figure 1), of whom 148/207 (71.5%) were male and with a median age of 20. Participants reported having consumed a median of 6 UK units of alcohol (range 1–25), equivalent to a median of 46 g (range 8–204) alcohol. One hundred and eight participants were tested with the Dräger Alco-Check single-use breathalyser, and 100 with the Alcosense Single single-use breathalyser.

Figure 1

Study flow chart. All participants (except exclusions detailed in figure) were tested with the Dräger Alcotest 6510 (R), the Alcosense Elite (IM), and one of either the Dräger Alco-Check (ISD) or the Alcosense Single (ISA). The order of undertaking each breathalyser, and the selection of ISD or ISA was determined by randomisation. *Three participants left before analysis therefore results for participant estimation of ISA are only available for 97 participants.

Thirty-eight participants (18.3%, 95% CI 11.7% to 27.4%) were at or over the current UK driving limit of 35 µg/100 mL according to the reference breathalyser. Table 1 compares performance of the three index breathalysers at detecting those at or over the UK limit. Compared with the reference breathalyser, the Alcosense Elite multiuse breathalyser had a sensitivity of 89.5% (95% CI 75.9% to 95.8%), the Dräger Alco-Check single-use breathalyser had a sensitivity of 94.7% (95% CI 75.4% to 99.1%) and the Alcosense Single breathalyser had a sensitivity of 26.3% (95% CI 11.8% to 48.8%). When analyses were repeated using the participant's interpretation of colour change in the single-use breathalysers instead of researcher's interpretation, sensitivity was 94.7% (95% CI 75.4% to 99.1%) for the Dräger Alco-Check (ie, identical to researcher estimation) and 16.7% (95% CI 11.8% to 48.8%) for the Alcosense Single (see online supplementary appendix 1 for full data for participant estimation).

Table 1

Diagnostic accuracy of index breathalysers compared with reference police breathalyser, using researcher interpretation of single-use breathalysers

We conducted three sensitivity analyses in turn (1) excluding two results where the participants were suspected to have incorrectly used the device, (2) a colour-blind participant who may have had difficulty interpreting the colour change of crystals and (3) a participant who was suspected to have violated the protocol (consumed alcohol between use of the three devices). In general, these analyses showed minimal difference to results overall (see online supplementary appendix 1) with the exception that the sensitivity of the Dräger Alco-Check increased to 100% (95% CI 83.2% to 100%) in the latter sensitivity analysis. There were no reported adverse events from using any of the breathalyser devices.

Because there is no single standard threshold for driving decisions based on self-estimated alcohol consumption, we calculated sensitivity and specificity for all alcohol unit thresholds (table 2) and plotted them in ROC space (figure 2).

Table 2

Diagnostic accuracy of self-reported alcohol consumption compared with reference police breathalyser

Figure 2

Receiver operating characteristic curve of self-reported alcohol consumption in UK units, with comparative sensitivity and specificity for breathalysers tested.

Discussion

We have shown that breathalysers available for sale to the public for personal use vary considerably in their performance in detecting being at or over a legal driving limit during the period directly after drinking. Two of the devices tested, the Alcosense Elite digital multiuse breathalyser and Dräger Alco-Check single-use breathalyser had a sensitivity of approximately 90% and 95%, respectively, in the main analyses. However, even a sensitivity of 95% means that approximately 1 in 20 people over the driving limit would be falsely reassured by these tests. We question whether even this would be sufficient sensitivity to assess safety to drive. The third device, the Alcosense Single single-use breathalyser, had a sensitivity of only 26%, meaning that only approximately one in four individuals over the legal limit would be identified by this device. Participants (rather than researchers) interpreting results, which is what would occur in real life, reduced sensitivity further to only 17%. This device has a correspondingly high specificity, but specificity is not the safety-critical aspect of performance of a device assessing safety to drive. Surprisingly, we found that self-reported consumption of alcohol was a more sensitive test for being at or over the legal driving limit than the three breathalysers tested for up to 5 UK units of alcohol consumed. None of the devices outperformed simple recall of amount of alcohol consumed up to 5 UK units of alcohol.

Strengths of our study include testing participants in a ‘real world’ environment including assessment of participants’ estimation of breathalyser readings, which enables generalisability to everyday life. However, this pragmatic approach also brings limitations in that the setting in college bars and public houses could not be rigorously controlled. It is possible, for example, that poor lighting, or unknown protocol violation by participants who could not always be perfectly monitored in this busy environment, could have introduced some inaccuracies. Operating the three breathalysers in close succession and randomisation of order of use of the breathalysers should have helped reduce the risk of such effects causing bias in the overall results. There is also the possibility that ambient alcohol vapour in the environment may have resulted in excess false positives. As discussed above, we may have underestimated the sensitivity of the Dräger Alco-Check because of a suspected protocol violation. Three participants inadvertently took the breathalysers in a different order to that planned: we decided to include these participants because the aim was to have an overall variety in breathalyser order and this was unlikely to have any impact for such a small number of participants. The use of a 20 min rather than 30 min minimum time after drinking alcohol was not adherent to the manufacturer's instructions for the Alcosense Elite, and so it is possible that this may have affected results for this breathalyser. Hyperventilation immediately prior to a breath sample reduces breath alcohol concentration11 and breath holding increases it.12 ,13 We recorded a minimum amount of information about participants to facilitate recruitment and monitoring breathing would not have been possible in this deliberately real-life environment. However, we followed the instructions in the Alcosense Elite manual to “wait until you are breathing normally again” by ensuring that participants were relaxed and ensuring at least a minute between breathalysers for any recovery required. Participants were typically seated while completing the consent process and initial data collection before breathalysing took place. Randomising the order of breathalysing would further reduce any bias in the overall results due to temporal effects such as hyperventilation before the first test. The high solubility of alcohol means that it is thought to be deposited in exhaled air largely from the proximal airway conducting system, and breath alcohol concentration is not therefore reliant on alveolar equilibration.14 Our methodology of allowing 1 min between each test was designed to allow satisfactory measurement of breath alcohol concentration.

Participants were blinded to their result on the reference breathalyser until after completing estimation of colour change, however researchers were not and it is possible that this could have introduced some bias. We met our planned recruitment target, but the prevalence of those over the legal limit by the reference standard means that our CIs are fairly wide. Even using the upper confidence limit for sensitivity, however, the worst-performing breathalyser would still have sensitivity under 50%. Our sample of participants obtained from colleges and pubs may not be representative of the population who purchase personal breathalysers, particularly in age group and quantity of alcohol consumed and sociodemographics; although this might affect sensitivity of self-reported alcohol consumption, it should not impact significantly on the overall accuracy of the breathalyser devices. We tested the index devices on the night of drinking and in a bar environment and the generalisability to other uses such as ‘morning-after’ use may be unclear; blood and breath alcohol concentrations decline with time after drinking alcohol at similar rates and maintain high correlation,15 but it is possible that diagnostic accuracy of personal breathalysers may differ at lower alcohol concentrations.

Use of blood alcohol as a gold standard was not possible in this pragmatic, field study. We chose as our reference standard the Dräger Alcotest 6510 which has passed a Home Office testing protocol requiring error less than 10% in all readings16 ,17 and is also an evidential device for criminal prosecutions in the USA. The devices currently approved for evidential use in the UK are not portable and were therefore not suitable for our study setting. For convenience we selected index devices easily available in the UK for testing, however the same devices calibrated for different legal limits are sold outside of the UK, and other devices sold elsewhere use similar technology and are similarly priced. Therefore, while we cannot directly apply our results to other countries, we would anticipate similar findings elsewhere.

We have not found other studies testing the accuracy of personal breathalysers. Studies in the USA of college student parties have found a mean BAC of 0.077% (SD 0.063%) (for context, the UK driving limit is 0.08%).18 A Canadian study found a linear relationship between self-reported alcohol consumption and BAC in emergency room attendees, up to seven drinks.19 However, a US study of college students, which compared estimated BAC to measured BAC, found that students tended to over-estimate their levels of consumption when surveyed in the midst of a night of drinking.20 We did not attempt to convert self-reported alcohol consumption to BAC and only recorded total quantity consumed, which would have had differential effects between individuals dependent on weight, sex and the time over which the drinks were consumed. However, despite the variation these factors would introduce into BAC, self-reported consumption up to 5 units was still a more sensitive test than the breathalysers tested, and BAC is known to correlate poorly with symptoms of intoxication.21

Our research suggests that at least some personal breathalysers available for sale to the public are not always sufficiently sensitive to test safety to drive after drinking alcohol, where use of inaccurate information from breathalysers thought to be accurate could have catastrophic safety implications for drivers. The fact that these devices are sold in well-established pharmacies including national chains does not guarantee sufficient accuracy for safe use. Medical and measurement devices may carry regulatory approvals such as CE or NF marking, but this does not appear to correlate with accuracy, and this raises wider questions over how this marking may be perceived by users. A derivative device of the worst-performing breathalyser in our study is widely sold for use in France as part of the new law requiring breathalysers to be carried when driving, and has French NF approval.4 Although results from our study cannot be directly applied to the lower French driving limit of 0.05 g/dL and a derivative device, they question the utility of the new law which on the one hand may improve public awareness of drink-driving in general, but risks ill-informed driving decisions based on inaccurate results from a personal breathalyser. Replication of our results in other settings and with other breathalysers could further inform policymakers planning to introduce similar laws in other jurisdictions, and explore the characteristics of the population who purchase personal breathalysers and how they use the results obtained. Finally, our research raises worrying questions about the level of scrutiny that medical tests intended for sale to the public undergo in Europe, and raises wider concerns about how diagnostic accuracy in particular is evaluated, and whether any further field evaluations are required for intended users, perceptions of accuracy of such devices and how use of such devices interacts with medical testing in other healthcare settings.

Acknowledgments

The authors would like to thank the study participants, and those colleges of the University of Oxford and public houses in Oxford which permitted them to recruit on their premises; Michael Ashdown for assistance with legal literature; and David McCartney for additional advice on the final manuscript. They would also like to thank the reviewers of this article for their hard work and insightful comments.

References

Supplementary materials

Footnotes

  • Contributors All authors contributed to study design, data collection, data interpretation and helped write the paper. RJS conceived the idea for the study. HFA, EAS and RJS analysed the data. HFA is the guarantor. All authors approved the final version of the manuscript.

  • Disclaimer The views expressed in this article are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

  • Funding This study received no external funding.

  • Competing interests None.

  • Ethics approval The study was approved by the University of Oxford Medical Sciences Division Interdivisional Research Ethics Committee (reference MSD/IDREC/2011/13).

  • Patient consent All participants gave informed consent before taking part.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. Full data are available from the corresponding author on request. Consent for data sharing was not obtained but the presented data are anonymised and risk of identification is low.

Linked Articles