Article Text

Download PDFPDF

Remote data collection speech analysis and prediction of the identification of Alzheimer’s disease biomarkers in people at risk for Alzheimer’s disease dementia: the Speech on the Phone Assessment (SPeAk) prospective observational study protocol
  1. Sarah Gregory1,
  2. Nicklas Linz2,
  3. Alexandra König3,
  4. Kai Langel4,
  5. Hannah Pullen1,
  6. Saturnino Luz5,
  7. John Harrison6,7,
  8. Craig W Ritchie1
  1. 1Edinburgh Dementia Prevention, Centre for Clinical Brain Sciences, The University of Edinburgh Centre for Clinical Brain Sciences, Edinburgh, UK
  2. 2ki elements, ki elements, Saarbrucken, Saarland, Germany
  3. 3Stars Team, National Institute for Research in Computer Science and Automation, Nice, France
  4. 4Janssen Healthcare Innovation, Beerse, Belgium
  5. 5Usher Institute of Population Health Sciences and Informatics, The University of Edinburgh, Edinburgh, UK
  6. 6Metis Cognition Ltd, Kilmington Common, UK
  7. 7Department of Neurology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
  1. Correspondence to Sarah Gregory; Sarah.Gregory{at}


Introduction Identifying cost-effective, non-invasive biomarkers of Alzheimer’s disease (AD) is a clinical and research priority. Speech data are easy to collect, and studies suggest it can identify those with AD. We do not know if speech features can predict AD biomarkers in a preclinical population.

Methods and analysis The Speech on the Phone Assessment (SPeAk) study is a prospective observational study. SPeAk recruits participants aged 50 years and over who have previously completed studies with AD biomarker collection. Participants complete a baseline telephone assessment, including spontaneous speech and cognitive tests. A 3-month visit will repeat the cognitive tests with a conversational artificial intelligence bot. Participants complete acceptability questionnaires after each visit. Participants are randomised to receive their cognitive test results either after each visit or only after they have completed the study. We will combine SPeAK data with AD biomarker data collected in a previous study and analyse for correlations between extracted speech features and AD biomarkers. The outcome of this analysis will inform the development of an algorithm for prediction of AD risk based on speech features.

Ethics and dissemination This study has been approved by the Edinburgh Medical School Research Ethics Committee (REC reference 20-EMREC-007). All participants will provide informed consent before completing any study-related procedures, participants must have capacity to consent to participate in this study. Participants may find the tests, or receiving their scores, causes anxiety or stress. Previous exposure to similar tests may make this more familiar and reduce this anxiety. The study information will include signposting in case of distress. Study results will be disseminated to study participants, presented at conferences and published in a peer reviewed journal. No study participants will be identifiable in the study results.

  • dementia
  • delirium & cognitive disorders
  • old age psychiatry

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • A key strength of this study is the novel speech data collection in a preclinical cohort of participants at risk for dementia with Alzheimer’s disease (AD) biomarkers.

  • The Speech on the Phone Assessment study and associated materials have been designed with input from participants to ensure that the project is of interest and value to the target participant population.

  • This study will collect data on participant acceptability of the test methodology to collect critical information for future projects.

  • Unfortunately, in this project, there is no concurrent collection of AD biomarkers, so only historical data are available.

  • The participants recruited to this study do not represent the diversity of the wider population as the previous studies they were involved did not have inclusive recruitment strategies.


More than 40 million people worldwide are living with dementia.1 There is a growing body of work to understand risk factors and how we can prevent dementia.2 3 Identification of biomarkers of Alzheimer’s disease (AD) has been important in advancing the field.4 Current gold-standard biomarkers for AD include amyloid positron emission tomography (PET) scans,5 cerebrospinal fluid (CSF) amyloid and tau levels6 7 and blood-based amyloid and tau concentrations.8 However, these techniques can be expensive,9 perceived as invasive or provoking anxiety for some participants10 and may only be available in certain geographical locations.

Speech data are easy to collect and non-invasive, with recent studies identifying possible markers of AD.11 Semantic errors and difficulties in semantic processing may be important in preclinical AD.12 The failure to stop autocorrect errors when reading aloud correlates with Aβ1–42.13 Changes in connected speech beginning as early as prodromal stages of the illness have been associated with autopsy-confirmed AD, alongside linear declines in syntactic complexity and semantic content.14 Participants with preclinical AD, defined by amyloid beta positivity on a PET scan, were found to have more rapid declines on specific word content, a feature of connected speech.15 Acoustic features may also reveal information on emotional states and potential presence of neuropsychiatric symptoms such as apathy or depression which are comorbid conditions in AD.16–19

Adapting traditional face-to-face cognitive assessments to allow remote data collection may increase access to both research studies and clinical services. Recent studies have shown that automated screening processes, including automated speech analysis, give as accurate neurocognitive screening decisions as human evaluators.20–22 These tasks are also validated in a telephone-based setting,23 artificial intelligence-empowered pipelines can also capture clinically relevant features beyond that available in traditionally administered neurocognitive tests.24 25

The aim of the Speech on the Phone Assessment (SPeAk) is to collect speech data from participants across a spectrum of risk for AD and assess the usefulness of speech features collected over the phone as potential digital biomarkers for AD.

Methods and analysis

Study design

The SPeAk study is a prospective observational study.


The objectives of the SPeAk study are:

  • To develop algorithms to identify speech biomarkers in the data that are predictive of CSF AD biomarkers.

  • To identify ceiling performance on a set of frequently performed cognitive tasks when administered on the telephone.

  • To assess the test–retest reliability where the initial testing has a human element and the follow-up testing is fully automated

  • To explore participant acceptability of both with human and fully automated cognitive testing via telephone.

  • To conduct conversational analysis on the spontaneous speech generated in baseline testing session.

The SPeAK project began in January 2021 and is expected to complete data collection December 2021, with data analysis ongoing until June 2022.

Participants: samples size and power calculation

Participants will all have engaged with in person cognitive testing at the research site within the last 12–24 months. The rationale for this is twofold. First, some of these participants were enrolled in the European Prevention of Alzheimer’s Dementia Longitudinal Cohort Study (EPAD LCS)26 27 and have CSF biomarkers available for analysis in combination with the speech data collected in this protocol. Participants formerly enrolled in the EPAD LCS represent a range of risk states for AD dementia, from cognitively healthy to preclinical AD. Participants recruited from other studies are also either cognitively healthy or have preclinical AD. Importantly, none of the participants is currently living with dementia. Second, we wanted to recruit participants who were familiar with cognitive testing procedures to ensure that they were psychologically robust (as cognitive testing can elicit anxiety) and had a reference point of comparison for acceptance evaluations. All participants included are known to be native or fluent English-language speakers from their previous study involvement. If fluent but non-native speakers chose to enrol in the SPeAk study, a sensitivity analysis will be conducted to assess for any impact of non-native language patterns.

Participants will be included in this study if they have previously enrolled in the EPAD LCS or another closed commercial research study at the Edinburgh Dementia Prevention research site. Participants must have a landline or mobile phone and have the capacity to consent to the study. Participants will be excluded from the study if they do not have the capacity to provide informed consent or if they anticipate that they will not be available by landline or mobile phone at the 3-month follow-up time point. As participants are being recruited from previous studies, we have not specified further eligibility criteria. On the basis of the previous studies, all participants will be aged 50 years or above, cognitively healthy or with mild cognitive impairment, will not have dementia, will be fluent English speakers and will not have visual or hearing impairments severe enough to interfere with cognitive testing.

The primary objective is the evaluation of an algorithm, and power calculations remain challenging in this field, with the general accepted wisdom being more data are better. Using the rationale explained by de la Fuenta Garcia et al, 28 a minimum of 75 participants would be sufficient for this analysis, however, we will aim for at least 150 participants. Briefly, in this protocol, the authors justify the minimum sample size by placing the lower bounds between 1.2×f and 1.4×f, where f is the number of features of the data set. Using the Geneva Mimalistic Acoustic Parameter Set,29 which contains 62 features, the authors concluded that a minimum of 75 participants would be required for 90% accuracy. In this study, we anticipate 30–50 core features. For the machine learning analysis, we have adopted a pragmatic approach to sample size calculation, with an aim of 150 participants allowing us to develop two reasonably sized data sets, one for learning and one for testing. A simple classification algorithm such as the Euclidean distance requires 1.2*v, where v is the number of variables. If we anticipate 50 core speech features as well as a small number of demographic covariates (described later), we arrive at a required minimum sample size of 66.

Experimental procedure and design

This study will trial Speech Analysis for cognitive assessments over the telephone in a study population previously exposed to in-person cognitive testing. Participants will be asked to complete two assessments during the study period. In the first testing session, participants will interact with a staff member who will deliver the cognitive assessments on the phone, facilitated by an iPad application. The participants do not need to have an iPad themselves, the app will call either a landline or mobile phone number. These cognitive assessments will be audio recorded for speech analysis. There will also be free speech recorded during the telephone conversation, which will be used for conversational analysis, and to assess the predictive power and clinical relevance of acoustic features extracted from spontaneous conversational speech.30 In the follow-up testing session, completed 3 months after the first session (±14 days), participants will complete automated cognitive tests. For the automated tests, a computerised voice will read the instructions and initiate tasks for the participants. A computer voice was selected instead of a prerecorded human voice to ensure that it was clear to participants that they are engaging with a computer and can not freely interact as with a human tester.


Participants will complete consent forms electronically using Online Survey software ( An electronic consent whereby participants will tick consent clauses and type their name has been deemed to be proportionate to the study risks. At the start of the first telephone call for the study, participants will be asked to verbally reconfirm consent to participate in the trial. If the participant does not have the capacity to consent to their participation in the study, they will not continue in the study. Members of the research team are experienced in working with this population and assessing capacity. Only appropriately trained staff will be delegated to this task. Senior staff will be available to discuss any concerns about a participant’s capacity to consent.


Demographic details will be collected from all participants after they have provided informed consent to the study. Sex, age, education (years), living status and current medications will be recorded.

Preparing the participant

Prior to the spontaneous speech and cognitive tests, the participants will be prepared for the session by the interviewer. The interviewer will check the audio quality, both for the participant and for the interviewer. The participants will be asked to position themselves in a quiet room away from distractions and to ensure that they do not have anything available to write on (such as pen and paper or an electronic device). Participants will be informed that if the call is terminated for any reason, an attempt will be made to re-establish the call. Prior to the second call, which is led by the computer, the participants will be reminded of these preparations in an email. If participants would like a phone call to review these preparations this will be offered.

Spontaneous speech

Participants will be informed that the rest of the session will be audio recorded and the session will begin with approximately 5 min of spontaneous speech, prompted with the completion of two conversational tasks. In the first task, participants will be asked to respond to the question: ‘What animal makes the best pet?’. To encourage conversational exchange between the interviewer and the participant, the interviewer will ask follow-up questions, such as ‘Why do you think that?’ to elicit turn-taking. In task 2, participants will be invited to engage in a game of 20 questions. The interviewer will explain briefly that it is a game in which one participant thinks of an object, animal, plant or substance and the other participant asks yes-or-no questions to determine what it is. To offer a level of standardisation, the interviewer will always think of the same animal and the participant will ask questions to try and guess the answer. The participant may only ask questions that can be answered with an yes or a no. This data will only be collected at the baseline visit as a human interviewer is needed for this.

Cognitive testing

Participants will complete a series of cognitive tasks selected to test a range of domains and assessed for suitability to complete on the telephone. Domains were selected to represent cognitive abilities sensitive to decline in early AD.31 32 Encoding, immediate and delayed recall of verbal words will be tested using the immediate and delayed Rey Auditory Visual Learning Test list learning task.33 Between the initial and delayed recall tasks, participants will also be asked to complete the Wechsler Adult Intelligence Scale-IV forward digit span, phonemic (letter S) and semantic (category animals) verbal fluency tasks. All test instructions will be read verbatim by a human interviewer at baseline and a computerised voice at 3-month follow-up. An audio recording is made of participant responses which will be used for automatic transcription and scoring of the tasks.

Results feedback

Participants will be randomised 1:1 to receive results of their cognitive tests after each session or only after the final session. This is to analyse whether participants have a preference for receiving immediate feedback or are happy to receive feedback at the end of their involvement in the study. No individual-level speech feature data will be feedback to participants as this remains experimental. Participants will be provided with a copy of the overall study results. Individual cognitive test results will be sent to the participant using a proforma that explains the experimental nature of the tasks (ie, being completed on the phone rather than face to face), the expected score range for their age (based on published normative range for face to face assessment) and their scores. This proforma will be sent via email from the research team. If a participant has a score that falls below the published normative range, their data will be reviewed by the chief investigator (CWR, an experienced consultant psychiatrist, clinical triallist and professor of old age psychiatry) prior to result feedback. Any results that may be of concern will be highlighted to the participant in a phone call prior to receiving the email with their results and a letter to their General Practitioner (GP) written if the participant consents to this.

Acceptability questionnaires

Following each assessment point, participants will complete an acceptability questionnaire designed for this study. These questionnaires will evaluate how easy the assessments were to set up, how comfortable participants felt completing these assessments remotely, whether participants preferred face-to-face assessments (completed in earlier trials), human tester administered assessments or computer-administered assessments and understand participants’ experiences of receiving their test results.

Previously acquired biomarker data

Participants have previously taken part in studies which collected AD biomarkers (CSF, PET and structural MRI), information about AD risk factors (such as apolipoprotein E genetic status) and cognitive test batteries (repeatable battery for the assessment of neuropscyhological status, clinical dementia rating scale, mini mental state examination). These data will be used in the machine learning model to build the algorithm for the primary outcome.

Data management

All speech data will be recorded on the Delta Testing App on an iPad and securely transferred to a University of Edinburgh server. Participants are all assigned a unique ID code, which is used to label all of their study data. Due to the potential benefit of speech data for future research, participants will be asked to consent to long-term storage of their raw speech files. These files will be securely held on University of Edinburgh servers and will be available for analysis for the research team as part of this protocol, and in the future, for approved studies from researchers approved by the study team. Data from the EPAD LCS study are held securely on the Alzheimer’s Drug Discovery Foundation platform and relevant participant data will be extracted for use in planned analyses. Demographic and questionnaire data will initially be stored using before transfer to the study server. All procedures are in line with the University of Edinburgh data protection policy, which is informed by the General Data Protection Regulation.

Patient and public involvement statement

Two participants from the EPAD LCS participant panel34 were involved in the study design process and reviewed the study documentation electronically. In particular, the participants were asked their opinions on whether they thought it was acceptable to conduct cognitive assessments on the telephone and their thoughts on receiving results from the assessments. Both participants agreed that completing cognitive assessments on the telephone would be acceptable to them, and they felt that would apply to the rest of the group. Both welcomed the feedback of results as this was something the panel had advocated for with previous studies. The participants felt that it should be clear in the information sheet that feedback would be available, so that if anyone who did not want to receive this would know to decline the study invitation. All comments on the participant facing documentation were incorporated into the final document to improve readability and maximise ease of use and understanding of the study documents.


To achieve the primary objective of the study, three different types of variables will be computed and extracted from speech recordings: classical neuropsychological outcome variables, novel or qualitative outcome variables based on produced language and low-level speech descriptors. These measures will be computed on a task level, meaning for each cognitive task performed. Language and speech variables will be extracted on a linguistic and acoustic-level based. These include temporal, voice source, formant, semantic and syntactic variables. Some variables are task specific and encode performance or strategy related to a task, some are general descriptors of voice quality. For the primary analysis, we will develop machine learning models, with patients’ biomarker status as a target variable. Models will be constructed with task-specific features and using aggregated variables spanning multiple cognitive tasks. Patients with missing data will not be considered in the final analysis. An interim analysis will be performed once 100 participants have been recorded.

Secondary objectives are to evaluate ceiling effects and stability in cognitive performance between a human and a computer tester. For this analysis, classical neuropsychological outcome variables will be extracted from the recorded audio. To this end, human raters will use classical scoring schemes to score speech recordings of cognitive tasks. This will lead to a data set where each participant has multiple repeated measurements of the same task. Results from multiple time points will be compared using a repeated measures analysis of variance (ANOVA) and repeated measures correlation. Patients with missing data will not be considered in the final analysis. An interim analysis will be performed once 100 participants have been recorded. We will use frequency statistics and qualitative analysis for the acceptability questionnaires and compare baseline and 3-month responses used paired t tests. We will also analyse and predict levels of participant satisfaction, as recorded in the acceptability questionnaires, based on acoustic and conversational speech features.

Analysis of the conversational (dialogue) data will generate speech interaction features and vocalisation graphs, which will be used as input representations for machine learning modelling. These interactional features will be combined with low-level acoustic descriptors for predictive modelling of biomarker status and results of the cognitive tasks.35 36

Ethics and dissemination

This study has been reviewed and given favourable ethical opinion by the Edinburgh Medical School Research Ethics Committee (EMREC) (REC Reference 20-EMREC-007). The study raises few ethical concerns. Participants may find participation in cognitive testing causes stress or anxiety. All participants invited to this study will previously have completed at least one testing session in a face-to-face setting. We hope that this will reduce the risk of participants experiencing stress as the cognitive tasks will be somewhat routine and participants who did find the cognitive testing anxiety provoking in the preceding studies are likely to decline this study invitation. Participants with concerns about their memory will be encouraged to speak to their primary care practitioner or local Alzheimer’s charity hotlines.

All participants will provide informed consent prior to participating in the SPeAk study. We will use electronic consent for this study, via the Online Survey software ( This software provides an electronic consent, whereby participants will tick consent clauses and type their name. Electronic consent has been deemed to be proportionate to the SPeAk study risks. At the start of the first telephone call for the study, participants will be asked to verbally reconfirm consent to participate in the trial. If the participant does not have the capacity to consent to their participation in the study, they will not continue in the study. Members of the research team are experienced in working with this population and assessing capacity. Only appropriately trained staff will be delegated to this task. Senior staff will be available to discuss any concerns about a participant’s capacity to consent.

Receiving results from the cognitive tests may also lead participants to worry about their memory. Providing participants with their results is increasingly common, and the plan to do so in this study was well received by our participant panel advisors. Nearly half of participants enrolled in a longitudinal AD study were interested to receive their cognitive test results.37 In that study, participants highlighted receiving personal feedback would be a motivation for engagement in longitudinal research studies.37 We were not able to identify studies reporting on participants’ experiences of receiving their cognitive test results and will evaluate this within the acceptability questionnaires in our study. Studies that have investigated disclosure of AD risk factors, such as amyloid PET and APOE results, have identified both benefits and harms of disclosure.38–40 When done in a safe and appropriate way, there is little short-term psychological harm in risk factor disclosure.41 On balance disclosure, when done in a safe and sensible way, allows participants to plan and prepare causing little psychological distress. As previously detailed, any participants who express particular concern about their cognitive test scores will be encouraged to seek help from their primary care practitioner or AD charities.

The final ethical consideration is data security. Participants’ voices will be recorded, stored and analysed. The study has been reviewed in line with university information governance guidelines and complies with data security requirements. Publications arising from this study will not identify any participants. Participants will be asked to explicitly consent to the recording, storage and analysis of their voice data. Any participants who do not agree to this will not be enrolled in the study.


The SPeAk study adds a unique opportunity to identify speech biomarkers in a preclinical population at risk for dementia. The study will combine the collection of spontaneous speech through two conversational tasks and a brief battery of cognitive tasks previously validated in person. The primary aim of the study is to develop an algorithm to identify speech features, which are predictive of the target variable of AD biomarkers (collected in a prior study). Secondary objectives will evaluate test–retest reliability of the cognitive tasks when delivered on the phone and participant acceptability of the assessments, including experiences of receiving their cognitive test results. We will also undertake exploratory conversational analysis of the spontaneous speech data.

Early detection of AD is a priority for both clinical and research purposes. Identifying a low-cost, non-invasive biomarker will be important in this field, and speech is an exciting biomarker to explore.

Ethics statements

Patient consent for publication


We would like to thank the EPAD Scotland participant’s panel for their review of the study, and for all prospective participants for taking the time to consider this study.



  • Twitter @GregorySarah

  • Contributors SG coordinated the design of the protocol, managed the ethical approval process, drafted the manuscript and edited the final version. NL, AK, KL, HP, SL and JH all contributed to the design of the protocol and provided comments on the manuscript drafts. CWR is the principal investigator of the protocol and oversaw the design and ethical approval process as well as provided comments on the manuscript drafts.

  • Funding This work is supported by Janssen Pharmaceutica NV through a collaboration agreement (Award/Grant number is not applicable).

  • Competing interests KL is an employee of Janssen (the funder). NL and AK are employees of KI Elements who own and provide the Delta Testing App. JH reports receipt of personal fees in the past 2 years from Actinogen, AlzeCure, Aptinyx, Astra Zeneca, Athira Therapeutics, Axon Neuroscience, Axovant, Bial Biotech, Biogen Idec, BlackThornRx, Boehringer Ingelheim, Brands2life, Cerecin, Cognito, Cognition Therapeutics, Compass Pathways, Corlieve, Curasen, EIP Pharma, Eisai, G4X Discovery, GfHEU, Heptares, Ki Elements, Lundbeck, Lysosome Therapeutics, MyCognition, Neurocentria, Neurocog, Neurodyn Inc, Neurotrack, the NHS, Novartis, Novo Nordisk, Nutricia, Probiodrug, Prothena, Recognify, Regeneron, reMYND, Rodin Therapeutics, Samumed, Sanofi, Signant, Syndesi Therapeutics, Takeda, Vivoryon Therapeutics and Winterlight Labs. Additionally, he holds stock options in Neurotrack Inc. and is a joint holder of patents with My Cognition Ltd. CWR has received consultancy fees from Biogen, Eisai, MSD, Actinogen, Roche and Eli Lilly, as well as payment or honoraria from Roche and Eisai.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Provenance and peer review Not commissioned; externally peer reviewed.