Objective With an elderly population that is set to more than double by 2050 worldwide, there will be an increased demand for elderly care. This poses several impediments in the delivery of high-quality health and social care. Socially assistive robot (SAR) technology could assume new roles in health and social care to meet this higher demand. This review qualitatively examines the literature on the use of SAR in elderly care and aims to establish the roles this technology may play in the future.
Design Scoping review.
Data sources Search of CINAHL, Cochrane Library, Embase, MEDLINE, PsychINFO and Scopus databases was conducted, complemented with a free search using Google Scholar and reference harvesting. All publications went through a selection process, which involved sequentially reviewing the title, abstract and full text of the publication. No limitations regarding date of publication were imposed, and only English publications were taken into account. The main search was conducted in March 2016, and the latest search was conducted in September 2017.
Eligibility criteria The inclusion criteria consist of elderly participants, any elderly healthcare facility, humanoid and pet robots and all social interaction types with the robot. Exclusions were acceptability studies, technical reports of robots and publications surrounding physically or surgically assistive robots.
Results In total, 61 final publications were included in the review, describing 33 studies and including 1574 participants and 11 robots. 28 of the 33 papers report positive findings. Five roles of SAR were identified: affective therapy, cognitive training, social facilitator, companionship and physiological therapy.
Conclusions Although many positive outcomes were reported, a large proportion of the studies have methodological issues, which limit the utility of the results. Nonetheless, the reported value of SAR in elderly care does warrant further investigation. Future studies should endeavour to validate the roles demonstrated in this review.
Systematic review registration NIHR 58672.
- socially assistive robots
- geriatric medicine
- social medicine
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
This is the first scoping review of the literature that has evaluated and categorised the effects of socially assistive robot (SAR) interventions aimed to improve the health and social care of elderly people.
The novelty of the field means that the quantity and quality of studies available in the current literature is limited, making generalisations difficult.
The retrospective creation of SAR roles grouped together sets of studies that differed in quality, design and sometimes outcome, which may mislead the actual weight of data in the respective roles.
The global population is undergoing a demographic shift. Life expectancy is growing, and the postwar baby boom generation is entering retirement. The implications on resource allocation will impact the delivery of elderly care. As of 2015,1 21% of Western Europe’s population were over the age of 60 years, and this is expected to rise to 33% by 2030. By 2050, there are expected to be more people over the age of 60 years globally than under 15 years, reaching a total population of 2.1 billion compared with 901 million in 2015. This is compounded by a proportional decrease in the number of social and healthcare providers shouldering this increased burden. In 2015, seven workers were allocated for every elderly person globally, but this is projected to fall to 4.9 in 15 years.1 Moreover, the situation is magnified in Europe by an accelerated ageing population. Currently, there are 3.5 workers for every elderly person, but this is set to fall to 2.4 by 2030. The shift in societal proportions will place new pressures on all aspects of elderly care.
Loneliness, for instance, is a consequence of social, psychological and personal factors. Over half of people over the age of 75 live alone2 and 17% of older people see family, friends or neighbours less than once a week.3 A recent meta-analysis4 showed that the impact of loneliness and isolation carries the same mortality risk as smoking 15 cigarettes a day. This is compounded by the fact that social care is a labour intensive industry in a world with a proportionally shrinking workforce.
Throughout many industries, the ‘robot revolution’ promises to solve this growing personnel shortage. At present, physically or surgically assistive robots dominate the healthcare sector’s robot usage. This includes: (1) increasingly sophisticated wheelchairs transforming the limitations imposed on paraplegics; (2) robotic limbs redefining amputee capabilities; and (3) robotic surgeons revolutionising how and where surgery can be performed. Nonetheless, physically assistive robots do not combat the increasing mental health burden recognised in the elderly population. It is here that the concept of socially assistive robots (SARs) is gaining headway. These are robots adept at completing a complex series of physical tasks with the addition of a social interface capable of convincing a user that the robot is a social interaction partner.5
SARs have been categorised into two operational groups: (1) service robots and (2) companion robots. Service robots are tasked with aiding activities of daily living.6 Companion robots, by contrast, are more generally associated with improving the psychological status and overall well-being of its users. Such examples include Sony’s AIBO7 and Paro.8 Despite much of the hype, the utilisation of this technology in elderly care is not completely ascertained.
The aim of this scoping review is to establish the clinical usefulness of SARs in elderly care. Through examination and qualitative analyses of existing literature, studies will showcase the utility of SAR and their associated clinical outcomes. A better understanding of SAR and its ability to provide integral care, both socially and physiologically, will provide an indication of its future role in society.
The protocol for this review was conducted in accordance with the principles of the Cochrane Handbook for Systematic Reviews of Interventions.9
The following bibliographical databases were searched: CINAHL, Cochrane Library, Embase, MEDLINE, PsychINFO and Scopus using Medical Subject Headings (MeSH or where appropriate, the database specific thesaurus equivalent) or text word terms. The database search query was composed of two search concepts: the intervention (SAR) and the context (elderly care). Free-text terms for the intervention included: ‘service robot*’, ‘therapeutic robot*’ and ‘socially assistive robot*’; their associated MeSH terms were ‘Robotics’ and ‘Artificial Intelligence’. The names of specific robot systems were also searched for. The free words used for the context included: ‘elder*’, ‘senior*’, ‘older person*’, ‘old people’ and ‘dementia’; their associated MeSH term was ‘Aged, 80 and over’. The use of the asterisk (*) enables the word to be treated as a prefix. For example, ‘elder*’ will represent ‘elderly’ and ‘eldercare’ among others (see online supplementary material for an example of a bibliographical search). Additional studies were selected through a free search (Google Scholar) and from reference lists of selected publications and relevant reviews. The main search was conducted in March 2016, and the latest search was conducted in September 2017.
Supplementary file 1
Two reviewers (JA and AA-H) independently screened the publications in a three-step assessment process: the title, abstract and full text and selection were made in accordance with inclusion criteria. All publications collected during the database search, free search and reference list harvesting were scored on a three-point scale (0=not relevant, 1=possibly relevant and 2=very relevant), and those with a combined score of 2 between the reviews would make it through to the next round of scoring. All publications with a total score of 0 were excluded. A publication with a combined score of 1 indicated a disagreement between the reviewers and would be resolved through discussion. At the end of the full-text screening round, a final set of publications to be included into the review was acquired. Cohen’s kappa coefficient was calculated to ascertain the agreement between the reviewers in the title, abstract and full-text screening phases.
A study was considered eligible if it assessed the usefulness of SAR in the elderly population with a clinical outcome measure. A study that simply assessed the robot’s acceptability to elderly users without a clinical outcome measure, or was a technical report, or concerned the use of physically or surgically assistive robots was excluded. No limitations regarding date of publication were imposed, and only English publications were considered.
Since the field of socially assistive robotics is in its infancy, many of the studies are small and exploratory. Nonetheless, they provide an insight into what is currently being researched and the potential applications of SAR in elderly care. For this reason, no publication was excluded on the grounds of methodological quality.
The data extraction form was designed in line with the Participants, Intervention, Comparator and Outcomes approach. This process was conducted by one reviewer (JA) to ensure consistent extraction of all studies. All clinical outcome measures reported in selected studies were extracted. Data extraction included, in addition to outcomes, country in which study was conducted, number of included participants, mean age of participants, gender ratio of participants, specific robot used, cognitive status of participants, settings, study design, study duration and assessment tools.
Duplicate reports of the same study may present in different journals, papers or conference proceedings and may each focus on different outcome measures or include a follow-up data point. To minimise the impact of duplicates, the final set of publications were collated into ‘study groups’ containing duplicate reports. The data extraction process was conducted on the most comprehensive report of a given study.
Data synthesis and analysis
Studies were categorised into groups by the role of the robot in the study. The categories were generated retrospectively by the authors and were not predefined or directly referenced in the original studies themselves.
Some studies used comparable quantitative outcome measures in their assessment of clinical utility of SAR. As different assessment tools were used across studies, a standardised mean score (0–100) was generated to allow comparison across similar assessment tools. The result is a unit-free size.
The database search yielded 2356 publications and a further 40 were included from reference harvesting and the free search. Duplicate publications were removed (n=173), and following three screening phases, 61 publications were eligible and included in the review. Once duplicate reports were collated, a total of 33 original studies were identified and subject to detailed review. Descriptions of these studies can be found in table 1.
The inter-rater agreement between the reviewers were calculated to be 0.91 for the title screen, 0.64 for the abstract screen and 0.89 the final report, demonstrating very good, good and very good correlation between the reviewers, respectively, according to Cohen’s Kappa coefficient.10
Participants and settings
Across the studies, 1574 participants were included. However, due to inconsistent reporting, overall age and gender information are not available. All participants were considered elderly, and among the studies that reported age information (n=28; 1411 participants), only one participant was under the age of 60 years. The number of participants included in any given study varied from 3 to 415 subjects. In the 24 studies that reported gender information (comprising 1264 participants), 71% of the participants were women. The majority of studies exclusively assessed participants with a dementia diagnosis (n=18; 1036 participants), while a further six studies (151 participants) included some patients with dementia. A large proportion of studies were conducted in Japan (n=10; 178 participants), the USA (n=8; 182 participants) and Australia (n=4; 577 participants). The most common setting was the nursing home (n=17; 621 participants). In total, 11 robot systems were used across the studies. Assessed in 22 of the 31 studies, Paro was the most popular choice of SAR intervention. Robots are divided into those capable of learning responses, such as NAO using closed-loop architecture, and those which cannot, such as Paro, using open-loop architecture. In total, only two closed-loop robots were used (NAO and AIBO) in a total of six studies. Descriptions of individual robot systems reviewed can be found in table 2.
Identified roles of SARs
Eligible studies were organised into sets by the role assumed by SAR. Five roles were identified: affective therapy, cognitive training, social facilitator, companionship and physiological therapy. Specific details of the studies below, such as assessment tools or subject demography, are described in table 1.
Fifteen studies (889 participants) evaluated the effect SAR can have in improving the general mood and well-being of elderly participants, or its ability to overcome episodes of mood disturbance. In this review, this role is collectively termed affective therapy. Nine of these studies (650 participants) were conducted on participants diagnosed with dementia. In total, 11 reported positive findings including reductions in depression scores, agitation scores and increases in quality of life scores. While these studies were evaluating similar effects of SAR, their intervention design can broadly be divided into two types: one-on-one interactions with SAR or group interactions with SAR.
Eight studies (657 participants) assessed SAR in one-on-one settings, whereas the remaining seven studies (232 participants) had group settings. All of the group setting studies reported positive findings, including reduced agitation and depression levels and higher expression of positive emotions. Of the eight one-on-one interaction studies, only five report positive findings. Indeed, two of these studies12 13 report negative findings with increased agitation and worsening dementia, respectively.
These contrasting set of results could indicate a mechanism of how elderly users gain emotional benefit from SAR. A Japanese pilot study14 assessed group interactions of 26 subjects with Paro and found significant improvements in mood scores during the intervention period. Of note, the authors commented on improved sociability between subjects. As discussed later, several studies15–19 demonstrate that SAR can increase the sociability of subjects within groups, which may play a direct role in the mood changes seen here.
Notwithstanding this, however, a Dutch crossover study20 compared two types of one-on-one intervention: therapeutic interventions (Paro introduced at times when subject was distressed) and care support interventions (Paro introduced to facilitate activities of daily living). Only the therapeutic intervention showed a significant improvement in the mood score (P<0.01). This suggests that perhaps while group interventions may be better at generating positive emotions, one-on-one interventions may be appropriate to remedy negative emotions.
Some studies in this set also investigated how SAR compared with soft toys in improving general mood and well-being of participants. A large Australian randomised controlled trial (RCT)21 of 415 participants with dementia compared one-on-one interventions with Paro switched ‘on’ and Paro switched ‘off’ (placebo Paro) to identify if Paro’s additional social capabilities translated into any positive outcomes. The study found Paro was more effective than usual care in improving pleasure and agitation but was no different to placebo Paro. Similarly, a Japanese study8 compared the effect of group interactions with Paro and placebo Paro and again did not demonstrate any differences between the groups.
These results are mimicked by a Danish RCT13 of 100 subjects, which compared interactions with Paro, a living dog or soft toy cat. The study found intervention type did not affect cognitive state, independence or depression scores and did not affect sleep quality. However, depressive scores improved compared with baseline scores in all groups (P<0.05).
Indeed, only two small pilot studies found differences between SAR and soft toys. The first22 showed subjects engaged more with Paro (P<0.05) and showed more positive emotional expressions with Paro (P<0.01) when compared with a stuffed lion. The second23 was a study on participants with dementia; it showed that agitation scores were only significantly decreased in a toy cat (P<0.05), whereas NeCoRo (SAR—cat-like robot) only improved scores of pleasure and interest (P<0.01 and P<0.05, respectively).
Six studies (344 participants) assessed whether SAR can improve aspects of cognition, such as working memory or executive function, and as such this review has termed this set cognitive training. This set included four studies (239 participants) that assessed elderly subjects with dementia, and two studies (105 participants) that assessed elderly subjects who were cognitively intact. Several robot types have been used in this set including two closed loop robots capable of learnt responses. This means that while broad conclusions surrounding the role of SAR in cognitive training can be made, the evidence for any individual robot system is limited. Five of the six studies (133 participants) concluded with positive findings, although there is a breadth of outcome measures used as surrogate markers for cognitive improvement.
Two studies used cognitive tests, such as Mini-Mental State Examination (MMSE) as the primary outcome measure to assess the impact of SAR interactions. The first was a RCT24 of 34 cognitively healthy subjects in Japan using the Nodding Kabochan as the SAR intervention. Subjects either received the fully functional Nodding Kabochan or a non-functional Nodding Kabochan (control) for 8 weeks. All interactions were one on one with the participant and the SAR in the participants’ home. Only subjects receiving the functional Nodding Kabochan demonstrated an improved cognitive function score (P<0.01) after the study period. This result contrasts with the conclusion of the previous set, affective therapy, where it was difficult to distinguish the positive effects between functional SAR and placebo toys. The distinction here may be that the Nodding Kabochan robot is a communication robot that can talk and sing with the user, a function that a placebo toy is incapable of. The communication itself may be key to this study’s findings.
The other study that used cognitive tests as an outcome measure for cognition was a two-phase block RCT.25 This Spanish study involved 101 and 110 subjects with dementia, in the respective phases, and assessed the cognitive effects of group interactions with SAR. In phase 1, the study compared open-loop system robot, Paro, with closed-loop robot, NAO, and a control group treatment as usual. Compared with control group, phase 1 showed a decrease in cognitive function scores in the NAO group only (P<0.05) at follow-up. Notably, there were no significant differences between NAO and Paro groups at follow-up. This set of results contrasts with the previous study conducted on cognitively healthy subjects in one-on-one settings. Given different robots systems have been used in the studies, it is difficult to establish which factor is responsible for differing results.
Two studies used neuroimaging modalities as outcome measures of interactions with SAR. The first was a South Korean study26 that used MRI in a RCT of 71 cognitively healthy subjects. The primary outcome measure was change in cortical thickness in brains of participants over the 12-week study period. Subjects were randomised into three arms: (1) robot-assisted group training using Silbot and Mero (SAR), (2) traditional intervention training, using computer software or (3) non-intervention arm - control. The study showed attenuation of cortical thinning on MRI in both intervention groups (P<0.05) and estimated it would take 15.3 months for intervention groups to reach the same level of cortical thinning as controls. This study also used neuropsychiatric tests as a secondary outcome measure. Both intervention groups showed greater improvement in the executive function scores than control group (P<0.001). However, in the general cognitive and visual memory tasks, the traditional intervention group had greater improvement than in the robot group. Indeed, the robot group did not outperform the traditional group on any neuropsychological tests. Both Silbot and Mero are communication robots, like the Nodding Kabochan, which may underpin the improvements in executive function. Nonetheless, the SAR arm did not prove to be any more effective than traditional computer software in either outcome measures for cognitive function.
The other study to use a neuroimaging modality was a Japanese pilot study27 of 14 subjects with dementia. This study investigated the neuropsychological influence of Paro within an interactive group setting by analysing the electroencephalogram (EEG) recordings. They found an increase in cortical neuronal activity in seven participants, particularly in participants who liked Paro. It is unclear what the clinical meaning of this finding is, and without a control group, one cannot distinguish the effect of SAR from any other stimulating activity on EEG.
The two final studies used game performance as a surrogate marker for cognitive function in participants with dementia. These were very small studies without control groups. The first28 included three subjects and found that verbal encouragement from SAR (Bandit) improved response time in a game quiz, while the second study, with 11 participants, concluded the participants’ performance in group ball games and individual card games improved following interactions with SAR (AIBO). Again, the clinical utility of this is unclear, and without objective outcome measures or control groups, there is little that can be learnt from these studies.
Seven studies (230 participants) assessed the utility of SAR as facilitators for improved sociability between subjects or between subjects and other people. As such, this review has titled this role social facilitator. All of these studies concluded that the respective SAR intervention improved sociability of participants. Five of these studies (210 participants) were conducted with participants who had been diagnosed with dementia. Four of the studies used Paro as the SAR intervention, and two used AIBO, the robotic dog, which allowed for a greater degree of comparison between the studies. The final study used Sophie and Jack as the SAR intervention.
Most studies used observed behaviour changes on video recording or via a live assessor during the interaction period. One study16 used a validated communication scale to assess how group Paro interactions affected sociability. The study concluded that after the 4-week programme, a significant improvement in communication and interaction skills were exhibited by subjects (P<0.05) and an increase in activity participation (P<0.05).
Two studies compared SAR with comparative soft toys/animals. The first was a crossover study17 of 23 subjects in the USA. Subjects were grouped into sessions with Paro, placebo Paro or no object. The study concluded that the group with Paro engaged in more social interactions than the group with placebo Paro. This suggests that the sociability effects are associated with SAR itself. The authors note that the novelty around SAR may have contributed to the excitement manifested in increased social engagement. However, as this study was conducted over 4 months, any novelty effects would not likely have been sustained.
The other comparative study was another crossover study29 in the USA, which involved 18 female subjects with dementia. Subjects were divided into sessions with AIBO, a real dog or no object. The study concluded that although all visit types with AIBO, a dog or no object stimulated social interaction by the subject, there were no significant differences in the frequency of social behaviours exhibited by the subjects between visit types.
A similar US pilot study15 of seven subjects with dementia was instead conducted in a group setting. Subjects within a group were divided into primary users, those individuals who engaged with Paro at any one time, or non-primary users who were defined as everyone else in the group. The study showed an increase in social interaction over the 7-week period between primary and non-primary users towards each other and towards staff.
This study’s results are reflected in two larger, more recent studies that also investigate effects of group interactions with SAR on participants with dementia. The first is an Australian study18 of 139 participants conducted over 5 years with Sophie and Jack. The study reported that social engagement increased over the study period. The second was a Norwegian study19 with 23 participants that evaluated the effects of group interactions Paro on those with mild to moderate dementia compared with those with severe dementia. The study found that those with mild to moderate dementia paid more attention to Paro than those with severe dementia. The authors note that SAR interventions may need to be more tailored towards the degree of dementia severity. Another finding was that over the 12-week study period, there was a reported increase in interactions with other subjects and a decrease in interactions with Paro.
Three studies (78 participants) assessed the utility of SAR in overcoming the feeling of loneliness and social isolation in the elderly. These studies are collected into a set this review has titled the companionship role. All three of the studies examining SAR in this role showed reductions in loneliness scores. None of these studies were conducted on patients with diagnosed dementia. Two studies used AIBO as the intervention, while the third used Paro.
Only one study assessed this in a one-on-one setting. This was a RCT30 of 38 subjects in the USA. Subjects were randomised to have weekly one-on-one sessions with a real dog, AIBO or no object (control). Subjects in the dog or AIBO group were significantly less lonely than those in the control group at week 7 (P<0.05, respectively). In both intervention groups, there was a higher attachment score compared with the control group. No significant differences were found between the dog and AIBO groups in the assessment of loneliness or attachment. This is an important finding that suggests an artificial animal (SAR) can be as effective a companion as a pet.
The other two studies were conducted in a group setting. The first study was a pilot study7 of 11 subjects in Japan using AIBO. Mean loneliness scores after the session were significantly lower than those before the session (P<0.05), although longer term benefits were not established. The second was a larger RCT31 of 34 subjects in New Zealand investigated the effects of Paro on loneliness. Subjects were randomised into a Paro group or a control group that attended normal activities. Subjects in the Paro group had a significantly greater decrease in loneliness score at the 12-week follow-up than the control group (P<0.05). This indicated that sustained effects can be achieved.
The last two studies do show promising results; however, in the context of the previous set of studies, the decreased sense of loneliness may result from increased sociability in the group setting. Sociability was not measured in either study and therefore may act as a confounder.
Two studies (33 participants) investigated the effects of SAR on physiological markers, and as such, this review titles this set physiological therapy. This clinical applicability of this set is less clear but does raise some questions that future studies may be able to answer. Both of these studies used Paro as the SAR intervention.
The first was a pilot study32 of 21 subjects in New Zealand and investigated the effect of Paro on blood pressure and heart rate. Subjects had a single 10 min session with Paro where they were free to interact with the robot. Blood pressure and heart rate was recorded before (T1), immediately after (T2) and 5 min after (T3) the 10 min interaction. Overall, no significant changes in blood pressure or heart rate were demonstrated; however, the study decided to exclude four residents who did not interact or touch the robot. Subsequently, significant decreases in systolic blood pressure (P<0.05) from T1 to T2 were shown, and such decreases were sustained at T3 measurement. Similarly, significant decreases in diastolic blood pressure (P<0.05) from T1 to T2 were shown; however, this decrease was not sustained at T3. Between T1 and T3, heart rate significantly decreased (P<0.05).
In the other study33 of 12 subjects in Japan, physiological effects of interacting with Paro were investigated. Compared with baseline readings, a significant increase in the ratio of urinary 17-ketosteroid:17-hydroxycorticosteroid (P<0.01), by week 4 of Paro being introduced, was found. The authors suggest this confers an improved physiological reaction to stress. A confounder noted was an increase in social interactions with other residents (P<0.05) by week 4, compared with baseline. It is also not clear from this study if Paro played any role in the increased sociability of residents; however, in the context of other studies on the topic, it seems likely.
These two studies do not provide much indication of the clinical use of SAR; however, they do give a direction for what future studies could investigate further.
Several studies reported comparative quantitative data by using the same or similar assessment scales to others within their role category. The data from these studies have been reproduced from the studies and are compiled in tables 3–5. As different assessment tools were used across studies, a standardised mean score (0–100) was generated to allow comparison across similar assessment tools. Five comparable studies were identified in the affective therapy, each using a mood scale to assess either anxiety or depression or both, giving rise to seven comparable sets of data. Of these, five showed significant improvements in the mood scores either in the robot intervention group or in the follow-up score, depending on study design.
Four comparable studies were identified in the cognitive training set of studies, and of these, three studies showed significant improvements in the cognitive scores. Of note, the two phases of the Spanish paper25 have been listed as two separate sets of data as they are different studies with different interventions and different subject numbers; they both use the same control data, however, as seen on table 4.
Finally, three studies with comparable data were identified in the companionship set of studies, each of which used validated loneliness scales. All of these studies showed significant improvements in loneliness scores in the robot intervention group or in the follow-up score, depending on study design.
No comparative data were identified in the social facilitator or physiological therapy groups.
The aim of this review is to identify the roles SAR could play in elderly care. Despite the infancy of this field, the qualitative amalgamation of the studies demonstrated five roles for SAR.
Evaluation of SAR technology
This review identifies five roles for SAR in elderly care: affective therapy, cognitive training, social facilitation, companionship and physiological therapy. These roles provide a comprehensive classification of how this technology has been used in social and physical care to date.
The first set of studies demonstrated that SAR can be used to improve the overall sense of well-being of users and alleviate acute states of mood disturbance. Interestingly, interactions conducted in a group setting proved to be more consistently effective than one-on-one interactions. However, a study20 showed that one-on-one interventions were useful in alleviating states of distress. This result may apply to patients with delirium, and future studies are required to explore this possibility. The overall picture suggests that while SAR is capable of improving mood of subjects, it does not seem to be much better than a comparative soft toy or placebo robot. This is demonstrated in patient groups with and without dementia.
This was not true for the second set, cognitive training, where communication robots were significantly more effective at improving cognitive outcome measures than soft toys. The clearest evidence for SAR in improving cognitive function was found in those who are cognitively healthy. While positive findings have been found in participants with dementia, obscure outcome measures make it difficult to interpret the meaning of the findings. The South Korean study26 showed that computer programmes are at least as effective as SAR interventions and may raise doubts about the cost-effectiveness of using SAR to only improve elderly users cognitive function.
All the studies in the social facilitator set demonstrated improved sociability. This is demonstrated in subjects with and without dementia and across three robot systems (AIBO, Paro and Sophie and Jack). When compared in group settings, SAR was shown to be more effective than a comparator, such as a soft toy. In one US study,29 subjects were divided into one-on-one sessions with AIBO, a real dog or no object at all, and while all sessions increased frequency of exhibited social behaviour, the study concluded no significant differences between session type. Conversely, in a different US study,17 participants had group interactions with Paro, placebo Paro or no object. The study concluded that the group with Paro engaged in more social interactions than the group with placebo Paro. This suggests that the sociability effects are associated with a group setting, and perhaps in the absence of a group of users, these effects may not exist.
The companionship set all showed positive findings. However, two studies were conducted in group settings, and the observed improved loneliness scores may be confounded by the increased sociability seen in aforementioned studies. This set has far fewer studies than the other sets generated in this review; however, the findings are insightful. If animal-like SAR can be as much a companion as a pet, then such technology may have particular utility in care homes, where health and safety concerns regarding pets, such as allergies and infection risks, restrict their use.
The final set, physiological therapy, did show positive findings; however, these findings are clinically uninterpretable. Nonetheless, these studies create new questions about the use of SAR for future studies to address. For example, one study32 demonstrated short-term reductions in blood pressure and heart rate following Paro interactions. The potential implications of these results are twofold: this short-term reduction in cardiovascular markers could reflect results seen in the affective therapy set, which show calming effects of Paro. Additionally, it may be the case that these reductions can be sustained for the long term and that SAR may have a role as a non-pharmacological intervention for hypertension. Future studies may benefit from incorporating blood pressure and heart rate outcome measures, alongside other metrics in longer term studies.
While the utility of SAR in affective therapy or cognitive training can be replaced by cheaper, existing alternatives (eg, soft toys or computer software), the main value of SAR may lie in its multidomain functionality. This review has identified five such domains where a single intervention may be of simultaneous value.
Quality of selected studies
Of all 33 included studies, 11 were RCTs, 12 included more than 30 subjects and 16 had a comparative intervention. These metrics are not in their own right indicative of the quality of the studies; however, together they do provide a general picture. The quality of studies is not evenly distributed across the set. Of the RCTs, six are in the affective therapy set, while there are none in the social facilitator set. Similarly, nine studies in the affective therapy set have a comparative intervention compared with two in the social facilitator set.
This review did not exclude studies based on methodology. The rationale is that low-quality studies can offer an insight into the potential utility of SAR and guide study design improvements for future studies. For example, a companionship role is a popular concept for SAR among commentators in the literature, but very few studies demonstrating this have been conducted. Evidence supporting a companionship role is socially desirable because of its applicability to serve the elderly population. As reported by one of the selected studies,30 AIBO, the robotic dog, was as effective a companion as a real dog. This has real implications for its use, specifically where a real animal companion may be inappropriate.
Although no studies were excluded on the basis of quality, there are several underlying methodological limitations facing the selected studies that need to be addressed. Low-quality data complicate the task of establishing clinical applications of SAR. It also risks undermining the field’s efforts or sensationalising exploratory research. Another limitation is the narrow set of robots assessed, primarily Paro. This restricts the applicability of results to wider SAR systems with different functionality.
There is also a concern for cultural bias as around a third of the studies were conducted in Japan alone. Although more recent studies have been conducted in other cultural environments, most notably the USA and Australia, it is not clear if the results are universally applicable. Additionally, there is evidence of gender bias. Around two-thirds of the participants were women. This is a concern since men and women as populations have been shown to regard robot technology differently,70 and therefore some of the reported findings may be exaggerated or diminished by the participant composition.
Another common study design issue relates to the supervision of interactions that are present in 20 of the included studies. Although supervision ensures safety for the user, it risks altering how the participant interacts with the robot and may change how the participant reports the robot’s utility, known as the Hawthorn Effect. While this is difficult to control for when the study is not randomised and no comparator is used, direct supervision may lead to subjects reporting greater positive effects than is necessarily the case. An example where this may be the case is a US study29 where subjects were divided into supervised sessions with AIBO, a real dog, or no object at all. One would anticipate that sessions with an object (AIBO or a soft toy) would stimulate a greater behavioural response than no object at all. However, the study concluded there were no significant differences between the responses to the sessions, irrespective of whether an object was present or not. This suggests that the positive findings were completely independent of the intervention and may instead be a consequence of supervision.
Another main limitation of the selected studies is the nature of chosen outcome measures. They are often abstract, with a limited number of studies identifying a direct clinical need or problem. Although around half of the studies included a comparator intervention, it often involved uninspiring activities or no activity at all. This is an unfair comparison and may inflate the value attributed to the results. As momentum grows behind SAR, these study design flaws will need to be addressed if the technology is going to play a clinical role in the future.
The primary limitation of this review is the validity of the categorisation of studies into the defined roles. The roles were created retrospectively, as part of a discovery process on extracting data from the final set of studies. While they have utility in evaluating the state of the field and providing defined expectations for the technology, they have generalised sets of studies that are very different in quality, design and sometimes outcome. There is also the issue that some studies demonstrated several roles for SAR. The studies were categorised on the basis of the the primary outcome measures, irrespective of whether a secondary outcome measure would fit into another set. A consequence of this is that the weight of data in the respective roles may be misleading. All outcomes have been reported in table 1 for purposes of data transparency.
Furthermore, this review has an inadvertent risk of excluding relevant papers in the screening phase. Although high concordance between the reviewers was reported, the large volume of studies that had to be reviewed invites the possibility that relevant publications were excluded. The main reason for the high exclusion rate was because the broad search criteria identified irrelevant robot interventions, such as surgical robots or telecommunication devices. It is unlikely, however, that an additional study would have changed the conclusions of this review.
Finally, the comparison of assessment values between studies illustrated in tables 3–5, aimed to provide some comparison between studies where different outcome measures were used. The comparison does have limitations, because although each assessment tool was scaled from 0 to 100, a score of 50 in one measure does not necessarily correlate to 50 in a different scale. This has made it difficult to reach broad conclusions about the sets of studies.
Future of the field
In order to achieve successful application of SAR in elderly care, future studies should be more conscious of the outcome measure chosen and its translation into care. Some studies used surrogate measures such as frequency of laughter,22 or performance in particular games.60 While these may be desired outcomes, it is not clearly demonstrated how they meet quantifiable needs of the elderly population. It is likely that any application of SAR will incorporate several of the previously defined roles. Therefore, larger studies should assess the intervention’s impact in the context of these clear roles with validated outcome measures. For example, one study24 involved a robot staying at home with the elderly participants for 8 weeks and assessed its impact using questionnaires, cognitive tests, blood and saliva samples. While the study demonstrated an improvement in cognitive scores and a reduction in saliva cortisol, it did not assess whether living with a robot for 8 weeks had any impact on loneliness. Larger RCTs using valid comparators are needed to definitively show where SAR is and is not useful in elderly care.
SARs have shown potential in elderly care which, in light of recent demographic shifts, promises to reform the delivery of care for the elderly. Although many of the studies described have methodological issues, the size and quality of studies are improving. This review has qualitatively assessed the existing research and comprehensively outlined the state of the field as it stands. In establishing the five roles to which SAR can be ascribed, this review intends not to restrict ambition but to provide a basis for clinical applicability and design of future studies. This review urges that new studies should be clearer about the precise role any robot intervention intends to serve and use validated measures to assess their effectiveness. Future studies need to demonstrate how SAR can solve real problems in order to shift from novelty to functionality in elderly care.
Contributors Design of the review: JA and MPV. Data collection: JA and AA-H. Data analysis and interpretation: JA, TN and MPV. Drafting the article: JA, AA-H and TN. Critical revision of the article: JA, TN and MPV. Final approval of the version to be published: MPV.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement There is no additional unpublished data from this review.
Collaborators Peter Forrest; Sundhiya Mandalia.