Objectives Assessment of the significance of online queries regarding smell impairment to evaluate the epidemiological status and effectiveness of COVID-19 epidemic control measures using levofloxacin as an example.
Setting There are 81 regions of Russia and several large cities, such as Moscow, St. Petersburg and Nizhny Novgorod.
Methods Weekly online queries from Yandex Russian users regarding smell impairment and levofloxacin were analysed in regions and large cities of Russia from 16 March 2020 to 21 February 2021.
Results A strong positive direct correlation (r>0.7) was found between the number of smell-related queries in Yandex new cases of COVID-19 in 59 out of 85 Russian regions and large cities (70%). During the ‘first’ peak of COVID-19 incidence in Russia (April–May 2020), the increase in the number of smell-related queries outpaced the increase in new cases by 1–2 weeks in 23 out of 59 regions of Russia. During the ‘second’ peak of COVID-19 incidence in Russia (October–December 2020), the increase in the number of smell-related queries outpaced the increase in the number of new cases by 1–2 weeks in 36 regions of Russia, including Moscow. It was found that the query/new case ratio increased by more than 100% in 24 regions. The regions where the increase in queries was more than 160% compared with new infection cases during the ‘second’ peak of incidence demonstrated significantly higher search activity related to levofloxacin than the regions where the increase in queries was lower than 160% compared with the increase in new infection cases.
Conclusion The sudden interest in certain symptoms of COVID-19, such as smell impairment and the growing frequency of online queries among the population, can be used as an indicator of the spread of coronavirus infection among the population and for evaluation of the effectiveness of the COVID-19 epidemic control measures.
- health informatics
Data availability statement
All data relevant to the study are included in the article or uploaded as supplementary information. Extra data are available by emailing KM.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
One of the major strengths of the study is that it uses absolute number of internet queries from the Yandex search engine, unlike Google Trends, allowing to compare the number of web queries with real cases of coronavirus infection or drug consumption.
The study included a user queries in Russia that included the words ‘smell’ and ‘levofloxacin’, which reflect COVID-19 symptoms and control measures, respectively.
One limitation of this study is that study design does not allow us to conclude at the individual patient level.
Design of our study can cover all intended search queries with high sensitivity, although this reduces the specificity of the search.
In recent years, big data analytics have become increasingly integrated into studies conducted in public health informatics, and web data analysis has become a valuable tool for monitoring and population behavioural analysis in terms of health-related content. The term for using data from web-based sources to improve public health is known as ‘infodemiology’, a combination of the words ‘information’ and ‘epidemiology’.1 Infodemiology and infoveillance studies using different web-based sources, such as Google, Twitter, and other social media platforms, show the importance of real-time access to data when evaluating health status.2–6
On the other hand, during the global COVID-19 pandemic, the scale of public interest in this disease has been unprecedented, suggesting that the trends demonstrated by web search traffic should remain steady and reliable.7 Changes in smell and taste are prominent symptoms of COVID-198–12 as has consistently been demonstrated in many countries (eg, Iran, Spain, France, Italy, Germany, and the UK).8 13–16 More critically, these chemosensory changes generally occur earlier than other symptoms and may constitute more specific symptoms than fever or dry cough.8 15 17 Accordingly, web-based monitoring of changes in smell and taste can provide early and specific information on the spread of COVID-19 in the general population and support health system monitoring to evaluate the effectiveness of epidemic control measures taken by countries against COVID-19.
In a previous study, we demonstrated a strong correlation between the relative search volume (RSV) in Russia when using Google Trends to assess ‘smell’ queries and actual infection cases (r=0.81).18 The interest in the above queries increased along with the number of new cases from 16 March 2020 to 11 May 2020 and 27 August 2020 to 1 October 2020 (r=0.93 and 0.87, respectively). From 2 April 2020 to 12 April 2020, the increase in queries outpaced the increase in actual COVID-19 cases by 2–5 days. The increase in the smell-related queries ‘outpaced’ the increase in the new infection cases implies that smell-impaired patients tend to study the problem through a web search and only then opt for SARS-CoV-2 testing. We found that starting on 27 August 2020, the smell-related queries outnumbered the detected cases of infection as of 1 October 2020.
Our work aimed to determine the significance of web queries related to smell impairment to evaluate the epidemiological status and effectiveness of COVID-19 epidemic control measures using the example of levofloxacin. In this study, we analysed smell-related queries (hereafter ‘smell queries’) from Yandex Russian users in regions and large cities of Russia from 16 March 2020 to 21 February 2021 (49 weeks) and compared them with new cases of infection. In contrast to Google Trends, the Yandex.Wordstat service provides absolute rather than relative data, thus making it possible to compare queries with actual cases in absolute quantities, for example, coronavirus infection or drug consumption. We assume that certain internet queries such as smell impairment can be markers of the spread of SARS-CoV-2 infection and can be used as a supplementary tool for evaluating the effectiveness of COVID-19 epidemic control measures.
Materials and methods
The study period lasted from 16 March 2020 to 21 February 2021 and was broken down into weeks.
The data on new (confirmed) cases reported weekly in regions, and large cities of Russia were obtained using https://стопкоронавирус.рф (https://stopcoronavirus.rf) resources and Yandex and Johns Hopkins University services. The data (when required) were normalised on a scale from 0 to 100 by the maximum number of new (confirmed) cases per week during the study period.
Databases of search queries
Yandex.Wordstat is the service providing information about queries made by Yandex users. For example, it helps determine the monthly number of people looking for a certain phrase and find queries similar in meaning to the entered phrase. The Yandex.Wordstat query statistics show how frequently users entered their search-term-containing queries into the search box (the number of impressions). Our study estimated the number of users’ queries in Russia that included the words ‘smell’ and ‘levofloxacin’. According to Yandex, the most popular keyword combinations in which the words ‘smell’ and ‘levofloxacin’ occur are shown in online supplemental table 1.
The data on the web-based smell query were received in absolute weekly values from 16 March 2020–22 March 2020 to 15 February 2021–21 February 2021.
The strength of the relationship between the daily increase in the number of cases of COVID-19 and the number of queries associated with changes in the smell was then tested using Spearman’s rank correlation since it is known that data were not normally distributed.19 For all correlations, statistical significance was determined.
Estimated time gaps between queries related to smell and new cases of COVID-19
For this purpose, weekly queries and new cases of infection were normalised by the maximum number of queries and a maximum number of new cases of infection per week over the entire observation period, respectively. Then, we estimated the weekly increase in queries and new cases of infection by finding the difference between the number of queries (new cases of infection) per week and the number of queries (new cases of infection) during the previous week. The increase or decrease was considered significant if it was more than 2% and if the subsequent dataset had an upward or a downward trend. The analysis included only the regions that demonstrated a significant correlation (r>0.5) between smell-related queries and confirmed cases of infection.
Patient and public involvement
This research was done without patient and public involvement.
Dynamics of smell-related queries in the studied regions
Figure 1 and online supplemental movie 1 show changes in smell-related web queries from the 12th week of 2020 (16 March 2020 to 22 March 2020) to the 7th week of 2021 (15 February 2021 to 21 February 2021) (the total observation period consists of 49 weeks) in regions and large cities of Russia (85 regions). The maximum number of smell-related queries was recorded in most of the regions during the pandemic’s second wave in October to November, for example, in the Ivanovo, Kaluga, Moscow and Ryazan regions from 2 November 2020 to 8 November 2020 and in Moscow from 23 November 2020 to 29 November 2020. The highest activity was observed during the first wave in several regions, including Dagestan, Ingushetia, and North Ossetia from 27 April 2020 to 3 May 2020, Chechnya from 11 May 2020 to 17 May 2020 and Tyva from 22June 2020 to 28 June 2020.
As expected, the number of queries strongly correlates (r>0.99) with the population in the respective areas. When the smell-related queries (online supplemental table 2) were normalised, the largest number of queries per population was found in Moscow (0.11 queries/person), St. Petersburg (0.10 queries/person), Nizhny Novgorod (0.09 queries/person), Moscow, Novosibirsk, and Sverdlovsk regions (0.08 queries/person). At the same time, Moscow, the Magadan Region, and the Altai Republic had the highest coronavirus infection incidence per person (>0.08). The smallest number of queries/person (0.01) was found in the Chechen Republic, the Republic of Ingushetia, and the Republic of Dagestan.
The obtained data can imply specific behavioural patterns typical of populations in different regions of Russia during the coronavirus infection pandemic in terms of information search regarding one of the COVID-19 symptoms: smell impairment.
Correlations between the number of queries and new COVID-19 cases
We analysed correlations between the number of queries and new COVID-19 cases in regions and large cities of Russia (online supplemental table 3, figure 2). The presented data show that Moscow was characterised by a very high correlation between queries and new cases of infection (r=0.96). The average (0.5<r<0.7) correlation between smell-related queries and new COVID-19 cases was detected for 25 regions (p<0.05). Thirty-three regions of Russia, including St. Petersburg, demonstrated a strong correlation (0.7<r<0.9) between the number of smell-related queries and new COVID-19 cases in the studied regions (p<0.05). For 26 regions, the correlation coefficient was less than 0.5 or p>0.05. Thus, in total, 70% of the regions and large cities of Russia (59 out of 85 regions) demonstrated a significantly strong correlation (r>0.7) between smell-related queries and confirmed COVID-19 cases; these data suggest a strong relationship between the information search for one of the COVID-19 symptoms, smell impairment, and the confirmed cases of coronavirus infection.
The increase/decrease in the number of smell-related online queries precedes the increase/decrease in the number of new COVID-19 cases
We attempted to estimate the time gap between the online search of smell-related information by population and new cases of coronavirus infection in different regions of Russia. For example, see figure 3; the number of smell-related queries and the number of new cases of infection before and after the data were normalised for Moscow and the Moscow and Vladimir Regions as a visual illustration of the obtained data.
The analysis results are shown in table 1 and online supplemental table 4. During the first peak of coronavirus infection incidence in Russia, smell-related queries outpaced the increase in the number of new cases of infection by 1–2 weeks in 23 out of 59 regions of Russia. Such a relationship was not observed for the other regions.
During the second peak of coronavirus infection incidence in Russia, the increase in the number of smell-related queries outpaced the increase in the number of new cases of infection by 1–2 weeks in 36 regions of Russia, including Moscow. In 14 regions, queries outnumbered new cases of infection, and they were ahead by more than 2 weeks. Two regions of Russia did not demonstrate such patterns during the first and second waves.
Relationship between queries and new infection cases as a supplementary tool for the evaluation of the effectiveness of COVID-19 epidemic control measures
Since the Yandex.Wordstat provides queries in absolute rather than relative numbers, we estimated the increase in the number of queries during the second peak of coronavirus infection incidence as the effectiveness indicator for COVID-19 epidemic control measures in regions of Russia. For this purpose, we compared the relationships between smell-related queries and new COVID-19 cases between two periods: when the number of new infections in the studied regions was minimal for several weeks (the plateau period from 6 July 2020 (28th week)–12 July 2020 to 14 September 2020–20 September 2020 (38th week)) and when the number of infections had an upward trend, rising to the maximum number (the peak period from 21 September 2020–27 September 2020 (39th week) to 30 November 2020–6 December 2020 (49th week)).
Figure 4 and table 2 show the results of the query/new infection case ratio calculation for 45 regions between the peak and plateau periods. The obtained data show that, except for Moscow, St. Petersburg, and the Tver Region, the ratio of queries to the number of new cases of infection increased during the second peak of coronavirus infection incidence compared with the plateau period; in 24 regions, the increase in queries was more than 100%.
Furthermore, in these regions, we assessed the popularity of queries related to levofloxacin, an antibiotic mentioned in the Ministry of Health of Russia guidelines as an agent administered to treat bacterial infections in patients with COVID-19 (online supplemental table 5). We can point out two groups where the ratio of levofloxacin-related queries to the total number of new cases of infection was less than one or larger than one. The first group includes Moscow and St. Petersburg, where the ratio of queries to the number of new cases of infection decreased during the peak period compared with the plateau period.
No significant relationship was found between the increase in smell-related queries during the peak period with the cut-off point set at 100% and the increase in levofloxacin-related queries compared with the number of new cases of infection (p=0.1690). However, when the cut-off point was set at 160%, a significant relationship (p=0.0216) was found between the increase in smell-related queries during the peak period and the increase in levofloxacin-related search activity. In other words, in the regions where the increase in queries compared with the number of new cases of infection during the second peak of coronavirus infection incidence increased by more than 160%, the levofloxacin-related search activity was significantly higher than in regions where the number of queries increased by less than 160% compared with the number of new cases of infection.
In response to the pandemic, numerous studies have attempted to identify the causes and symptoms of COVID-19 disease. The cumulative estimate for the prevalence of loss of smell was 77% when assessed using objective measures (95% CI 61.4% to 89.2%) and 44% when measured by subjective measures (95% CI 32.2% to 57.0%).20 21
Our study demonstrated a strong correlation (r>0.7) between the number of smell-related queries in Yandex and new COVID-19 cases in 59 regions and 85 large cities of Russia (70%). The obtained results are consistent with our previous data that revealed a strong correlation between smell queries and new infection cases (r=0.81) in Russia using Google Trends.18 Higgins et al22 also pointed out that worldwide search queries related to shortness of breath, anosmia, dysgeusia and ageusia, headache, chest pain, and sneezing had a strong correlation (r>0.60; p<0.001) both with daily new confirmed cases and with the number of deaths caused by COVID-19. Similar results were obtained by Walker et al19 who found a strong correlation between daily RSV associated with loss of smell, the daily increase in COVID-19 cases, and deaths ranging from 0.633 to 0.952 (p<0.05) in several countries.
In addition, the obtained data showed that during the first peak of the coronavirus infection incidence in Russia, the increase in smell-related queries outpaced the increase in new infection cases by 1–2 weeks in 23 out of 59 regions. During the second peak of the coronavirus infection incidence in Russia, the increase in the number of smell-related queries outpaced the increase in new infection cases by 1–2 weeks in 36 regions of Russia, including Moscow. In 14 regions, the increase in queries outpaced the increase in new infection cases by more than 2 weeks. A previous study also showed the time interval between the onset of COVID-19-associated symptoms and their actual detection.23
An important question raised in our study is whether the smell-related queries are primarily attributed to the queries from users with COVID-19 or they can be explained by other time-related reasons, such as seasonal diseases, allergies, or an infodemic (a rapid spread of information) when users who do not experience COVID-19 symptoms, including loss of smell, try to find more information about the disease.22 23
When analysing the Yandex.Wordstat data, we did not observe such peaks in queries related to smell impairment during the respective period in previous years. The assumption that symptom-free occasional users are interested in COVID-19 symptoms can be challenged by several arguments. For example, a comparative study conducted in Israel24 showed that patients suspected of having COVID-19 and having positive COVID-19 results (68%) demonstrated changes in smell almost 10 times as frequently as patients with negative COVID-19 (8%) results. Our study did not detect any increase in queries compared with new cases of infection during the second wave of infection in Moscow, St. Petersburg, and the Tver Region, the total population of which accounts for 50% of the population of 24 regions; there was a 100%–250% increase in the number of queries compared with the number of new cases of infection. In addition, for Moscow, the ratio of queries to new cases of infection decreased from 2.4±0.3 to 1.5±0.3, while for St. Petersburg and the Tver Region, the ratio of queries to new cases of infection remained unchanged. Moreover, from 16 November 2020–22 November 2020 to 25 January 2021–31 January 2021, in Moscow, the number of queries exceeded the number of new cases of infection by only 10%–20% (figure 3). Therefore, it is quite probable that smell-related queries are generated by people suffering from loss of smell, which is primarily associated with COVID-19.
The existing difference (100%–250%) between the number of queries and confirmed COVID-19 cases during the second peak of coronavirus infection incidence can be explained by the fact that some of the users who suffered from loss of smell and who searched the internet for information did not see a doctor in healthcare facilities. In this case, patients can opt for self-treatment and look for information about methods of treatment of coronavirus infection. One of these antibiotics is levofloxacin, which was mentioned in the Ministry of Health of Russia guidelines as an agent to treat bacterial infections in patients with COVID-19. A shortage of this antibiotic was reported in Russia starting in November 2021. We found that in regions where the queries increased by 160% compared with the number of new cases of infection during the second wave, the levofloxacin-related search activity was also significantly higher than in regions where the number of queries increased by less than 160% compared with the number of new cases of infection.
Our study shows the analysis of search queries in Yandex.Wordstat confirms the timewise relationship: internet users first look for information about their initial COVID-19 symptoms (smell impairment) and then confirm their disease. The presented data demonstrate that the increase (decrease) in the number of smell-related queries precedes the increase (decrease) in the number of infections by several weeks. Therefore, the ratio of queries to new cases of infection can be used to estimate the actual number of patients with recent coronavirus infections.25 For example, from 16 November 2020–22 November 2020 to 25 January 2021–31 January 2021, the queries in Moscow outnumbered the new infections by only 10%–20%. This suggests an effective policy targeted at COVID-19 epidemic control measures in Moscow when all the people affected by COVID-19 were detected promptly.
It should be noted that our results should be interpreted with caution due to many limitations. First, the design of our study does not allow us to conclude at the individual patient level. The increase in the number of queries may, at least in part, be due to an increase in the presence of related topics in the media and not to individual situations. In addition, looking for symptoms may be related to more than just COVID-19. However, given the high prevalence of COVID-19 in Russia and the significant correlation with confirmed cases of COVID-19, it is reasonable to assume that the increase in related searches is due to COVID-19. Although we tried to select terms that would cover the largest percentage of related terms, this could reduce the specificity of the search. For example, the search topic is ‘levofloxacin’ and not ‘levofloxacin in the treatment of COVID-19’ or ‘how to use levofloxacin in COVID-19 complications’. Like this, we can cover all intended search queries with high sensitivity; although this reduces the specificity of the search since queries related to the use of levofloxacin in urological diseases will also fall into the statistics, for example, ‘fourth generation fluoroquinolones in urology’ or ‘urological antibiotics for men’, as follows from the analysis of search queries.
We assume that the increase in the sudden interest in some symptoms of COVID-19, such as smell impairment, can be used as a valuable, minimally invasive indicator of coronavirus spread among populations and a tool for evaluating the effectiveness of COVID-19 epidemic control measures.
Data availability statement
All data relevant to the study are included in the article or uploaded as supplementary information. Extra data are available by emailing KM.
Patient consent for publication
Contributors VA and KM designed and developed the study. KM identified the thematic framework and interpreted the data. DK was collecting data. VA, KM and DK analysed and refined the data. KM prepared the manuscript’s main text, and all authors revised it. All authors contributed to interpreting the data and finalised and approved the manuscript. KM is responsible for the overall content as the guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Map disclaimer The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.