Article Text

Download PDFPDF

Original research
Reopening Italy’s schools in September 2020: a Bayesian estimation of the change in the growth rate of new SARS-CoV-2 cases
  1. Luca Casini,
  2. Marco Roccetti
  1. Department of Computer Science and Engineering, University of Bologna, Bologna, Italy
  1. Correspondence to Dr Luca Casini; luca.casini7{at}unibo.it

Abstract

Objectives COVID-19’s second wave started a debate on the potential role of schools as a primary factor in the contagion resurgence. Two opposite positions appeared: those convinced that schools played a major role in spreading SARS-CoV-2 infections and those who were not. We studied the growth rate of the total number of SARS-CoV-2 infections in all the Italian regions, before and after the school reopening (September–October 2020), investigating the hypothesis of an association between schools and the resurgence of the virus.

Methods Using a Bayesian piecewise linear regression to scrutinise the number of daily SARS-CoV-2 infections in each region, we looked for an estimate of a changepoint in the growth rate of those confirmed cases. We compared the changepoints with the school opening dates, for each Italian region. The regression allows to discuss the change in steepness of the infection curve, before and after the changepoint.

Results In 15 out of 21 Italian regions (71%), an estimated change in the rate of growth of the total number of daily SARS-CoV-2 infection cases occurred after an average of 16.66 days (95% CI 14.47 to 18.73) since the school reopening. The number of days required for the SARS-CoV-2 daily cases to double went from an average of 47.50 days (95% CI 37.18 to 57.61) before the changepoint to an average of 7.72 days (95% CI 7.00 to 8.48) after it.

Conclusion Studying the rate of growth of daily SARS-CoV-2 cases in all the regions provides some evidence in favour of a link between school reopening and the resurgence of the virus. The number of factors that could have played a role is too many to give a definitive answer. Still, the temporal correspondence warrants further systematic experiments to investigate on potential confounders that could clarify how much reopening schools mattered.

  • public health
  • epidemiology
  • infectious diseases

Data availability statement

Data are available in a public, open access repository. Data are available in a public, open access repository: https://github.com/pcm-dpc/COVID-19. All data and code are available upon request to the corresponding author (email: luca.casini7@unibo.it).

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • The use of a Bayesian linear regression model represents a reliable method to account for the uncertainty in the estimation of the changes of the growth rate of the number of daily SARS-CoV-2 infections.

  • Analysing the variation of the total number of new daily SARS-CoV-2-confirmed cases per each Italian region, in coincidence with school reopening, has avoided the problems of looking for specific data collected in schools.

  • The problem has been avoided of many infections missed inside schools as positive children and adolescents tend to display less symptoms, therefore leading to a lower probability to be detected with a passive surveillance methodology.

  • Data, made available by the Italian government, used to count the number of daily SARS-CoV-2 infections, amount to aggregated measures, uploaded daily: in some cases, those measures changed meaning over time and/or contained errors that were never corrected.

  • Many confounding factors, besides schools, may have played a role; nonetheless in September 2020 in Italy, those factors were still fewer than in the following months when several various containment measures were put in place.

Introduction

We recognise that attending schools play a key role for younger generations in supporting their development toward shared societal values and in promoting their positive physical and mental well-being. Nonetheless, if the scientific community is to overcome the challenges posed by SARS-CoV-2, a serious assessment of the impact that educational settings may have had in the spread of the pandemic cannot take a back seat. Hence, the main motivation of our study.

As the summer of 2020 was coming to its end, with its relatively low number of COVID-19 cases in most western countries, many started arguing whether it was wise to normally restart school activities. During the first wave, in most nations schools were closed, as any other activity, and only some partially reopened them as the situation got under control and the lockdown was lifted. In this context, the large COVID-19 outbreak in a high school in Israel in May 2020 ignited the discussions about the role of schools in the spread of the virus.1 2 Two opposing sides appeared quite clearly: on one side, those who considered schools to be a minor risk and the importance of school paramount; on the other, those who were concerned by the lack of clear data on the contagion dynamics in schools and were scared by the high number of asymptomatic cases in younger people. The same discussion emerged in all the other countries affected by the ongoing pandemic, with the two sides bringing to the table mostly the same arguments with the occasional country-specific remarks.

Most international literature that suggests the absence of considerable risk factors connected to schools focuses on the fact that children and adolescents seem to be the least affected by the virus, both in terms of the number of positive cases but also of symptoms and contagiousness. Ismail et al,3 studying UK schools, show how the incidence in students is not larger than the total incidence in the region and how the most cases inside schools are transmission between staff members. Similar studies targeting other countries draw similar conclusions, especially when they come to primary schools and kindergarten.4–9 The hypothesis that asymptomatic children could easily and unknowingly spread the virus in their families was rejected by Munro and Faust10 and by other similar analyses, based on argument that children have a lower susceptibility to the virus and thus play a lesser role in the transmission.11–13

The researchers presenting the opposite hypothesis point to the weaknesses of many of the aforementioned studies, namely the very small samples and the fact that often the role of asymptomatic subjects is not tracked and considered properly. Many point out the correspondence between the insurgence of the second wave in many countries within 2–3 weeks from the school openings and point to the data that suggest a higher spread in the school-aged population in those months, especially in high school and university students.14 15 Flasche and Edmunds have responded to Ismail’s study saying that it was conducted with schools not fully populated underestimating the potential of children, especially in the age bracket 10–18 years.16 This group has seen a considerable increase in September, as did college students, and seemed to be a common source of SARS-CoV-2 infections in the households. Also Yamey and Walensky17 express their concerns for universities reopening, while Sebastiani and Palù18 studied the Italian situation and argued that the rise of new cases in September, with most SARS-CoV-2 infections happening in the household, was compatible with the hypothesis of school being a factor. While inside schools, measures were taken, they argued that outside contact was inevitable due to public transportation and social gatherings, so that young people spread the virus among themselves. Larosa et al19 conducted a study in the Reggio Emilia province (Emilia-Romagna region, Italy) showing that there were non-negligible clusters in the age bracket 10–18 years. They also suggested that more prompt isolation and testing could have hindered the spread, stressing how important timeliness is in this context. Despite their opposing positions, most researchers emphasise the need for the same measures: an active case finding approach with systematic and thorough testing of students and personnel.

Following this scientific debate, this work focuses on Italy, looking at the contagion curves and relating them to the dates schools opened in each of the 21 regions. Italy has faced a hard time recently, being hit by a second wave of COVID-19 cases that were bringing the healthcare system to its knees. This second wave started during the autumn, somewhere around the start of October 2020, and peaked in November when the government imposed a new form of lockdown with color-coded zones, based on risk in every region. September was a crucial period, as with the end of summer many activities were going back to normal, and the virus prevalence in the nation was quite low.20 In the first days of that month, a slight increase in the number of new cases was registered in most regions, probably due to the cross-regional movement for the summer holidays.21 This prompted many to warn of the arrival of the second wave, but the number of new cases stabilised in the coming days, and the growth was considered small in any case. School reopened between the third and the fourth week of September22 23 while people had also already started going back to their offices and activities. Another thing to notice is the referendum of 20 September that in some regions corresponded with other elections for local government and senate representatives. While attendance was quite low, one could wonder the effects of such an event.

Because of the shocking scarcity of available and reliable data on SARS-CoV-2 infections inside Italian schools, we chose another perspective in order to investigate the hypothesis of an association between schools and the resurgence of the virus, by analysing the growth rate of the total number of SARS-CoV-2 infections in all the Italian regions, before and after the school reopening.

Methods

Given the scarce availability of data collected in schools that could better describe their role in Italy (for the reasons stated in the Introduction section), we decided to work with the population-wide data at the regional level. We fitted a piecewise linear regression model where the dependent variable was only the number of new daily confirmed COVID-19 cases, and the independent variable was just the number of days since 1 September (until 31 October). The result is a model comprised of a changepoint and two segments, whose slopes represent the growth rate before and after the acceleration in the exponential growth causing the second wave; we passed from a stable situation with the exponent close to 0 which means little to no growth, to a situation with markedly positive exponent. To have a measurement of the uncertainty in our estimates, we decided to use a Bayesian framework for the regression as described in.24 Two transformations were applied to the initial data. First, we used a 7-day rolling average as the raw data present a weekly periodicity due to the way COVID-19 tests are carried out and registered. Second, we applied a natural logarithm so that the exponential growth appears as an easily identifiable slope. Using the R package called m;cp built on top of JAGS,25 we estimated a piecewise linear regression model for each region.

We modelled our dependent variable Embedded Image (the natural log of confirmed daily cases) as a normal distribution whose mean depends on the regression coefficients a1 and b1 (the intercept and angular coefficient) before a changepoint τ, and a2 and b2 after the changepoint, as represented in the formula below.

Embedded Image

Embedded Image

The model was fitted using a Markov Chain Monte Carlo method. Since the two lines are joined at the changepoint, without discontinuities, the second intercept term a2 is not estimated as is bound to be Embedded Image .

As described in 25, the intercept and slope priors used for starting the Bayesian estimation were chosen in Gaussian families, while for the changepoint a Uniform distribution was used, precisely as reported below:

Embedded Image

Embedded Image

Embedded Image

Embedded Image

The final estimates for all the parameters (the changepoint, the two slopes, the initial intercept and the variance) are reported in table 1 with the 95% CI. To give an idea of the increase in slope, we computed the number of days (DT1 and DT2) necessary to observe a doubling in the number of new cases from the changepoint onwards, with both slopes (using the average angular coefficient), as shown in the following equations.

Table 1

Estimated parameters from the 21 Italian regions along with school opening date (Open), number of days between opening and changepoint dates (D) and the doubling time for the two slopes (DTi)

Embedded Image

All the data used are available from the Italian government site: https://githubcom/pcm-dpc/COVID-19. The results of all the computations are fully reproducible by using the method stated above. The authors are open to sharing the code with whoever reaches out by email.

Patient and public involvement

Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Results

Table 1 shows all the estimated quantities for each region. Similarly, figure 1 shows all the curves and regression lines. Out of the 21 Italian regions, 15 (71%) of them have a changepoint within 28 days from the date when the school opened. In particular, the average number of days between the opening and the changepoint is 16.66 days (95% CI 14.47 to 18.73). We believe this number is plausible in a scenario where one has to be exposed to the virus and then manifest symptoms in order to be tested, also considering that often children and adolescents are asymptomatic and that they only function as a vector to other people around them.

Figure 1

Plots for the regions whose changepoint (blue with 95% CI area) is within 28 days since school openings (yellow). Black dots are the natural log of daily confirmed COVID-19 cases from 01 September 2020 to 31 October 2020. In green and red the regression lines using the mean parameters, before and after the changepoint. On the rightmost side of each plot, the y-axis is reported without the log transformation, to allow the reader to infer what the confirmed case rates were in each region.

Looking at the estimated slopes of figure 1, we can have an idea of the strength of the increase. While one could comment on the difference in the angular coefficients, converting it into an angle, the number does not convey the idea very effectively. Translating it into the number of days required for a doubling in the growth rate is more interpretable. Of the 15 regions mentioned above, 4 had a slope that is null or slightly negative before the change, making this time infinite, but the fact that the trend was inverted to an average value of 6.23 days (95% CI 4.30 to 8.20) is significant enough by itself. The remaining 11 regions went from an average of 47.50 (95% CI 37.18 to 57.61) days at the rate before the changepoint to an average of 7.72 (95% CI 7.00 to 8.48) at the rate after the changepoint. Figure 1 and table 1 illustrate these results.

Of the six regions that break the pattern, two of them, P.A. Trento and P.A. Bolzano (often considered one region, called Trentino-Alto Adige) presented a changepoint more than 4 weeks after the school opening; they are indicated in blue in table 1. On the other hand, the remaining four, in yellow in table 1, begin their rapid increase before or in correspondence of the school opening. Both are displayed in figure 2.

Figure 2

The same plots of figure 1 but for the regions that do not exhibit the pattern. First four anticipate the start of the exponential growth before school opens, the last two take more than 28 days after that. On the rightmost side of each plot, the y-axis is reported without the log transformation, to allow the reader to infer what the confirmed case rates were in each region.

All the models converge quite nicely, with narrow 95% CI, except for Basilicata, where the changepoint estimation is extremely wide, probably due to the high variability in the reported numbers.

Discussion

The results highlight how the second wave in Italy started in the days between September and October, with a degree of variability. These changepoints are on average a couple of weeks after the school openings. Certainly, multiple confounding factors played a role in the acceleration of the SARS-CoV-2 infection, but our opinion is that schools are surely one of those, and the magnitude of their effects should be investigated more thoroughly.

In the short period that precedes the second wave, there were not many events that interested as many people as schools. The referendum and elections on 20 September almost coincide with school opening and did not have a large participation. Workers going back to their offices could also have played a major role, however by looking at the nationwide mobility report by Google news, we can see that in September the number of people moving to their workplace increased steadily from −30% to −20% with respect to the reference level before the pandemic.26 If we consider that in Italy, there are approximately 25 million workers, an increase of 10% would translate into 2.5 million more people circulating. If we compare this number with that of students and school personnel which is equal to approximately 11 million, we can conclude that perhaps, even when considering that both categories use public transportation heavily, schools could be more influential in spreading the virus.

The regions that do not follow the pattern may tell us something more. In a regionalised country like Italy, with a strong territorial differentiation, Trento and Bolzano could be outliers.27 They are two northern, neighbouring autonomous provinces, often considered a single region, which share characteristics of a higher care coverage of social and educational services that could set them apart from other regions. In the others, the change is before or coincidental with school opening, so any factor that could have ignited the second wave is to be researched outside of the school activity and all the other connected activities. The effect of schools, if any, would be absorbed in the inflation already in act. Being all four maritime regions, tourism could perhaps be one of the major causes. Regarding those regions where the number of cases has high variability, the reason is most likely that the number of tests done each day varies as much. We could have normalised the cases with the number of tests, but this datum is often unreliable and leads to unrealistic normalised values, so we decided to avoid this. It should be also noted that any research hypothesis concerning SARS-CoV-2 infections in schools has a hard time being verified in Italy, as no region has so far seriously investigated the dynamics of the spread of the virus inside schools by using a systematic active case finding methodology.

There are, finally, some important technical considerations to put in evidence concerning the possible limitations of this study. First, the data we have used to count the number of daily SARS-CoV-2 cases were made available by the Italian government under the form of aggregated measures. In several cases, those measures have changed meaning/value over time, with errors that were never corrected. Not only that, but in order to highlight the shift in the infections growth rate (ie, the slopes), along with the moment in time when this happened (ie, the changepoint), we adopted a simple normal model, resulting in two lines before and after the changepoint of interest. We recognise that this method is not the more accurate one with which to count COVID-19 cases. Poisson-like distributions would be more appropriate. Moreover, it should be considered that the target of this study was not trying to create a model for the count of COVID-19 infections, per each single day of observation, but to look at how quickly they grew, comparing the slopes. Nonetheless, a Poisson model yields comparable results. In particular, the 15 regions that showed the pattern continue doing so (the changepoint falls within 3 weeks from the school reopening). Also, the relevant parameters (the slopes coefficients b1 and b2) for all the 21 Italian regions have an average absolute difference from their respective of less than 0.01 (0.0057 for b1, 0.0081 for b2), with the average number of days since the reopening of schools equal to 15.2 days, well within the CI computed with the normal model. A limitation of this study is also concerned with the impossibility to provide an estimate of many other confounding factors that could have played a role during the period of observation, besides schools. Another limitation resides in the use of Italian data. The extension to different geographies could result into more robust results.

Conclusion

We have applied a piecewise regression to the number of new daily COVID-19 cases since 1 September 2020 in order to highlight the start of the second wave and relate it to the reopening of schools in Italy. The numbers emerged from this study are not enough to rule out schools, as some suggest, and neither they indicate a direct link, as others are sure of. Nonetheless, they show that the acceleration in the exponential growth is compatible, most of the time, with schools opening 2–3 weeks prior. We believe that a serious study which systematically addresses potential confounders and extends this approach, or similar approaches, to other geographies is necessary, to give a definitive answer to this question, with a comparison between different age brackets and schools, and an effort to track the virus both inside and outside the school’s walls, using an active approach to case finding. The difference between regions, especially those that break the pattern, should be further investigated as it could provide clues on how the dynamics of the spread have changed according to context.

Data availability statement

Data are available in a public, open access repository. Data are available in a public, open access repository: https://github.com/pcm-dpc/COVID-19. All data and code are available upon request to the corresponding author (email: luca.casini7@unibo.it).

Ethics statements

Ethics approval

Not applicable: neither humans nor animals nor personal data are being involved in this study.

References

Footnotes

  • Contributors LC and MR equally contributed to conceive, design, write, manage and revise the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.