Article Text

Original research
STI epidemic re-emergence, socio-epidemiological clusters characterisation and HIV coinfection in Catalonia, Spain, during 2017–2019: a retrospective population-based cohort study
  1. Alexis Sentís1,2,3,
  2. Marcos Montoro-Fernandez1,4,
  3. Evelin Lopez-Corbeto1,4,5,
  4. Laia Egea-Cortés1,4,
  5. Daniel K Nomah1,6,
  6. Yesika Díaz1,4,
  7. Patricia Garcia de Olalla5,7,
  8. Lilas Mercuriali7,
  9. Núria Borrell8,
  10. Juliana Reyes-Urueña1,4,5,
  11. Jordi Casabona1,4,5,6
  12. The Catalan HIV and STI Surveillance Group
    1. 1Centre of Epidemiological Studies of Sexually Transmitted Disease and AIDS in Catalonia (CEEISCAT), Department of Health, Generalitat of Catalonia, Badalona, Spain
    2. 2Pompeu Fabra University (UPF), Barcelona, Spain
    3. 3Epidemiology Department, Epiconcept, Paris, France
    4. 4Fundació Institut d'Investigació Germans Trias i Pujol (IGTP), Badalona, Spain
    5. 5Spanish Consortium for Research on Epidemiology and Public Health (CIBERESP), Instituto de Salud Carlos III, Madrid, Spain
    6. 6Department of Paediatrics, Obstetrics and Gynecology and Preventive Medicine, Universitat Autònoma de Barcelona, Barcelona, Spain
    7. 7Epidemiology Service, Public Health Agency of Barcelona, Barcelona, Spain
    8. 8Epidemiological Surveillance and Response to Public Health Emergencies Service in Tarragona, Agency of Public Health of Catalonia, Generalitat of Catalonia, Tarragona, Spain
    1. Correspondence to Alexis Sentís; a.sentis{at}


    Objectives To describe the epidemiology of sexually transmitted infections (STIs), identify and characterise socio-epidemiological clusters and determine factors associated with HIV coinfection.

    Design Retrospective population-based cohort.

    Setting Catalonia, Spain.

    Participants 42 283 confirmed syphilis, gonorrhoea, chlamydia and lymphogranuloma venereum cases, among 34 600 individuals, reported to the Catalan HIV/STI Registry in 2017–2019.

    Primary and secondary outcomes Descriptive analysis of confirmed STI cases and incidence rates. Factors associated with HIV coinfection were determined using logistic regression. We identified and characterized socio-epidemiological STI clusters by Basic Health Area (BHA) using K-means clustering.

    Results The incidence rate of STIs increased by 91.3% from 128.2 to 248.9 cases per 100 000 population between 2017 and 2019 (p<0.001), primarily driven by increase among women (132%) and individuals below 30 years old (125%). During 2017–2019, 50.1% of STIs were chlamydia and 31.6% gonorrhoea. Reinfections accounted for 10.8% of all cases and 6% of cases affected HIV-positive individuals. Factors associated with the greatest likelihood of HIV coinfection were male sex (adjusted OR (aOR) 23.69; 95% CI 16.67 to 35.13), age 30–39 years (versus <20 years, aOR 18.58; 95% CI 8.56 to 52.13), having 5–7 STI episodes (vs 1 episode, aOR 5.96; 95% CI 4.26 to 8.24) and living in urban areas (aOR 1.32; 95% CI 1.04 to 1.69). Living in the most deprived BHAs (aOR 0.60; 95% CI 0.50 to 0.72) was associated with the least likelihood of HIV coinfection. K-means clustering identified three distinct clusters, showing that young women in rural and more deprived areas were more affected by chlamydia, while men who have sex with men in urban and less deprived areas showed higher rates of STI incidence, multiple STI episodes and HIV coinfection.

    Conclusions We recommend socio-epidemiological identification and characterisation of STI clusters and factors associated with HIV coinfection to identify at-risk populations at a small health area level to design effective interventions.

    • epidemiology
    • HIV & AIDS
    • sexual medicine
    • infection control
    • public health
    • preventive medicine

    Data availability statement

    All data relevant to the study are included in the article or uploaded as supplementary information.

    This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

    Statistics from

    Strengths and limitations of this study

    • In this retrospective population-based cohort study, the use of data from the Catalan HIV/STI Registry allowed us to characterise the re-emergence of sexually transmitted infections (STIs), perform socio-epidemiological clustering and reveal factors associated with HIV coinfection.

    • To the best of our knowledge, this is the first study to apply the k-means clustering methodology to identify and characterise distinct socio-epidemiological clusters of STI at a small health area level.

    • A key limitation of this study is the high proportion of missing data around sociodemographic and lifestyle characteristics such as education level, sexual preference and country of birth. Nonetheless, our findings are consistent with previous analyses.


    The epidemic of sexually transmitted infections (STIs) continues to be a major concern and threat to global public health. Undiagnosed and untreated STIs can lead to a multitude of complications including HIV acquisition, long-term disabilities, infertility, adverse pregnancy outcomes and death.1 2 Across Europe, incidence of STIs continue to be on the rise with confirmed cases reported in national surveillance systems increasing by 50% for gonorrhoea, 36% for syphilis, 68% for lymphogranuloma venereum (LGV) and 0.6% for chlamydia from 2014 to 2018.3–6 This trend is reflected in Spain where new STI cases have been reported to increase 10-fold from 2000 to 2017, with 23 975 cases of gonorrhoea, syphilis, chlamydia and LGV reported in 2017 alone.7 8 During 2018 to 2019, the region of Catalonia in Spain recorded the highest incidence of STIs across the country, with a rise of 37% in the number of cases.9 Incidence rates were highest among men who have sex with men (MSM), women and in young adults, particularly among young women who in recent years have shown a proportionally higher increase than men.7 9 The surge in STI incidence rates may be explained by improvements in surveillance systems, introduction of new diagnostic methods with enhanced sensitivity, changes in sexual attitudes and behaviours, sociocultural shifts in society and the effects of tourism and globalisation.10

    STIs and HIV infections are overlapping epidemics, which, besides from biological synergies, are largely driven by socioeconomic and other contextual factors acting as syndemics. Individuals affected by STIs are at increased risk of HIV infection and people living with HIV are more vulnerable to STIs.11 12 Some studies have described social determinants of health, discrimination and inequalities as the main factors associated with the spatiotemporal clustering of STI cases.13 14 While spatiotemporal clustering may be useful in grouping events or cases, other methodologies including k-means clustering allow grouping of different geographical units by common characteristics such as sociodemographic and epidemiological factors.15 The socio-epidemiological characterisation of STIs, including association with HIV coinfection, and identification of distinct clusters are imperative to strengthen the integrated surveillance of STIs and HIV. Data from such an exercise could potentially increase the sensitivity, timeliness and representativeness of surveillance systems, and generate information to tailor public health strategies to tackle a continuously growing epidemic.

    Therefore, we aimed to describe the epidemiology of STIs, identify and characterise socio-epidemiological clusters of STI and determine factors associated with HIV coinfection in Catalonia, Spain, during 2017–2019.


    Study design and data source

    We conducted a retrospective population-based cohort analysis of all confirmed cases of the notifiable STIs, syphilis, gonorrhoea, chlamydia and LGV, in Catalonia between 1 January 2017 and 31 December 2019. Data were obtained from the Catalan HIV/STI Registry,16 which uses information from the Epidemiological Repository of Catalonia (REC, in Catalan), an electronic database used by the Epidemiological Surveillance Network of Catalonia (XVEC, in Catalan). REC collects information from two sources: (1) the microbiological notification system (SNM, in Catalan) of confirmed cases from microbiological laboratories; and (2) the mandatory disease notification system (MDO, in Catalan) based on physician reporting of clinically suspected/probable and laboratory-confirmed cases as per established case definitions. Information collected through an epidemiological questionnaire that records clinical, epidemiological and behavioural variables are included along with the mandatory notification in REC (online supplemental table S1). Case definitions for surveillance reporting are standardised according to the European Union definitions established by the European Centre for Disease Prevention and Control (online supplemental table S2).17 18

    Analysis variables

    We extracted data around epidemiological, sociodemographic and clinical variables as detailed in online supplemental table S3. All individuals who had experienced at least one STI episode during the study period were linked, through the Spanish healthcare system personal identification code (CIP), to the Catalan HIV/STI Registry to identify HIV coinfections either before or after the recorded STI episode. In addition to the CIP, the Catalan HIV/STI Registry surveillance team performs checks of duplicates at least twice annually using a unique STI episode number (assigned to each notification and disease), name and date of birth. For our analysis, a deduplicated, HIV/STI-linked and anonymised version was provided.

    A Basic Health Area (BHA; Àrea Bàsica de Salut (ABS), in Catalan) is a territorial unit of coverage served by a primary healthcare team. Each BHA typically serves a population of approximately 5000–25 000 people. The socioeconomic level of the BHAs was classified according to a deprivation index (calculated by the Agency of Health Quality and Assessment of Catalonia) which was attributed to each individual according to their residential address. The deprivation index is a composite measure based on indicators such as proportion of residents with low educational level, proportion of manual workers, proportion of residents with an annual income below a specified amount, and rate of premature mortality. Deprivation indices were categorised in quintiles, with the first quintile being the least deprived.19

    Clinical variables that were extracted included reinfections, multiple STI episodes and coinfection with HIV. Reinfection was defined as an episode of the same STI detected after a defined period, which differed for each STI, following the previously recorded infection in the same individual during the study period. Multiple STI episodes were defined as total number of episodes of any STI reported for the individual during the study period (online supplemental table S3). As information regarding treatment response was not available, episodes occurring outside of the specific time frames for each STI were assumed not to be a persistent infection resulting from treatment failure.

    K-means clustering of STIs

    We implemented k-means to define STI clusters by sociodemographic characteristics. Specifically, the k-means clustering methodology is an unsupervised machine learning approach that seeks to group heterogeneous units (in our case BHAs) into clusters based on similarities in characteristics (variable values and categorical distribution among the BHA).15 A clustering algorithm is a procedure for grouping a series of vectors according to a specific criterion, which could be distance or similarity. Proximity is defined in terms of a distance function and uses a k-means clustering method based on Euclidean distance to quantify similarities or differences between observations. This method is sensitive to outliers and requires both internal and external validation processes using different combinations of variables in the algorithms. The internal clustering validation considered an average Euclidean distance for each cluster and similarities between cases according to the correlation matrix of distances to determine the optimum number of clusters for which intracluster variation is minimum. Our external validation process was based on a description of the possible socio-epidemiological variables to determine the most appropriate clustering algorithm for our data set. Based on these validations, the researchers ended up with an exact number of clusters formed by defined variables which were included in the algorithm. In our case, the following variables were chosen to identify and build the final three socio-epidemiological clusters of STI by BHA: incidence rate by each STI, percentage of women, percentage of people with HIV coinfection, median age among all STI cases in each BHA and deprivation index of each BHA.

    Statistical analyses

    We performed a descriptive analysis to summarise epidemiological, clinical, sociodemographic and geographical variables for the total confirmed STI cases and by STI clusters. Continuous variables were summarised as median and interquartile range (IQR), while categorical variables were reported as absolute frequencies and percentages. Annual incidence rates of STIs are described per 100 000 population for the total confirmed STI cases and by STI cluster, and calculated based on census information from the Statistical Institute of Catalonia (IDESCAT) (online supplemental table S4). Incidence trends were analysed using the χ2 test for linear trend. For identifying and building clusters in k-means clustering, STI incidence rates were described per 1000 population due to the small population size per BHA.

    We assessed risk factors associated with HIV coinfection among individuals diagnosed with STIs using multivariable logistic regression models to estimate odds ratios (ORs) and 95% confidence intervals (CIs). Individuals with more than one STI episode were counted once (first episode), and successive episodes in the same individual were grouped in a variable that considers the number of episodes, and included in the models. Sexual preference, country of birth and education level were excluded from the models because more than 50% of values were missing. We used backward stepwise elimination regression to include all analysed variables that showed statistical significance (p<0.05) by the Wald test in the final multivariable logistic regression model. All analyses were performed using R Statistical Software V.3.6.1).

    Ethics approval statement

    Data from the Catalan HIV/STI Registry, REC and all aggregated variables used in the study were handled according to international recommendations, the Helsinki Declaration revised by the World Medical Organization in Fortaleza in 2013 and Spanish Law 3/2018 on Data protection and Public Health 33/2011. Patient information was anonymised and deidentified prior to analysis and therefore no informed consent was required.

    Patient and public involvement

    Patients were not directly involved in this study; only data from the nationally notifiable disease surveillance system were used.


    STI epidemic and trends

    Between 2017 and 2019, a total of 42 283 cases of STIs were reported among 34 600 individuals in Catalonia (table 1). Throughout the study period, half of all reported STIs were chlamydia (50.1%) and almost a third were gonorrhoea (31.6%). Reinfections accounted for 10.8% of all reported cases. Among the subjects affected by STIs, the events of gonorrhoea had the highest reinfection rate (15.7%), while chlamydia had the lowest occurrence of reinfection (6.7%).

    Table 1

    Epidemiological characteristics of reported STI cases in Catalonia, Spain (2017–2019)

    The number of STI cases doubled from 9687 in 2017 to 18 872 in 2019 (table 2). The incidence rate of STIs increased by 91.3% from 128.2 cases per 100 000 population in 2017 to 248.9 cases per 100 000 population in 2019. The annual incidence rate of STIs for the period 2017–2019 was 185.5 per 100 000 population. Incidence rates increased significantly (p<0.001) from 2017 to 2019 for all STI types except for syphilis cases which remained stable over the 3 years, with the highest increase in number of cases seen in chlamydia (188.8%) followed by gonorrhoea (63.8%) and LGV (56.1%). In 2017, chlamydia and gonorrhoea represented 36.8% and 36.0% of all reported STIs, respectively, but by 2019 chlamydia accounted for 55.1% of all cases. Gonorrhoea showed the second greatest increase from 2017 to 2019, with 47.5% occurring in individuals under 30 years of age throughout the study period. This increase in the number of confirmed STI cases from 2017 to 2019 was remarkably higher in women (132% vs 75% in men) and individuals below the age of 30 years (125% vs 68% in those ≥30 years). Indeed, women under 30 years of age presented the highest decrease in both number of cases, with an increase of 155.8% versus 93.6% in men below 30 years and in incidence rates with an increase of 154.1% (from 193.6 to 491.9 per 100 000 population) versus 93.6% in men under 30 years (from 202.9 to 384.0 per 100 000 population).

    Table 2

    Reported STI cases and incidence rates by year in Catalonia, Spain (2017–2019)

    The vast majority of reported cases occurred in men for all STI types except chlamydia, of which 61.9% occurred in women (table 1). Among all STI cases, 78.6% were reported in individuals below 40 years of age. Chlamydia was reported most frequently among individuals below 30 years of age (66.1%), while syphilis occurred most in those above 30 years of age (77.1%). Among the 15 023 (35.5%) reported STI cases for which information regarding sexual preference was available, half (54.5%) were reported in women who have sex with men (WSM), 21.8% in MSM and 21.0% in MSW (table 1).

    When examining the distribution of STI cases according to deprivation index, the highest proportion of cases was seen in less deprived areas, with 24.3% of all cases reported in the first quintile. In more deprived areas (fifth quintile), chlamydia (56%) and gonorrhoea (29%) occurred more frequently than syphilis (14%) and LGV (1%). Data around country of birth and education level were limited due to high rates of missing data (56.6% for country of birth and 76.5% for education level). Nevertheless, we report that among cases with available information, 72.4% were observed among individuals born in Spain and 85.0% in those with secondary or higher education (online supplemental table S5).

    The incidence rate of STI cases was disproportionately higher in Barcelona (83.3%) compared with the other six regions combined (table 1). Barcelona reported the highest incidence rate of STIs, while Alt Pirineu i Aran recorded the lowest consistently throughout the study period (table 2). In 2019, the incidence rate of STIs was 307.8 cases per 100 000 population in Barcelona and 45.7 cases per 100 000 population in Alt Pirineu i Aran. Nevertheless, incidence rates of STIs increased significantly from 2017 to 2019 in all regions, regardless of STI type. Similarly, the large majority of STI cases occurred in urban BHAs (70.9%) throughout the study period.

    Factors associated with HIV coinfection among individuals with STIs

    In total, 6% of STI episodes affected HIV-positive individuals with a higher proportion of HIV coinfection observed with cases of syphilis and LGV (13% and 25%, respectively) and the lowest with cases of chlamydia (2%) (table 1). Factors associated with HIV coinfection among individuals with STIs in the multivariable analyses are shown in table 3. The likelihood of HIV coinfection was greater among males (adjusted OR (aOR) 23.69; 95% CI 16.67 to 35.13 compared with females) and in urban BHAs (aOR 1.32; 95% CI 1.04 to 1.69 compared with rural BHAs). All age groups from 20 years and above and having multiple STI episodes were also associated with greater odds of HIV coinfection. BHA deprivation indices beyond the first quintile were associated with lower likelihood of HIV coinfection among individuals with STIs.

    Table 3

    Factors associated with HIV coinfection among individuals diagnosed with STIs in Catalonia, Spain (2017–2019)

    Identification and characterisation of the socio-epidemiological clusters of STIs

    Of the 373 Catalan BHAs, five (Garraf rural, Polinyà-Sentmenat, Ribes-Olivella, Roquetes-Canyelles and Viladecans 3) were excluded from the K-means clustering analysis because their delimitations and populations changed during the study period. In these five BHAs, 679 episodes were reported during the 3 years of the study period. This fact and having 5773 episodes with no information available about BHA of residence reduced the sample size for the cluster analysis from 42 283 to 35 831 STI cases. Of the 368 BHAs included in the analysis, we identified three distinct clusters (table 4). Among the included BHAs, the incidence rate of STIs in 2017–2019 was 160.6 per 100 000 population per year.

    Table 4

    Characteristics of socio-epidemiological STI clusters in Catalonia, Spain (2017–2019)

    Of the three clusters, the socio-epidemiological characteristics of STI-infected individuals in cluster A most closely resembled that of the total cases reported in the Catalan surveillance system that were included in the cluster analysis. Among the 109 BHAs in cluster A, median age was 31 years compared with 29 years among all reported STI cases, median deprivation index was 31.9 versus 39.8, the proportion of men was 67.4% versus 58.1% and HIV coinfection rate was 8.8% versus 6.1% (table 4).

    Cluster B consisted of the largest number of BHAs (251) and had the highest deprivation index (44.9) of all three clusters and compared with the total. The incidence rate of STIs was lower in cluster B compared with that of the total reported cases included in the cluster analysis (136.3 vs 160.6 per 100 000 population), but represented the majority of all reported STI cases (55.7%). STI-infected individuals in cluster B were the youngest among all groups (26 years) and were predominantly women (53.0%), whereas men represented the majority in all other groups. Compared with the total, cluster B consisted of more rural BHAs (15.8% vs 11.0%) and had a higher proportion of heterosexual men and women (approximately 12% higher) and chlamydia cases (61.7% vs 55.0%). Rates of multiple STI episodes and HIV coinfection in cluster B were the lowest of all three clusters and compared with the total (table 4).

    Cluster C consisted of only eight BHAs and had the lowest deprivation index (25.6) among all groups (table 4). The incidence rate of STIs in cluster C was the highest among all groups (721.0 per 100 000 population), with all cases reported in urban BHAs. STI-infected individuals in cluster C were the oldest among all groups (34 years) and had the highest proportion of MSM. Similar to other clusters and the total reported cases, chlamydia remained the most common STI type; however, cluster C was characterised by higher rates of gonorrhoea (33.2%), syphilis (23.6%), LGV (5.4%), multiple STI episodes (24.0%) and HIV coinfection (15.7%).

    Almost 60% of STI cases in cluster B occurred in BHAs in the three lowest quintiles of STI incidence rates, while more than 60% in cluster A occurred in areas of high STI incidence rates (fourth and fifth quintiles). All 4359 STI cases in cluster C were reported in BHAs in the highest quintile of STI incidence rate. This correlated well with the fact the number of STI cases per BHA was higher in clusters A and C (105.8 and 544.9 cases per BHA, respectively) than in the total (97.4 cases per BHA), which indicates higher proportion of high incidence rates (table 4 and figure 1).

    Figure 1

    Incidence rates (per 1000 population) and socio-epidemiological clusters of STIs by BHA during 2017–2019. (A) STI incidence rates in Catalonia; (B) STI incidence rates in Barcelona city*; (C) STI socio-epidemiological clusters in Catalonia and (D) STI socio-epidemiological clusters in Barcelona city*. *Health Regions were used as a bigger unit of analysis than BHA. The municipality of Barcelona is shown to enhance the visualisation of cluster C. From a total of 373 Catalan BHA, five (Garraf rural, Polinyà-Sentmenat, Ribes-Olivella. Roquetes-Canyelles and Viladecans 3) were excluded from the K-means clustering analysis because their delimitations and populations changed during the study period. BHA, Basic Health Area; STIs, sexually transmitted infections.


    Our findings revealed that the incidence of STIs in Catalonia almost doubled from 2017 to 2019, primarily driven by the increase in cases among young adults (under 30 years) and in cases of chlamydia (particularly in women) and gonorrhoea. In 2017–2019, the majority of STI cases occurred also in individuals below the age of 30 years, and those living in urban and less deprived areas, with most cases reported in Barcelona. The identification and characterisation of socio-epidemiological clusters of STI showed that young women living in rural and more deprived areas were more likely to be affected by chlamydia. Furthermore, MSM living in urban and less deprived areas showed, more frequently than other population groups, higher STI incidence rates, more multiple STI episodes and higher percentages of HIV coinfection. Similarly, the factors associated with HIV coinfection were being men, older than 20 years old, living in urban and less deprived areas and having multiple STI episodes.

    After a long period of continuous reduction of STI incidence in Western countries, which coincided with the beginning and hardest times of the HIV epidemic from the 1980s to 2010s, many countries including the USA and European countries are recently reporting an ongoing re-emergence of STIs.3–6 The rise in STI cases has been partially attributed to enhancement of surveillance systems and the introduction of improved diagnostic tools in recent years.10 Other contributing factors, described mostly among MSM, include the use of HIV pre-exposure prophylaxis, the use of recreational drugs for sex, substance and/or alcohol abuse and widespread use of the internet and other technologies to seek sexual partners.20–22

    Chlamydia has been reported more frequently in WSM, while syphilis, gonorrhoea and LGV were more common in MSW and MSM.3–9 Similarly, in our study, we found different epidemiological characteristics for each STI type. Chlamydia was more common in women, mostly in WSM, with a large majority occurring in individuals below the age of 30 years. Gonorrhoea, syphilis and LGV were substantially more frequent in men, specifically among MSM, and showed higher percentages of reinfections and HIV coinfections than chlamydia. Most STI cases were observed in Spanish-born individuals and among those with secondary or higher education levels, although these findings should be interpreted with caution because of the high proportion of missing data for sexual preference and education level. Our findings are consistent with earlier studies and reports of STIs in Catalonia in 2007–2015,12 2012–201723 and 2018–2019,9 showing a proportionally higher increase in young adults, mostly women, especially for chlamydia but also for gonorrhoea. Our findings are also consistent with that of a previous study among residents of Barcelona showing that STIs are becoming more prevalent in individuals with favourable socioeconomic status and education levels.24

    Consistent with previous data,12 we found that male sex, age above 20 years (particularly 30–60 years), living in urban or less deprived areas, and having multiple STI episodes were associated with an increased risk of HIV coinfection. STIs and HIV have been described as synergic infections and should be viewed as a syndemic.25 The WHO and other public health agencies have emphasised the importance of integrating surveillance of STIs, HIV and even viral hepatitis, and strengthening understanding of determinants of these infections by linking biological and behavioural surveillance, to enhance the identification and characterisation of populations at increased risk of infection.10 25 Sociodemographic and socioeconomic are increasingly being established as more important risk factors of STI acquisition than individual behaviours, particularly among women from disadvantaged groups.26 27

    The k-means clustering methodology is a machine learning approach that has proven its utility and potential in classifying and grouping health-related outcomes. It has been used in the field of bipolar disorder to define cluster-based disease severity using heterogeneous variables such as sociodemographic, clinical, cognitive, vital signs and laboratory parameters.28 More recently, its potential to monitor and group SARS-CoV-2 prevalence by magnitude and trends (higher, medium and lower) at a regional level in Italy has been described.29 Identification of these ‘clusters of characteristics’ may be useful, in their specific context, to better detect and characterise case profiles by site or geographical area, which could ultimately lead to better-designed interventions to improve health outcomes. In a recent study of STI risk among MSMs, hierarchical cluster analysis, another machine learning methodology, identified factors other than behaviour, such as sexual networks and risk perception, that influence the vulnerability to STIs and HIV infections.30 To the best of our knowledge, this current study is the first to apply the k-means clustering methodology to identify and characterise socio-epidemiological clusters of STI.

    A key limitation of this study is the high proportion of missing data around sociodemographic and lifestyle characteristics, a common phenomenon in population-based epidemiological studies where questionnaires are used. This may have potentially introduced information bias or inaccurate representation of the true situation when describing high-risk populations. Although not formally assessed, we classify these missing data as missing completely at random due to time constraints in completion of the epidemiological questionnaires by surveillance officers and healthcare professionals who notified the diseases to the surveillance systems. Nonetheless, our findings are similar to those reported in previous analyses.3–6 9 12 24 The age category above 60 years old may contribute to residual cofounding although the risk is minimal because it is the age group with the smallest sample size and the range is larger than for other age categories. Categorisation of the deprivation indices by quintiles could have diluted the findings if deprivation was a strong confounder or unevenly distributed, although we do not believe either event to be the case in our analysis.

    A strength of our study is the inclusion of ecological variables of socioeconomic status which are highly relevant and pertinent for describing groups at increased risk of STIs. We believe that the most valuable outcome of our study is that it shows the utility of complementing traditional epidemiological analyses with new methodologies, in this case, a machine learning approach, to combine heterogeneous data sources. This would allow identification and characterisation of target populations at increased risk of STIs to design more efficient measures to prevent and control STIs and HIV infection at a small health area level.

    In conclusion, consistent with other European countries, our study found that STIs increased at an alarming rate during 2017 to 2019 in Catalonia, Spain, and continues to be a worrisome public health concern. The STI epidemic is both an issue of the health sector, and it also poses a threat to the broader global development framework and agenda. While declines in HIV infection has been observed in the last decade in Catalonia, as in many other regions in Europe, primarily due to the success of wider and earlier use of antiretroviral therapies, STI rates have been increasing dramatically, both among the MSM population, and also in heterosexual women and young adults. We found that young women living in rural and deprived areas were more likely to be affected by chlamydia, while MSM living in urban and less deprived areas had higher overall STI incidence rates, multiple STI episodes and greater likelihood of HIV coinfection. Preventative strategies must consider these populations priority targets and take into account structural social determinants identified as crucial in our analysis. Our findings suggest that monitoring the STI epidemic in accordance with determinants of health and designing intervention programmes targeted at the local context would be of paramount importance rather than using national or regional prevalence as the key monitoring variable.

    Data availability statement

    All data relevant to the study are included in the article or uploaded as supplementary information.

    Ethics statements

    Patient consent for publication


    We thank all healthcare professionals working in STI/HIV surveillance, prevention and control in Catalonia who enable case detection, diagnosis and treatment, as well as the notification and information gathering for the epidemiological questionnaires. We thank Stefanie Chuah for her valuable support in editing the manuscript.


    Supplementary materials

    • Supplementary Data

      This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


    • Collaborators The Catalan HIV and STI Surveillance Group: A Sentís, E López, V Gonzalez, R Lugo, MP Bonamusa, J Reyes, J Casabona (Centre d’Estudis Epidemiològics sobre les Infeccions de Transmissió Sexual i Sida de Catalunya); P Garcia de Olalla, Lilas Mercuriali, E Masdeu, M Ros, C Rius (Servei d’Epidemiologia de l’Agència de Salut Pública de Barcelona); M Company, M Danés, N Camps (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública a Girona); RM Vileu, G Ferrús, N Borrell, S Minguell (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública a Tarragona); J Ferràs (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública a Terres de l’Ebre); I Parrón (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública al Barcelonès Nord i Maresme); I Mòdol, A Martinez, P Godoy (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública a Lleida); MA Tarrès, J Pérez, M Boldú, I Barrabeig (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública a Barcelona Sud); E Donate, L Clotet, MR Sala (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública al Vallès Occidental i Vallès Oriental); M Carol, V Guadalupe-Fernández (Servei de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública a Catalunya Central) and J Mendioroz, P Ciruela, G Carmona, R Mansilla, JL Martínez, S Hernández (Subdirecció General de Vigilància Epidemiològica i Resposta a Emergències de Salut Pública, Agència de Salut Pública de Catalunya).

    • Contributors AS conceptualised and designed the study. MM-F cleaned the database, MM-F, LE-C and YD performed the statistical and cluster analysis. AS, EL-C and DKN reviewed scientific literature. AS, JMR-U and JC drafted the manuscript and AS, EL-C, PGdO, LM, NB, JMR-U and JC interpreted the results. All authors critically reviewed the manuscript and approved the final version to be published. AS is the author acting as guarantor.

    • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

    • Map disclaimer The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

    • Competing interests None declared.

    • Provenance and peer review Not commissioned; externally peer reviewed.

    • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.