Article Text

Download PDFPDF

Development of a prediction model to aid primary care physicians in early identification of women at high risk of developing endometriosis: cross-sectional study
  1. Nina Julie Verket1,2,
  2. Ragnhild Sørum Falk3,
  3. Erik Qvigstad1,4,
  4. Tom Gunnar Tanbo1,5,
  5. Leiv Sandvik3
  1. 1 Institute of Clinical Medicine, University of Oslo, Oslo, Norway
  2. 2 Research Center for Obstetrics and Gynecology, Oslo University Hospital, Oslo, Norway
  3. 3 Oslo Center for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
  4. 4 Department of Gynecology, Oslo University Hospital, Oslo, Norway
  5. 5 Department of Reproductive Medicine, Oslo University Hospital, Oslo, Norway
  1. Correspondence to Dr Nina Julie Verket; ninaverket{at}gmail.com

Abstract

Objectives To identify predictors of disease among a few factors commonly associated with endometriosis and if successful, to combine these to develop a prediction model to aid primary care physicians in early identification of women at high risk of developing endometriosis.

Design Cross-sectional anonymous postal questionnaire study.

Setting Women aged 18–45 years recruited from the Norwegian Endometriosis Association and a random sample of women residing in Oslo, Norway.

Participants 157 women with and 156 women without endometriosis.

Main outcome measures Logistic and least absolute shrinkage and selection operator (LASSO) regression analyses were performed with endometriosis as dependent variable. Predictors were identified and combined to develop a prediction model. The predictive ability of the model was evaluated by calculating the area under the receiver operating characteristic curve (AUC) and positive predictive values (PPVs) and negative predictive values (NPVs). To take into account the likelihood of skewed representativeness of the patient sample towards high symptom burden, we considered the hypothetical prevalences of endometriosis in the general population 0.1%, 0.5%, 1% and 2%.

Results The predictors absenteeism from school due to dysmenorrhea and family history of endometriosis demonstrated the strongest association with disease. The model based on logistic regression (AUC 0.83) included these two predictors only, while the model based on LASSO regression (AUC 0.85) included two more: severe dysmenorrhea in adolescence and use of painkillers due to dysmenorrhea in adolescence. For the prevalences 0.1%, 0.5%, 1% and 2%, both models ascertained endometriosis with PPV equal to 2.0%, 9.4%, 17.2% and 29.6%, respectively. NPV was at least 98% for all values considered.

Conclusions External validation is needed before model implementation. Meanwhile, endometriosis should be considered a differential diagnosis in women with frequent absenteeism from school or work due to painful menstruations and positive family history of endometriosis.

  • primary care
  • gynaecology
  • endometriosis

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • The present study is the first to identify and combine predictors of endometriosis to develop a prediction model that may be used in primary care.

  • A randomly selected sample from the general population was used to recruit control subjects.

  • We did not have access to medical records.

  • Possible recall and selection bias cannot be excluded.

  • External validation is needed before model implementation.

Introduction

Endometriosis is a chronic inflammatory gynaecological disease with an estimated prevalence of ~5% among women of childbearing age.1 2 Tissue similar to the inner lining of the uterus in aberrant locations can cause pain, most frequently painful menstruations and painful intercourse, and infertility.3 Disease onset can be as early as adolescence, with disease persistence throughout reproductive age until a presumed burnout at menopause. Both disease expression and disease progression can vary markedly.2 There is no cure, and symptomatic treatment can vary from occasional use of over-the-counter painkillers to multiple extensive surgeries with adhesiolysis and organ resection or removal.4 Thus, the potential consequences of early-onset progressive endometriosis can be substantial and can last for multiple decades.5 6

Endometriosis is difficult to diagnose because painful menstruations, painful intercourse and infertility are common among too many without endometriosis. To date, the only way of diagnosing endometriosis is visual confirmation of abnormal patches of tissue during surgery.7 Thus, it is not surprising that for some it may take years before endometriosis is diagnosed, prolonging patient uncertainty and delaying treatment and care.8–10 It follows from the lack of diagnostic tools that the longest delay takes place in primary care.5 11

Screening tools are often developed for screening of general populations. However, in the field of endometriosis, screening tool development has been confined to women attending secondary and tertiary gynaecological surgical units or infertility clinics.12 13 Even if successful, screening tools developed from such studies would not be applicable in primary care due to the requirement of specialised examinations, such as ultrasound, MRI or surgery.14 In the present study, we used a control group from the general population. Our objectives were to identify predictors of disease among a few factors commonly associated with endometriosis and available to physicians through medical interview and, if successful, to combine these to develop and internally validate a prediction model to aid primary care physicians in early identification of women at high risk of developing endometriosis.

Participants and methods

Study design and data collection

Cross-sectional data collection was performed from 2012 to 2013. A postal questionnaire for anonymous reply was sent to women with endometriosis and a random sample of women from the general population.

Study populations

Women with endometriosis were recruited from the Norwegian Endometriosis Association. Inclusion criteria were 18–45 years of age and surgically confirmed diagnosis. In total, 162 of 375 women successfully completed and returned the questionnaire. Among these, five reported that their diagnosis had not been confirmed surgically and were excluded. Thus, 157 women with endometriosis were included, representing a response rate of 41.9% (online supplementary flow chart).

Supplemental material

Following approval from the Norwegian Tax Administration, the Norwegian Civil Registry provided names and addresses of a random sample of women aged 18–45 years living in Oslo, Norway. Inclusion criteria were 18–45 years of age and no known diagnosis of endometriosis. In total, 159 of 1050 women successfully completed and returned the questionnaire. Although the survey included a letter asking only women without endometriosis to participate, three women reported having endometriosis and were excluded. Thus, 156 women without endometriosis were included, representing a response rate of 14.9% (online supplementary flow chart).

Basic characteristics

Background information included age, height, weight and symptoms (dysmenorrhea, pelvic pain, dysuria, dyschezia, fatigue, nausea, irregular menstrual bleeding and irregular bowel movement) experienced at any time during the 4 weeks prior to answering the questionnaire. For participants with endometriosis, diagnostic delay was recorded as the year receiving diagnosis minus the year the participant started having symptoms. Disease duration was recorded as the year of data collection minus the year receiving diagnosis. Further, the questionnaire included a multiple choice question on organs/anatomical locations affected by endometriosis, and two open questions inviting free description of previous and present treatments.

Candidate predictors

The candidate predictors were chosen based on three criteria: (1) they had to be applicable to most, if not all, female adolescents; by this criterion, variables such as dyspareunia (according to surveys from 99 700 Norwegian high school students from 2016 to 2018, about half have had intercourse by the age of 18), ultrasound/MRI findings, surgical findings, infertility and previous pregnancies were excluded as candidate predictors15; (2) they had to be simple and comprehensible to young adolescents, without the need for supplementary explanation; by this criterion, variables such as pelvic pain (eg, we were not confident in adolescents’ ability to readily localise symptoms as from the pelvis) and the concept of cyclic versus non-cyclic symptoms were excluded; and (3) they had to be available from early stages of the disease and reasonably frequent; by this criterion, variables such as dysuria and dyschezia were excluded. The following candidate predictors (with the questions (Q) and answer (A) alternatives given in parentheses) were included in the final questionnaire:

  1. Age at menarche

    (Q: How old were you when you had your first period?)

  2. Severe dysmenorrhea in adolescence

    (Q: Did you have very painful periods as a teenager?)

    (A: never/rarely/sometimes/often/always)

  3. Absenteeism from school due to dysmenorrhea

    (Q: Did you have to be absent from school—junior high school/high school—because of painful periods?)

    (A: never/rarely/sometimes/often/always)

  4. Use of painkillers due to dysmenorrhea in adolescence

    (Q: Did you use painkillers for painful periods as a teenager?)

    (A: never/rarely/sometimes/often/always)

  5. Use of oral contraceptives due to dysmenorrhea in adolescence

    (Q: Did you use oral contraceptives because of painful periods as a teenager?)

    (A: yes/no)

  6. Family history of endometriosis

    (Q: Does anyone in your family have endometriosis?)

    (A: yes/no/irrelevant)

Statistical analysis

Data were presented as mean with SD for continuous variables and as frequencies with percentages for categorical variables. Continuous variables were compared using independent samples t-test. Categorical variables were compared using Pearson’s χ2 test. Ordered categorical variables were compared using linear-by-linear association χ2 test.

Development of risk indices

Two different approaches were used to develop two risk indices: Endometriosis Risk Index Variant 1 (ERI-1), based on logistic regression analysis, and Endometriosis Risk Index Variant 2 (ERI-2), based on least absolute shrinkage and selection operator (LASSO) regression analysis. Logistic regression analysis is one of the most frequently used methods to develop prediction models by selecting relevant predictors and combining them statistically into a multivariable model.16 However, logistic regression may overestimate performance. We therefore applied LASSO regression analysis, a penalisation procedure that performs both variable selection and regularisation, during model development, as recommended in the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist for developing and validating prediction models.16

In the regression analyses, age at menarche was included as a continuous variable. To increase test power, the ordered categorical variables severe dysmenorrhea in adolescence and absenteeism from school due to dysmenorrhea were included as continuous variables based on linearity of the beta coefficients, supporting the assumption of the categories (never/rarely/sometimes/often/always) being equally spaced. The ordered categorical variable use of painkillers due to dysmenorrhea in adolescence was recoded into three categories (never/rarely, sometimes and often/always) based on deviations from linearity of the beta coefficients. Use of oral contraceptives due to dysmenorrhea in adolescence was included as a dichotomous (yes/no) variable. The categorical variable family history of endometriosis was recoded into two categories (yes and no/irrelevant/missing) to be able to handle the real-life response category ‘irrelevant’ (eg, if adopted). Missing responses were also included in this dichotomous categorisation due to the likelihood of blank responses being comparable to participants simply not knowing. Participants with complete data for the candidate predictors according to the previous mentioned description were included in the analyses (154 cases and 145 controls). Further, a sensitivity analysis was performed, that is, a reanalysis with an alternative dichotomous categorisation (yes/no) for the categorical variable family history of endometriosis, excluding the responses irrelevant and missing (142 cases and 130 controls).

First, univariable and multivariable logistic regression analyses were performed to assess the relationship between the six candidate predictors and endometriosis. Backward stepwise variable selection was performed using p≤0.157 as criterion (corresponding to Akaike information criteria). The results were presented as beta coefficients and ORs with 95% CIs based on 1000 bootstrap samples. ERI-1 was based on the relative ratio between the beta coefficients. Second, LASSO regression analysis was performed with 10-fold cross-validation and 1000 bootstrap samples, as implemented in the R package mami. The results were presented as means of the LASSO regression coefficients with 95% CIs. ERI-2 was based on the relative ratios between the LASSO regression coefficients.

Internal validation

The predictive abilities of the two risk indices, ERI-1 and ERI-2, were described by area under the receiver operating characteristic curve (AUC). Sensitivity and specificity for different cut-off values of the risk indices were calculated, as well as positive predictive values (PPVs) and negative predictive values (NPVs). To take into account the likelihood of skewed representativeness of the patient sample towards high symptom burden,17 we considered the following hypothetical prevalences of endometriosis in the general population: 0.1%, 0.5%, 1% and 2%. Participants with complete data for the predictors included in ERI-1 and ERI-2 (155 cases and 148 controls) were included in the analyses.

A significance level of 5% was used if not otherwise stated. All analyses were performed with IBM SPSS Statistics V.22, STATA/SE V.15 and R V.3.5.

Patient and public involvement

A representative of the Norwegian Endometriosis Association assessed the readability and the respondent burden of the questionnaire prior to survey administration. Patients were not consulted to interpret the results. Patients were not invited to contribute to the writing or editing of this document for readability or accuracy.

Results

Basic characteristics of the participants

Basic characteristics of the participants are presented in tables 1–3. All 157 participants with endometriosis reported surgically confirmed diagnosis. Of these, 123 reported previous or present affection of one or both ovaries, bladder, vagina and/or bowels. To an open question inviting free description of previous treatment, 122 reported surgical treatment. Of these, 33 reported specific surgical procedures, including 18 hysterectomies, 12 oophorectomies (11 unilateral and 1 bilateral), 5 cystectomies of endometriomas and 7 partial colectomies.

Table 1

Recent characteristics of the participants

Table 2

Adolescent characteristics and family history of the participants

Table 3

Further characteristics of the endometriosis group

Candidate predictors

Responses to the candidate predictors are presented in table 2. Blank responses were described as missing. In the control group, six participants skipped an entire page of the questionnaire (including the candidate predictors), most likely by error, and therefore had blank responses for all candidate predictors.

Regarding family history of endometriosis in the endometriosis group, 42 participants reported positive family history; 102 reported negative family history; 5 answered irrelevant; and 8 did not answer at all (however, seven of these eight had written ‘I don’t know’ as a comment in the answer field). Of the 42 who reported positive family history, 41 specified nature of kinship (reporting one to three relatives each). Nineteen reported a mother, 13 a sister, 9 one or more aunts, 4 a grandmother, 3 a cousin, 2 a parent’s cousin, 1 a niece and 1 a great aunt. In total, 28 of 41 (68.3%) reported one or more first-degree relatives with endometriosis. In the control group, 7 participants reported positive family history; 126 reported negative family history; 8 answered irrelevant; and 15 did not answer at all. Of the seven who reported positive family history, six reported one or more sisters, one a mother and one a cousin. All seven reported one or more first-degree relatives with endometriosis.

Development of ERI-1 using logistic regression analysis

Based on univariable logistic regression analysis, use of painkillers due to dysmenorrhea in adolescence, family history of endometriosis, use of oral contraceptives due to dysmenorrhea in adolescence, absenteeism from school due to dysmenorrhea and severe dysmenorrhea in adolescence were the strongest predictors of endometriosis (table 4). Multivariable logistic regression analysis with backward stepwise variable selection procedure resulted in two predictors: absenteeism from school due to dysmenorrhea (A) and family history of endometriosis (F). Based on the relative ratio between the beta coefficients (A:F ratio was 1.1:2.3, rounded to 1:2), the following risk index was developed and assigned scores from 0 to 6:

Table 4

Logistic and LASSO regression analyses of candidate predictors of endometriosis

ERI-1=A+2F, where

  • A=absenteeism from school due to dysmenorrhea (never=0 points, rarely=1 point, sometimes=2 points, often=3 points, always=4 points)

  • F=family history of endometriosis (yes=1 point, not yes=0 points).

Development of ERI-2 using LASSO regression analysis

Based on LASSO regression analysis, four predictors were selected: severe dysmenorrhea in adolescence, absenteeism from school due to dysmenorrhea, use of painkillers due to dysmenorrhea in adolescence (the categories often or always) and family history of endometriosis (table 4). Based on the relative ratios between the means of the LASSO regression coefficients, the following risk index was developed and assigned scores from 0 to 44:

ERI-2=D+6A+2P+14F, where

  • D: severe dysmenorrhea in adolescence (never=0 points, rarely=1 point, sometimes=2 points, often=3 points, always=4 points).

  • A: absenteeism from school due to dysmenorrhea (never=0 points, rarely=1 point, sometimes=2 points, often=3 points, always=4 points).

  • P: use of painkillers due to dysmenorrhea in adolescence (never/rarely/sometimes=0 points, often/always=1 point).

  • F: family history of endometriosis (yes=1 point, not yes=0 points).

Logistic and LASSO regression analyses, including participants with complete data for the candidate predictors, who only responded ‘yes’ or ‘no’ to the candidate predictor ‘family history of endometriosis’ (142 cases and 130 controls), did not alter the findings (online supplementary table).

Supplemental material

Internal validation

The AUC was 0.83 and 0.85 for ERI-1 and ERI-2, respectively. Sensitivities and specificities for different cut-off values for ERI-1 and ERI-2 are presented in tables 5 and 6. Estimated specificities for ERI-1 with a cut-off of ≥5 (ERI-1≥5) and ERI-2 with a cut-off of ≥33 (ERI-2≥33) were 100%. As a true specificity of 100% is highly unlikely, we chose a value of 99.5% when calculating PPV for ERI-1≥5 and ERI-2≥33.

Table 5

PPVs and NPVs for ERI-1 (score range 0–6) with cut-off values 2, 3, 4 and 5 for different possible prevalences of endometriosis

Table 6

PPVs and NPVs for ERI-2 (score range 0–44) with cut-off values 12, 19, 26 and 33 for different possible prevalences of endometriosis

For each hypothetical prevalence, PPV and NPV were calculated for ERI-1 cut-off values of 2, 3, 4 and 5 (table 5) and for ERI-2 cut-off values of 12, 19, 26 and 33 (table 6). The highest cut-off value provided the highest PPV. For the prevalences of 0.1%, 0.5%, 1% and 2%, both prediction models ‘ERI-1≥5’ (score range 0–6) and ‘ERI-2≥33’ (score range 0–44) ascertained endometriosis with PPVs equal to 2.0%, 9.4%, 17.2% and 29.6%, respectively. For both indices, PPV was low for the cut-off value that provided the highest sensitivity. NPV was at least 98% for all values considered (tables 5 and 6). In the present dataset, 16 of 155 participants with endometriosis achieved ERI-1≥5 and ERI-2≥33. Among participants without endometriosis, the highest achieved ERI-1 and ERI-2 scores were 4 and 32, respectively.

Discussion

Statement of principal findings

In the present study, regression analysis was used to develop two endometriosis risk indices. The predictors absenteeism from school due to dysmenorrhea and family history of endometriosis demonstrated the strongest association with disease. ERI-1 included these two predictors only. ERI-2 included two more: severe dysmenorrhea in adolescence and use of painkillers due to dysmenorrhea in adolescence. These two predictors had the lowest weight among the predictors included in ERI-2. For the hypothetical prevalences of endometriosis in the general population of 0.1%, 0.5%, 1% and 2%, both prediction models ERI-1≥5 (score range 0–6) and ERI-2≥33 (score range 0–44) ascertained endometriosis with PPVs equal to 2.0%, 9.4%, 17.2% and 29.6%, respectively, and NPV was at least 98% for all values considered. Thus, no apparent additional value was observed for ERI-2 relative to ERI-1. However, this issue should be investigated in an external validation study. For the predictor family history of endometriosis, comments from participants suggest that ‘I don’t know’ should be included as a response category (in addition to ‘yes’, ‘no’ and ‘irrelevant’) in future studies.

Strengths and weaknesses of the study

A major strength of the present study is that it is the first to identify predictors of endometriosis which may be used in primary care. When developing prediction models, high PPV is preferable to high sensitivity and specificity. Thus, cut-off values for the risk indices providing the highest PPV were chosen. Depending on the prevalence, the prediction models may identify women at high risk of developing endometriosis with PPVs comparable to that of mammography screening, where PPVs close to 15% are common.18 However, a sensitivity close to 10% is lower than we would prefer. Still, our patient sample has previously been demonstrated to carry a high disease burden, with marked pain and low health-related quality of life, comparable to or worse than women with rheumatoid arthritis, but with the disease hitting them at a much younger age.17 Thus, we have a patient sample representing a subtype of endometriosis that would undoubtedly benefit from early diagnosis and treatment. Hence, a screening tool with a sensitivity of 10% seems much better than the alternative of no screening tool. Cut-offs giving a sensitivity and a specificity of ~80% provided an unacceptable PPV of ~3%.

Our study has several weaknesses. First, we did not have access to medical records. Thus, severity of endometriosis could not be assessed. A second weakness is that we cannot exclude the possibility of recall bias. Women with endometriosis may be more liable to recall symptoms suggestive of endometriosis experienced in adolescence compared with women without endometriosis. A third weakness is the low response rate from the general population, following an overall international trend of declining response rates to postal surveys.19 Thus, the control group may not be completely randomly selected even though random procedures were used for selection. However, the prevalences of absenteeism from school due to dysmenorrhea and family history of endometriosis in the control group in the present study were comparable to those found in a Finnish survey involving 1103 adolescent girls from the general population, in which 2.7% reported having a first degree relative with endometriosis, and 5% reported regular absenteeism from school or voluntary activities because of painful menstruation.20

Comparison with other studies

Previous studies on screening tool development have not included control groups from the general population and have not been intended for use in primary care settings, making comparisons of findings difficult.12 13 21–23 In general, reporting of pain, such as frequency of dysmenorrhea, is subject to substantial individual variation and is expected to be of limited predictive value. However, interference of pain with daily life, such as absenteeism from school due to dysmenorrhea, is less common and likely less subject to individual variation. The choice of the response options ‘never’, ‘rarely’, ‘sometimes’, ‘often’ and ‘always’ to the question on frequency of absenteeism from school, although seldom used in other studies, has most likely been suitable. Endometriosis has an estimated total heritability of about 50%.24 25 It is therefore not surprising that a positive family history of endometriosis is required for both prediction models to identify women at high risk of developing endometriosis.

The predictors identified in the current study are in line with a French study, however more so for advanced endometriosis than for endometriosis in general.26 In a cross-sectional study comparing adolescent markers among women with endometriosis, women with deeply infiltrating endometriosis were found to have a more positive family history of endometriosis (OR 3.2) and higher absenteeism from school during menstruation (OR 1.7) than women with superficial peritoneal endometriosis and/or ovarian endometriomas.26 In a genome-wide association study regarding heredity of endometriosis, moderate and severe endometrioses showed greater genetic burden than minimal or mild endometriosis.27 Thus, our models may be more predictive of advanced endometriosis than of endometriosis in general. The prevalence of deep endometriosis is assumed to be ~2%,2 28 which may be a bit overstated according to some prevalence studies.29–32 Thus, the chosen range of hypothetical prevalences in the present study seems appropriate.

Future research

More studies on screening tool development for endometriosis including control groups from the general population are needed. Register studies should be encouraged. However, newer candidate predictors such as absenteeism from school due to dysmenorrhea with suitable response options may not always be available. In view of the diversity of endometriosis, different subtypes may require different prediction models.

Conclusions and clinical implications

The developed prediction models need to be validated in future studies before use. Meanwhile, endometriosis should be considered a differential diagnosis in women with frequent absenteeism from school or work due to dysmenorrhea and positive family history of endometriosis.

Persevering or increasing interference of pain with daily life should prompt referral to secondary or tertiary care clinics experienced in handling endometriosis patients.

Dissemination declaration

We aim to disseminate the results in the Norwegian Endometriosis Association newsletter. If the prediction models are validated, primary care physicians will be informed through national health care and primary care physician websites. School nurses will be informed through school nurse networks, including presentation at the annual national school nurse conference.

Acknowledgments

We gratefully acknowledge the contribution of Karen Bertelsen of the Norwegian Endometriosis Association and the association itself.

References

Footnotes

  • Contributors Study concept and design: NJV, RSF and LS. Acquisition of data: NJV. Analysis and interpretation of data: NJV, RSF, EQ, TGT and LS. Drafting of manuscript: NJV, RSF and LS. The final manuscript was critically revised and approved by all authors.

  • Funding The present study was funded by the University of Oslo.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval The study was approved by the Regional Committee for Medical and Health Research Ethics, division south-eastern Norway (trial registration number: 2011/2213/Regional Committee for Medical and Health Research Ethics, division south-eastern Norway B).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement The data used in the present study is part of a larger dataset. Due to ongoing data analysis, the data used in the present study will not be available until all data analysis is completed. The corresponding author can be contacted for details.