Article Text

Original research
Symptom heterogeneity and patient subgroup classification among US patients with post-treatment Lyme disease: an observational study
  1. Alison W Rebman,
  2. Ting Yang,
  3. John N Aucott
  1. Lyme Disease Research Center, Division of Rheumatology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
  1. Correspondence to Dr John N Aucott; jaucott2{at}


Objectives To identify underlying subgroups with distinct symptom profiles, and to characterise and compare these subgroups across a range of demographic, clinical and psychosocial factors, within a heterogeneous group of patients with well-defined post-treatment Lyme disease (PTLD).

Design A clinical case series of patents.

Setting Participants were recruited from a single-site, Lyme disease referral clinic patient population and were evaluated by physical exam, clinical laboratory testing and standardised questionnaires.

Participants Two hundred and twelve participants met study criteria for PTLD, with medical record-confirmed prior Lyme disease as well as current symptoms and functional impact.

Results Exploratory factor analysis classified 30 self-reported symptoms into 6 factors: ‘Fatigue Cognitive’, ‘Ocular Disequilibrium’, ‘Infection-Type’, ‘Mood-Related’, ‘Musculoskeletal Pain’ and ‘Neurologic’. A final latent profile analysis was conducted using ‘Fatigue Cognitive’, ‘Musculoskeletal Pain’ and ‘Mood-Related’ factor-based scores, which produced three emergent symptom profiles, and participants were classified into corresponding subgroups with 59.0%, 18.9% and 22.2% of the sample, respectively. Compared with the other two groups, subgroup 1 had similarly low levels across all factors relative to the sample as a whole, and reported lower rates of disability (1.6% vs 10.0%, 12.8%; q=0.126, 0.035) and higher self-efficacy (median: 7.5 vs 6.0, 5.3; q=0.068,<0.001). Subgroup 2 had the highest ‘Musculoskeletal Pain’ factor-based scores (q≤0.001). Subgroup 3 was characterised overall by higher symptom factor-based scores, and reported higher depression (q≤0.001).

Conclusions This analysis identified six symptom factors and three potentially clinically relevant subgroups among patients with well-characterised PTLD. We found that these subgroups were differentiated not only by symptom phenotype, but also by a range of other factors. This may serve as an initial step towards engaging with the symptom heterogeneity that has long been observed among patients with this condition.

  • internal medicine
  • infectious diseases
  • primary care

Data availability statement

Data are available upon reasonable request. De-identified participant data are available upon reasonable request to the corresponding author.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • We operationalised a rigorous definition of post-treatment Lyme disease (PTLD) in our sample population, which ensured greater specificity of our findings to patients whose current illness is more evidently linked to prior Lyme disease.

  • This specificity, and the regional focus of our sample population, may limit generalisability to the larger population of patients with persistent symptoms following treatment for Lyme disease, or those from other regions of the USA.

  • Reproducibility of the subgroup analysis may be affected by necessary methodological decisions incorporating statistical and clinical criteria which were made during the analytic process.

  • We were able to draw on a relatively large sample size of participants with well-characterised PTLD, which allowed for clear and concise interpretability of data.


Lyme disease is a tick-borne disease of increasing public health importance found primarily across temperate regions of the Northern Hemisphere.1 2 Clinical signs of early infection may include a round, red, skin lesion occurring at the site of the bite of infected Ixodes ticks, and/or a transient, non-specific illness consisting of fever, fatigue, myalgia or arthralgia.1 3 If not promptly identified or otherwise left untreated, the bacteria (Borrelia burgdorferi in the USA) can disseminate to other areas of the skin, and via the blood stream to other organs such as the nervous system, heart and joints.4 Consequently, although less commonly observed, patients with untreated infection can present with objective, later manifestations of neurologic disease, carditis or arthritis.3

While the majority of patients treated appropriately for Lyme disease recover, a subset develop a poorly understood, chronic illness of persistent or recurrent symptoms following treatment.5 The presence of chronic or persistent symptoms following acute infection has been documented in a subset of patients for a number of viral and bacterial pathogens.6 Although more research is needed, the symptom phenotype of these illnesses, including that of the newly described ‘long COVID-19’ shares many overlapping characteristics.6 7 In order to methodically advance scientific understanding, a standardised, highly specific, research definition for post-treatment Lyme disease (PTLD, alternatively previously called post-treatment Lyme disease syndrome or PTLDS) has been used and operationalised to identify a subset of these patients with on-going symptoms linked temporally to strong evidence of prior exposure to B. burgdorferi.3 8 9 The most prominent symptoms, and those included in the Infectious Diseases Society of America’s (IDSA) proposed case definition of PTLD,3 include fatigue, musculoskeletal pain and cognitive dysfunction. However, patients with PTLD often also report a broad range of other neurologic, sleep, mood, ocular and other symptoms.8 10 11 This heterogeneity is often compounded by the significant impact of these symptoms on patient quality of life and functioning.8 12 Additionally, given the lack of: (a) a sensitive and specific test to aid diagnosis, (b) The United States Food and Drug Administration-approved treatment options for patients and (c) a known aetiology, PTLD presents a complex challenge to physicians.

As large studies among patients with well-characterised PTLD have not been conducted, this diversity in PTLD symptom reporting has not been comprehensively examined and it is unknown whether it may obscure the presence of distinct clinical patient subgroups. However, it is increasingly common that through advances in personalised medicine, diseases previously considered a single entity have been found instead to be comprised of clinically and/or biologically coherent subgroups.13 14 Furthermore, similar to fibromyalgia, PTLD is likely a complex, multifactorial illness with immunologic, microbiologic, genetic and/or psychosocial factors contributing to disease development, severity and persistence.5 15 Consequently, examining the heterogeneity of clinical presentations and symptom reporting that exists among patients with PTLD is important because it may inform a deeper understanding of aetiology and effective treatment approaches. Therefore, the aims of this observational study were (a) to identify underlying patient subgroups with distinct symptom profiles within a heterogeneous group of patients with well-defined PTLD and (b) to characterise and compare these subgroups across a range of demographic, clinical, laboratory and psychosocial factors.


Study participants

Participants were recruited from a referral-based clinic population. Detailed recruitment information and enrolment criteria for this study were included in an initial publication describing a subset of the larger sample of participants included in the current analysis.8 In brief, we replicated much of the criteria set forth in the IDSA’s proposed case definition for PTLD through our enrollment criteria.3 8 Participants were required to have prior evidence in their medical record of appropriately treated, Centers for Disease Control and Prevention (CDC)-definite or probable Lyme disease.16 They were also required to have current, functionally impairing fatigue, pain and/or cognitive dysfunction, and were excluded for a range of specific comorbid medical conditions, as previously described.8 For the current analysis, we did not limit the sample to those with greater than 6 month’s illness duration, and thus, we refer to our sample as meeting criteria for PTLD. The Institutional Review Board of the Johns Hopkins University School of Medicine approved this study, and written informed consent was obtained from all study participants.

Patient and public involvement

Patients and the public were not directly involved in the design, recruitment or assessment of this study.

Data collection instruments

Participants were asked to self-administer a 36-item post-Lyme questionnaire of symptoms (PLQS) developed based on prior clinical and research experience among patients with PTLD.8 Participants indicated both presence and severity over the past 2 weeks for each symptom (0=absent, 1=mild, 2=moderate or 3=severe). Of the original 36 symptoms, we excluded the following, which occurred with low frequency in our sample and were not considered to be core symptoms of PTLD (the per cent endorsed at a moderate or severe level): urination pattern change (9%), diarrhoea (8%), sore throat (5%), drooping eyelid(s) (2%), Bell’s palsy (1%) and tender lymph nodes (2%). Data from the remaining 30 symptoms provided the basis for the subgroup analyses described below (see online supplemental table 1 for the complete list of symptoms).

Participants were also asked to self-administer a battery of additional questionnaires included in the current analyses. The Beck Depression Inventory-II is a 21-item depression metric which can be divided into ‘Somatic’ and ‘Cognitive-Affective’ subscales.17 18 In order to avoid duplication with other variables in this analysis, only the ‘Cognitive-Affective’ subscale of the Beck Depression Inventory-II (BDI-C/A) was included, which has a total score of 0–48. Quality of life was measured by the Short-Form Health Survey, Version 2 (SF-36).19 This 36-item metric can be summarised into Physical and Mental Component Scores (PCS and MCS, respectively), with a higher score indicating higher quality of life. These scores can also be compared with the US population mean (50.0±10.0). The Life Events Checklist (LEC) is a 17-item measure with total scores of 0–17 of prior potentially traumatic events originally developed to aid in the diagnosis of post-traumatic stress disorder.20 The Stanford Chronic Disease Self-Efficacy Scale (CDSE) is a 6-item measure of perceived self-efficacy for chronic disease self-management.21 22 The Big Five Inventory (BFI) is a 44-item measure of five personality dimensions; extraversion, agreeableness, conscientiousness, emotional stability and openness.23–25 Variables related to prior, initial Lyme disease clinical presentation, treatment(s) and duration of illness were abstracted from participants’ medical records from the time of Lyme disease onset. Participants self-reported other prior medical diagnoses as part of a structured clinical interview.

During the study visit, a physical exam was performed which included routine measures of height, weight, pulse and blood pressure. Body mass index was calculated using the standard formula (weight (kg)/height (m2)). Vibratory index was measured on the distal interphalangeal joint of the index finger and on the interphalangeal joint of the hallux using a Rydel-Seiffer 64 Hz tuning fork.26 Lastly, participants underwent a blood draw, and standard clinical tests (complete blood count, comprehensive metabolic panel, C reactive protein and two-tier serology for antibodies to B. burgdorferi) were performed by a large, commercial laboratory.

Statistical analysis

We hypothesised that subcollections of symptoms are caused by different but interrelated underlying biological mechanisms, which are not directly observable in our study.

Therefore, we first performed exploratory factor analysis (EFA) to identify the latent relational structure of the symptoms included in the PLQS, which subsequently also reduced the dimensionality of the data. The Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett’s test of sphericity were used to check whether the data were suitable for factor analysis. Considering the ordinal nature of the variables, both polychoric and Pearson’s correlation coefficients were used. We chose the minimal residual estimation method because it can be used when the sample size is relatively small and when the correlation matrix is non-positive definite.27 Oblique rotation was used to allow for correlations between extracted factors. The number of retained factors was informed by the visual scree test and parallel analysis, while taking into consideration clinical meaningfulness and the balance between parsimony and comprehensiveness. We used a factor loading cut-off value of 0.3.

Next, to uncover subgroups of participants, we performed latent profile analysis (LPA) on the standardised symptom factor-based scores generated by the EFA. The number of identified clusters was determined based on minimisation of the Bayesian information criteria and the correlational structure of the data. Lastly, pairwise subgroup differences were examined and summarised using two-sample t test or Wilcoxon rank sum test for continuous variables and χ2 or Fisher’s exact test for categorical variables. Considering the accumulation of type 1 error across multiple hypothesis tests, we calculated q values to control false discovery rate at 5%.28 All statistical analyses were performed using R (V.3.6.1).


Participant characteristics

A total of 225 participants with PTLD were enrolled in the study. We excluded six participants whose PTLD symptoms began more than 6 months after their initial Lyme disease episode, and seven participants who missed all symptom variables on the PLQS, for a total of 212 in the final sample. We employed mean imputation for three participants who each missed 1 of the 30 PLQS variables included in the analysis. Table 1 shows a description of the final participant sample. The average age was 48 years and there was a slight (58.5%) majority male in the sample. A large majority were residents of Mid-Atlantic states at the time of their disease onset (93.4%) and/or residents of states considered ‘high-incidence’ for Lyme disease (96.7%).29

Table 1

Characteristics of 212 participants with well-defined post-treatment Lyme disease*

Latent relational structure among symptoms

The total symptom score among patients with PTLD ranged from 2 to 70, with a median and first and third quartile interval of 22 (14, 33). Histograms of individual symptom scores are presented in online supplemental figure 1. In the EFA analysis, the original polychoric correlation matrix was non-positive definite. After smoothing was performed to arrive at a positive definite matrix, it resulted in a poor overall sampling adequacy index (0.10) and an ultra-Heywood case was detected. However, the overall measure of sampling adequacy based on the Pearson’s correlation coefficient was 0.86 (meritorious), and Bartlett’s test of sphericity was significant (p<0.001). A six-factor model was suggested by both statistical criteria and clinical meaningfulness (figure 1, see online supplemental table 1 for the complete factor pattern matrix). The root mean square of the residuals was 0.04, the root mean square error of approximation index was 0.06 and the Tucker Lewis Index of factoring reliability was 0.85. The symptom headache did not significantly load to any factor (maximum loading: 0.22, online supplemental table 1). Poor coordination and lower back pain loaded weakly to multiple factors (maximum loading ≤0.33), and had close cross loading (difference less than 0.10) across two or more factors, and were therefore removed. The percent endorsed at a moderate or severe level for these symptoms was 34.9%, 15.6% and 35.8%, respectively. An expert physician on the study team (JNA) named the factors as ‘Fatigue Cognitive’, ‘Ocular Disequilibrium’, ‘Infection-Type’, ‘Mood-Related’, ‘Musculoskeletal Pain’ and ‘Neurologic’. All six factors were weakly or moderately correlated with each other (0.21–0.41), with the strongest correlation between the ‘Fatigue Cognitive’ and ‘Mood-Related’ factors. For a more straightforward interpretation, six factor-based scores were calculated for each participant by adding up the scores of the symptoms within each factor, and then these factor-based scores were standardised to have a mean of 0 and an SD of 1.

Figure 1

Exploratory factor analysis of 30 common post-treatment Lyme disease syndrome symptoms suggests a six-factor model. Three of the symptoms either did not load or loaded weakly and had close cross-loading, and they were not included in the final model.

Participant subgroup analysis

For the LPA analysis, we did not include the ‘Ocular Disequilibrium’ factor as it prevented the LPA from converging for most of the specified models in model selection, possibly due to its low endorsement rate (the percentage endorsing symptoms included in this factor at a moderate or severe level ranged from 0.9% to 24.1%). When conducted on the remaining five factors, LPA classified participants into two groups based on their overall level of symptom reporting (high vs low) relative to the sample as a whole.

We then conducted a secondary LPA incorporating those factors which contained only the most common PTLD-defining symptoms as well as mood (ie, ‘Fatigue Cognitive’, ‘Musculoskeletal Pain’ and ‘Mood-Related’). Three symptom profiles emerged (figure 2) and participants were classified into subgroups corresponding to these symptom profiles. Subgroup 1 contained 59.0% of the participants and was characterised by similarly low levels across all three factors relative to the sample as a whole. Subgroups 2 and 3 contained 18.9% and 22.2% of the participants, respectively, and were characterised by overall higher levels of the three factors relative to the entire sample. These results remained stable when the ‘Neurologic’ factor was reintroduced in the LPA.

Figure 2

Three subgroups of participants identified based on latent profile analysis (A,B).

Participant subgroup comparisons

We first compared the three subgroups generated by the LPA across all six original PLQS factor-based symptom scores (figure 3). Compared with subgroup 1, ‘Fatigue Cognitive’ and ‘Neurologic’ factor-based scores were significantly higher among both subgroups 2 and 3 participants. ‘Musculoskeletal Pain’ was the only factor to statistically significantly differentiate all three subgroups from one another, with scores in subgroup 1 being the lowest and subgroup 2 being the highest. ‘Infection-Type’ and ‘Ocular Disequilibrium’ factor scores trended in the direction of increasing from subgroups 1 to 3. Lastly, ‘Mood-Related’ factor scores were significantly higher among subgroup 3 participants compared with both subgroups 1 and 2, which did not differ significantly from each other.

Figure 3

Participant subgroup differences in median standardised symptom factor-based scores, depicted as a heat map. The higher the score, the higher the severity of reported symptoms within each factor.

The results of detailed demographic, clinical, laboratory and psychosocial characteristic comparisons by subgroup are presented in table 2. Notably, neither the percentage male (q≥0.887 for all pair-wise comparisons) nor LEC total score (q≥0.615 for all pair-wise comparisons) was statistically significantly different across subgroups. Participants in subgroup 1, which generally included those with lower symptom factor-based scores, also reported lower rates of being on disability than the other two groups and had higher CDSE scores. Subgroup 2 was found to have higher blood pressure, and a higher percentage of participants with an abnormal C reactive protein than subgroup 1.

Table 2

Participant subgroup comparisons across demographic, clinical laboratory and psychosocial characteristics*

Overall, participants in subgroup 3 were younger, with a lower percentage reporting an annual household income >US$100 000. This group was also found to have a median illness duration of almost a year longer than the other two groups, and a higher percentage who reported prior intravenous antibiotic treatment. Consistent with the pattern of symptom reporting in the factor-based PLQS scores, subgroup 3 had significantly worse BDI-C/A scores than the other two subgroups. On the BFI, subgroup 3 had significantly lower scores in the conscientiousness and emotional stability domains than the other two subgroups.

Those comorbid diagnoses occurring with at least 5% prevalence in the sample as a whole are also reported in table 2. No statistically significant differences were found for any of the conditions with the exception that participants in subgroup 3 were almost three times as likely as those in subgroup 1 to report migraine headaches. In examining differences by subgroup in SF-36 quality of life scores, we found that subgroup 2 had significantly lower PCS scores compared with the other two groups, whereas subgroup 3 had significant lower MCS scores compared with the other two groups (figure 4). This is consistent with the pattern of symptom reporting in the factor-based scores which differentiated the three groups.

Figure 4

Short-Form Health Survey-36 (SF-36) health-related quality-of-life physical and mental component scores19 for the three patient subgroups. ns=not significant; *p<0.05; **p<0.01; ***p<0.001; ****p<0.0001.


PTLD is a complex illness which is characterised by a wide range of clinical symptoms that can significantly impact quality of life for many patients.8 10–12 The aim of this study was to examine heterogeneity in symptom reporting in order to ultimately identify and characterise clinically relevant patient subgroups. Using our PLQS Questionnaire, we first identified six symptom-based factors through EFA analysis. The relational structure of these results had overall clinical face validity, with symptoms clustering in seemingly physiologically relevant rather than randomly distributed ways. For example, all three cognitive symptoms loaded onto the same factor, as did joint pain, muscle pain and joint swelling. Furthermore, the six factors we identified represent commonly recognised domains in the clinical phenotype of PTLD.

Although the analyses and the measure differed, results from our EFA were generally consistent with those from a recent study with some participant sample overlap, which aimed to validate the General Symptom Questionnaire-30 (GSQ-30) in PTLD.30 One noticeable difference was that fatigue loaded with the musculoskeletal pain factor in the GSQ-30 study rather than with cognitive symptoms, as it did in the current study. This suggests that fatigue in PTLD could arise from multiple sources including pain, the central nervous system, or muscle weakness. Similarly, insomnia may also be a multifactorial symptom, as it showed low loading (0.32) to the ‘Infection-Type’ factor in the current study, with significant cross-loading to the ‘Fatigue Cognitive’, ‘Musculoskeletal Pain’ and ‘Mood-Related’ factors.

Several additional symptom factor loadings were informative as well. Neck pain is relatively common in the general population,31 however it is reported with greater frequency and severity in this sample population compared with controls,8 and the cause is unknown. Given that neck pain loaded the strongest onto the ‘Neurologic’ factor, with the second strongest loading to ‘Fatigue Cognitive’ and not ‘Musculoskeletal Pain’, we hypothesise the potential for a neurologic rather than arthritic origin. We also found that difficulty breathing and heart palpitations loaded onto the ‘Mood-Related’ factor, implying that this constellation of symptoms may result from a common pathway such as autonomic nervous system activation or central sensitisation32 rather than specific cardiac or pulmonary pathology. Alternatively, anxiety and other mood-related symptoms could result secondary to experiencing these types of distressing physiologic symptoms. The hypothetical relational constructs we uncovered using EFA may shed light on, but not necessarily equate to, distinct biological mechanisms resulting in symptoms. Some symptoms may have a composite underlying mechanism, some may correlate with each another despite different mechanisms and some distinct factors could represent different subtypes of a shared general mechanism.

We then used a subset of the symptom-based factors in an LPA analysis to ultimately identify three patient subgroups corresponding to specific symptom profiles. This subgroup classification was prominently differentiated first by overall severity of symptom reporting, where high and low symptom reporters were identified. We plan to investigate factors associated with severity in the sample as a whole in future multivariate analyses. It is important to clarify that symptom severity in the current study is relative to this study sample of participants with PTLD and not the general population; we have previously shown a higher symptom burden in a subset of this sample of patients with PTLD compared with non-Lyme infected controls.8

Similar to our previous GSQ-30 study,30 we conclude that morbidity in this population can exist above and beyond the effects of mood-related symptoms. Indeed, in our EFA analysis an independent ‘Mood-Related’ factor was formed whose symptoms failed to load with other core symptoms of PTLD such as fatigue, pain and cognitive difficulty. This is also supported by the pattern of symptom factor-based score reporting in subgroup 2. This subgroup had the highest ‘Musculoskeletal Pain’ factor-based scores; however, their ‘Mood-Related’ factor-based scores remained relatively low, similar to those of subgroup 1. This pattern also suggests that mood-related symptoms in PTLD may be more likely to be associated with fatigue or cognitive symptoms than with pain. Moreover, although fatigue/cognitive, mood-related and pain symptoms all formed discrete factors in our analysis, ‘Mood-Related’ factor scores were more strongly correlated with ‘Fatigue Cognitive’ than they were with ‘Musculoskeletal Pain’ scores (0.41 vs 0.21, respectively).

We did define a subset of our sample (22.2%, subgroup 3) who overall reported significantly higher ‘Mood-Related’ factor-based scores relative both to the other two subgroups and to their other symptom factor-based scores. Comparing subgroups across a variety of domains suggests several possible explanations for this finding. First, despite being younger, participants in subgroup 3 had a longer illness duration, as abstracted from their medical record. We would hypothesise that the effects of a chronic, often functionally impairing illness on mood would both compound over time and be more pronounced among younger patients. Second, subgroup 3 also endorsed lower self-efficacy in managing their illness. This is unsurprising, as lower self-efficacy has been found to be associated with a higher degree of mood symptoms in a number of studies.33 34 Furthermore, participants in subgroup 3 also scored lower on the conscientiousness and emotional stability dimensions of the BFI, although additional research is warranted to explore the complex construct of personality among patients with PTLD. In sum, our findings suggest that participants in subgroup 3 may have been more psychologically vulnerable to the effects of a significant chronic illness over time when they first encountered Lyme disease. Indeed, many of the psychosocial variables that we measured have been shown to impact illness and resilience in other similar chronic disease populations.35–37

Finally, our data also suggest that participants with prior neurologic pathology may be over-represented in subgroup 3. Although the subgroup comparisons were not statistically significant, we observed that these participants had almost three times the rate of prior neurologic Lyme disease (cranial nerve palsy, neuropathy, meningitis or encephalitis), as abstracted from their medical record, compared with the other two groups. This is consistent with the higher rate of prior intravenous antibiotic treatment in this group as well. We also found that participants in subgroup 3 were significantly more likely to report a comorbid diagnosis of migraines. In post-hoc analyses, the diagnosis of migraine predated the Lyme disease onset for 57% of those in subgroup 3 with migraine. It is possible that pre-existing neurologic vulnerabilities, such as a history of migraine and/or frank neurologic Lyme disease, are associated with a post-treatment phenotype that encompasses an increase in mood-related symptoms.38 Although, per the IDSA case definition, we excluded participants with major psychiatric illness, Lyme disease has been associated with a range of neurologic and neuropsychiatric symptoms.39 Strikingly, although female gender40 41 and greater exposure to prior stressful life events42 have both been associated with higher mood symptoms in a number of studies, we did not observe that these participants were any more likely to report heightened mood-related symptoms when faced with similar physical symptom levels.

Our study does have limitations. We ensured greater specificity of our findings to patients whose current illness is more evidently linked to B. burgdorferi exposure by operationalising a narrow research definition of PTLD as eligibility criteria for inclusion into our sample. However, this specificity may also limit generalisability of our findings to a larger population of patients with persistent symptoms following treatment for Lyme disease, especially atypical early presentations not meeting CDC criteria. It is possible that different eligibility criteria, or different patient samples drawn from other regions of the USA, may have different results. Given the relatively high median household income of our sample, which may have resulted from the geographic location and specialty referral-based nature of our clinic, it will also be important to understand if our findings are generalisable across a broader income range. Furthermore, we relied on self-report symptom data for these analyses, which are subject to response bias as well as individual variation in perception of symptom severity.43

Finally, when applying EFA, Pearson’s correlation was used for data from a 4-point Likert scale, which does not satisfy the assumption of a multivariate normal distribution. A non-convergence issue prevented us from using the more appropriate polychoric correlation. This could lead to spurious multidimensionality and biased factor loadings.44 However, EFA conceptually met the needs of our research aim, and the results based on Pearson’s correlation matrix exhibited meritorious factorability and produced results with satisfactory performance measures. We also followed recommendations to improve our EFA for ordinal data,45 such as using parallel analysis-based methods for factor retention decision and oblique rotation method. In addition, the main structure of the EFA results is largely consistent with an exploratory symptom clustering analysis we conducted using Kendall’s Tau-b, which is non-parametric and is appropriate for ordinal variables.

Reproducibility of the subgroup analysis may be affected by necessary methodological decisions made during the analytic process, including: the scale of the data, the inclusion of a large number of symptoms in the analysis and the statistical and clinical criteria used during the model selection process. However, the approaches we employed were chosen to achieve as high a degree of theoretical soundness and feasibility as possible. These approaches, in conjunction with the relatively large sample of participants with PTLD that we were able to draw on for this analysis, allowed for clear and concise interpretability of data.

This analysis represents one of the first to identify and characterise potentially clinically relevant patient subgroups in PTLD. This is important as it may serve as an initial step towards engaging with the heterogeneity in symptom reporting that has long been observed among patients with this condition. Furthermore, in the future it may lead to more targeted interventions or other novel treatment approaches to address the varied and/or multiple factors which contribute to illness perpetuation in PTLD.

Data availability statement

Data are available upon reasonable request. De-identified participant data are available upon reasonable request to the corresponding author.

Ethics statements

Patient consent for publication


We gratefully acknowledge Dr Kristian Nitsch and Dr Pegah Touradji for their assistance reviewing the manuscript, and Cheryl Novak and Erica Mihm for their assistance conducting participant study visits.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • AWR and TY contributed equally.

  • Correction notice This article has been corrected since it first published. The provenance and peer review statement has been included.

  • Contributors AWR, TY and JNA all contributed to the conception and design of this study. TY and AWR conducted the data management and statistical analyses. AWR, TY and JNA drafted, revised and gave final approval of the manuscript for publication.

  • Funding This work was supported by a grant from the Steven and Alexandra Cohen Foundation (#122279). The funding organisation had no role in any of the following: design and conduct of the study; data collection, analysis or interpretation; preparation, review or approval of the manuscript; decision to submit for publication.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.