Article Text


Classification of osteoarthritis phenotypes by metabolomics analysis
  1. Weidong Zhang1,
  2. Sergei Likhodii2,
  3. Yuhua Zhang1,
  4. Erfan Aref-Eshghi1,
  5. Patricia E Harper1,
  6. Edward Randell2,
  7. Roger Green1,
  8. Glynn Martin3,
  9. Andrew Furey3,
  10. Guang Sun4,
  11. Proton Rahman4,
  12. Guangju Zhai1,5
  1. 1Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St John's, Newfoundland, Canada
  2. 2Department of Laboratory Medicine, Faculty of Medicine, Memorial University of Newfoundland, St John's, Newfoundland, Canada
  3. 3Department of Surgery, Faculty of Medicine, Memorial University of Newfoundland, St John's, Newfoundland, Canada
  4. 4Discipline of Medicine, Faculty of Medicine, Memorial University of Newfoundland, St John's, Newfoundland, Canada
  5. 5Department of Twin Research & Genetic Epidemiology, King's College London, London, UK
  1. Correspondence to Dr Guangju Zhai; guangju.zhai{at}


Objectives To identify metabolic markers that can classify patients with osteoarthritis (OA) into subgroups.

Design A case-only study design was utilised.

Participants Patients were recruited from those who underwent total knee or hip replacement surgery due to primary OA between November 2011 and December 2013 in St. Clare's Mercy Hospital and Health Science Centre General Hospital in St. John's, capital of Newfoundland and Labrador (NL), Canada. 38 men and 42 women were included in the study. The mean age was 65.2±8.7 years.

Outcome measures Synovial fluid samples were collected at the time of their joint surgeries. Metabolic profiling was performed on the synovial fluid samples by the targeted metabolomics approach, and various analytic methods were utilised to identify metabolic markers for classifying subgroups of patients with OA. Potential confounders such as age, sex, body mass index (BMI) and comorbidities were considered in the analysis.

Results Two distinct patient groups, A and B, were clearly identified in the 80 patients with OA. Patients in group A had a significantly higher concentration on 37 of 39 acylcarnitines, but the free carnitine was significantly lower in their synovial fluids than in those of patients in group B. The latter group was further subdivided into two subgroups, that is, B1 and B2. The corresponding metabolites that contributed to the grouping were 86 metabolites including 75 glycerophospholipids (6 lysophosphatidylcholines, 69 phosphatidylcholines), 9 sphingolipids, 1 biogenic amine and 1 acylcarnitine. The grouping was not associated with any known confounders including age, sex, BMI and comorbidities. The possible biological processes involved in these clusters are carnitine, lipid and collagen metabolism, respectively.

Conclusions The study demonstrated that OA consists of metabolically distinct subgroups. Identification of these distinct subgroups will help to unravel the pathogenesis and develop targeted therapies for OA.


This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • The strength of the study is the use of synovial fluid samples rather than plasma or urine samples. The metabolic profiling in synovial fluid reflects directly what is happening in a joint and yields the most accurate, real-time and joint-specific metabolic profile that is relevant to osteoarthritis (OA).

  • OA is a heterogeneous disease. This is the first study that demonstrated that OA consists of at least three metabolically distinct subgroups, which are likely due to the differences in carnitine, lipid and collagen metabolism.

  • The study was limited to only patients with OA, and we did not have synovial fluid samples from healthy people. A study with control synovial fluid samples is needed to confirm the findings.


Osteoarthritis (OA) is a heterogeneous disease with various pathogenic factors and consists of different phenotypes which continually evolve, eventually leading to common clinical and radiographic manifestations.1 Various classifications have been proposed depending on the main underlying pathophysiological mechanisms,2 clinically relevant patient characteristics,3 stage of disease,4 involved joints5 ,6 and degree of inflammation.7 However, most patients could not fit easily into these proposed OA subgroups2 due to the complexity of the influencing factors.

Recent studies have implied that OA is a metabolic disease linked to several components of the metabolic syndrome, such as hypertension, type-2 diabetes and dyslipidemia.8–10 Metabolites represent intermediate and end products of various cellular processes, whose levels can be regarded as a consequence of biological systems response to genotypic and environmental influences. Synovial fluid (SF) is an ultrafiltrate of plasma that also contains locally synthesised factors. Altered composition or concentrations of SF components are directly linked to OA.11 Using a metabolomics approach, we identified branched-chain amino acid to histidine ratio as a novel metabolic biomarker for knee OA.12 ,13 Other than biomarker identification, classification for a heterogeneous condition such as OA will require a reliable analytical method with good sensitivity and accuracy because the variation of the metabolite concentrations between phenotypes for a heterogeneous disease may be much narrower than that between people with and without the disease.

The Biocrates AbsoluteIDQ p180 kit is a commercially available product for targeted metabolomics which can simultaneously identify and quantify 186 metabolites from 5 different compound classes. The assay is performed using a combined ultra performance liquid chromatography (UPLC) and mass spectrometry-based flow injection analysis (FIA) method which has proven to be in conformance with the Food and Drug Administration (FDA) Guideline “Guidance for Industry—Bioanalytical Method Validation”.14 Compared with untargeted screening, the multiple reaction monitoring (MRM) mode was adopted in this method which can offer better sensitivity and quantitative accuracy for analysis. Our previous study12 has demonstrated it to be a reliable and sensitive method for metabolite detection. In the present study, we used the p180 kit to identify metabolic markers in SFs that can be used to classify patients with OA into distinct subgroups.

Patients and methods


The present study was part of the Newfoundland Osteoarthritis Study (NFOAS) that was initiated in 2011 and aimed at identifying novel genetic, epigenetic and biochemical markers for OA.15 Patients with OA were recruited from those who underwent total knee or hip replacement surgery due to primary OA between November 2011 and December 2013 in St. Clare's Mercy Hospital and Health Science Centre General Hospital in St. John's, the capital city of Newfoundland and Labrador (NL), Canada. The response rate was 90%. OA diagnosis was performed based on the American Rheumatology College's criteria and the judgement of the attending orthopaedic surgeon.

Demographic and comorbidities

Demographic and medical information was collected by a self-administered general questionnaire with the assistance of research staff if necessary. Age was calculated at the time of surgery. Comorbidities including eight metabolic-related diseases (heart disease, hypertension, high blood pressure in pregnancy, high cholesterol, diabetes, stroke, gout and osteoporosis) were confirmed by reviewing the electronic hospital records, which contain relevant clinical, laboratory and radiographic information. Height and weight measurements were obtained from the patient's hospital medical records. BMI was calculated as weight in kilograms divided by the squared height in metres.

Metabolic profiling

SFs were collected during the joint surgeries. Prior to knee arthrotomy/hip capulotomy, a syringe was inserted into the suprapatellar pouch of the knee/hip along the femoral neck, and 2–4 mL of the SF samples was aspirated. The samples were then put in vials and stored in liquid nitrogen until analysis. Metabolic profiling was performed by using the Waters XEVO TQ MS system (Waters Limited, Mississauga, Ontario, Canada) coupled with the Biocrates AbsoluteIDQ p180 kit, which measures 186 metabolites including 90 glycerophospholipids, 40 acylcarnitines (1 free carnitine), 21 amino acids, 19 biogenic amines, 15 sphingolipids and 1 hexose (>90% is glucose). The details of these 186 metabolites are listed in online supplementary table S1. The metabolic profiling method using this kit was described previously.12

Statistical methods

Data analyses encompassed hundreds of variables that are highly correlated. Dimension reduction was performed by multivariate methods that not only sought to capture changes of single metabolites between different groups, but also to utilise the dependency structures between the individual molecules. Principal component analysis (PCA), cluster analysis and partial least squares (PLS) regression, which are the most prominent multivariate analysis techniques applied in the research of metabolomics,16 were used in the analysis.

PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of principal components. The first principal component has the largest possible variance, and each succeeding component in turn has the highest variance possible.17 Hierarchical cluster analysis (HCA) is also an unsupervised multivariate technique and was used to18 provide a visual description of the evolution of the clusters. Identification of the characteristic metabolites with significance between clusters was performed using the PLS-Discriminant Analysis (PLS-DA) method implemented in SIMCA-P 11.5 (Umetrics AB, Umea, Sweden) software. In PLS-DA, the R2X, R2Y and Q2 (cum) parameters were used for the model evaluation. R2X is the percentage of all response variables explained by the model. R2Y is the percentage of all observation or sample variables explained by the model. Q2 is the percentage of all observation or sample variables predicted by the model. The importance of each metabolite in the PLS-DA was evaluated by variable importance in the projection (VIP) score. The VIP score positively reflects the metabolite's influence on the classification, and metabolites with a score greater than 1 were considered important in this study. Additionally, the Kruskal-Wallis test was executed using Multi Experiment View (V.4.9) software to determine the significant metabolites. The significance level was defined as p<0.01. A heatmap was made using Multi Experiment View (V.4.9) software to present a detailed description for each group. Discriminatory metabolites with these parameters are identified.

The above analyses were performed on concentrations obtained from the Absolute IDQ kit. Before analysis, raw data were filtered by the presence of metabolites in at least 80% of patients and all data were mean-centred and standardised.


A total of 38 men and 42 women were included in the study. The mean age was 65.2±8.7 years, and the mean BMI was 33.3±6.9 kg/m2. We had data on eight metabolic-related diseases including hypertension, dyslipidemia and diabetes that were previously reported to be associated with OA.10 The detailed descriptive statistics are presented in table 1.

Table 1

Descriptive statistics of the study population*

Over 90% of the potential metabolites (168/186) were successfully determined in each sample. These included 40 acylcarnitines (1 free carnitine), 20 amino acids, 9 biogenic amines, 87 glycerophospholipids, 11 sphingolipids and 1 hexose (>90% is glucose). Since there were vast differences in the absolute concentrations among different metabolites, we standardised the concentration by using the Z-score for comparability between different metabolites for their biological relevance and used them in subsequent analyses.

Figure 1 presents the PCA results. Eighty patients with OA were clearly clustered into two distinct groups, that is, cluster A and cluster B (including several sub-assembling groups). Cluster A including 11 patients mainly assembled in the first quadrant, while cluster B consists of 69 patients scattered along the X-axis. From the loading values, PC ae C40:1, PC ae C40:5, PC ae C36:1, PC ae C40:4 and PC ae C40:3 were the major contributors for component 1, whereas C12, C6:1, C3-OH, C3-DC (C4-OH), C3:1, C14:1 and C14 were the main contributors for component 2.

Figure 1

The result of the principal component analysis.

Using the HCA method, the patients of cluster B can be further classified into two subgroups, B1 and B2. It also appeared that group B1 could be divided into B1-1, B1-2-1 and B1-2-2 groups, and B2 could be subdivided into B2-1 and B2-2 groups, respectively (figure 2).

Figure 2

Hierarchical clustering analysis for group B (69 patients).

Identification of the metabolites with the most significantly different concentrations between groups was performed using the PLS-DA method. Data obtained from cluster A and cluster B were first used to construct a PLS-DA model, which showed the performance statistics of R2X=0.603, R2Y=0.983 and with a high prediction parameter Q2 (cum) of 0.98. The discriminant analysis showed that the concentrations of 37 acylcarnitines in cluster A were significantly higher than that in cluster B (13.5±14.0 fold for 34 of 37, 309±261 fold for 3 of 37, all p<0.01), while C0 (free carnitine, L-3-hydroxy-4-aminobutyrobetaine) and C2 (acetylcarnitine) were significantly lower in Cluster A than that in cluster B (2.7 fold for C2, 3.1 fold for C0, all p<0.01; figure 3). When the ratios of acylcarnitines to C0 was calculated, cluster A had far greater ratios than that of cluster B (p<0.01). However, the total carnitine (acyl+free) level was not statistically significant between group A and group B (p=0.75).

Figure 3

Heatmaps of significant metabolites for the separation of groups A and B.

Data were obtained from clusters B1 and B2 to construct a PLS-DA model with parameters: R2X=0.546, R2Y=0.854 and Q2 (cum) of 0.98. The discriminant analysis between clusters B1 and B2 was executed. A total of 86 metabolites including 75 glycerophospholipids (6 lysophosphatidylcholines, 69 phosphatidylcholines), 9 sphingolipids, 1 biogenic amine and 1 acylcarnitine were found to be significantly higher in cluster B2 than in cluster B1 by 1.58 fold, 1.83 fold, 1.72 fold, 1.19 fold and 1.82 fold, respectively (all p<0.01). However, there are two acylcarnitines, C3 and C5, with a slightly lower concentration in B2 than in B1.

Specific metabolites that were distinct between B1-1, B1-2-1, B1-2-2, B2-1 and B2-2 were obtained by the PLS-DA method with VIP>1 and p<0.01, and the results are shown in figure 4 and online supplementary table S2. The significant metabolites include 14 amino acids, 24 glycerophospholipids, 12 acylcarnitines and 1 sphingolipid, in which proline had the biggest VIP value. Group B1-2-2 samples have a higher concentration in 12 amino acids and 1 biogenic amine than in the other four groups. Group B2-1 samples have the highest level of two amino acids (glutamate and ornithine) and all glycerophospholipids except two lyso-PC types (lyso-PC 16:0 and 16:1). On the whole, group B1-2-1 and group B2-2 have a compromised concentration in all significant metabolites and cluster B1-1 has the lowest concentration in all metabolites.

Figure 4

Heatmaps of significant metabolites for the separation of cluster B1-1. B1-2-1, B1-2-2, B2-1 and B2-2, p<0.01.

We explored whether the groupings were associated with any of the potential covariates. For groups A and B, the mean ages were 67.2±5.8 years and 64.9±9 years, respectively, but the difference was not statistically significant (p=0.42). Women made up 55% of group A and 52% of group B. The average BMI in group A was 35.9±5.7 kg/m2, which was higher than that in group B (32.8±7.0 kg/m2), but the difference was not statistically significant (p=0.17). We had data on eight metabolic-related diseases (heart disease, hypertension, high blood pressure in pregnancy, high cholesterol, diabetes, stroke, gout and osteoporosis) on all the study participants. The more prevalent diseases in these patients with OA were hypertension (41.3%), high cholesterol (35%) and diabetes (16.3%). Although group A had a tendency of low prevalence of these metabolic-related diseases than group B, the differences were not statistically significant (all p>0.05).

For the groups B1 and B2, there was no significant difference in age and sex between groups B1 and B2, but BMI in group B1 was higher than that in group B2 (34.2±7.2 vs 29.6±5.3, 0.01<p<0.05).


To the best of our knowledge, this is the first study that used the metabolomics approach to classify patients with OA into different distinct subgroups. The strength of the study is the use of SF samples rather than plasma or urine samples. Our unpublished data suggested that the metabolite correlation between plasma and SF was only modest (an average correlation coefficient of 0.22). Metabolic profiling in SF reflects directly what is happening in a joint and yields the most accurate, real-time and joint-specific metabolic profile that is relevant to OA.19

On the basis of metabolic profiling of the SFs, we classified 80 patients with OA into two large groups, that is, group A and group B. This is largely due to the difference in concentrations of acylcarnitines. Fasting might be a factor explaining the difference; however, all patients fasted on their surgery dates. When using the ratio of the concentrations of acylcarnitines to free carnitine (C0), group A exclusively had greater ratios than did group B, indicating that two distinguished OA subphenotypes may be related to the carnitine metabolism pathway.

In humans, the major sources of camitine (C0) are de novo synthesis and the diet. In the process of synthesis of carnitine, glycine can be released as a by-product;20 however, there were no significant differences between both groups in the concentration of this metabolite. Carnitine acyltransferases are responsible for the production of acylcarnitines and the value of the acylcarnitines/carnitine ratio can reflect its activity.20 ,21 In the present study, the ratio of acylcarnitine to carnitine in group A is significantly higher than that of cluster B, which may indicate that the activity of acyltransferases is higher in patients of group A than group B. Carnitine and its acyl esters acylcarnitines are essential compounds for the metabolism of fatty acids. Carnitine can assist in the transport and metabolism of fatty acyl-CoA from the cytosol to the mitochondrial matrix, where the enzymes of β-oxidation are located and fatty acids are oxidised as a major source of energy. Inside the mitochondria, carnitine and acyl-CoA are regenerated, and the latter is catabolised in two-carbons units by β-oxidation, with production of acetyl-CoA in normal circumstances. Then the acetyl groups are converted to acetylcarnitine via the action of carnitine acetyltransferase for transport out of the mitochondria. In the patients of group A, both carnitine (C0) and acetylcarnitine (C2) are significantly lower than in patients in group B; however, the concentrations of other acylcarnitines are all significantly higher in group A than that in group B. This suggests that the activity of carnitine acetyltransferase is significantly lower in group B and there are differences in fatty acid metabolism between these two groups. Elevated acylcarnitine levels have been detected in obesity,22 type-2 diabetes,23 cardiovascular disease24 and encephalopathy.25

Group B can be classified into group B1 and B2 based on the HCA. The significant metabolites that contribute to these subgroups are mainly glycerophospholipids, sphingolipids and amino acids

The physiological significance regarding OA for these kinds of metabolites has been previously studied. Glycerophospholipids form the essential lipid bilayer of all biological membranes and are intimately involved in signal transduction, regulation of membrane trafficking and many other membrane-related phenomena.26 ,27 Studies by Hills28 indicated that alterations in phospholipid composition and concentrations are associated with the development of OA. Kosinska et al29 also found that in comparison with control SF, the levels of glycerophospholipids (Five phosphatidylglycerol and two lysophosphatidylglycerol species) were all elevated in late OA by 3.6-fold. Our results are consistent with their findings. The concentrations of 24 glycerophospholipids in patients of group B2 were all significantly higher than those in patients of group B1, especially for the PC types.

Sphingolipids are a class of lipids that include ceramide species, sphingomyelins (SMs) and more complex glycosphingolipids, which are an important part of SF. Sphingolipids are structural components of plasma membranes and bioactive molecules that have significant functions in proliferation and growth as well as differentiation, cellular signal transduction and apoptosis in many mammalian cells, for instance, fibroblast-like synoviocytes and neural cells.30–33 Studies by Marta et al29 suggest that SM species had risen approximately twofold in SF from early OA to late OA. In our study, the concentration of nine significant sphingolipids (6 SM, 3 SM(OH)) in group B2 was significantly higher than that in group B1 by about 1.7 fold.

In metabolic disorders, the knowledge of the concentration of one amino acid or related group of amino acids is essential for correct diagnosis. For example, the cellular energy metabolism accessed by amino acids profiling can be used for in-depth analysis of chronic fatigue syndrome.34 The branched-chain amino acids (BCAA), valine, isoleucine and leucine, are essential amino acids, accounting for 35% of the essential amino acids in muscle proteins. On the basis of our previous studies, we believe it could be possible that an increased concentration of the BCAA leads to an increased production of cytokines, which then leads to an increased rate of joint collagen degradation. An increased level of cytokines has been associated with OA.12 ,35 In the current study, some amino acids, including these three BCAA amino acids, were elevated in group B1-2-2, which could indicate that the patients with OA in this group were more likely concerned with collagen degradation. The proline, hydroxyproline and hydroxylysine levels are important indicators of connective tissue status.36 There were reports that proline incorporation into osteoarthritic cartilage was increased 4-fold as compared to normal cartilage.37

Age, sex, BMI and comorbidities could be potential confounders. However, we did not find the grouping based on metabolic profiling in the present study to be associated with any of these potential confounders. Nevertheless, patients in group B tend to have a higher prevalence of hypertension than patients in group A. Liu et al,38 using metabolomics analysis, showed significantly increased serum fatty acid levels in hypertension patients, which is consisted with our results.

There are some caveats. First, this is a case-only study and we do not have SF samples from healthy people. Although the metabolic map has presented a clear diversity character for patients with OA, metabolite concentrations in healthy people would provide a normal range so that we could distinguish which group has normal concentrations. Second, we do not have dietary and drug used information on the study participants, which might have an influence on metabolite concentrations, but the Newfoundland population is an isolated population characterised by relative environmental homogeneity and metabolite concentrations in SFs is less influenced by dietary intake. Third, we used a targeted metabolomics approach; thus, we might have missed important OA-associated metabolites which we were unable to measure. Lastly, our sample size was modest and a follow-up study with a large sample size is required to verify the findings.


This is the first study using a metabolomics approach to classify patients with OA and demonstrated that OA consists of metabolically distinct subgroups. While the findings need to be confirmed, the identification of these distinct subgroups will help to unravel the pathogenesis and develop targeted therapies for OA.


The authors thank all the study participants who made this study possible, and all the staff in the hospital operation theatres who helped us in the collection of samples.


View Abstract

Review history and Supplementary material

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors GZ conceptualised and designed the study, oversaw the sample collection, data analyses, and contributed to the drafting of the final manuscript; WZ contributed to metabolic profiling, data analyses and manuscript drafting; YZ contributed to the sample collection and data analyses; PEH, EA-E, GM and AF contributed to the sample collection and critical comments on the final manuscript; SL and ER contributed to the metabolic profiling, data analyses and critical comments on the final manuscript; RG, PR and GS contributed to the data collection and critical comments on the final manuscript. All authors approved the final manuscript as submitted.

  • Funding Canadian Institutes of Health Research (CIHR) (grant number RNL-132178); Newfoundland and Labrador RDC (grant number 5404.1423.102); Memorial University of Newfoundland (MUN); EA-E is supported by the Dean's PhD fellowship, Faculty of Medicine, MUN; PEH is supported by the CIHR CGS-Master's award.

  • Competing interests None.

  • Patient consent Obtained.

  • Ethics approval The study was approved by the Health Research Ethics Authority (HREA) of Newfoundland and Labrador and written consent was obtained from all the participants.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.