Objective Reliable semiquantitative assessment of histological placental acute inflammation is problematic, even among experts. Tissue samples in histology slides often show variability in the extent and location of neutrophil infiltrates. We sought to determine whether the variability in pathologists' scoring of neutrophil infiltrates in the placenta could be reduced by the use of ‘regions of interest’ (ROIs) that break the sample into smaller components.
Design ROIs were identified within stained H&E slides from a cohort of 56 women. ROIs were scored using a semiquantitative scale (0–4) for the average number of neutrophils by at least two independent raters.
Setting Preterm singleton births at Yale New Haven Hospital.
Participants This study used stained H&E placental slides from a cohort of 56 women with singleton pregnancies who had a clinically indicated amniocentesis within 24 hours of delivery.
Primary and secondary outcome measures Interrater agreement was assessed with the intraclass correlation coefficient (ICC) and log-linear regression. Predictive validity was assessed using amniotic fluid protein profile scores (neutrophil defensin-2, neutrophil defensin-1, calgranulin C and calgranulin A).
Results Excellent agreement by the ICC was found for the average neutrophil scores within a region of interest. Log-linear analyses suggest that even where there is disagreement, responses are positively associated along the diagonal. There was also strong evidence of predictive validity comparing pathologists' scores with amniotic fluid protein profile scores.
Conclusions Agreement among observers of semiquantitative neutrophil scoring through the use of digitised ROIs was demonstrated to be feasible with high reliability and validity.
- STATISTICS & RESEARCH METHODS
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
This study assessed reliability and validity of semiquantitative histology scoring for histological placental acute inflammation.
Agreement was assessed with using intraclass correlation coefficients as well as log-linear models.
Additional studies are needed to assess this methodology in placentas that are <20 weeks or >37 weeks gestation.
Methodology to combine individual scores from regions of interests into a single summary score has not been developed, thus limiting the utility of this method.
Intra-amniotic infection and inflammation are major risk factors for preterm birth as well as a contributor to the development of significant childhood diseases including cerebral palsy and upper and lower respiratory tract diseases.1–6 As such, its diagnosis needs to be reproducible in the same patient and across patients and institutions and valid (consistently predictive of important clinical features of infection including severity, duration and risk of sequelae, such as neonatal sepsis or longer term outcomes). Three methods to determine the presence and to quantify the degree of intra-amniotic infections have been used: (1) clinical assessment by the obstetrician7 (2) assay of the levels of inflammatory mediators (‘cytokines’) in maternal serum8 amniotic fluid8 ,9 or fetal umbilical cord blood8 and (3) placental histology.10 However, there are important differences among these methods. Clinical assessment has poor sensitivity and specificity,11 leading to missed cases that might benefit from treatment, as well as misclassification of healthy pregnancies. While perhaps more sensitive and specific than clinical assessment, cytokine measures of any source are technically measures of inflammation as opposed to infection. Importantly, inflammation can result from multiple pathways, one of which is infection.10 Assessment of histological placental acute inflammation depends on the assessment of neutrophil numbers in slides stained with H&E of samples of extraplacental membranes, chorionic plate and fetal chorionic vessels and umbilical cord.12 ,13 This method has been treated as the gold standard for placental assessment and uses a semiquantitative scoring system (eg, 0, 1, 2 and 3; 0, 1, 2, 3 and 4).13
Unfortunately, current diagnostic placental histology ‘gold standards’ are neither reliable nor have they been validated against biologically valid endpoints such as amniotic fluid or cord blood proteomics. A recent publication illuminates this issue. A panel of six reviewers was provided 20 histology slides, 14 of which had lesions related to acute infection. The slides were discussed, but a consensus scoring could not be obtained for six of the slides. These six slides were swapped out for six slides with similar lesions for which such consensus could be obtained. Then the same 20 slides were circulated among pathologists for independent scoring. Kappa values were acute chorioamnionitis/maternal inflammatory response (any 0.93; severe 0.76 and advanced stage 0.49); chronic (subacute) chorioamnionitis (0.25) and acute chorioamnionitis/fetal inflammatory response (any 0.90; severe 0.55 and advanced stage 0.52). By their own criteria, kappa values for anything other than a ‘present/absent’ code, any semiquantative score, had only fair–moderate agreement.14 Reliable determination of the presence and the quantity may assist in a more valid prediction of risk.15 In addition, accurate and reliable information regarding the severity of the fetal inflammatory response in the context of intra-amniotic infection may assist in newborn care. The disappointing reproducibility of semiquantitative histology scoring, even among experts, has compromised clinical usefulness and limited its value in research as well as clinical evaluation.
In this manuscript, we describe the results of an approach to pathologist assessment that markedly decreases variability in pathologist assessment, a first step in achieving improved semiquantative assessment in this field. In preparation for development of an automated algorithm to detect neutrophils using image analysis software, we first digitised slides and used image segmentation software to create smaller ‘regions of interest’ (ROIs) for application of the algorithm. In so doing, a single slide was subdivided into multiple ROIs. We then sought to estimate agreement between (human) raters when comparing ratings for a single ROI rather than a single rating for an entire ‘case’. Our results suggest that reliability can be improved by this simple step alone. As will be described in the Methods section, the criteria used to define an ROI may relate to the improvement in reliability. It also may be that agreement is enhanced when the field of view for the assessment is limited such as occurs with evaluation of a single ROI rather than evaluation of a full slide (comprised of multiple ROIs not circumscribed) regardless of criteria for the ROI selection. We describe in more detail below the method used, the cases and whether reliability appeared to vary with case or tissue characteristics.
All women presented with symptoms of preterm labour or preterm premature rupture of membranes. Eligible subjects for this substudy met the following criterion: singleton fetus at <37 weeks gestational age at delivery with a clinically indicated amniocentesis to rule out intra-amniotic infection performed within 24 hours of delivery. Exclusion criteria included anhydramnios, HIV or hepatitis infections and non-reassuring fetal status. Gestational age was established based on an ultrasonographic examination before 20 weeks of gestation.
Amniocentesis for microbiological studies and for evaluation of the inflammatory status of the amniotic fluid was offered routinely. Amniocentesis was performed using sterile technique and ultrasound guidance. Protein profiles in the amniotic fluid were used to detect a set of four protein ‘markers’ that were closely associated with inflammation in the amniotic fluid, and developed a score based on those proteins, which were termed the amniotic fluid mass restricted (AFMR) score. The AFMR score, was immediately generated from the fresh amniotic fluid using a single surface-enhanced laser desorption ionisation time-of-flight mass spectrometry instrument. The AFMR score provides qualitative information regarding intra-amniotic inflammation. The AFMR score ranges from 0 to 4, depending on the presence or absence of each of four protein biomarkers (neutrophil defensin-2, neutrophil defensin-1, calgranulin C and calgranulin A). A categorical value of 1 is assigned if a particular peak is present and 0 if absent.16 However, in the current investigation, we also stratified the study population based on the ‘severity’ of inflammation (AFMR=0 indicates ‘no’ inflammation; AFMR=1–2 indicates ‘minimal’ inflammation and AFMR=3–4 indicates ‘severe’ inflammation16 ,17).
In addition, histological evaluation of the placenta is performed routinely in women who deliver prematurely. In all cases, a membrane roll extending from the area of membrane rupture to the placental margin and samples of chorionic plate with at least two chorionic vessels per sample were taken. Sections of tissue blocks were stained with H&E and digitised using an Aperio XT slide digitiser (Aperio, Vista, California, USA). This data set was limited to those cases in which amniocentesis was performed within 24 hours of delivery in order to allow optimal correlation between the AFMR score and histopathology findings. From the 56 cases, digitised slide files were reviewed by a research associate trained to capture ROIs at 20x magnification in the requisite tissues. Criteria for selection of an ROI were specific to the tissue type and are listed below.
Maternal extraplacental membranes: viable (containing appropriately basophilic nuclei) areas that appeared to be cut perpendicular (non-tangential) to the membrane plane, which yielded ROIs with valid and consistent samples of decidua, chorion and amnion.
Chorionic plate, maternal side: regions of chorionic plate with subchorionic fibrin <50% of the width of the chorionic plate connective tissues and without chorionic vessels intervening between the maternal intervillous blood space and the chorionic plate surface.
Umbilical cord vessels: all umbilical vessels including the portion of each vessel lumen with the shortest distance between the lumen and the umbilical cord surface.
Chorionic plate vessel, fetal side: included the shortest distance from the chorionic surface to the endothelium of chorionic vessels in the chorionic plate.
From the 56 cases, we collected a total of 1591 ROIs. In order to ensure each tissue type was represented, we stratified the sample of ROIs by tissue type (maternal extraplacental membranes (n=713); chorionic plate, maternal side (n=124); chorionic plate, fetal side (n=109) and umbilical cord (n=645)). For each of these four placental tissue types (maternal extraplacental membranes; chorionic plate, maternal side; chorionic plate vessel, fetal side and umbilical cord), we selected all available ROIs or a random subsample (where the numbers were larger) to be assessed by at least two and in some cases all three pathologists. For chorionic plate and fetal chorionic plate vessels, all ROIs were selected. For maternal extraplacental membrane ROIs, we stratified the ROIs on the AFMR score (0, 1, 2, 3 and 4) and randomly selected 70% of each stratum for random assignment to two pathologists each. This stratification decreased the number of ROIs that had to be reviewed, but still ensured variation in inflammation. The same method was used for the umbilical cord vessel ROIs. Of the 1591 ROIs available, 1051 were included in this analysis. The distribution of the included ROIs is as follows: maternal extraplacental membranes (n=448); chorionic plate, maternal side (n=119); chorionic plate, fetal side (n=73) and umbilical cord (n=412).
The selected ROIs were randomly assigned to each of three pathologists who had never practiced together at the same institution with blinding as to the case of origin. Each pathologist was then also assigned half of each of the other two pathologist's ROIs. Therefore, for each ROI selected, there were at least two pathologists providing scores. In a small sample, all three pathologists provided scores as one of us (CS) reviewed additional ROIs randomly selected by the epidemiologist from those reviewed by the other two pathologists. In total, each pathologist reviewed ∼600–650 ROIs over a period of 4 weeks.
ROIs were scored using a semiquantitative scale for the average number of neutrophils in tissues. For example, neutrophils in the extraplacental amnion originate in the decidua, while fetal neutrophils in the Wharton's jelly outside the umbilical vessels originate in the vessel lumens. Distance migrated from the site of origin, as proxied by the specific tissues in which maternal or fetal neutrophils are identified, may be an independent reflection of infection duration.18 The scale for determining average number of neutrophils was very simply cast; ‘0’ indicated ‘no cells identified as neutrophils’, ‘4’ reflected ‘too many neutrophils to count’ and grades 1, 2 and 3 were left to the pathologist's judgement to partition into quartiles of neutrophil numbers. The ROIs were retained as .svs format, so that each ROI could be operated by ImageScope to move from 2 to 20x magnification across the ROI. Magnification and scanning were left to the judgement of each pathologist. Customised data entry screens were used for pathologists' entry of scores for each ROI so that errors from direct entry into spreadsheets would be eliminated. While the pathologists were given no special training for this scoring project, two (LE and CS) are board certified by the American Board of Pathology in paediatric pathology while the other (AC) is similarly board certified by the Royal Colleges of Pathology in the UK and Australia.
We used the intraclass correlation coefficient (ICC)19 ,20 to compare the semiquantitative scores by each pair, and in some cases, trio of pathologists on average. ICCs measure agreement between raters allowing for more than two raters and more than two categories of classification. Kappa statistics are a subset of ICC in which there are two raters and dichotomous classification and therefore are not applicable here. In addition, log-linear modelling was used to describe the pattern of agreement between each pair of raters. A series of increasingly complex log-linear regression models were fit to the data with the goal of identifying the best fitting model for each pair of raters' scores. Details about each model are described in detail elsewhere.21 ,22 In short, each model describes a different pattern of agreement (independence, diagonal, linear by linear, diagonal plus linear by linear, triangular and quasi-independence). The model of independence is the simplest model. A good fit of this model to the data suggests that the response of one pathologist was not related to the response of another pathologist. Diagonal agreement indicates exact agreement or agreement along the main diagonal. Linear by linear association indicates a positive association between two pathologists' scores. Diagonal plus linear by linear association indicates the presence of exact agreement (ie, agreement along the main diagonal) and a tendency for discordant respondent pairs to be positively associated. Other models include triangular agreement and quasi-independence. The model that best describes the type and amount of agreement present was selected by determining the best fitting and most parsimonious model.21 ,22 The deviance (G2) of each increasingly complex model was compared using the likelihood ratio test. If the likelihood ratio test was non-significant, (eg, a more complex model did not significantly improve model fit), the more parsimonious model was considered as best describing the pattern of agreement.
Figure 1 is a graphic representation of each pair of reviewers' scores for an ROI. The size of the circle is directly related to the number of ROIs represented by that circle. The larger the circle, the more ROIs that are represented. The largest circles are generally along the diagonal, where there is perfect agreement between a pair of reviewers or just off the diagonal where the pairs differ by a single number (eg, one reviewer gives a score of 2 and the second reviewer gives a score of 3). The ICCs for average score were computed overall as well as within strata by tissue type and gestational age category (table 1). For the average scores, the ICCs were excellent and appeared to vary little across most tissue types. The ICC for the chorionic plate vessel (fetal inflammatory response) was lower than for the other categories but was still good (ICC>0.75). The ICCs were also high for gestational ages of 20 or more weeks. There was a notable decrease in the ICC for the placentas at 20 weeks gestation or earlier (ICC=0.45).
In spite of the lower agreement for the few subgroups described above, log-linear models comparing the scores suggested that the average scores of each pair of raters were positively associated or had exact agreement between the raters along with positively associated responses for discordant pairs (table 2). Comparisons of the average scores suggested that there was linear by linear association between raters 1 and 2. When the scores from raters 1 and 3 as well as the scores from raters 2 and 3 were compared, we found diagonal agreement plus linear by linear association. In other words, there was exact agreement between the raters, but when there was disagreement, discordant pairs were still positively associated.
The distribution of the AFMR scores (reported by case but assigned to the relevant ROI for the case) by the ROI histology score are presented in figure 2. We compared the AFMR scores with each pathologist's average score for an ROI. Associations between each pathologist's scores and the AFMR scores were invariant. The χ2 test was highly significant for all three pathologists when compared with AFMR scores (p<0.0001) as was the test for a linear-by-linear association (p<0.0001). Approximately 20% of the ROIs and the AFMR scores were in perfect agreement. The majority of the ROIs, each scored in isolation and without knowledge of scores of other ROIs from the same case, were scored higher on the AFMR (measuring a global process) than the pathologist who necessarily in this study design scored the case piecemeal. Approximately 40% of all pairs were just one category higher (eg, 3 instead of 2).
This study found that by reducing the visual field to a smaller ROI, we can dramatically improve the inter-rater reliability of pathologists for a semiquantitative score of maternal and fetal neutrophil infiltrates. In addition, we demonstrated linear-by-linear agreement between the case AFMR scores and the pathologist's scores of individual ROIs. Together, these results suggest that the lack of reliability using traditional scoring methods, likely originates in the mental summation that is necessary when viewing an entire slide as opposed to an individual ROI.
The morbidity and mortality associated with fetal inflammation demand the development of measures that can provide reliable and precise semiquantitative histological measurement of the maternal and fetal inflammatory responses to acute intra-amniotic infection.23 ,24 The fetal inflammatory response, defined as elevated levels of inflammatory cytokines in cord blood and by vasculitis in the umbilical and chorionic vessels of the placenta, predicts recurrence risk for preterm birth25 ,26 as well as risks of death,27 cerebral palsy,4 childhood asthma and lung damage more generally.2 ,3 Cord blood cytokine levels are highly correlated with fetal neutrophil infiltration of umbilical and chorionic plate fetal vessels.28
We have identified a clinically feasible method to decrease variability in pathologist assessment. However, the present methodology is limited in that we and others have not yet determined the best method to combine ROI scores into a single summary score that is representative of clinically meaningful outcomes. As such, a next step will necessarily require the development of some sort of ‘weighted average’; our data suggest that one consideration in developing a ‘case score’ may be the number of ROIs with neutrophils, as well as the number and score of neutrophils in each ROI. Understanding the relationships between scores on the different tissues and different individuals (maternal vs fetal inflammation) and proteomics for the case will likely help us to refine the pathologists' assessments and future efforts to automate scoring with algorithms using the digitised data.
The utility of findings are also limited by the low ICC in placentas that are <20 weeks gestation. It is unclear why the ICC is lower for this group than later gestational ages. However, few placentas contributed to the total number of ROIs in this group. Finally, although more than 1000 ROIs were assessed, relatively few placentas were used in this study and all of them were preterm. As such, it is unclear how population variability contributed to the findings. Additional studies will be needed to examine the impact of later gestational ages (term births) on the ICC.
In summary, current histological tools show excellent reproducibility only when a ‘present/absent’ categorisation of the complex physiology of histological placental acute inflammation is used, even among ‘experts’. We have demonstrated that the reliability of semiquantitative scores of numbers of neutrophils in tissue infiltrates can be improved to an acceptable level by simply digitising the slide and limiting the field of view for the pathologist. We suggest that scoring variability is based in the natural heterogeneity of cells and tissue samples and the current requirement of scoring whereby the pathologist must mentally sum the whole slide, with all its variability, and provide a single score across one or several tissue samples. Furthermore, we report an association between pathologist assessment and proteomics scores that bodes well for future developments in measurement in this field. In future work, we will further examine the predictive validity of these semiquantitative scores (eg, analysing the patterns of agreement with proteomics using all ROI scores within a case) as well as refine image-processing algorithms that should provide reliable quantification for histological assessment of placental acute inflammation (ie, neutrophil number in placental tissue). Our goal is to take these tools and validate them against expert pathologists and more concrete and biologically meaningful endpoints germane to considerations of maternal, fetal–neonatal and childhood health.
Contributors DM and CS conceived and coordinated the study. JKS, DM and CS participated in the design of the study and drafted the manuscript. LE, CS and AC reviewed the slides and quantified the histological parameters of amniotic fluid infection. SV prepared the slides. DM, SG, GD and JKS conceptualised and performed the statistical analysis. IB, CB and CS contributed data, reagents and materials to the study. All authors read and approved the final manuscript.
Funding This work was supported by a Small Business Innovative Research (SBIR) grant, ‘Placental Pathology: Digital Assessment and Validation’ from National Institutes of Health (CS, grant number R43HD062307-01).
Disclaimer The funding source had no role in the writing of this manuscript.
Competing interests None declared.
Ethics approval This study used anonymised data and slides from 56 women recruited as part of a larger study between May 2004 and January 2007 during their presentation for delivery at Yale New Haven Hospital. That study was approved by the Yale University Institutional Review Board. The present study used only anonymised data and slides. No contact with study participants occurred. As such, the present study was exempt.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Data from this study were generated as part of SBIR R43HD062307-01, the PI of this grant may be contacted at email@example.com.