Article Text

Download PDFPDF

Original research
Retrospective analysis to describe trends in first-ever prostate-specific antigen (PSA) testing for primary healthcare facilities in the Gauteng Province, South Africa, between 2006 and 2016
  1. Naseem Cassim1,2,
  2. Timothy R Rebbeck3,
  3. Deborah K Glencross1,2,
  4. Jaya A George2,4
  1. 1 Faculty of Health Sciences, Department of Molecular Medicine and Haematology, University of the Witwatersrand, Johannesburg-Braamfontein, Gauteng, South Africa
  2. 2 Department of Molecular Medicine and Haematology, National Health Laboratory Service, Johannesburg, Gauteng, South Africa
  3. 3 Dana Farber Cancer Institute, Harvard TH Chan School of Public Health, Harvard University, Cambridge, Massachusetts, USA
  4. 4 Department of Chemical Pathology, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
  1. Correspondence to Dr Naseem Cassim; naseem.cassim{at}


Objectives The objective of our study was to use laboratory data to describe prostate-specific antigen (PSA) testing trends for primary healthcare (PHC) services from a single province. PHC is a basic package of services offered to local communities, serving as the first point of contact within the health system. These services are offered at clinics and community health centres (CHC), the latter providing additional maternity, accident and emergency services.

Design The retrospective descriptive study design was used.

Methods We analysed national laboratory data between 2006 and 2016 for men ≥30 years in the Gauteng Province. We used the probabilistic matching algorithm to create first-ever PSA cohort. We used the hot-deck imputation to assign missing race group values and the district health information system facility descriptors to identify PHC testing. We reported patient numbers by calendar year, age category and race group as well as descriptive statistics. We used multivariable logistic regression to assess any association for race group and age with a PSA ≥4 µg/L.

Results Between 2006 and 2016, numbers of men tested increased from 1782 to 67 025, respectively, with 186 984/239 506 (78.1%) tests were from clinics. The majority of testing was for men in the 50–59 age category (31.5%) and Black Africans (86.4%). We reported a median of 0.9 µg/L that increased with age. A PSA ≥4 µg/L was reported for 11.7% of men, increasing to 35.5% for the ≥70 age category. The logistic regression reported that the adjusted odds of having a PSA ≥4 µg/L was significantly lower for Indian/Asians, multiracials and whites than for Black Africans (p value<0.0001).

Conclusions Our study has shown a marked increase in PSA testing from clinics and CHC suggestive of screening for prostate cancer. The approaches reported in this study can be extended for national data.

  • prostate disease
  • chemical pathology
  • primary care

Data availability statement

No data are available. The authors do not have permission to share the data.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study reported data for 239 506 men presenting for a first-ever prostate-specific antigen (PSA) testing at primary healthcare facilities over an 11-year period.

  • This study demonstrated that the categorisation of men with a first-ever PSA result as low (<10 µg/L), intermediate (10–19.9 µg/L) and high (≥20 µg/L) risk can inform clinical practice.

  • This study describes presentation at the primary healthcare level in a developing African country.

  • Due to the absence of a national unique patient identifier, this study used the probabilistic matching algorithm to deduplicate patients.

  • Due to the paucity of race group data, the hot-deck imputation algorithm was used.


Prostate cancer (PCa) ranks as the second most frequent neoplasm and the fifth leading cause of cancer death in men globally.1 PCa was the most common male neoplasm in South Africa in 2012.2 The 2018 global burden of cancer report indicated that PCa incidence is highest in developed countries such as Australia, New Zealand and North America.1 PCa was reported to be the leading cause of death among black men, particularly in Sub-Saharan Africa (SSA) and the Caribbean.1 Globally, it is estimated that there will be 1.3 million new cases of PCa and about 359 000 deaths in 2018.1 A meta-analysis of PCa incidence rates across Africa by Adeloye et al reported an estimated pooled PCa incidence rate of 22.0 (95% CI 19.93 to 23.97) per 100 000 population.3 This study reported that the incidence rate is higher in South Africa with rates ranging from 47.7 to 67.8 per 100 000 population.3 In 2014, the local cancer registry reported a PCa incidence of 43.7 per 100 000 population, which is lower than estimates provided by Adeloye et al.3 4 This incidence is expected to increase with a concomitant improvement in life expectancy.

The high incidence in African populations may be due to a genetic predisposition to PCa specifically for race group and family history.5 6 The preponderance of PCa is much higher among men of African descent who also have advanced stage of disease at presentation.7

Several local studies conducted in urology settings have evaluated prostate-specific antigen (PSA) levels, specifically for PCa cases at hospitals and have shown that African men present late to hospitals with very high PSA levels.8–10 PSA is a glycoprotein having a close structural relationship to the glandular kallikreins, functioning as a serine proteinase.11 PSA levels may be elevated in benign prostatic hypertrophy as well as in PCa.5 Despite these limitations, PSA testing has improved the early detection of PCa for many countries.12 Schroder et al reported that PSA-based screening reduced PCa mortality by 20% but was associated with a high risk of over diagnosis.12 This requires that healthcare workers carefully identify patient cohorts that would benefit most from individualised early diagnosis, taking into account the potential benefits and harms involved.5 However, as an independent variable, PSA is a better predictor of PCa than both digital rectal examination (DRE) or transrectal ultrasound.5 PSA may also be used to describe risk, for example, the European Association of Urology guidelines defines risk groups for the biochemical recurrence of localised and locally advanced PCa with a PSA ≥20 µg/L classified as high risk.5

A local study which assessed PSA testing for Black Africans showed that cases presented with advanced disease as evidenced by median PSA of 98.8 µg/L compared with 9.1 ug/L for controls.9 Similar findings have been reported by other investigators.8 10 For example, a study from the KwaZulu-Natal Province reported that Black African men (n=81) diagnosed with PCa have a median PSA of 154 µg/L (IQR 39–448) at presentation.10 A similar study in the Western Cape Province (n=901) for patients diagnosed with PCa reported that the mean PSA was significantly higher for Black Africans (766.1 µg/L) compared with 673.3 µg/L and 196.1 µg/L for multiracials and whites, respectively.8 On the other hand, men presenting at primary healthcare (PHC) facilities have been shown to have much lower PSA levels.13

South Africa does not have a national screening policy and there is a paucity of data on PSA testing trends from PHC services, which act as a gatekeeper to higher levels of care.14 These services are offered by clinics and community health centres (CHC).15 Clinics offer a range of basic services for 8 hours compared with CHC, which, in addition, offer 24-hour maternity, accident and emergency services.15 The objective of our study was to describe first-ever PSA testing trends for PHC services in the Gauteng Province. These findings would contribute important information to a country with high rates of PCa incidence and mortality.


Study design

We used the retrospective descriptive study design with convenience sampling. Our inclusion criteria included tests requested by PHC facilities and a patient age ≥30 years in the Gauteng Province, South Africa.

Data extract

Data were extracted for the 2006 to 2016 period from national laboratory data, provided as password-protected data files.

Data preparation

The PSA data extract included the following variables: (1) episode number, (2) age, (3) race group, (4) facility description, (5) reviewed date of the PSA result, (6) unique patient identifier (UPI) (for the patient), (7) numeric PSA result and (8) non-numeric PSA result. The data were extracted from the corporate data warehouse (CDW), which houses all laboratory data for public-sector testing in South Africa.

Data were prepared using Microsoft Access and Excel (Redmond, Washington) and analysed using SAS V.9.4 (Cary, North Carolina).16 17 Age, gender and race group are captured from details provided on the laboratory request form by healthcare workers. We classified data into the following age categories: (1) 30–39, (2) 40–49, (3) 50–59, (4) 60–69, (5) ≥70 and (6) Unknown.

We converted PSA values both below and above the analytical range reported as text values to numbers, eg, <0.01 and >100.0 reported as 0.01 and 100.00 µg/L, respectively.11 We used the district health information system organisational hierarchy to identify testing by PHC facilities, which are the first point of contact between the population and the health system.14 These PHC services are offered by clinics and CHCs.18 Clinics offer a range of basic services for 8 hours compared with CHC, which, in addition, offer 24-hour maternity, accident and emergency services.18

Race group was identified as an important PCa risk factor with higher incidence and late presentation reported specifically for Black Africans.8–10 For our study, race group was poorly recorded (94.5% missing), we opted to use a locally developed hot-deck imputation method to assign missing data19 that is described in an unpublished study.

We used the Statistics South Africa 2011 Census reported race groups: (1) Indian/Asian, (2) Black African, (3) Coloured and (4) White.20 We used these race group descriptions, except for ‘Coloured’ that was described here as ‘Multiracial’.20 We used the CDW developed UPI that is generated by a probabilistic matching algorithm that includes fuzzy logic matching.21 This algorithm uses the patients first name, last name, date of birth, gender and hospital folder number for matching.22 We used this algorithm to generate a deduplicated PSA presentation (first-ever result) cohort. This cohort consists of patients who presented for care or are screened for PCa, using only the earliest recorded PSA result. Throughout this manuscript any reference to patients is for men. Bassett et al have shown that using the CDW UPI, they were able to identify a cohort of patients with HIV transferred from a hospital to PHC facilities with 90% accuracy.23

Statistical methods

We reported the number of men who had a first-ever PSA per year, reporting the percentage year on year increase. As lower test volumes were reported in 2012 and 2013, the calculation for 2014 was based on 2011 numbers. Annual test volumes were reported for clinics and CHC, indicating the annual PSA median and 75th percentile. We reported the number of patients tested by year for both age category and race group, with the fold increase reported between 2006 and 2016. The inset reported total volumes, percentage contribution as well as the fold increase. We used the Kruskal-Wallis rank test to assess whether there was a statistically significant difference in the PSA between 2006 and 2016, reporting the p value. We also ran the Dunn’s test for pairwise comparison by year. We reported the first-ever PSA median and IQR for age category and race group. The proportion of patients with a first-ever PSA ≥the 75th percentile value was reported by age category and race groups as well. We also reported the proportion of men with a first-ever PSA result in three risk categories stipulated by urological guidelines (low: <10 µg/L, intermediate: 10–19.9 µg/L and high: ≥20 µg/L) as well as for a PSA ≥4 µg/L.24 We used the Kruskal-Wallis rank test to assess whether there was a statistically significant difference in the PSA by age category and race group. We also ran the Dunn’s test for pairwise comparisons. We used multivariable logistic regression to determine the risk factors associated with a PSA ≥4 µg/L (binary dependent) controlling for age and race group as independent variables. With PSA testing not recommended below 40 years of age, we excluded the 30–39 age category for this analysis as PSA testing is not recommended by local guidelines.24 25 We reported the adjusted OR (aOR), p values and a 95% CI. A p value of less than 0.05 was used to indicate significant associations between the categorical variables of interest with PSA ≥4 µg/L.

Patient and public involvement

Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.


The data extract included 277 983 tests that were requested by PHC facilities in the Gauteng Province for men aged ≥30 years (figure 1). Following deduplication, there were 239 506 (86.2%) men with a first-ever PSA. The PSA data were positively skewed; hence, only the median and IQR are reported. There were 186 984 men who tested at clinics compared with 52 522 for CHC. Patient age had a skewness of 0.22, with data captured for 98% of men. There was a paucity of race group data, with values provided for only 1.7% of men. We populated all race values using hot-deck imputation, with 7.1% classified as unknown.

Figure 1

Flowchart depicting all the data steps to generate the data used to create the first-ever PSA cohort for primary health services between 2006 and 2016 in the Gauteng Province, South Africa. PSA, prostate-specific antigen.

Number of patients receiving first-ever PSA test

The annual number of men receiving a first-ever PSA increased from 1782 in 2006 to 67 025 by 2016. The percentage year on year change ranged from −5% to 134%. An unexplained decrease was noted for 2012 and 2013, with a percentage year on year change of −5% and −20%, respectively (figure 2). Patient numbers increased from 24 220 in 2014 to 67 025 by 2016.

Figure 2

Percentage year on year change reported as a bar chart for patients with a first-ever total prostate-specific antigen (PSA) test at primary healthcare services in the Gauteng Province, South Africa. The annual test volumes were reported as a line chart.

In 2006, the number of patients tested were similar for both clinics (n=924: 51.9%) and CHC (n=858: 48.1%). Between 2007 and 2016, clinic patient numbers increased from 2 316 (68.0%) to 53 939 (80.5%). In contrast, CHC patient numbers increased from 1 089 (32.0%) to 13 086 (18.7%) (figure 3). Between 2014 and 2015, clinic test volumes increased dramatically from 19 172 to 46 144 (2.4-fold), respectively. Similarly, CHC test volumes increased from 5 048 in 2014 to 10 588 by 2015 (2.1-fold).

Figure 3

Annual PSA test volumes by unit type for primary healthcare services. The annual PSA volumes are reported for clinics (green bars) and community health centres (blue bars) between 2006 and 2016 in the Gauteng Province, South Africa.

The annual median PSA decreased from 1.1 µg/L in 2006 to 0.9 µg/L in 2016. Similarly, the 75th percentile PSA decreased from 2.6 µg/L in 2006 to 1.7 µg/L in 2016. The Kruskal-Wallis rank test reported a p value of ≤0.005 for the first-ever PSA, indicating a statistically significant difference between 2006 and 2016. The pairwise comparison of first-ever PSA between 2006 and subsequent years report a p value of ≤0.005 from 2008 to 2016. In 2007, a p value of 0.1453 was reported when compared with 2006.

Number of patients receiving a first-ever PSA test by age category and race group

Across all 11 years, the majority of testing was performed for men aged 50–59 (31.5%), followed by the 40–49 (24.8%) and 60–69 (23.6%) age categories (figure 4A). We noted a 94.7-fold increase in patient numbers between 2006 and 2016 for the 40–49 age category (from 196 to 18 561). The 30–39 age category reported 78.0-fold increase, with patient numbers increasing from 72 in 2006 to 5 613 by 2016. For the 50–59 and 60–69 age categories, we noted a 35.8 and 27.3-fold increase, with patient numbers increasing from 561 to 20 094 and 537 to 14 656, respectively. The ≥70 age category reported a 20.8-fold increase.

Figure 4

Line chart reporting the number of patients with a first-ever total prostate-specific antigen (PSA) test by year and age category (A) and race group (B) for primary healthcare facilities between 2006 and 2016 in the Gauteng Province, South Africa.

We reported a 40.4-fold rise in patient numbers for Black Africans between 2006 and 2016 increasing from 1 301 in 2006 to 52 575 by 2016 (figure 4B). A 26.0, 22.0 and 19.5-fold increase was reported for Multiracials, Indian/Asians and Whites, respectively.

Descriptive statistics by age and race group

Our study reported a median PSA of 0.9 µg/L (IQR: 0.5–1.8). Median PSA increased with age group from 0.7 µg/L (IQR 0.4–1.0 µg/L) for the 30–39 age category to 2.3 µg/L (IQR 1.0–6.4 µg/L) for men seventy years and older. The median PSA ranged from 0.9 µg/L (IQR 0.5–1.8) for Black Africans to 1.0 µg/L (IQR 0.5–2.0) for Whites.

Overall, 11.7% of men reported a PSA ≥4 µg/L. This ranged from 1.7% to 35.5% for an age category of 30–39 and ≥70 age category. For Black African men, 11.6% reported a PSA ≥4 µg/L compared with 12.6% for Multiracials.

A first-ever PSA ≥75th percentile was reported for 25.2% of men. By age category, this ranged from 25.0% to 26.0%. Indian/Asians reported 21.8% with a first-ever PSA ≥75th percentile compared with 25.8% for Black Africans.

Our findings indicate that we classified 95.0%, 2.3% and 2.7% of samples as low, intermediate and high risk, respectively (table 1). A high-risk PSA result (≥20 µg/L) was reported for 0.1%, 0.2%, 1.2%, 4.2% and 10.5% of men for age categories 30–39, 40–49, 50–59, 60–69 and ≥70, respectively. A high-risk PSA result was reported for 1.6% of Black Africans and Indian/Asians compared with 2.7% and 2.4% for Multiracials and Whites, respectively. For Indian/Asians and Black Africans, 96.3% of men reported a low-risk PSA result (<10 µg/L) compared with 94.9% and 95.5% for Whites and Multiracials, respectively.

Table 1

First-ever total prostate-specific antigen (PSA) median and IQR reported for age category and race group between 2006 and 2016 in the Gauteng Province, South Africa

The Kruskal-Wallis rank test reported a p value of ≤0.05 for the first-ever PSA, indicating a statistically significant difference for both age category and race group (data not shown). The pairwise comparison of first-ever PSA revealed that when compared with men aged 30–39, a p value of ≤0.05 was for all the other age categories in the Dunn’s test for pairwise comparisons. When compared with Indians/Asians, a p value of ≤0.05 was reported for coloureds and white. Black African men reported a p value of 0.1527 when compared with Indians/Asians.

Association between an elevated PSA with age and race group

The multivariable logistic regression reported that the odds of a PSA ≥4 µg/L for Indian/Asians, Multiracials and Whites were significantly lower than that of Black Africans (table 2). We reported an aOR of 0.70 (CI 0.63 to 0.78) for Indian/Asians, aOR of 0.94 (CI 0.88 to 1.02) and 0.77 (CI 0.74 to 0.82) for Multiracials and Whites, respectively. We reported a significant association for Indian/Asians and Whites (p value <0.0001) and not for Multiracials. These results indicate that Indian/Asians, Multiracials and Whites are 30%, 6% and 23% less likely to have a PSA ≥4 µg/L.

Table 2

Multivariable logistic regression to assess the association between samples with a first-ever total prostate-specific antigen (PSA)≥4 µg/L (dependent binary variable) and race group as well as age category (independent variables) between 2006 and 2016 in the Gauteng Province, South Africa

The ≥70 age category was determined to have increased odds of a PSA ≥4 µg/L with an aOR of 23.90 (CI 22.45 to 25.45) when compared with the 40–49 age category. The 50–59 and 60–69 age categories reported an aOR of 3.41 (CI 3.20 to 3.64) and 9.73 (CI 9.15 to 10.35), respectively. Controlling for race, an age of 50 years and older reported a significantly higher risk of a PSA ≥4 µg/L.


The purpose of our study was to describe first-ever PSA testing trends for PHC services in the Gauteng Province between 2006 and 2016. We reported a gradual increase in patients receiving a first-ever PSA test between 2006 and 2013. Between 2014 and 2015, a striking increase in patient numbers was reported. Furthermore, we noted an increase in testing across all age groups with 33% of tests conducted in those below the age of 50 years, and almost 12% in those above 70. The median PSA for a first-ever test decreased over the study period.

While South Africa does not have a screening policy, the fact that there was a huge increase in patients being tested at PHC facilities suggests that some of the testing may have been for screening purposes, which is further supported by a low median PSA very similar for PCa screening for men without PCa reported by Capitanio et al.26

An unexplained decrease was noted for 2012 and 2013 followed by a dramatic increase in 2015. One possible explanation for the lower numbers in 2013 could be industrial action that affected the health sector specifically27 and involving thousands of nurses and other workers at the Charlotte Maxeke Johannesburg Academic Hospital and other health facilities in April 2013.27 The absence of nursing staff would dramatically impede healthcare service delivery.27

The increase between 2006 and 2011 could be due to the inclusion of the PSA test in the essential test list, public-sector campaigns, the introduction of clinicians at PHC facilities or other factors we are not aware of. However, the dramatic increase in 2015 is most likely due to the introduction of PHC public sector guidelines in 2014. For many years, it was debatable as to whether PSA screening did, in fact, reduce PCa mortality. The European Randomised Study of Screening for Prostate Cancer, clearly showed a reduction in PCa mortality with screening.12 On the other hand, screening can result in harm such as detection of cancers, which would otherwise never have been diagnosed during the life of the patient, with subsequent over treatment.12

Various clinical guidelines for PCa assessment have been released in South Africa. Public sector guidelines introduced in 2014 for PHC services state that PCa occurs in men >50 years and is most often asymptomatic.25 These guidelines for the first time recommended PSA testing for men over 50 years of age.25 At the same time, guidelines issued by the Urology Society of South Africa in 2013 recommended targeted screening.24 They recommend testing in men with a life expectancy of >10 years and suggest race and age-based cut-off values for PSA, with testing for Black African men from the age of 40 years.24 These are not dissimilar to European guidelines, which recommend individualised risk-adapted strategy for early detection of PCa to a well-informed man with a life expectancy of at least 10–15 years.5

The development of conflicting guidelines is seen across the globe and most likely arise as a result of three large prospective randomised controlled trials that published data on PCa screening.12 28 29 Andriole et al compared annual PCa screening versus the standard of the care group at 10 American study centres.28 Following 7–10 years of follow-up, the PCa mortality rate was very low and did not differ significantly between the two study groups.28 Another study randomised men in a 1:1 ratio, either a screening group invited for PSA testing every 2 years or to a control group that were not screened.29 This study showed that PCa mortality was reduced almost by half over 14 years in the intervention arm with the risk of over diagnosis. To complicate matters, a Cochrane review reported that PCa screening is not likely to result in a significant reduction in cancer-specific and overall mortality.30 In the USA, a definite reduction in PSA screening was noted following the release of the 2012 US Preventive Services Task Force recommendations against PSA screening in all men.31 This was followed by a decline in prostate biopsy and PCa incidence with a trend to higher grade and stage of cancer at diagnosis.32 It is important to note that none of these studies were conducted in Africa and hardly any African men were included. It is, therefore, necessary to conduct African studies that will enable the development of appropriate targeted screening or testing algorithms to identify those that would benefit the most.

Our finding of a significant increase in testing for the 30–39 and ≥70 age categories can be described as low value testing, which has been defined as use of a health service whose costs or harms exceed its benefits. Low value care has been shown to be a major driver of wasteful healthcare expenditure.33 Testing of elderly men without a history of PCa is not unique to our setting. It has been reported in the USA where the Choosing Wisely Campaign was initiated to reduce low value care.34

PCa is the leading cause of cancer death among men particularly in SSA.1 In spite of the increase in PSA testing across all ages, a local study shows that over the same period, the PCa age-standardised incidence rate has increased from 44.9 (2006) to 57.3 (2016) per 100 000 population.35 One of the challenges with PSA testing is limited sensitivity and specificity.36 37 Therefore, better methods are needed to detect men at high risk of aggressive disease. For example, biomarkers such as the prostate health index, the 4K score or multiparametric MRI may be used to enhance the specificity of PSA for PCa and reduce the number of men undergoing unnecessary biopsy.36 More Black African men were diagnosed with PCa than men from other racial groups and more Black African men presented with high-risk PCa compared with men from other racial groups.35 While we were not able to assess mortality due to PCa, it is of concern that Black men in particular present with more advanced disease. There is a clear need for education about when and who to test for PSA and how follow-up should be carried out. Clinical guidelines provide an efficient mechanism to introduce evidence-based clinical practices to offer a standard package of services to the entire population serviced by public health facilities.38 There are multiple HIV and tuberculosis examples that demonstrate that national guidelines are able to extend a package of services to the most remote communities.39 40 It is similarly necessary to provide clear guidelines for PSA testing in our setting.

Our findings of a low median PSA differ substantially from local urological studies.8–10 These studies offered PSA testing to men with a clinical suspicion of PCa, which could explain why they reported much higher median PSA values.8–10 Furthermore, our data suggest that in spite of increased testing for PSA, we are not picking up early PCa in men. The reasons for this are not known and should be investigated.

We have emphasised the value of laboratory data to describe PSA testing for PHC facilities across a province. It would be feasible to implement these approaches for national data across South Africa. We could use this data to link patients who require follow-up to a urology service. Such an approach has been used effectively locally for HIV.41 Laboratory data provide a ready resource to provide real-time PCa screening data to target healthcare interventions to reduce PCa incidence. To enrich PCa screening, we would also need the DRE results captured electronically.


Our study used laboratory data to describe PSA testing by PHC facilities. Due to the absence of a national UPI, we have used the CDW probabilistic matching algorithm to deduplicate patients.23 Deduplication may be suboptimal for patients with incomplete demographic information. A local study demonstrated the validity of the National Health Laboratory Service (NHLS) CDW UPI by matching patient identifiers for cluster of differentiation 4 and viral load testing to the local hospital cohort, returning records for 89.6% of the patients.42 Similarly, the CDW UPI was compared with the Road to Health booklet identifier, reporting an overall accuracy of 87.7%.43 The CDW UPI has also been used to develop a longitudinal HIV cohort, reported by multiple studies.44 45 Due to the paucity of race group, we used the hot-deck imputation algorithm. Hot-deck imputation is a way of handling missing data that are replaced with an observed response from a ‘similar’ unit and have been applied in epidemiologic, medical and survey settings.46–48 Well-populated cancer registry data with patient-reported race grouping were used to construct the imputation reference panel.49 This imputation reference panel was collected at the South African National Cancer Registry (NCR) from passive pathology-based submissions by both public and private laboratory service providers in South Africa from 1998 to date.50 The panel is administered and curated by staff at the NCR and contains entries with the race group populated for 1 566 779 patients.51 The algorithm assigns the most prevalent race group for a given surname for missing values.52 Hot-deck imputation is mechanism, whereby missing values are replaced by an observed value (the imputation reference panel (the imputation reference panel) from a similar unit.46 The hot-deck algorithm identifies missing race group values and then uses the surname to assign missing values from the imputation reference panel.52 In an unpublished local study, Chen et al used a hold-out test to compare imputed versus cancer registry patient-reported race group values for 406 642 unique surname-race group pairings and reported a 94.3% agreement.49 Our racial distribution does not match population estimates (Census 2011).53 In particular, Whites and Asian/Indians appear to be under-represented.53 This could be explained by relative differences in state practice and indigent patients versus those patients with private medical aid coverage in South Africa; among patients with medical aid coverage, there are a disproportionately higher representation of White and Indian/Asian men.54 The absence of private sector laboratory data could explain the under representation (there is no inclusive national database and there is no access to this data by the NHLS).

Data availability statement

No data are available. The authors do not have permission to share the data.

Ethics statements

Patient consent for publication

Ethics approval

The study obtained ethical clearance from the University of the Witwatersrand (M170419). The study did not use any patient identifiers.


The authors thank the National Cancer Registry (NCR) and Academic Affairs, Research and Quality Assurance (AARQA) department for their assistance. We would also like to thank Nigel Crowther for reviewing the manuscript and Innocent Maposa for statistical advice. This work was supported in part by U01-CA184374 to TRR.



  • Contributors NC: made substantial contributions to the conception or design of the work, acquisition of laboratory data, data analysis, drafting the work and revising it critically for important intellectual content. TRR: made substantial contributions to the conceptualisation of the study design, interpretation of data and revising the work critically for important intellectual content. DKG and JAG: contributed to the drafting and revising of the work critically for important intellectual content as well as overall supervision. JAG was responsible for the overall content as the guarantor. All authors read and approved the final manuscript.

  • Funding This work was supported in part by a grant from the US National Cancer Institute to TRR (U01-CA184374). The funders played no role in the development of this manuscript.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.