Cross-sectional study evaluating data quality of the National Cancer Registration and Analysis Service (NCRAS) prostate cancer registry data using the Cluster randomised trial of PSA testing for Prostate cancer (CAP)

Objectives To compare the completeness and agreement of prostate cancer data recorded by the National Cancer Registration and Analysis Service (NCRAS) with research-level data specifically abstracted from medical records from the Cluster randomised triAl of prostate specific antigen (PSA) testing for Prostate cancer (CAP) trial. Design Cross-sectional comparison study. Participants We included 1356 men from the CAP trial cohort who were linked to the NCRAS registry. Primary and secondary outcome measures Completeness of prostate cancer data in NCRAS and CAP and agreement for tumour, node, metastases (TNM) stage (T1/T2; T3; T4/N1/M1) and Gleason grade (4–6; 7; 8–10), measured by differences in proportions and Cohen’s kappa statistic. Data were also stratified by year and pre-2010 versus post-2010, when NCRAS reporting standards changed. Results Compared with CAP, completeness was lower in NCRAS for Gleason grade (41.2% vs 76.7%, difference 35.5, 95% CI 32.1 to 39.0) and TNM stage (29.9% vs 67.6%, difference 37.6, 95% CI 34.1 to 41.1). NCRAS completeness for Gleason grade (pre-2010 vs post-2010 31.69% vs 64%; difference 32.31, 95% CI 26.76 to 37.87) and TNM stage (19.31% vs 55.50%; difference 36.19, 95% CI 30.72 to 41.67) improved over time. Agreement for Gleason grade was high (Cohen’s kappa, κ=0.90, 95% CI 0.88 to 0.93), but lower for TNM stage (κ=0.41, 95% CI 0.37 to 0.51) overall. There was a trend towards improved agreement on Gleason grade, but not TNM stage, when comparing pre-2010 and post-2010 data. Conclusion NCRAS case identification was very high; however, data on prostate cancer grade was less complete than CAP, and agreement for TNM stage was modest. Although the completeness of NCRAS data has improved since 2010, the higher completeness rate in CAP demonstrates that gains could potentially be achieved in routine registry data. This study’s findings highlight a need for improved recording of stage and grade data in the source medical records.

are older than the average cancer patient and that prostate cancer is not always diagnosed pathologically as some patients are kept under review while PSA levels are monitored and surgery is delayed. This may have affected the levels of completeness in the NCRAS.
The tables need better labelling re the numbers. In the discussion mention is made of prostate cancer registries however as far as I am aware there are just population based cancer registries which include prostate.

GENERAL COMMENTS
The reviewer completed the checklist but made no further comments.

Markus Aly
Karolinska Institutet Department of Molecular Medicine and Surgery, Sweden REVIEW RETURNED 16-Apr-2017

GENERAL COMMENTS
Thank you for letting me review the manuscript entitled " Agreement and completeness of prostate cancer data comparting the National Cancer Registration and Analaysis Service(NCRAS) and the Cluster randomised triAl of PSA testing for Prostate cancer(CAP)". It is interesting to compare research data with clinical registry data. And I think this manuscript could be used as a leverage in order to get more funding to update government funded clinical registries, which is very important in countries where a fair part of the health care is funded by taxes.

General comments
The manuscript is well written and has a good structure.
Minor comments Introduction, page 5 row20-30 Is there any information on how complete the NCRS and the NCIN registries are? Maybe it could be examplified by data on how complete they are.

Materials and Methods
Participants selection, page 6 row 14-40 It is not clear as to why the authors focus on the men who had died due to prostate cancer. In this type of study I assume the data should be easily retracted also for men alive and could therefore be compared as well. I suggest a short comment to clarify this.
Data, page 7 row 6-10 It is not clear whether nor not is is the authors who combine NCRAS clinial TNM stage data with pathological stage data in order to produce `best`stage data. I am also a little bit sceptical to this, as such registry based data may be used to do prediction models later on and then the best stage data is a combination of the information you have at time of diagnosis and the information you have after treatment, which then may differ from men undergoing surgery and men undergoing radiation therapy.
Results, page 8, row 10-15) The results for complete information on Gleason grade is surprisingly low in the NCRAS data. I would like a comment on that in the discussion of the manuscriptit is difficult for policymakers to draw any conclusions on trends in prostate cancer when completeness is low.
Tables, page 10, to make table 1 easier to read, is it possible to greyshade everyother row? Otherwise I have no comments.

Discussion
Adequate discussion, however I think the fact that there is fairly low coveregae in both registers could be more stressed.
However I think the reference used on page 14 row 5 is wrongthe article referenced does not mention this controversy, at least not to my knowledge. This report is not a study comparing the contents of the accumulated data but a report simply comparing the accumulation situation. Collecting TNM and Gleason score when conducting clinical research on prostate cancer is fundamentally important. Therefore the research itself to compare the collection situation does not make sense. For CAP trial in particular, it is necessary to reconsider the data collection method and accuracy so that it can be applied as more useful clinical data for research purpose.

VERSION 1 -AUTHOR RESPONSE
***************************** Reviewer(s)' Comments to Author: ***************************** Reviewer: 1 Dr Anna Gavin N. Ireland Cancer Registry, Centre for Public Health, Queens University Belfast, N. Ireland Please state any competing interests or state 'None declared': none declared ***************************** Please leave your comments for the authors below This is an important area for research as the national cancer registry is used for service planning and monitoring, for evaluation of public health initiatives, for quality assessment of cancer screening programs, to provide information essential for genetic counselling services and also for the examination of possible cancer clusters. Accuracy and full data completeness of data are essential for the completion of these tasks. The method of comparison with an independently collected dataset is a recognised gold standard of data examination. The researchers treated each source in the same way looking at data items within a six month window. The researchers focus on stage and grade data items and report levels of completeness from both sources noting improvements over time in the data collected by the National Cancer Registration Service ( NCRAS). This was not reflected in the title as from the title I expected more data items to be compared.
Thank you for your kind feedback. The manuscript title has been amended to meet the journal guidelines.
The discussion transiently mentions the completeness of other data items such as date of diagnosis; however these are not documented in the results.
Please see the 4th paragraph of the Results section (Line 30-34) for data on completeness of date of diagnosis I feel that a table to include how many patients in the CAP trial were in the NCRAS and the accuracy of data items such as date of diagnosis within 7 days, NHS number, etc would be useful if possible and would better reflect the completeness of the NCRAS in patient notification which is their primary role.
The authors did discuss whether to have a separate table for demographic data. It was decided to only include the data at the start of the Results section (see Lines 1-16).
The figure indicates that 2111 patients were eligible for note review however, this was completed on 1,356. The reasons for this reduction should be explained.
At the time this analysis was performed, there were 1,356 CAP participants who had complete medical record review data available for use in this study. We did not have enough data on the remaining 755 CAP participants. This was outlined in Lines 26-43 of the Methods section, to compliment Figure 1.
In the discussion note should be taken that prostate cancer patients are older than the average cancer patient and that prostate cancer is not always diagnosed pathologically as some patients are kept under review while PSA levels are monitored and surgery is delayed. This may have affected the levels of completeness in the NCRAS.
The tables need better labelling re the numbers.
All tables have been amended to reflect that whether the numbers are raw counts (n), proportions (%) or Kappa statistic (k) In the discussion mention is made of prostate cancer registries however as far as I am aware there are just population based cancer registries which include prostate.
Thank you for highlighting this. The Discussion section has been adjusted accordingly.

General comments
The manuscript is well written and has a good structure.
Thank you for reviewing our manuscript. We appreciate your general comments.
Minor comments Introduction, page 5 row20-30 Is there any information on how complete the NCRS and the NCIN registries are? Maybe it could be examplified by data on how complete they are.
Comparison data from the UKIACR has been added to the second paragraph of the Discussion section.

Materials and Methods
Participants selection, page 6 row 14-40 It is not clear as to why the authors focus on the men who had died due to prostate cancer. In this type of study I assume the data should be easily retracted also for men alive and could therefore be compared as well. I suggest a short comment to clarify this.
Apologies that this was not made clear. In the CAP trial a full medical record review was only completed for deceased men. The Methods section has been amended.
Data, page 7 row 6-10 It is not clear whether nor not is is the authors who combine NCRAS clinial TNM stage data with pathological stage data in order to produce `best`stage data. I am also a little bit sceptical to this, as such registry based data may be used to do prediction models later on and then the best stage data is a combination of the information you have at time of diagnosis and the information you have after treatment, which then may differ from men undergoing surgery and men undergoing radiation therapy.
Where staging data was incomplete in the NCRAS, the NCRAS staff generated the 'best stage'. The Methods section has been amended to make this clearer.
Results, page 8, row 10-15) The results for complete information on Gleason grade is surprisingly low in the NCRAS data. I would like a comment on that in the discussion of the manuscriptit is difficult for policymakers to draw any conclusions on trends in prostate cancer when completeness is low.
The final paragraph of the Discussion section has been amended to include this comment.
Tables, page 10, to make table 1 easier to read, is it possible to greyshade every other row? Otherwise I have no comments.
Thank you for the suggestion. We have amended Table 1 to try and make it easier to read.

Discussion
Adequate discussion, however I think the fact that there is fairly low coveregae in both registers could be more stressed.
The final paragraph of the Discussion section has been amended to include this comment.
However I think the reference used on page 14 row 5 is wrongthe article referenced does not mention this controversy, at least not to my knowledge.
Thank you for reviewing this article. The authors would be happy to consider any important observations that may have been missed if the reviewer would like to specify them.
This report is not a study comparing the contents of the accumulated data but a report simply comparing the accumulation situation. Collecting TNM and Gleason score when conducting clinical research on prostate cancer is fundamentally important. Therefore the research itself to compare the collection situation does not make sense. For CAP trial in particular, it is necessary to reconsider the data collection method and accuracy so that it can be applied as more useful clinical data for research purpose.
The CAP trial was used as an independently collected, comparable dataset to assess the quality of case reporting, stage and grade data in the NCRAS cancer registry for prostate cancer. The use of independent datasets, as done in this study, is the gold standard for assessing data quality in cancer registries. The authors believe that ensuring national cancer registries have high quality data is vital to inform policy and funding decisions aimed at improving cancer outcomes.
The data collection and accuracy of the CAP trial was not the main focus of this study, and peer reviewed studies of the CAP trial will be published in due course. Further information about the CAP trial methodology can be found in the BJC paper by Turner et al (Reference 9).

GENERAL COMMENTS
In the material and methods second paragraph, second sentence it would be worth indicating at the start'' In this trial these'' In the conclusion the authors miss that the completeness of NCRAS is good in terms of not missing cases, it is the data items for each registered case which is not so complete, this is better summarised in the paper conclusion page 15 and the authors should consider using some of the words from that to the conclusion for the abstract

Reviewer 1 comments
Thank you for reviewing our revised manuscript, and recommending it for publication 1. In the material and methods second paragraph, second sentence it would be worth indicating at the start'' In this trial these'' The suggested amendment has been made to the materials and methods section 2. In the conclusion the authors miss that the completeness of NCRAS is good in terms of not missing cases, it is the data items for each registered case which is not so complete, this is better summarised in the paper conclusion page 15 and the authors should consider using some of the words from that to the conclusion for the abstract The conclusion section of the abstract has been amended to be more consistent with the discussions and conclusions section of the main article.