Objectives Effective researcher assessment is key to decisions about funding allocations, promotion and tenure. We aimed to identify what is known about methods for assessing researcher achievements, leading to a new composite assessment model.
Design We systematically reviewed the literature via the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols framework.
Data sources All Web of Science databases (including Core Collection, MEDLINE and BIOSIS Citation Index) to the end of 2017.
Eligibility criteria (1) English language, (2) published in the last 10 years (2007–2017), (3) full text was available and (4) the article discussed an approach to the assessment of an individual researcher’s achievements.
Data extraction and synthesis Articles were allocated among four pairs of reviewers for screening, with each pair randomly assigned 5% of their allocation to review concurrently against inclusion criteria. Inter-rater reliability was assessed using Cohen’s Kappa (ĸ). The ĸ statistic showed agreement ranging from moderate to almost perfect (0.4848–0.9039). Following screening, selected articles underwent full-text review and bias was assessed.
Results Four hundred and seventy-eight articles were included in the final review. Established approaches developed prior to our inclusion period (eg, citations and outputs, h-index and journal impact factor) remained dominant in the literature and in practice. New bibliometric methods and models emerged in the last 10 years including: measures based on PageRank algorithms or ‘altmetric’ data, methods to apply peer judgement and techniques to assign values to publication quantity and quality. Each assessment method tended to prioritise certain aspects of achievement over others.
Conclusions All metrics and models focus on an element or elements at the expense of others. A new composite design, the Comprehensive Researcher Achievement Model (CRAM), is presented, which supersedes past anachronistic models. The CRAM is modifiable to a range of applications.
- researcher assessment
- research metrics
- journal impact factor
- Comprehensive Researcher Achievement Model (CRAM)
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- researcher assessment
- research metrics
- journal impact factor
- Comprehensive Researcher Achievement Model (CRAM)
Strengths and limitations of this study
A large, diverse dataset of over 478 articles, containing many ideas for assessing researcher performance, was analysed.
Strengths of the review include executing a wide-ranging search strategy, and the consequent high number of included articles for review; the results are limited by the literature itself, for example, new metrics were not mentioned in the articles, and therefore not captured in the results.
A new model combining multiple factors to assess researcher performance is now available.
Its strengths include combining quantitative and qualitative components in the one model.
The Comprehensive Researcher Achievement Model, despite being evidence oriented, is a generic one and now needs to be applied in the field.
Judging researchers’ achievements and academic impact continues to be an important means of allocating scarce research funds and assessing candidates for promotion or tenure. It has historically been carried out through some form of expert peer judgement to assess the number and quality of outputs and, in more recent decades, citations to them. This approach requires judgements regarding the weight that should be assigned to the number of publications, their quality, where they were published and their downstream influence or impact. There are significant questions about the extent to which human judgement based on these criteria is an effective mechanism for making these complex assessments in a consistent and unbiased way.1–3 Criticisms of peer assessment, even when underpinned by relatively impartial productivity data, include the propensity for bias, inconsistency among reviewers, nepotism, group-think and subjectivity.4–7
To compensate for these limitations, approaches have been proposed that rely less on subjective judgement and more on objective indicators.3 8–10 Indicators of achievement focus on one or a combination of four aspects: quantity of researcher outputs (productivity); value of outputs (quality); outcomes of research outputs (impact); and relations between publications or authors and the wider world (influence).11–15 Online publishing of journal articles has provided the opportunity to easily track citations and user interactions (eg, number of article downloads) and thus has provided a new set of indices against which individual researchers, journals and articles can be compared and the relative worth of contributions assessed and valued.14 These relatively new metrics have been collectively termed bibliometrics 16 when based on citations and numbers of publications, or altmetrics 17 when calculated by alternative online measures of impact such as number of downloads or social media mentions.16
The most established metrics for inferring researcher achievement are the h-index and the journal impact factor (JIF). The JIF measures the average number of citations of an article in the journal over the previous year, and hence is a good indication of journal quality but is increasingly regarded as a primitive measure of quality for individual researchers.18 The h-index, proposed by Hirsch in 2005,19 attempts to portray a researcher’s productivity and impact in one data point. The h-index is defined as the number (h) of articles published by a researcher that have received a citation count of at least h. Use of the h-index has become widespread, reflected in its inclusion in author profiles on online databases such as Google Scholar and Scopus.
Also influenced by the advent of online databases, there has been a proliferation of other assessment models and metrics,16 many of which purport to improve on existing approaches.20 21 These include methods that assess the impact of articles measured by: downloads or online views received, practice change related to specific research, take-up by the scientific community or mentions in social media.
Against the backdrop of growth in metrics and models for assessing researchers’ achievements, there is a lack of guidance on the relative strengths and limitations of these different approaches. Understanding them is of fundamental importance to funding bodies that drive the future of research, tenure and promotion committees and more broadly for providing insights into how we recognise and value the work of science and scientists, particularly those researching in medicine and healthcare. This review aimed to identify approaches to assessing researchers’ achievements published in the academic literature over the last 10 years, considering their relative strengths and limitations and drawing on this to propose a new composite assessment model.
All Web of Science databases (eight in total, including the Web of Science Core Collection, MEDLINE and BIOSIS Citation Index) were searched using terms related to researcher achievement (researcher excellence, track record, researcher funding, researcher perform*, relative to opportunity, researcher potential, research* career pathway, academic career pathway, funding system, funding body, researcher impact, scientific* productivity, academic productivity, top researcher, researcher ranking, grant application, researcher output, h*index, i*index, impact factor, individual researcher) and approaches to its assessment (model, framework, assess*, evaluat*, *metric*, measur*, criteri*, citation*, unconscious bias, rank*) with ‘*’ used as an unlimited truncation to capture variation in search terms, as seen in online supplementary appendix 1. These two searches were combined (using ‘and’), and results were downloaded into EndNote,22 the reference management software.
After removing duplicate references in EndNote, articles were allocated among pairs of reviewers (MB–JL, CP–CB, KL–JH and KC-LAE) for screening against inclusion criteria. Following established procedures,23 24 each pair was randomly assigned 5% of their allocation to review concurrently against inclusion criteria, with inter-rater reliability assessed using Cohen’s kappa (ĸ). The ĸ statistic was calculated for pairs of researchers, with agreement ranging from moderate to almost perfect (0.4848–0.9039).25 Following the abstract and title screen, selected articles underwent full text review. Reasons for exclusion were recorded.
The following inclusion criteria were operationalised: (1) English language, (2) published in the last 10 years (2007–2017), (3) full text for the article was available, and (4) the article discussed an approach to the assessment of an individual researcher’s achievements (at the researcher or singular output-level). The research followed the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols framework.26 Empirical and non-empirical articles were included because many articles proposing new approaches to assessment, or discussing the limitations of existing ones, are not level one evidence or research based. Both quantitative and qualitative studies were included.
Data from the included articles were extracted, including: the country of article origin, the characteristics of the models or metrics discussed, the perspective the article presented on the metric or model (positive, negative and indeterminable) including any potential benefits or limitations of the assessment model (and if these were perceived or based on some form of evidence). A customised data extraction sheet was developed in Microsoft Excel, trialled among members of the research team and subsequently refined. This information was synthesised for each model and metric identified through narrative techniques. The publication details and classification of each paper are contained in online supplementary appendix 2.
Appraisal of the literature
Due to the prevalence of non-empirical articles in this field (eg, editorial contributions and commentaries), it was determined that a risk of bias tool such as the Quality Assessment Tool could not be applied.27 Rather, assessors were trained in multiple meetings (24 October, 30 October and 13 November 2017) to critically assess the quality of articles. Given the topic of the review (focusing on the publication process), the type of models and metrics identified (ie, more metrics that use publication data metrics) may influence the cumulative evidence and subsequently create a risk of bias. In addition, three researchers (JH, EM and CB) reviewed every included article to extract documented conflicts of interests of authors.
Patient and public involvement
Patients and the public were not involved in this systematic review.
The final dataset consisted of 478 academic articles. The data screening process is presented in figure 1.
Of the 478 included papers (see online supplementary appendix 2 for a summary), 295 (61.7%) had an empirical component, which ranged from interventional studies that assessed researcher achievement as an outcome measure (eg, a study measuring the outcomes of a training programme),28 as a predictor29–31 (eg, a study that demonstrated the association between number of citations early in one’s career and later career productivity) or reported a descriptive analysis of a new metric.32 33 One hundred and sixty-six (34.7%) papers were not empirical, including editorial or opinion contributions that discussed the assessment of research achievement, or proposed models for assessing researcher achievement. Seventeen papers (3.6%) were reviews that considered one or more elements of assessing researcher achievements. The quality of these contributions ranged in terms of the risk of bias in the viewpoint expressed. Only for 19 papers (4.0%) did the authors declare a potential conflict of interest.
Across the study period, 78 articles (16.3%) involved authors purporting to propose new models or metrics. Most articles described or cited pre-existing metrics and largely discussed their perceived strengths and limitations. Figure 2 shows the proportion of positive or negative discussions of five of the most common approaches to assessing an individual’s research achievement (altmetrics, peer-review, h-index, simple counts and JIF). The approach with most support was altmetrics (51.0% of articles mentioning altmetrics). The JIF was discussed with mostly negative sentiments in relevant articles (69.4%).
Publication and citation counts
One hundred and fifty-three papers (32.0%) discussed the use of publication and citation counts for purposes of assessing researcher achievement, with papers describing them as a simple ‘traditional but somewhat crude measure’,34 as well as the building blocks for other metrics.35 A researcher’s number of publications, commonly termed an n-index,36 was suggested by some to indicate researcher productivity,14 rather than quality, impact or influence of these papers.37 However, the literature suggested that numbers of citations indicated the academic impact of an individual publication or researcher’s body of work, calculated as an author’s cumulative or mean citations per article.38 Some studies found support for the validity of citation counts and publications in that they were correlated with other indications of a researcher’s achievement, such as awards and grant funding,39 40 and were predictive of long-term success in a field.41 For example, one paper argued that having larger numbers of publications and being highly cited early in one’s career predicted later high-quality research.42
A number of limitations of using citation or publication counts was observed. For example, Minasny et al 43 highlighted discrepancies between publications and citations counts in different databases because of their differential structures and inputs.43 Other authors38 44 45 noted that citation patterns vary by discipline, which they suggested can make them inappropriate for comparing researchers from different fields. Average citations per publication were reported as highly sensitive to change or could be skewed if, for example, a researcher has one heavily cited article.46 47 A further disadvantage is the lag-effect of citations48 49 and that, in most models, citations and publications count equally for all coauthors despite potentially differential contributions.50 Some also questioned the extent to which citations actually indicated quality or impact, noting that a paper may influence clinical practice more than academic thinking.51 Indeed, a paper may be highly cited because it is useful (eg, a review), controversial or even by chance, making citations a limited indication of quality or impact.40 50 52 In addition to limitations, numerous authors made the point that focusing on citation and publication counts can have unintended, negative consequences for the assessment of researcher achievement, potentially leading to gaming and manipulation, including self-citations and gratuitous authorship.53 54
Singular output-level approaches
Forty-one papers (8.6%) discussed models and metrics at the singular output or article-level that could be used to infer researcher achievement. The components of achievement they reported assessing were typically quality or impact.55 56 For example, some papers reported attempts to examine the quality of a single article by assessing its content.57 58 Among the metrics identified in the literature, the immediacy index focused on impact by measuring the average number of cites an article received in the year it was published.59 Similarly, Finch21 suggested adapting the Source Normalized Impact per Publication (a metric used for journal-level calculations across different fields of research) to the article-level.
Many of the article-level metrics identified could also be upscaled to produce researcher-level indications of academic impact. For example, the sCientific currENcy Tokens (CENTs), proposed by Szymanski et al 60 involved giving a ‘cent’ for each new non-self-citation a publication received; CENTs are then used as the basis for the researcher-level i-index, which follows a similar approach as the h-index but removes self-citations.60 The temporally averaged paper-specific impact factor calculates an article’s average number of citations per year combined with bonus cites for the publishing journal’s prestige and can be aggregated to measure the overall relevance of a researcher (temporally averaged author-specific impact factor).61
Journal impact factor
The JIF, commonly recognised as a journal-level measure of quality,59 62 was discussed in 211 (44.1%) of the papers reviewed in relation to assessing singular outputs or individual researchers. A number of papers described the JIF being used informally to assess an individual’s research achievement at the singular output-level and formally in countries such as France and China.63 It implies article quality because it is typically a more competitive process to publish in journals with high impact factors.64 Indeed, the JIF was found to be the best predictor of a paper’s propensity to receive citations.65
The JIF has a range of limitations when used to indicate journal quality,66 including that it is disproportionally affected by highly cited, outlier articles41 67 and is susceptible to ‘gaming’ by editors.17 68 Other criticisms focused on using the JIF to assess individual articles or the researchers who author them.69 Some critics claimed that using the JIF to measure an individual’s achievement encourages researchers to publish in higher impact but less appropriate journals for their field, which ultimately means their article may not be read by relevant researchers.70 71 Furthermore, the popularity of a journal was argued to be a poor indication of the quality of any one article, with the citation distributions for calculating JIF found to be heavily skewed (ie, a small subset of papers receive the bulk of the citations, while some may receive none).18 Ultimately, many commentators argued that the JIF is an inappropriate metric to assess individual researchers because it is an aggregate metric of a journal’s publication and expresses nothing about any individual paper.21 49 50 72 However, Bornmann and Pudovkin73 suggested one case in which it would be appropriate to use JIF for assessing individual researchers: in relation to their recently published papers that had not had the opportunity to accumulate citations.73
The h-index was among the most commonly discussed metrics in the literature (254 [53.1%] of the papers reviewed); in many of these papers, it was described by authors as more sophisticated than citation and publication counts but still straightforward, logical and intuitive.74–76 Authors noted its combination of productivity (h publications) and impact indicators (h citations) as being more reliable77 78 and stable than average citations per publications,41 because it is not skewed by the influence of one popular article.79 One study found that the h-index correlated with other metrics more difficult to obtain.76 It also showed convergent validity with peer-reviewed assessments80 and was found to be a good predictor of future achievement.41
However, because of the lag-effect with citations and publications, the h-index increases with a researcher’s years of activity in the field, and cannot decrease, even if productivity later declines.81 Hence, numerous authors suggested it was inappropriate for comparing researchers at different career stages,82 or those early in their career.68 The h-index was also noted as being susceptible to many of the critiques levelled against citation counts, including potential for gaming and inability to reflect differential contributions by coauthors.83 Because disciplines differ in citation patterns,84 some studies noted variations in author h-indices between different methodologies85 and within medical subspecialties.86 Some therefore argued that the h-index should not be used as the sole measure of a researcher’s achievement.86
A number of modified versions of the h-index were identified; these purported to draw on its basic strengths of balancing productivity with impact while redressing perceived limitations. For example, the g-index measures global citation performance87 and was defined similarly to the h-index but with more weight given to highly cited articles by assuming the top g articles have received at least g2citations.88 Azer and Azer89 argued it was a more useful measure of researcher productivity.89 Another variant of the h-index identified, the m-quotient, was suggested to minimise the potential to favour senior academics by accounting for the time passed since a researcher has begun publishing papers.90 91 Other h-index variations reported in the articles reviewed attempted to account for author contributions, such as the h-maj index, which includes only articles in which the researcher played a core role (based on author order), and the weighted h-index, which assigns credit points according to author order.87 92
Recurring issues with citation-based metrics
The literature review results suggested that no one citation-based metric was ideal for all purposes. All of the common metrics examined focused on one aspect of an individual’s achievement and thus failed to account for other aspects of achievement. The limitations with some of the frequently used citation-based metrics are listed in box 1.
Common limitations in the use of citation-based metrics
Challenges with reconciling differences in citation patterns across varying fields of study.
Time-dependency issues stemming from differences in career length of researchers.
Prioritising impact over merit, or quality over quantity, or vice versa.
The lag-effect of citations.
Gaming and the ability of self-citation to distort metrics.
Failure to account for author order.
Contributions from authors to a publication are viewed as equal when they may not be.
Perpetuate ‘publish or perish’ culture.
Potential to stifle innovation in favour of what is popular.
In contradistinction with the metrics discussed above, 54 papers (11.3%) discussed altmetrics (or ‘alternative metrics’), which included a wide range of techniques to measure non-traditional, non-citation based usage of articles, that is, influence.17 Altmetric measures included the number of online article views,93 bookmarks,94 downloads,41 PageRank algorithms95 and attention by mainstream news,63 in books96 and social media, for example, in blogs, commentaries, online topic reviews or Tweets.97 98 These metrics typically measure the ‘web visibility’ of an output.99 A notable example is the social networking site for researchers and scientists, ResearchGate, which uses an algorithm to score researchers based on the use of their outputs, including citations, reads and recommendations.100
A strength of altmetrics lies in providing a measure of influence promptly after publication.68 101 102 Moreover, altmetrics allows tracking of the downloads of multiple sources (eg, students, the general public, clinicians, as well as academics) and multiple types of format (eg, reports and policy documents),103 which are useful in gauging a broader indication of impact or influence, compared with more traditional metrics that solely or largely measure acknowledgement by experts in the field through citations.17
Disadvantages noted in the articles reviewed included that altmetrics calculations have been established by commercial enterprises such as Altmetrics LLC (London, UK) and other competitors,104 and there may be fees levied for their use. The application of these metrics has also not been standardised.96 Furthermore, it has been argued that, because altmetrics are cumulative and typically at the article-level, they provide more an indication of influence or even popularity,105 instead of quality or productivity.106 Hence, one study suggested no correlation between attention on Twitter and expert analysis of an article’s originality, significance or rigour.107 Another showed that Tweets predict citations.108 Overall, further work needs to assess the value of altmetric scores in terms of their association with other traditional indicators of achievement.109 Notwithstanding this, there were increasing calls to consider altmetrics alongside more conventional metrics in assessing researchers and their work.110
A past record of being funded by national agencies was identified as a common measurement of individual academic achievement (particularly productivity, quality and impact) in a number of papers and has been argued to be a reliable method that is consistent across medical research.111–113 For example, the National Institute of Health’s (NIH) Research Portfolio Online Reporting Tools system encourages public accountability for funding by providing online access to reports, data and NIH-funded research projects.111 114
New metrics and models identified
The review also identified and assessed new metrics and models that were proposed during the review period, many of which had not gained widespread acceptance or use. While there was considerable heterogeneity and varying degrees of complexity among the 78 new approaches identified, there were also many areas of overlap in their methods and purposes. For example, some papers reported on metrics that used a PageRank algorithm,115 116 a form of network analysis based on structural characteristics of publications (eg, coauthorship or citation patterns).14 Metrics based on PageRank purported to measure both the direct and indirect impacts of a publication or researcher. Other approaches considered the relative contributions of authors to a paper in calculating productivity.117 Numerous metrics and models that built on existing approaches were also reported.118 For example, some developed composite metrics that included a publication’s JIF alongside an author contribution measure119 or other existing metrics.120 However, each of these approaches reported limitations, in addition to their strengths or improvements on other methods. For example, in focusing on productivity, a metric necessarily often neglected impact.121 Online supplementary appendix 3 provides a summary of these new or refashioned metrics and models, with details of their basis and purpose.
This systematic review identified a large number of diverse metrics and models for assessing an individual’s research achievement that have been developed in the last 10 years (2007–2017), as evidenced in online supplementary appendix 3. At the same time, other approaches that pre-dated our study time period of 2007–2017 were also discussed frequently in the literature reviewed, including the h-index and JIF. All metrics and models proposed had their relative strengths, based on the components of achievement they focused on, and their sophistication or transparency.
The review also identified and assessed new metrics emerging over the past few decades. Peer-review has been increasingly criticised for reliance on subjectivity and propensity for bias,7 and there have been arguments that the use of specific metrics may be a more objective and fair approach for assessing individual research achievement. However, this review has highlighted that even seemingly objective measures have a range of shortcomings. For example, there are inadequacies in comparing researchers at different career stages and across disciplines with different citation patterns.84 Furthermore, the use of citation-based metrics can lead to gaming and potential ethical misconduct by contributing to a ‘publish or perish’ culture in which researchers are under pressure to maintain or improve their publication records.122 123 New methods and adjustments to existing metrics have been proposed to explicitly address some of these limitations; for example, normalising metrics with ‘exchange rates’ to remove discipline-specific variation in citation patterns, thereby making metric scores more comparable for researchers working in disparate fields.124 125 Normalisation techniques have also been used to assess researchers’ metrics with greater recognition of their relative opportunity and career longevity.126
Other criticisms of traditional approaches centre less on how they calculated achievement and more on what they understood or assumed about its constituent elements. In this review, the measurement of impact or knowledge gain was often exclusively tied to citations.127 Some articles proposed novel approaches to using citations as a measure of impact, such as giving greater weight to citations from papers that were themselves highly cited128 or that come from outside the field in which the paper was published.129 However, even other potential means of considering scientific contributions and achievement, such as mentoring, were still ultimately tied to citations because mentoring was measured by the publication output of mentees.130
A focus only on citations was widely thought to disadvantage certain types of researchers. For example, researchers who aim to publish with a focus on influencing practice may target more specialised or regional journals that do not have high JIFs, where their papers will be read by the appropriate audience and findings implemented, but they may not be well cited.51 In this regard, categorising the type of journal in which an article has been published in terms of its focus (eg, industry, clinical and regional/national) may go some way towards recognising those publications that have a clear knowledge translation intention and therefore prioritise real-world impact over academic impact.122 There were only a few other approaches identified that captured broader conceptualisations of knowledge gain, such as practical impact or wealth generation for the economy, and these too were often simplistic, such as including patents and their citations131 or altmetric data.96 While altmetrics hold potential in this regard, their use has not been standardised,96 and they come with their own limitations, with suggestions that they reflect popularity more so than real-world impact.105 Other methodologies have been proposed for assessing knowledge translation and real-world impact, but these can often be labour intensive.132 For example, Sutherland et al 133 suggested that assessing individual research outputs in light of specific policy objectives through peer-review based scoring, may be a strategy, but this is typically not feasible in situations such as grant funding allocation, where there are time constraints and large applicant pools to assess.
In terms of how one can make sense of the validity of many of these emerging approaches for assessing an individual’s research achievements, metrics should demonstrate their legitimacy empirically, as well as having a theoretical basis for their use and clearly differentiating what aspects of quality, achievement or impact they purport to examine.55 65 If the recent, well-publicised134–136 San Francisco Declaration on Research Assessment137 is anything to go by, internationally, there is a move away from the assessment of individual researchers using the JIF and the journal in which the research has been published.
There is momentum, instead, for assessment of researcher achievements on the basis of a wider mix of measures, hence our proposed Comprehensive Researcher Achievement Model (CRAM) (figure 3). On the left-hand side of this model is the researcher to be assessed and key characteristics that influence the assessment. Among these factors, some (ie, field or discipline, coauthorship and career longevity) can be controlled for depending on the metric, while other components, such as gaming or the research topic (ie, whether it is ‘trendy’ or innovative), are less amenable to control or even prediction. Online databases, which track citations and downloads and measure other forms of impact, hold much potential and will likely be increasingly used in the future to assess both individual researchers and their outputs. Hence, assessment components (past funding, articles, citations, patents, downloads and social media traction) included in our model are those primarily accessible online.
Strengths and limitations
The findings of this review suggest assessment components should be used with care, with recognition of how they can be influenced by other factors, and what aspects of achievement they reflect (ie, productivity, quality, impact and influence). No metric or model singularly captures all aspects of achievement, and hence use of a range, such as the examples in our model, is advisable. CRAM recognises that the configuration and weighting of assessment methods will depend on the assessors and their purpose, the resources available for the assessment process and access to assessment components. Our results must be interpreted in light of our focus on academic literature. The limits of our focus on peer-reviewed literature were evident in the fact that some new metrics were not mentioned in articles and therefore not captured in our results. While we defined impact broadly at the outset, overwhelmingly, the literature we reviewed focused on academic, citation-based impact. Furthermore, although we assessed bias in the ways documented, the study design limited our ability to apply a standardised quality assessment tool. A strength of our focus was that we set no inclusion criteria with regard to scientific discipline, because novel and useful approaches to assessing research achievement can come from diverse fields. Many of the articles we reviewed were broadly in the area of health and medical research, and our discussion is concerned with the implications for health and medical research, as this is where our interests lie.
There is no ideal model or metric by which to assess individual researcher achievement. We have proposed a generic model, designed to minimise risk of the use of any one or a smaller number of metrics, but it is not proposed as an ultimate solution. The mix of assessment components and metrics will depend on the purpose. Greater transparency in approaches used to assess achievement including their evidence base is required.37 Any model used to assess achievement for purposes such as promotion or funding allocation should include some quantitative components, based on robust data, and be able to be rapidly updated, presented with confidence intervals and normalised.37 The assessment process should be difficult to manipulate and explicit about the components of achievement being measured. As such, no current metric suitably fulfils all these criteria. The best strategy to assess an individual’s research achievement is likely to involve the use of multiple approaches138 in order to dilute the influence and potential disadvantages of any one metric while providing more rounded picture of a researcher’s achievement83 139; this is what the CRAM aims to contribute.
All in all, achievement in terms of impact and knowledge gain is broader than the number of articles published or their citation rates and yet most metrics have no means of factoring in these broader issues. Altmetrics hold promise in complementing citation-based metrics and assessing more diverse notions of impact, but usage of this type of tool requires further standardisation.96 Finally, despite the limitations of peer-review, the role of expert judgement should not be discounted.41 Metrics are perhaps best applied as a complement or check on the peer-review process, rather than the sole means of assessment of an individual’s research achievements.140
Contributors JB conceptualised and drafted the manuscript, revised it critically for important intellectual content and led the study. JH, KC and JCL made substantial contributions to the design, analysis and revision of the work and critically reviewed the manuscript for important intellectual content. CP, CB, MB, RC-W, FR, PS, AH, LAE, KL, EA, RS and EM carried out the initial investigation, sourced and analysed the data and revised the manuscript for important intellectual content. PDH and JW critically commented on the manuscript, contributed to the revision and editing of the final manuscript and reviewed the work for important intellectual content. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
Funding The work on which this paper is based was funded by the Australian National Health and Medical Research Council (NHMRC) for work related to an assessment of its peer-review processes being conducted by the Council. Staff of the Australian Institute of Health Innovation undertook this systematic review for Council as part of that assessment.
Disclaimer Other than specifying what they would like to see from a literature review, NHMRC had no role in the conduct of the systematic review or the decision to publish.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All data have been made available as appendices.
Patient consent for publication Not required.