Article Text


Systematic review of the measurement properties of self-report physical activity questionnaires in healthy adult populations
  1. Zoë Silsbury1,2,
  2. Robert Goldsmith1,3,
  3. Alison Rushton2
  1. 1Physiotherapy Department, University Hospital of Wales, Cardiff, UK
  2. 2School of Sport, Exercise and Rehabilitation Sciences, College of Life and Environmental Sciences, University of Birmingham, Birmingham, UK
  3. 3School of Physiotherapy, Cardiff University, Cardiff, UK
  1. Correspondence to Zoë Silsbury; Zoe.Grant{at}


Objective This systematic review evaluated the measurement properties of current self-report physical activity questionnaires (SRPAQs) completed within healthy adult populations.

Design Two reviewers independently searched seven electronic databases and hand searched for articles investigating measurement properties of a SRPAQ evaluating physical activity over the previous 6 months. Articles published from 1 May 2001 to 4 December 2014 were systematically screened and eligible studies were not limited to English language sources. Articles investigating specific race, gender or socioeconomic populations were excluded.

Results 10 studies investigating 10 SRPAQs were included. The methodological quality of the included studies was evaluated using COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) and ranged from ‘poor’ to ‘good’. The Recent Physical Activity Questionnaire, International Physical Activity Questionnaires and Physical Activity Assessment Tool demonstrated good/excellent test–retest reliability (intra-class coefficient (ICC)=0.76, p<0.0001; r=0.627–0.91; r=0.618, p<0.001, respectively), but variable criterion validity (r=0.67, p<0.0001; r=−0.02–0.43; r=0.392, p<0.01, respectively). The single-item measure showed significant criterion validity against an accelerometer (for moderate to vigorous physical activity (MVPA) k=0.23, 95% CI 0.05 to 0.41; and physical activity ≥10 min bouts 0.39 (95% CI 0.14 to 0.64). Construct validity of the six-point scale and Human Activity Profile varied significantly with age, marital status and presence of comorbidities (p<0.05, <0.01, <0.000 and p<0.05, <0.05, <0.000, respectively). The 1 week Godlin-Shephard recall demonstrated ‘moderate’ validity with the gold standard measure of accelerometry (r=0.43).

Conclusions Inconclusive evidence exists. Further investigation of criterion validity of the short-form International Physical Activity Questionnaire is required, as it demonstrated excellent test–retest reliability.

PROSPERO number CRD42012002484.

Statistics from

Strengths and limitations of this study

  • The ambiguity of common terminology may have affected the electronic database search for this systematic review, possibly contributing to omissions in the included studies.

  • This review searched for articles, reported in any language, investigating the English language version of a self-report physical activity questionnaire (SPARQ). In one study, this led to the exclusion of a subset of the overall sample that investigated the non-English version(s) of the SRPAQ. This may have diminished the overall confidence in this study's findings.

  • Studies investigating use of SRPAQs on healthy subjects were used, which reduces the inference of the conclusions on those suffering from comorbidities or populations with disease.

  • The focus of the systematic review was all measurement properties of SRPAQs; however, the eligible studies only investigated the properties of test–retest reliability, internal consistency, criterion, construct and structural validity.

  • The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodological quality analysis was self-taught by both the reviewers, using the handbook and articles explaining its use. This may have affected the ranking outcome for the included articles; however, as it was completed independently, this reduced the risk of error and bias.


Physical activity (PA) prevents chronic diseases, independent of ethnicity, income, education and body morphology.1 The WHO estimated that, in 2008, 31% of the global healthy adult population failed to achieve the recommended 150 min of moderate-intensity aerobic PA, or 75 min of vigorous intensity PA, a week.2 This has been described as the greatest public health problem of the 21st century.3 PA encompasses ‘any bodily movement produced by contraction of skeletal muscle resulting in energy expenditure above basal level’.4 A recent systematic review5 described a positive association between PA and physical as well as psychological health; however, heterogeneity of study designs reduces confidence in these findings. This was supported by a systematic review evaluating the use of PA in managing the pathogenesis, physical strength and fitness, quality of life and symptoms of patients with chronic disease.6 The review found strong evidence for effectiveness within cardiovascular disease, type 2 diabetes and obesity; and strong evidence for improved symptom management and quality of life for patients with cancer, osteoarthritis, osteoporosis, fibromyalgia and depression. The evidence supporting the role of PA in the prevention and management of chronic disease assists advancement of the public health research agenda,7 thus improving quality of life and healthcare cost-effectiveness.8 Research investigating PA in healthy participants is therefore valuable to inform prevention of chronic disease.

The International Classification of Functioning, Disability, and Health (ICF)9 considers the individual's PA level and participation alongside external factors (eg, environment) for disease management. The integration of the biopsychosocial model and the ICF into physiotherapy practice has enabled a focus on physical ability and graded return to function rather than pain. Healthcare professionals commonly see patients for pain relief prior to the development of chronic disease and are best placed to address issues surrounding risk factors, including lack of PA. Therefore, the demand on physiotherapists to accurately evaluate PA for health status and treatment efficacy is increasing. Physiotherapists use a wide range of outcome measures to inform and evaluate their clinical practice, but selection of the most appropriate measure is challenging.10 PA outcome measures assess the actual or perceived ability of an individual to carry out a variety of daily tasks and recreational or competitive sport.11 However, the multidimensional and individual-specific nature of PA has resulted in a diverse range of outcome measures, contributing to a lack of consensus from clinicians/researchers regarding the best measure.

Patient-reported PA outcome measures are relatively inexpensive and easy to administer. It is acknowledged that self-report PA questionnaires (SRPAQs) comprise a detailed assessment to allow the detection of clinically relevant change in diverse populations.12 However, due to the breadth of the activity dimensions analysed, it is argued that SRPAQs may lead to misclassification.12 The advancement of real-time data acquisition from performance-based outcome measures including accelerometers, and the doubly labelled water (DLW) technique, are now considered the most accurate methods for determining energy expenditure; as PA is defined as energy expended above the metabolic rate, these methods are considered the gold standard for assessing PA. However, as the DLW technique cannot predict patterns or type of activity performed, and accelerometers are not able to provide information on swimming, step/inclined activity, or strength training, there is a risk of under-estimating the energy expended in these activities.13 Furthermore, performance-based outcome measures tend to use expensive equipment and require data analysis from trained professionals, which is generally beyond the scope of departments interested in evaluating PA. Therefore, real-time data acquisition is used in combination with SRPAQs that consider the multidimensional nature of the activity completed.13 This informs the development of government guidelines.

Limited resources for implementing performance-based outcome measures reinforce the need to identify the best SRPAQ for clinical practice and research, as they are the most practical and economical outcome measures for heterogeneous populations.14 It is important that the measurement properties of SRPAQs are evaluated to reduce the risk of data misinterpretation and bias.15 An International Delphi study of experts formulated the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN), defining the essential properties of outcome measures as: reliability, measurement error, validity, responsiveness and interpretability.16

One systematic review11 has evaluated the measurement properties of SRPAQs in patients aged 18–55 years. The authors incorporated all SRPAQs regardless of time frame, administration type and whether the participants were already experiencing chronic disease. Heterogeneity and ambiguity in terminology may have contributed to their diverse data, making synthesis difficult; and hence identification of the best SRPAQ was not possible. Therefore, further study utilising a more homogenous population was advocated; results of which may allow for focused data analysis and synthesis, enabling conclusions to be drawn.

The global recommendation for health documents the dose–response relationship of PA to prevent non-communicable diseases in healthy people in specific age categories.1 This recognises the differing PA levels to evidentially achieve optimum results,17 and addresses the health benefits of PA in healthy populations to prevent chronic disease. Research investigating PA in an elderly population found that motivation was the most significant barrier.18 As motivation is not a component in most SRPAQs, specific PA questionnaires were developed for the elderly, and their role and measurement properties have already been evaluated.19 Furthermore, it has been reported that assessing PA for greater than a 6-month period and across seasons is unreliable due to poor subject recall.19 Consequently, a systematic review investigating the best existing SRPAQ for the healthy adult population of 18–60 years will improve accuracy of PA reporting. As WHO guidelines stipulate the health benefits gained from PA over 1 week and the subjectivity of self-reported PA in greater than a 6-month period, the systematic review targeted daily, weekly and monthly PA to inform health assessments and recommendations for disease prevention.



To evaluate the measurement properties of existing SRPAQs to ascertain the optimum PA outcome measure for use in a healthy adult population.

Study design

The systematic review followed a predefined and published protocol adapted from the Cochrane Handbook,20 the Centre for Reviews and Dissemination Group,21 and Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA).22

Search strategy

The following electronic databases were searched to maximise the identification of appropriate articles20:

  • OVID Medline without Revisions

  • CINAHL Plus with Full Text Database


  • CAB Abstracts

  • Health Management Information Consortium

  • Journals @ OVID Full Text

  • PubMed

Databases were searched from 1 May 2001, when the WHO first highlighted the importance of PA evaluation,9 to 4 December 2014. Citation tracking of reference lists ensured all relevant articles were obtained21 (boxes 1 and 2).

Box 1

Inclusion criteria for this systematic review: Inclusion criteria were articles

  • Completed on the adult human population (>18 and <60 years old); The National Institute for Health and Care Excellence guidelines recognise different activity thresholds for varying age groups and so specific questionnaires were developed for the adult population.23 Further research suggests that motivation and the ability to perform activities is more of a limitation for the elderly, rather than the amount.18 Therefore the age group of 60 years old and above have specific physical activity (PA) questionnaires which were beyond the scope of this review.19

  • Reporting assessment of ≥1 measurement property; Including content, construct or criterion validity, internal consistency, interpretability, responsiveness, reliability or absolute measurement error; justified for self-report PA questionnaires (SRPAQs) by an International Delphi study.16

  • Reporting assessment of ≥1 written self-report standardised original English version outcome measure(s) focusing on PA; SRPAQs were used as they are practical for both cost and participant convenience, and feasible for use and analysis by clinicians.24

  • Investigating English SRPAQs as this was the aim of this systematic review.

  • Utilising a participant group representative of the general population, non-exclusive of race, gender or socioeconomic group. This allows transfer of the findings to the general population.25

Box 2

Exclusion criteria for this systematic review: Exclusion criteria were articles

  • That were systematic reviews as original articles were required for methodological quality analysis.26

  • Completed on patients where reduced physical activity (PA) was due to neurological conditions or psychological factors. This reduces the variables that may have affected PA or completion of the self-report PA questionnaires (SRPAQs).27

  • Completed on a specific population group as this reduces transferability of the conclusions drawn to the general population.27

  • Utilising occupational PA or sedentary questionnaires or those conducted via the telephone. PA should be explicitly measured rather than being defined by involuntary occupational activity or a lack of sedentary behaviour.28 Telephone conducted questionnaires introduce the potential for inconsistent administration from the interviewer and potentially limit the number of participants from low socioeconomic groups.29 As the International Classification of Functioning, Disability, and Health emphasises the importance of PA for health promotion this is the concept of interest.30

  • Examining outcome measures focusing on pain or specific limb injury as the causative factor for activity modification. Assessment of PA due to pain or potentially short-term inactivity reduces transfer of the results to the general public.31

  • Examining PA logbooks rather than questionnaires. PA logbooks are less likely to reflect usual behaviour as PA levels may vary between seasons or as a result of illness or time constraints.24

  • Investigating long-term recall of PA (classified as >6 months). As the recall accuracy for vigorous and less intense activity over this timescale has been deemed unreliable.32 Furthermore, seasonal variations of PA are difficult to account for in the same survey.19

  • Investigating SRPAQs written in a language other than English. As investigating the consistency and accuracy of SRPAQ translation was beyond the scope of the review.33


“Self Report*” AND “Motor Activity” OR “Physical Activity” AND “Outcome Assessment” AND “Healthcare” OR “Outcome Measure*” OR Questionnaire* AND [“Physical Activity” OR “GPAQ” OR “International Physical Activity Questionnaire”] AND Valid* OR Reliab* OR “Measurement Propert*” NOT Obesity NOT Girl NOT Boy NOT Psych* NOT Environment* NOT Elderly.

Eligibility criteria

Included articles were original studies investigating ≥1 measurement property of an English language SRPAQ focusing on PA within the past 6 months,34 in a non-exclusive group. Articles published in any language (to ensure comprehensiveness20), investigating English language SRPAQs in participants aged 18–60 years,19 were included. Articles investigating SRPAQs focusing on pain, specific injury or pathological conditions were excluded.

Study selection

Two reviewers (ZS and RG) completed an independent search of electronic databases using the keywords. Titles and abstracts were evaluated independently by both reviewers using the eligibility criteria, removing duplicate articles. The two reviewers then independently evaluated the full-text article for eligibility.20 Any disagreements were resolved through discussion, and inconsistencies discussed with a third reviewer (AR).35 Level of agreement was evaluated using Cohen's κ. Details of study inclusion were recorded using the PRISMA flow diagram.22

Assessment of methodological quality and data extraction

The COSMIN checklist of measurement properties26 was used to evaluate the methodological quality of included studies. When studies are deemed to have good methodological quality, it indicates that their conclusions are more trustworthy.26 The COSMIN checklist was developed in an International Delphi study36 and comprises 12 components. Nine components analyse the standard of the included measurement property and one component assesses the interpretability of the study. Finally, two components question the generic requirements needed for the studies where Item Response Theory methods are utilised. Furthermore, there is a section to record the general requirements of results found to summarise the study's findings.

Training in use of the COSMIN analysis involved reading the background to its development and the manual on its use and scoring system.36 ,26 The COSMIN scoring system and manual were referred to when interpreting the checklist for each measurement property.16 ,37 The two independent reviewers completed the COSMIN analysis, with disagreement resolved by the third reviewer who was experienced in using COSMIN. Agreement was reached before data extraction. Level of agreement between reviewers was calculated at each stage of the process using Cohen's κ.38 The two independent reviewers extracted the data from the included studies,39 and the third reviewer mediated any discrepancies in data.

Data synthesis within and across studies

The SRPAQ investigated by each study was documented along with the measurement property assessed and the study's design and aim. The demographics of the population examined within each study were outlined and the COSMIN ranking for each measurement property methodology was recorded.37

For SRPAQs investigated in more than one study, the results and observed trends across the studies were collated. Consequently, a narrative synthesis of the data was used to discuss the relationship between the studies and their findings.


Included articles

Ten studies investigating the measurement properties of SRPAQs, comprising of over 2500 participants of multiple ethnicities across 11 countries, were included. Figure 1 depicts the PRISMA process of study eligibility (adapted from Moher et al22). All 346 articles electronically identified were written in English, with 269 removed as they were duplicates or did not satisfy the inclusion criteria from title and abstract assessment. Forty-nine articles were retrieved by hand searching reference lists but all were excluded. The full text of the remaining articles (n=77) was screened. Following full text screening, a further 67 articles were excluded. The most common reason for exclusion was that the article did not analyse an appropriate outcome measure (n=26), or population group (n=16). Complete agreement (with discussion between only n=3 articles that were subsequently excluded) did not necessitate use of Cohen's κ.

Figure 1

Flow diagram depicting study identification (adapted from Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)).

PA questionnaires and measurement properties

The data extraction for each article is presented in online supplementary table S1. Consensus between reviewers was reached at each stage following discussion on each article and reference to the COSMIN handbook, providing perfect agreement.38 Online supplementary table S2 summarises data synthesis across multiple studies investigating an SRPAQ by combining data analyses from individual studies.

Self-report physical activity questionnaires

Ten SRPAQs were investigated over the 10 studies, with four versions of the International Physical Activity Questionnaire (IPAQ) identified. The studies analysed test–retest reliability and criterion validity of SRPAQs, except for the human activity profile (HAP), six-point scale and Godlin-Shephard 1 week (G-S) recall. Comparison of the six-point scale, HAP and IPAQ long form ‘Past 7 days’ (IPAQ-L7S), was evaluated, and specific demographic groups were assessed for differing PA.40 The COSMIN methodological rank ranged between poor and good.

International Physical Activity Questionnaire

Both the written IPAQ short form ‘Usual Week’ (IPAQ-SUS) and IPAQ short form ‘Past 7 days’ (IPAQ-S7S) demonstrated excellent test–retest reliability over 7 days (r=0.79 and 0.75, respectively).41 Pooled correlation between the short and long forms, and within the short versions was moderate (p=0.67; 95% CI 0.64 to 0.70; and p=0.58, 0.51, 0.64, respectively).41 Assessment of the IPAQ short form (IPAQ-SF) walking component demonstrated excellent reliability over 3 days (r=0.77 for IPAQ-S7S, 0.72 for IPAQ-SUS) and was very good over 7 days (r=0.91 for IPAQ-SUS).42 The criterion validity of the IPAQ-SUS and its walking component against an accelerometer was poor (r=0.1331 and 0.26,35 respectively). Whereas the IPAQ-S7S and its separate walking component demonstrated small to moderate (r=0.26–0.4)41 and moderate criterion validity (r=0.39),42 respectively.

Similarly, the IPAQ-L7S demonstrated very good/excellent repeatability over 1 week (r=0.627, p<0.00143; p=0.7–0.8841; r=0.74–0.79, p<0.000144) and 2 weeks (r=0.74–0.79; p<0.0001), even when tested on a small sample (n=36).44 Furthermore, no significant difference was found between repeated IPAQ-L7S administrations for separate intensities of PA (r=0.74–0.84) in a study of good methodological quality.45 The IPAQ-L7S demonstrated moderate validity with both the accelerometer (p=0.05–0.43)42 and DLW technique (for activity-related energy expenditure (AEE) r=0.31, p=0.06 and metabolic energy equivalent per week r=0.33, p<0.05).44 Moreover, classification of ‘active’ and ‘non-active’ participants demonstrated good agreement between the IPAQ-L7S and accelerometer (p<0.001).43

The test–retest reliability of the IPAQ long form ‘Usual Week’ (IPAQ-LUS) was excellent (r=0.91), but poor correlation was found with an accelerometer (p=0.91), although the sample size was small (n=28).41 The IPAQ-LUS demonstrated better agreement with the Actigraph accelerometer for combined moderate and vigorous PA (r=0.3), and solely vigorous PA (r=0.42), than for moderate PA (r=0.19).46 Above this intensity, the IPAQ-LUS tends to over-estimate moderate/vigorous PA, which correlates to a 165% increase in PA,46 whereas the IPAQ-L7S has been shown to under-estimate PA by 27%.44 This component of the study demonstrated fair methodological quality.

Recent Physical Activity Questionnaire

The Recent Physical Activity Questionnaire (RPAQ) possesses high test–retest reliability for physical AEE (PAEE) (ICC=0.76), with the separate PA domains demonstrating poor (home ICC=0.62) to high reliability (work ICC=0.85).47 Validity of the estimated PAEE was associated significantly with the DLW technique (r=0.39; p=0.0004), and strongly with vigorous PA (r=0.67; p<0.0001).47 The study was deemed to be of good COSMIN rank.

PA Assessment Tool

A single study of good COSMIN methodological quality evaluating the PA Assessment Tool (PAAT) demonstrated significant test–retest reliability (r=0.618, p<0.001).43 When assessing the PAAT against the accelerometer, significant correlation (r=0.38–0.392; p<0.01) and fair agreement regarding participant classification as ‘active’ (k=0.338) was found.43

Six-point scale

Fair agreement was found between the six-point scale and IPAQ-L7S (k=0.46), and good agreement with the HAP (k=0.57), in a study of poor COSMIN rank.40 They found that increasing age and presence of comorbidities had a significant negative effect on PA (p<0.05 and 0.000, respectively). Gender and smoking had no effect (p>0.05) in an experiment of fair COSMIN methodological quality.40

Human activity profile

One study found that the HAP poorly correlated with the IPAQ-L7S (k=0.38), with this portion being of poor COSMIN ranking. Age (p<0.05) and the presence of comorbidities (p<0.000) significantly affected PA, while gender and smoking did not (p>0.05), which was similar to the six-point scale.40 The findings demonstrated that occupation significantly affected PA categorisation using the HAP (p<0.000), and this portion of the study was deemed to have fair COSMIN ranking.37

Single-item measure

One study of good COSMIN methodological quality reported significant correlation between the accelerometer and the single-item measure for all moderate-vigorous PA over 30 min (r=0.46; p<0.001). Stronger correlation was shown when this intensity of PA was taken at 10 min intervals (r=0.57; p<0.001).48 Participants were found to under-report activity using the single-item measure (−1.59 days) compared with all objectively measured moderate-vigorous PA, although when compared with recorded PA in 10 min bouts, there was stronger correlation of data (0.38 days).

G-S 1 week recall

Moderate correlation of the G-S recall with an accelerometer when the PA was of moderate to moderate-vigorous intensity (r=0.3 and 0.4, respectively) was shown but a stronger association for vigorous PA (r=0.5) was exhibited. It was demonstrated that the G-S recall data were not significantly altered by their use in participants of different genders, weight classifications or ethnicity (p>0.05).49 Furthermore, the subgroup analysis demonstrated that males performed more moderate-vigorous activity than females (p<0.001), and non-overweight participants completed more moderate and moderate-vigorous PA than overweight/obese individuals (p<0.05). The COSMIN rank for this study was poor.


Measurement properties

Self-report physical activity questionnaires

This systematic review identified available English language SRPAQs, updating and focusing on the previously completed research.11 Single studies of good methodological quality each evaluated the RPAQ,47 PAAT,43 single-item measure48 and G-S recall,49 giving confidence to the demonstrated significant test–retest reliability and criterion validity. The six-point scale demonstrated a moderate level of agreement with the HAP and IPAQ-L7S, whereas the HAP only demonstrated a fair level of agreement with the IPAQ-L7S.40 The HAP was able to distinguish PA differences between participant subgroups, however, fair COSMIN methodological quality was calculated for this study,40 due to limited statistical analysis, and the sample size was small, which reduces the confidence in this study's results. The G-S recall identified females as less active than males and the overweight/obese subgroup as less active than those who are non-overweight.49 This study also demonstrated that gender, ethnicity and weight did not significantly alter the data derived from the accelerometer or G-S recall questionnaire. This testing of different participant subgroups supports the construct validity of the SRPAQs, and consequently their use in heterogenous sample groups, which is representative of healthcare patient populations.

International Physical Activity Questionnaire

The most investigated SRPAQ was the IPAQ, specifically the IPAQ-L7S, which is the most extensively used SRPAQ worldwide, due to its varying formats and translation into many different languages. All forms of the IPAQ demonstrated very good/excellent test–retest reliability,41–45 with the results for the IPAQ-L7S and short form IPAQ ‘walking only’ component being classified as excellent. The English IPAQ-SUS was evaluated with a small sample, which limits its generalisability. However, cross-cultural comparisons involving non-English speaking countries corroborate the excellent reliability of the IPAQ-SF50 and, consequently, their use in clinical practice. Repeated administrations for separate PA intensities demonstrated no significant difference on IPAQ-L7S value, demonstrating excellent test–retest reliability. Unfortunately, this was completed on a group of university students of limited age range,45 which reduces transferability to the general population, and the authors stated poor recording consistency of the data collected. However, total PA was correlated, albeit weakly, to motivation and competency, demonstrating an attempt to explain PA scores within the excellent sample size.

The criterion validity of the IPAQ against an accelerometer or DLW technique was variable for the differing intensities of PA across the studies, with poorer correlation being found for total PA.41 ,42 The use of limited accelerometer data on a small convenience sample may have caused this and, with increased numbers, the validity of the IPAQ-S7S did significantly improve.41 Furthermore, a systematic review analysing the IPAQ-SF reported an over-estimation of PA by approximately 84%,51 so despite being reliable, the validity of the IPAQ-SF requires further investigation. Poor correlation between the IPAQ-LUS and accelerometer was significantly improved by increasing the sample size by adding its non-English counterpart,41 but it demonstrated better criterion validity at <1000 min/week of PA.46

Testing the ability of the SRPAQ to identify PA change in different subgroups of participants assesses its construct validity.16 Expectedly, the IPAQ-L7S showed that age significantly affects an individual’s PA; however, the presence of comorbidities and smoking did not.40 This may be due to low numbers within these participant groups and lack of clarity over the degree to which the comorbidity affected their health status. The authors suggested that the unexpected weak associations between total PA, motivation and the ability to undertake PA,45 were due to variability in the recording skills of participants, affecting data consistency, so repeated experimentation would be beneficial.

Encouragingly, it has been shown that higher IPAQ scores correlate with lower mortality rates and a reduced risk of cardiovascular disease, demonstrating its worth for public health assessment.52 This supports its interpretability, which although not considered a measurement property, is a clinically important characteristic of any outcome measure. However, suggestion of inconsistent translation of the IPAQ may not allow the same accreditation to non-English versions, therefore investigation into each translation would be necessary.33

Recent Physical Activity Questionnaire

The development of the RPAQ with a 1 month recall period was to calculate the individual's PA in relation to home, work, transport and leisure time.37 The RPAQ demonstrated high test–retest reliability and was strongly associated with the DLW technique for vigorous PA, giving the RPAQ criterion validity. This single study used different populations for the reliability and validity components of analyses, with the validity study consisting of a good sample size.37 The stability of the PA between repeated measures was assumed and the testing environment was not described, allowing for the COSMIN methodological quality to be described as good. Interestingly, the authors evaluated the ICC of 0.62 for home activity as poor, whereas the other literature would have ranked this more highly,53 which reduces standardisation of the ICC and data synthesis across studies.

PA Assessment Tool

The PAAT measures moderate and vigorous PA over the preceding 7 days, and is intended to be completed within 7 min.43 This tool would be ideal for implementation in a healthcare professional's waiting area and determines whether this 7-day PA is similar to the patient's normal activity. A single study evaluating the PAAT demonstrated significant reliability for vigorous activity (r=0.618; p<0.001); however, by only using 67 participants, the strength of the evidence is reduced. Furthermore, participant interpretation of moderate and vigorous PA may introduce ambiguity of reporting, and cause inconsistency in results. This study evaluated a high proportion of female volunteers who, it has been stated, may demonstrate reduced PA in their lifetime due to pregnancy and child care. This could reduce the application of findings into male populations; however, limited research on female PA neither supports nor refutes this hypothesis.54 Additionally, the PAAT quantifies the energy for moderate and vigorous PA only, and against potentially outdated Compendium values,55 which may affect the applicability of the results found.

Six-point scale

The six-point scale was devised to reduce time and financial constraints associated with lengthy questionnaires. PA is classified on a sliding scale with additional descriptors referring to the amount of perspiration and depth of breathing when examples of exercise are inappropriate.40 This may introduce subjectivity in participant reporting and inconsistency between test subjects. Structural validity was assessed by comparing the six-point scale and HAP with the IPAQ-L7S,40 demonstrating that a brief scale can assess PA adequately when compared to the longer SRPAQs on a heterogeneous sample recruited by verbal invitation. By subgrouping the participant group, they were able to discover that increasing age and the presence of comorbidities had an expected negative effect on PA. Smoking did not have an effect on the amount of PA completed, but a small sample size may not have enabled suitable analysis to be performed. It was concluded that the six-point scale can be used for assessing PA in large-scale heterogeneous sample epidemiological studies, but the poor COSMIN methodological quality, due to reduced description of the six-pont scale and the non-directional hypotheses, limits confidence in this finding and more rigorous investigations are encouraged.37

Human activity profile

The HAP measures PA levels to give an estimate of the individual's average energy expenditure.37 The HAP poorly correlated with the IPAQ-L7S40; although using a different κ classification this would have been described as fair,56 and deemed moderate with the six-point scale. Smoking did not significantly affect the PA, but expectedly, physicality of occupation did significantly affect the categorisation of PA by the HAP, which improves its construct validity. Disappointingly, this study did not use exploratory or factor analysis statistical methods and, consequently, methodological quality was poor,37 reducing confidence in findings.

Single-item measure

The single-item measure was investigated to observe whether the longer SRPAQs were, in fact, any more beneficial. The validity of the single-item measure48 was tested against the accelerometer in a healthy population of good representation to the general public. This study reports significant correlation between the accelerometer, which is stronger when PA was taken at 10 min intervals. By analysing 10 min PA intervals, the sensitivity of testing was improved, although the specificity was better when testing total moderate-vigorous PA.57 This lack of standardisation may limit the use of the single-item measure in clinical practice and research. However, the idea that a single question could be as reliable and valid as the longer SRPAQ is interesting and clinically relevant, so further investigation would be beneficial.

G-S recall

The criterion validity of the G-S recall against the accelerometer resulted in moderate to strong correlation in varying PA. The authors of this study did note a greater variability with higher levels of PA and reported that the G-S recall has restrictions when used to assess individual PA levels, however, the G-S recall data were not significantly altered by their use in heterogeneous groups.49 The ranking of this portion of the validity study was deemed as ‘poor’ by COSMIN, due to the lack of hypotheses prior to data collection. Furthermore, the statistical analysis was not optimal,16 causing the inferences to clinical and research practice to be restricted. This may limit its use, so further investigation for use is encouraged.

PAQs in research and clinical practice

To enable SRPAQ data to be equated to national guidelines,23 units need to be comparable. This is a limitation for questionnaires assessing PAEE, as the guidelines focus on time, thereby reducing the clinical significance of SRPAQ data, the comparison between different SRPAQs due to ambiguity and transferability into practice.58 Development of SRPAQs has reduced completion time, with the PAAT, six-point scale and single-item measure feasible for use in clinical practice, but further research is required to support their use across languages/populations. Conversely, the HAP is a lengthy questionnaire, thus increasing costs and reducing clinical usability.

The RPAQ demonstrates a strong correlation for assessing monthly vigorous PA energy expenditure, but not sedentary PA.47 Likewise, the PAAT and single-item measure only investigate moderate/vigorous PA over the preceding 7 days, questioning its use for all intensities of activity, as it may affect the categorisation of the individual as active or sedentary. The inconsistency of recording 10 and 30 min PA data for the single-item measure reduces standardisation, potentially leading to errors in data analysis.48 The G-S recall did detect differences between gender and weight classifications, with no difference shown when correlating their data with the gold standard.49 Similarly, there was moderate to strong correlation between moderate to vigorous PA collected comparable to the accelerometer. However, poor COSMIN methodological quality for its hypothesis testing substudy16 questions the inference that can be made when using the tool clinically or in research. Therefore, due to the ambiguity surrounding the PA intensity analysed, further scrutiny is required to improve consistency.

The IPAQ has the advantage of being researched extensively across languages, populations and demographic groups. It has been developed into multiple formats, making the IPAQ the most extensively used SRPAQ worldwide.51 There is good correlation between the long and short forms via all modes of delivery; however, caution should be used if comparing the two, as the IPAQ long form (IPAQ-LF) can over-estimate PA to a greater extent than the IPAQ-SF.46 The questionable validity of the IPAQ-SUS and time-consuming administration of the IPAQ-LF restrict their use clinically. The IPAQ-S7S is, at present, the preferred measure, owing to its excellent test–retest reliability over 7 days. Moreover, the IPAQ-S7S is supported across cultures within non-English speaking countries, assisting transferability of information, and this, along with its short format, makes it feasible for research and clinical practice. Owing to studies with low sample sizes demonstrating poor validity, a study investigating an adequate sample size of over 100 participants is required.


Limited narrative comparison of the SRPAQs was achievable due to their individual aims to report different PA intensities. For example, the IPAQ, HAP and RPAQ investigate a wide variety of PA, whereas the six-point scale and single-item measure are brief overviews of daily activity or exercise, and the PAAT concentrates on moderate/vigorous activity. The individual studies did not define their understanding of PA; and the SRPAQs did not indicate the difference between PA intensities. Furthermore, the single-item measure57 may compromise breadth and depth of recorded data for speed of administration.49 The main methodological flaws encountered were from the use of limited statistical tests and low subject numbers. Using the COSMIN scoring system, successive methodological limitations will reduce the rank from excellent methodological quality through to good, fair or poor, reflecting the confidence we can have in study findings. Additionally, within the reliability studies, it was generally assumed that no change in PA occurred rather than statistically analysing PA stability. Furthermore, the Spearman correlation coefficient indicates association rather than true agreement.40 Therefore, caution should be taken when accepting validity with only this supporting statistical test, which is true for the majority of included studies. Moreover, there were discrepancies in ranking the numerical outcome of the statistical tests between studies, which should be noted during data synthesis for each SRPAQ.

The ambiguity of common terminology used within PA research may have affected the electronic database search for this systematic review, possibly contributing to omissions in the included studies. In addition, as only English language SRPAQs were included, the sample sizes within the Craig et al41 study were greatly reduced, diminishing the significance of their results. Furthermore, the aim of this systematic review was to comment on the measurement properties of SRPAQs, but, unfortunately, only studies investigating test–retest reliability, internal consistency and criterion, construct and structural validity, were identified, reflecting the limited research to date on measurement properties.


Ambiguity in PA terminology, patient reporting of PA, and the variable nature of activity across the seasons and 7 days, makes daily activity difficult to assess using structured SRPAQs. Further inconsistencies within the PA assessed by each SRPAQ, limited measurement properties being evaluated and poor methodological quality of many studies, contributed to difficult data synthesis across studies. Consequently, the optimum SRPAQ has not been reported. The IPAQ-L7S is the most investigated SRPAQ, with all versions reported to have good test–retest reliability but limited criterion validity, potentially due to limited methodological quality. Based on current data, the IPAQ-S7S is the most appropriate outcome measure for clinical and research use, as it has excellent reliability and moderate correlation with accelerometry. The short version makes it efficient for clinicians, also making it more cost-effective. Future research should continue to assess SRPAQs against the ‘gold standard’ PA performance-based measures, in diverse socioeconomic groups worldwide. Assessment of responsiveness and interpretability would be valuable to relate the levels of PA required to reduce the risk of chronic disease for the individual. This is fundamental for disease prevention and therefore essential for the promotion of public health.


The authors acknowledge Mr Gethin Lynch, affiliated to School of Sport, Exercise and Rehabilitation Sciences, University of Birmingham, UK, for assisting with article selection.


View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors ZS was involved in finding and appraising all articles, completing COSMIN analysis, data extraction, analysis and synthesis, and wrote the article. This included documenting the supporting evidence for the need for the systematic review, adapting pre-defined protocol for its completion, and recording and synthesising the results obtained with inferences for clinical practice and research. RG was involved in the selection of articles, COSMIN analysis, and data extraction and synthesis for this manuscript. AR was the mediating third reviewer, academic supervisor of the research and peer reviewed the submitted article.

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Ethics approval Ethical approval was obtained from the University of Birmingham.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.