Read How to Read a Paper: The Basics of Evidence-Based Medicine Online
Authors: Trisha Greenhalgh
Before we turn to the critical appraisal, a word about terminology. A questionnaire is a form of psychometric instrument—that is, it is designed to measure formally an aspect of human psychology. We sometimes refer to questionnaires as ‘instruments’. The questions within a questionnaire are sometimes known as
items
. An item is the smallest unit within the questionnaire that is individually scored. It might comprise a stem (‘pick which of the following responses corresponds to your own view’) and then five possible options. Or it might be a simple ‘yes/no’ or ‘true/false’ response.
Ten questions to ask about a paper describing a questionnaire study
Question One: What was the research question, and was the questionnaire appropriate for answering it?
Look back to section ‘The science of “trashing” papers’, where I describe three preliminary questions to get you started in appraising any paper. The first of these was ‘what was the research question—and why was the study needed?’. This is a particularly good starter question for questionnaire studies, because (as explained in the previous section) inexperienced researchers often embark on questionnaire research without clarifying why they are doing it or what they want to find out. In addition, people often decide to use a questionnaire for studies that need a totally different method. Sometimes, a questionnaire will be appropriate but only if used within a mixed methodology study (e.g. to extend and quantify the findings of an initial exploratory phase).
Table 13.1
gives some real examples based on papers that Petra Boynton and I collected from the published literature and offered by participants in courses we have run.
Table 13.1
Examples of research questions for which a questionnaire may
not
be the most appropriate design
There are many advantages to researchers of using a previously validated and published questionnaire. The research team will save time and resources; they will be able to compare their own findings with those from other studies; they need only give outline details of the instrument when they write up their work; and they will not need to have gone through a thorough validation process for the instrument. Sadly, inexperienced researchers (most typically, students doing a dissertation) tend to forget to look thoroughly in the literature for a suitable ‘off the peg’ instrument, and such individuals often do not know about formal validation techniques (see subsequent text). Even though most such studies will be rejected by journal editors, a worrying proportion find their way into the literature.
Increasingly, health services research uses standard ‘off the peg’ questionnaires designed explicitly for producing data that can be compared across studies. For example, clinical trials routinely include standard instruments to measure patients' knowledge about a disease [6]; satisfaction with services [7]; or health-related quality of life (QoL) [8] [9]. The validity (see subsequent text) of this approach depends crucially on whether the type and range of closed responses (i.e. the list of possible answers that people are asked to select from) reflects the full range of perceptions and feelings that people in all the different potential sampling frames might actually hold.
Question Two: Was the questionnaire used in the study valid and reliable?
A valid questionnaire measures what it claims to measure. In reality, many fail to do this. For example, a self-completion questionnaire that seeks to measure people's food intake may be invalid, because in reality it measures what they
say
they have eaten, not what they have
actually
eaten [10]. Similarly, questionnaires asking general practitioners (GPs) how they manage particular clinical conditions have been shown to differ significantly from actual clinical practice [11]. Note that an instrument developed in a different time, country or cultural context may not be a valid measure in the group you are studying. Here's a quirky example. The item ‘I often attend gay parties’ was a valid measure of a person's sociability level in the UK in the 1950s, but the wording has a very different connotation today [1]! If you are interested in the measurement of QoL through questionnaires, you might like to look out for the controversy about the validity of such instruments when used beyond the context in which they were developed [12].
Reliable questionnaires yield consistent results from repeated samples and different researchers over time [4] [5]. Differences in the results obtained from a reliable questionnaire come from differences between participants, and not from inconsistencies in how the items are understood or how different observers interpret the responses. A standardised questionnaire is one that is written and administered in a strictly set manner, so all participants are asked precisely the same questions in an identical format and responses recorded in a uniform manner. Standardising a measure increases its reliability. If you participated in the UK Census (General Household Survey) in 2011, you may remember being asked a rather mechanical set of questions. This is because the interviewer had been trained to administer the instrument in a highly standardised way, so as to increase reliability. It's often difficult to ascertain from a published paper how hard the researchers tried to achieve standardisation, but they may have quoted inter-rater reliability figures.
Question Three: What did the questionnaire look like, and was this appropriate for the target population?
When I say ‘what did it look like?’ I am talking about two things—form and content. Form concerns issues such as how many pages was it, was it visually appealing (or off-putting), how long did it take to fill in, the terminology used, and so on. These are not minor issues! A questionnaire that goes on for 30 pages, includes reams of scientific jargon, and contains questions that a respondent might find offensive, will not be properly filled in—and hence the results of a survey will be meaningless [2].
Content is about the actual items. Did the questions make sense, and could the participants in the sample understand them? Were any questions ambiguous or overly complicated? Were ambiguous weasel words such as ‘frequently’, ‘regularly’, ‘commonly’, ‘usually’, ‘many’, ‘some’ and ‘hardly ever’ avoided? Were the items ‘open’ (respondents can write anything they like) or ‘closed’ (respondents must pick from a list of options)—and if the latter, were all potential responses represented? Closed-ended designs enable researchers to produce aggregated data quickly, but the range of possible answers is set by the researchers, not the respondents, and the richness of responses is therefore much lower [13]. Some respondents (known as
yea-sayers
) tend to agree with statements rather than disagree. For this reason, researchers should not present their items so that ‘strongly agree’ always links to the same broad attitude. For example, on a patient satisfaction scale, if one question is ‘my GP generally tries to help me out’, another question should be phrased in the negative—for example, ‘the receptionists are usually
impolite
’.
Question Four: Were the instructions clear?
If you have ever been asked to fill out a questionnaire and ‘got lost’ halfway through (or discovered you don't know where to send it once you've filled it in), you will know that instructions contribute crucially to the validity of the instrument. These include
These aspects of the study are unlikely to be listed in the published paper, but they may be in an appendix, and, if not, you should be able to get the information from the authors.
Question Five: Was the questionnaire adequately piloted?
Questionnaires often fail because participants don't understand them, can't complete them, get bored or offended by them or dislike how they look. Although friends and colleagues can help check spelling, grammar and layout, they cannot reliably predict the emotional reactions or comprehension difficulties of other groups. For this reason, all questionnaires (whether newly developed or ‘off the peg’) should be piloted on participants who are representative of the definitive study sample to see, for example, how long people take to complete the instrument, whether any items are misunderstood, or whether people get bored or confused halfway through. Three specific questions to ask are (i) What were the characteristics of the participants on whom the instrument was piloted; (ii)
How
was the piloting exercise undertaken—what details are given? and (iii)
In what ways
was the definitive instrument changed as a result of piloting?
Question Six: What was the sample?
If you have read the previous chapters, you will know that a skewed or non-representative sample will lead to misleading results and unsafe conclusions. When you appraise a questionnaire study, it's important to ask what the sampling frame was for the definitive study (purposive, random and snowball) and also whether it was sufficiently large and representative. Given here are the main types of sample for a questionnaire study (
Table 13.2
).
Table 13.2
Types of sampling frame for questionnaire research
Sample type | How it works | When to use |
Opportunity/haphazard | Participants are selected from a group who are available at time of study (e.g. patients attending a GP surgery on a particular morning) | Should be avoided if possible |
Random | A target group is identified, and a random selection of people from that group is invited to participate. For example, a computer might be used to select a random one in four sample from a diabetes register | Use in studies where you wish to reflect the average viewpoint of a population |
Stratified random | As random sample but the target group is first stratified according to a particular characteristic(s)—for example, diabetic people on insulin, tablets and diet. Random sampling is carried out separately for these different subgroups | Use when the target group is likely to have systematic differences by subgroup |
Quota | Participants who match the wider population are identified (e.g. into groups such as social class, gender age, etc.). Researchers are given a set number within each group to interview (e.g. so many young middle-class women) | For studies where you want to reflect outcomes as closely representative of the wider population as possible. Frequently used in political opinion polls, and so on |
Snowball | Participants are recruited, and asked to identify other similar people to take part in the research | Helpful when working with hard-to-reach groups (e.g. lesbian mothers) |