• Spencer Greenberg and Clare Harris

Our research shows how word choice can have a huge impact on survey results

Updated: May 7

When you read about surveys, polls, and psychology studies in the news, you’re probably aware that these involve asking people questions. While this can be an effective way to learn about human psychology and opinions held around the world, it can also be misleading; you might not realize just how much the exact wording of these questions can affect the results. This is a big deal if you want to learn from polls, allow study results to inform your beliefs, or if you conduct your own survey-based research.


In this article, we give three real-life examples of how surveys can lead to false conclusions for subtle reasons, followed by a longer write-up of our own randomized controlled trial where we demonstrate how unscrupulous actors (or just inexperienced researchers) can use small changes in question phrasing to dramatically alter opinion poll results.



In this article, you'll find:


Here are some dramatic, real examples of studies whose results were unintentionally influenced by word choice:



Example 1: Challenges to Measuring the Sunk Cost Fallacy


In a study conducted with Larry Friedman, we tried to measure the Sunk Cost Fallacy, the bias many of us have to stick with things because of the time or money we’ve already invested in them, even when it would be better to abandon them. We asked study participants to consider a hypothetical scenario where they order food at a restaurant, then realize they're already full and don't like the taste. We asked: "What would you do in this scenario?"


Many said they would eat the food anyway, suggesting they fell for the fallacy. In other words, the sunk cost of having already spent money on food made it difficult to leave the meal uneaten, even though eating it was not pleasant.


But when we asked study participants to explain why they gave the answers they did, we realized that the question wasn’t measuring what we thought it was! Many participants assumed they wouldn't be eating alone and didn't want to seem weird by leaving all the food on their plate. Some even said they felt obligated to eat the food because the chef had put work into preparing it. We had accidentally measured a social phenomenon instead of the bias we were attempting to find.


It took quite a few more iterations until we found a wording that seemed to actually measure people's susceptibility to the Sunk Cost Fallacy.



Example 2: Surprises Studying Delusional Beliefs


We found an academic paper claiming that a surprisingly high percentage of people have delusional beliefs, including thoughts that insects crawl on them or that the TV plays messages just for them. In another collaboration with Larry Friedman, we ran the same questions from the paper on some new study participants and found a few (though not all) of these same effects.


But when we asked respondents to explain their answers, we learned that their responses were almost entirely NOT delusional! Some people reported having lice (so they did indeed have insects crawling on them), while some people who watched TV online pointed out that it did have messages that are targeted "just for them:" advertising based on their prior clicks! It turns out that what appeared delusional to the original researchers wasn't delusional after all; it was largely a product of the questions being poorly worded and a failure to explore why participants gave those answers.



Example 3: Difficulties in Experimental Philosophy


In philosophy, there is a question of whether "ought implies can" (meaning that someone can only be morally obligated to do something they are capable of doing). An influential study from experimental philosophy found that ordinary people largely reject this principle, despite many philosophers accepting it.


However, experimental philosopher Kyle Thompson got study participants to explain their answers. It turns out that about 90% of his respondents did adhere to the principle of "ought implies can!" They were just not interpreting the questions the way that the original research team had assumed. Specifically, respondents reinterpreted the questions they answered to be consistent with the principle: people in thought experiments were only morally obligated to do things that they were capable of doing.


An example of how subtle (but deliberate) changes in wording can dramatically change results: findings from our own experiment

Given these concerning examples, we wanted to test the effects of question wording directly. So we ran a randomized controlled trial (RCT) to find out whether we could change respondents’ opinions on a specific topic based purely on how we worded the questions.

We used Positly.com to ask a group of predominantly conservative Americans [1] if they think that too much or too little money is being spent on national defense and military. In one condition, 61% of respondents said too little is being spent on national defense and military. But in another condition, despite us asking exactly the same questions, only 30% said too little is being spent. While this sounds like a contradiction, this was actually a predictable outcome that resulted from simply presenting different numbers of answer options to study participants.

Take a look at the two panels below, showing the exact question that participants were asked - half got the wording on the left, and half got the wording on the right.



Figure 1: The three-option and two-option versions of the main experimental question. Among the participants who had three answer options available, 27.2% of people said the U.S. is spending "too much" on military and defense, 42.8% said that spending is "about right," and 30.0% said that the U.S. is spending is "too little." In contrast, when people had two answer options available, by necessity, they had to choose a non-neutral answer (because there was no neutral answer available!). Among people who had two answer options, 39.1% said the U.S. is spending "too much" on military and defense, and 60.9% said the U.S. is spending "too little." To put it another way, the percentage saying the U.S. is spending "too much" was 27.2% when there was a neutral option available, and 39.1% when there wasn't a neutral option available (an 11.9% difference). The percentage saying the U.S. is spending "too little" was 30.0% when there was a neutral option available, and 60.9% when there wasn't a neutral option available (a 30.9% difference).


As you can see, due to the inclusion of just one less response option, the proportion of respondents thinking that the U.S. is spending "too little" on national defense and military appeared to more than double, jumping from 30% to 61%. This is simply a consequence of forcing people not to choose a neutral option (by excluding a neutral option from their set of answer options).


And it gets worse. When participants were given the same information on current military spending immediately prior to answering, they gave very different answers depending on how this figure was expressed. For those shown current spending as a percentage of all U.S. spending (Figure 2, bottom panel), 62% thought we are spending too little on national defense - but for those shown current spending as a percentage of discretionary* U.S. spending (Figure 2, top panel), only 26.5% thought we are spending too little! (*Discretionary spending is essentially optional government spending, as opposed to the mandatory funding of social programs.)


Let's look at exactly how we produced our second seemingly contradictory result, causing people to give different responses by varying the context in which spending figures were presented. Prior to being asked the question shown above, people were shown the following sentence: "In the U.S., there is much discussion regarding the amount of money the government in Washington should spend for national defense and military purposes." This was followed by either no information or one of seven true pieces of information, some of which were designed to bias their response in a particular direction. Below, you can see the information that caused the lowest proportion of people to say that military spending was too low (Figure 2, top panel) and the information that caused the highest proportion of people to say that military spending was too low (Figure 2, bottom panel).



Figure 2: The presentation of U.S. military spending as a percentage of discretionary spending produced different survey responses compared to the presentation of U.S. military spending as a percentage of all U.S. spending.

If we were trying to mislead the public, and if we hadn't preregistered our study [2], we could just report one of these above results to make it seem like our survey results supported whichever conclusion we wanted to. [3]

Of course, that's not our goal - instead, we purposely aimed to get contradictory results in order to reveal some of the ways that survey responses can misrepresent what people actually believe. By seeing how we produced these results, we hope you’ll better understand how polls can distort your perception of reality (whether that is the intention of the surveyors or not). And, if you are someone who runs psychology surveys or experiments yourself, we hope that this study serves as a reminder of the importance of preregistration, as well as the critical importance of ensuring your study wording is appropriate for answering the research questions you have set. Slight changes to wording or the options available to participants can have huge impacts on study results!

Our experiment also illustrates how critical it is to see the exact wording on polls before you can interpret them properly. Just as a magician hides aspects of their performance to create an intended effect, unscrupulous research organizations can hide aspects of their research design while producing exactly the effect that they want you to see. This hilarious video from the sitcom Yes, Prime Minster illustrates these ideas well:




Our Study in a Nutshell


In this study, we used different design "tricks" in different conditions of the experiment to get people to both agree and disagree that military spending is too low. Below, we explain in more detail how respondents' opinions differed depending on which condition of the study they were randomized into and what you can learn from these results.


Overall goal:

To directly test the effect of study wording and answer options on survey results, using a single-issue opinion poll about military spending in the U.S.

Study design:

Randomized controlled trial.

  • Information randomization: Participants were randomized to either no information or to one of seven different pieces of true information about military spending in the U.S.

  • Answer option randomization: Approximately half the participants were given two answer options and the other ~half of them were given three answer options.

  • Comparisons reported here are being made between participant groups (rather than within participants).

Number of participants:

n=1350 (this is the number of people who completed the study and passed the single-question attention check).

The question posed to participants:

3-answer-option version: Do you think we are spending too little, about the right amount, or too much on national defense and military?


2-answer-option version: Do you think we are spending too little or too much on national defense and military?

The outcomes we measured:

Participants answered “too much” (coded as -1), “about right” (coded as 0), or “too little” (coded as 1). We measured the average response on a -1 to 1 scale. We also checked the percentage saying “too little.”

The effects we found with different qualitative info:

Q(i) Are people less likely to say the U.S. is spending too little on military and defense if told about its negative effects (compared to being shown the justifications for such spending)?


According to our data: yes. People who were randomized to the group in which emphasized the negative impacts of U.S. military spending had a mean response of -0.058 (with -1 meaning “too much,” 0 meaning “about right,” and 1 meaning “too little”). In contrast, people who were randomized to the group which listed geopolitical justifications for the spending had a mean response of 0.362. This represents a statistically significant difference (Mann-Whitney U = 11275, n1 = 172, n2 = 174, p = 1.48 * 10^-5).

The effects we found with different ways of expressing the same number:

Q(ii) Are people less likely to say the U.S. is spending too little on military and defense if presented with true information emphasizing the (large) magnitude of current military spending, compared to if they are presented with true information that makes the spending seem smaller in magnitude (by changing how the spending is presented/contextualized)?


According to our data: yes. People who were randomized to groups in which the presented information made current military spending sound relatively large had a mean response of -0.111 (with -1 meaning “too much,” 0 meaning “about right,” and 1 meaning “too little”). In contrast, people who were randomized to groups in which military spending sounded relatively low had a mean response of 0.370. This represents a statistically significant difference (Mann-Whitney U = 38704.50, n1 = 332, n2 = 335, p = 3.15 * 10^-13).

The effects we found with 2 vs. 3 answer options:

Q(iii) Does the availability of only two answer options artificially polarize responses, compared to if a third, neutral answer option is available to choose?


According to our data: yes. About 61% of the participants who were presented with two answer options (“too much” or “too little”) said that U.S. military spending is too low, whereas only 30.0% of the participants who were presented with three options (including a neutral option, “about right”) said that spending was too low. This represents a statistically significant difference (Mann-Whitney U = 157420, n1 = 680 , n2 = 670, p = 4.33*10^-30).

Our analyses:

You can find all our analyses summarized here. Please contact us if you'd like the Jasp file that we used to generate the summary.

Our data:

You can find our de-identified dataset here. Please let us know if you perform any analyses on it and find anything interesting you’d like to share!

Study code:

You can see the GuidedTrack code used to administer the study here.

Study preview:

You can preview the study, exactly as participants saw it, here. You will need to run it many times to see all conditions.

Preregistration:

You can find our preregistration here.



Our study illustrated two key points


(a) When interpreting the results of opinion surveys or polls, it's important to know what answer options the participants had to choose from, and in particular, whether there was a neutral or "unsure" option available. If there wasn't, this might artificially polarize responses.


As we predicted, a large percentage (61%) of the participants who were presented with two answer options ("too much" or "too little") in our study said that U.S. military spending is too low; whereas a much smaller percentage (only 30.0%) of the participants who were presented with three options (including a neutral option, "about right") said that spending was too low. This represents a significant difference (Mann-Whitney U = 157862, n1 = 680 , n2 = 671, p = 6.01*10^-30).


Effect of Answer Options on Survey Responses


Figure 3: Proportion of respondents saying “too little” in the three-answer-option vs. two-answer-option conditions. Bars represent 95% confidence intervals.


(b) In any study, prior to interpreting people's responses to any given question, it's important to understand the *context* in which it was asked.


The different information conditions we included in our study were designed to nudge people's responses in different ways. As we predicted, people were more likely to say that U.S. military spending is too low if they were presented with true information that made the current spending sound small (relative to other spending).


People who were randomized to groups in which the presented (true) information made current military spending sound relatively large had a mean response of -0.111 (with -1 meaning "too much," 0 meaning "about right," and 1 meaning "too little"). In contrast, people who were randomized to groups in which military spending sounded relatively low (by comparing it to larger numbers) had a mean response of 0.370. This represents a significant difference (Mann-Whitney U = 38704.50, n1 = 335, n2 = 332, p = 3.15 * 10^-13). All information presented to respondents was factually true, and yet different true information caused different responses.


Effects of Information Presentation on Survey Responses


Figure 4: Participants’ average judgment of the size of U.S. military spending, from -1 (“too much”) to 1 (“too little”). Bars represent 95% confidence intervals.



What should our takeaways be from cases like these?


i) Before interpreting a poll, survey, or study, check the exact wording of the questions being asked, along with any additional information shown to respondents at the time. Doing so will help you better understand what the study is actually measuring and help prevent you from coming to false conclusions based on the results.


ii) Keep in mind that unscrupulous actors (or inexperienced researchers) may choose wordings which bias survey respondents in a direction that leads to their preferred results. Try to keep this in mind when leaving your opinions on a poll.


iii) If you run studies, be sure to choose your question wordings with care so that they reflect what you are truly trying to measure. Strongly consider piloting the question wordings that you want to use (e.g., asking pilot study participants to explain why they gave the answers they did) to help you avoid ambiguities and misinterpretations! Quantitative questions (such as Likert scale multiple-choice items) allow you to make a measurement, but without qualitative questions, you sometimes can't tell what it is you actually measured.


Acknowledgements

We would like to thank all study participants for their time, to thank Cassandra Xia for inspiring this research, to thank Larry Friedman for working with Spencer to conduct the studies mentioned in Examples 1 and 2, and to thank Holly Muir and Adam Binks and for their help editing this article.



Footnotes

[1] We aimed to recruit only conservative Americans so that participants' prior opinions on military spending would more heterogeneous than they would have been if our sample contained mostly progressive Americans. In a pilot study, we found that a majority of participants were left-leaning and that the majority of them were saying that the U.S. is spending "too much." So we decided to purposely over-sample conservative participants in order to have a sample of people with more heterogenous prior opinions on military spending. We selected participants who were recognized in the Positly system as conservative; however, some of those participants had entered their political affiliation into the system some time ago, and reported on the day of the study that they were no longer conservative. These participants were still included in the final analysis. You can find further information about the political positions of everyone in our sample in our analysis summary document.

[2] However, we should also note that preregistration on its own not does not guarantee reproducible, insight-generating studies. (See here for a detailed discussion of what preregistration does and does not help with.) Also, our preregistration document was underdetermined in some ways: we did not explain that we would include an attention check and did not specify the political orientation of our study participants.

[3] To make it easy for you to review all the analyses we preregistered and ran, we have provided a summary document here (which has headings that make it clear how the various analyses reflect the planned comparisons in the preregistration document).