If you want to change behavior (either your own, or someone else’s), what method do you think would be best? Recently a megastudy with more than 61,000 participants was published on this topic, and we want to share with you the surprising results they found! We'll also discuss what you can learn from the study, and how you may be able to apply those lessons to your own life.
The study had a total of 61,293 participants, all with gym memberships. 30 scientists worked in small teams to develop interventions, each aiming to increase gym attendance over 4-weeks. This resulted in 53 interventions to test against each other, to see what really works for getting people to go to the gym more often! A promising setup to figure out what truly changes behavior!
Participants were randomized into different groups:
The control group
These participants received a small financial reward for participating in the study, but that’s all.
Baseline group
These participants received 3 things:
Planning: Participants planned the dates and times when they'd work out
Small incentives: They earned a small financial incentive for every workout
Reminders: They got a text 30 mins before each scheduled workout
Intervention Groups
The remaining participants were each assigned to one of 52 other interventions. Importantly: these interventions incorporated all of the elements that the baseline group got (planning + incentives + reminders) but then added extra stuff on top to test different strategies.
So, what did they find in this massive study?
Takeaway 1: An effective baseline
The first important finding is that the baseline group went to the gym more than the control group (about 0.14 more gym visits per week).
So, a small monetary incentive to workout + scheduling when to workout + text reminders had a modest positive benefit on gym attendance!
This suggests a practical approach to behavior change. By planning activities in advance (e.g., exactly when you will do the activity) , introducing small incentives for completion (e.g., giving yourself a small reward for doing a good behavior), and setting up reminders, you can create a structured framework to help you succeed at establishing or maintaining new habits.
But what about the other 52 interventions that were tested?
The paper abstract says: "45% of these interventions significantly increased weekly gym visits by 9% to 27%." Later in the paper, we learn this means 45% achieved p<0.05 compared to the control intervention (this p<0.05 cutoff, although somewhat arbitrary, is a very commonly used approach for deciding what results are probably not just the result of chance or luck). So, at first glance, this seems extremely promising: a whopping 23 techniques showed promise!
But we believe that comparing the interventions to the control group can be misleading, and lead to inaccurate conclusions.
The 52 interventions each incorporated all of the elements of the baseline and added extra stuff on top of the baseline. But we know that the baseline intervention beats the control so, of course, many interventions beat the control since they incorporate the baseline! Think of it this way: suppose you have three groups of medical patients, each randomized to receive different medications. If group A+B gets both Drug A and Drug B, group B gets just Drug B, and group C (the control group) gets no drug at all, how do you figure out if Drug A works?
If you compare group A+B to the control group and find that group A+B has better outcomes (equivalent to what they did in their main analysis), you can't tell if drug A works because you might be just finding the impact of drug B. And, in fact, if you already know that drug B works, then of course you'll find an effect from comparing group A+B to the control group, because drug B will make group A+B better than the control. Hence the only way to see if drug A works is to compare it to group B. In the behavior change study we've been discussing, this means that each intervention must be compared to the base-line group, not to the control group, because each intervention also included the techniques of the base-line group in addition to whatever unique method it also employed!
We think it’s possible to get more insight by seeing which interventions outperformed the baseline, not the control.
This comparison between each intervention and the baseline is discussed in the paper (so they did not hide this result) but not in the abstract, and the results are very different if considered this way (which we believe is a more informative way to look at their results). Remarkably, only 4 distinct interventions (the ones highlighted in red, below) out of the 52 beat the baseline (highlighted in gray, below) (p<0.05)!
So, what actually increased gym attendance? Or, in other words, what are the promising behavior change techniques from this study that you might be able to apply yourself?
Takeaway 2: Behavior change is hard
This leads us to our next big lesson from this study, which is that behavior change is incredibly hard! About 30 scientists who study behavior change worked to develop more than 50 interventions, and at most, only 4 of them meaningfully beat the baseline!
Takeaway 3: The techniques that showed promise
Here are the techniques that showed promise:
1️⃣ Bonuses after messing up
Giving extra, small (financial) incentives to participants who went to the gym at the scheduled time after they missed a scheduled workout. This rewards people for getting back on track after they slip up (i.e., rewards them for recovering after failure). It's important to note though that these rewards were not so big that participants had an incentive to slip up - they were still best off if they didn't miss a day.
2️⃣ Bigger incentives
Simply paying more points each time people went to the gym. This presumably is not a surprise - it's aligned with standard economic theory and common sense.
3️⃣ Information about what's normal
Telling participants "that a majority of Americans exercise frequently and that the rate is increasing" was associated with greater gym attendance. However, some relatively similar interventions did not have meaningful effects, so we suspect this intervention appearing to work may have been the result of a false positive. The more statistical tests you run, the more likely some appear to "work" just by chance, and so it is reasonable to expect that there would be at least one false positive in this study, even if none of the interventions worked.
4️⃣ Choice of gain/loss frame
This was quite a fascinating intervention that points at an interesting psychological principle. Participants learned they could choose to earn small financial rewards each day they visited the gym OR to start with all the rewards and lose money each day they did not visit the gym. They were told that their earnings would be the same in both programs, and the only difference would be whether they framed the situation as gaining points each time they were successful, or losing points whenever they failed. While perfectly rational agents with unlimited computational powers would presumably treat these situations as identical (since the number of points you get is the same either way), given human psychology we don't necessarily treat gains and losses in an equivalent manner - meaning that effects like loss aversion (where we treat a loss as worse than an equivalent gain) may increase our motivation if used strategically!
Here’s the table from the study, showing the results:
b is the number of extra days per week measured, 95%CI is the range of the 95% confidence interval, and P is the p-value (less than 0.05 is typically deemed significant).
It may be no surprise that bigger incentives increased gym attendance, and we suspect "information about what's normal" (aka "Exercise social norms shared") may have been a false positive, but the other 2 findings are quite interesting: bonuses after messing up, and giving participants a choice of a gain frame vs. a loss frame seemed to work!
It should be noted, however, that the effect sizes on these best interventions were still modest (less than an average of an additional 0.27 days of exercise per week, on average).
Furthermore, since they selected these from a wide set of interventions based on their apparent effectiveness, we should anticipate regression to the mean. That is, we should predict that the effect size of these best interventions, modest as it is, is likely inflated.
It’s also possible that they are so inflated that these are all false positives (the results of random noise, rather than real effectiveness), since 52 interventions were tested, the threshold for significance used in this study was p<0.05, even if none of the interventions worked at all, we'd still expect to find 2.6 false positives on average (if we ran this study over and over). That being said, the strongest case can be made in favor of the intervention "bonus for returning after missed workouts", as two different versions of it achieved p<0.05 (and they were analyzed as though they were separate/unrelated groups - numbers 1 and 5).
Takeaway 4: It's hard to predict intervention effectiveness
The final takeaway we want to highlight from this paper is that it's really hard to predict what will work with regard to behavior change!
This is illustrated by another fascinating thing about this paper: they tested whether different groups could predict which interventions would work to increase gym membership (in 3 separate studies).
For one study, they had 301 ordinary people make predictions about which would work.
Another study had 156 professors from the top 50 schools of public health making predictions.
A final study used 90 practitioners recruited from companies specializing in applied behavioral science.
None of the groups made accurate predictions about what behavior change methods work! Correlations between estimated treatment effects and observed effects were:
Ordinary people: r = 0.25, p=0.07
Professors: r = −0.07, p=0.63
Practitioners: r = −0.18, p=0.19
Which appears to show that, in this instance, ordinary people had a slightly more positive correlation in predicting behavior change methods' effectiveness compared to professors and practitioners, though the correlations are not strong enough to warrant making generalizations about those demographics capabilities - the takeaway here is that none of the groups were significantly accurate in predicting what works.
In the following chart, the left-hand column shows the actual results recorded in the experiment, and the right-hand column shows the results predicted by the ordinary people, professors, and practitioners (equally weighted). Error bars represent 95% confidence intervals.
What does this mean for you?
Remember, the takeaways we’ve highlighted from this megastudy are:
A small (monetary) incentive + scheduling when to workout + text reminders had a modest positive benefit on gym attendance. This was the baseline used in the study.
Behavior change is incredibly hard!
There is evidence that three interventions are better than the baseline:
1️⃣Bonuses after messing up
2️⃣ Bigger incentives
3️⃣ Choice of gain/loss frame
It’s also incredibly hard to predict what will work!
If you want to apply these insights to your own life, you could try finding ways to incorporate the elements of the baseline and the more-effective interventions into your attempts to change behavior. Try establishing rewards for good behavior, scheduling specific time to engage in the behavior you want, and setting reminders. Then also consider the importance of trying again after failure (and incentivize restarting) and reflect on whether you are more motivated by achieving gain or avoiding loss when it comes to your incentives (and pick the appropriate framing).
You might also use these insights to run an experiment with yourself. If there is some behavior you have been struggling to change (it needn’t necessarily be increasing your gym attendance!), why not try utilizing one or more of the interventions that performed better than the baseline. For a little bit of help with this, you could try our simple, free, interactive tool for thinking about running personal experiments:
Overall, we learned a lot from reading this paper, and we have great respect for the research team that conducted these studies through a herculean effort - including X users @katy_milkman, @angeladuckw, and many others. We hope they conduct many more "mega studies" like this one!
This article was written by Spencer Greenberg and edited by Travis Manuel. Thanks also to X users @yashkaf and @LukeIRowe for a couple of helpful comments.
Comments