A tool to help you with tough decisions
- Travis M. and Spencer Greenberg
- 8 minutes ago
- 15 min read

What’s the most difficult decision you’re facing right now?
Maybe you’re deciding whether to change job, move city, or leave a partner. Decisions such as these can involve long periods of agonizing and lots of stressful uncertainty.
Our free Decision Advisor tool is back, and it may be able to help! It’s designed to assist you in organizing your thoughts and weighing your options. It can broaden your understanding of your options and perform an ‘expected value’ calculation on them, which may help you see which option has the best prospects (based on what you value).
We’re relaunching it now that we've removed unhelpful portions of it - based on the insights from a large, longitudinal, randomized controlled trial that we conducted. That means it is now more streamlined and (we believe) more helpful than the prior version. So while we can't promise it works, we're letting you know that it's back and improved if you want to give it a spin for a decision in your life.
You can try the Decision Advisor tool now, completely free, by clicking here:
If you are curious about why and how we improved this tool, you may find the full story below interesting. It's one where we tried something big that failed, how we approached trying to adapt to that failure, and what we learned along the way. This article has two parts.
Part 1: The story of how we ran a randomized controlled trial on this tool, found out the original version was unhelpful, and what we did to try to improve it.
Part 2: The general lessons we took away from this experience, which you may find it useful to read about.
Part 1: Studying the Decision Advisor
When we build our free tools, we usually start by reading academic papers and learning what’s known about the subject (if we're not already deeply knowledgeable of the topic). We identify promising methods and concepts in the literature and build interactive tools that walk you through these methods or apply these concepts. Additionally, before release to the public, our tools typically go through two rounds of empirical testing:
Alpha testing: Usually around 40 people (but sometimes as many as hundreds), drawn randomly from the US population, are recruited Positly - a participant recruiter platform.
Beta testing: We send out a request to our mailing list of around 15000 real users of ours around the world who have very kindly agreed to be beta testers, asking them to try and give feedback on our tool.
Both alpha testers and beta testers are asked many questions about their experience using the tool, including both open-ended questions like "What did you dislike most about this program?" and "What could we change about this program to make it better?" and closed-ended questions like rating whether it would be "helpful", or "harmful" if millions of people used this program and saw their results and whether they would recommend this tool to a friend.
This feedback gives us extremely valuable information that we use to improve our tools (and fix many of their flaws) before we release them to the public.
However, this is not the ideal process, if money and time were unlimited. Our process balances speed and cost against the reliability of feedback. In a perfect world the process would involve running a randomized controlled trial on each new tool, with a large number of participants randomized to use the tool, and others randomized into a control condition (not using the tool). Then, outcomes would be tracked for months to compare to what extent those randomized to use the tool end up better off on the relevant outcomes than the control group.
Unfortunately this hypothetical, ideal process would be incredibly expensive and time consuming, so we rely mostly on alpha and beta testing (in addition to putting thought and research into what we build in the first place).
An example of a tool that we were able to perform the ideal testing process on, however, is our habit formation tool, Daily Ritual. We were delighted to find in our study on 400 people that those randomized to use Daily Ritual succeeded more reliably at sticking to habits than the control group.
Due to a grant we received, we more recently had the opportunity to run a randomized controlled trial on the Decision Advisor tool. While the Decision Advisor tool is based on academic theories with a lot of support (such as the theory of expected value maximization), we were especially excited to run a study on this tool in particular, because decision-making is both (i) a challenging domain with many unknowns and (ii) an important area that impacts almost everyone.
When we launched the Decision Advisor, back in 2017, it had gone through our usual alpha testing and beta testing process. It received helpful feedback from our testers and we made a number of improvements based on feedback. However, we know that while this process catches many issues, it can't catch them all. The only way to have a really high degree of confidence in a domain like decision making is to run an expensive randomized controlled trial. Thankfully, we received a grant to do just that.
Study Design
Here’s how the study worked:
Participants answered a bunch of questions about themselves (including decoy questions, to obscure the purposes of the study).
They each picked a decision they wanted to make. The decision had to be one which they would be able to say how well it turned out, within six months.
Participants were randomly assigned into one of two groups: the intervention group used the Decision Advisor tool, and the control group merely answered a few questions about the decision they had to make, how many options they were considering, and which option they thought was best.
We automatically followed up with each participant when they said they’d know the outcome of their decision. We asked them a number of questions about how satisfied they were with the outcome, and how much they regretted the decision they made, which were combined into a decision satisfaction score and a decision regret score. We used those two values to calculate a ‘total decision score’ which reflected their overall happiness with the choice they made:
Total decision score = decision satisfaction - decision regret
We also collected a bunch of intermediate data in the tool, asking people questions along the way, such as how confident they were in their decision.
Results
The overall result was quite shocking! We had a total of 194 participants who finished the study (95 in the control group, and 99 getting the intervention) and found that the intervention group had lower total decision scores (on average)! And though the difference was small, it was statistically significant!


The average total decision score for the intervention group was 2.0 (on a scale from -4 to 4), while the average for the control group was 2.38 (T(193)=2.01, p-value=0.045). Since this result is just barely statistically significant (at a p<0.05 level), there is a possibility that the result is due to chance. Nevertheless, after finding this result, we operated under the assumption that it is not a fluke. We took the tool down and explored what had happened.
An interesting thing to note here is that this result has two different interpretations, which we unfortunately can't distinguish between:
People who used the tool had outcomes due to their decision that were a little bit (objectively) worse.
People who used the tool felt a bit worse about their choice for the decision (but outcomes were not actually worse, only people's feelings about them).
To understand this a bit better, imagine that two people (Alex and Beth) each buy a different pair of shoes and that Alex is less satisfied with his purchase than Beth is with hers. Even if Alex objectively has the better pair of shoes (they’re cheaper, more durable, more functional, more effective, more aesthetic), that doesn't guarantee that Alex is happier than Beth with the purchase. How good we feel depends not just on what we get, but how we think about what we get.
Similarly, there's no way to know whether participants in the study actually had worse (objective) outcomes as a result of using the Decision Advisor. It’s possible that they were less satisfied for reasons that don’t relate to getting worse outcomes (such as spending more time focusing on other possibilities that they could have had). That said, satisfaction is itself an outcome and we want people who use the Decision Advisor to experience greater satisfaction (as well as better, objective outcomes). So, needless to say, we were upset with these results! Our mission is to improve people's lives - and that is the goal behind everything we create. So we were disappointed that we had failed to achieve this mission in this instance. That made us especially eager to explore the study data to try to understand why this occurred.
As an exploratory analysis, we looked into 36 things that might predict the total decision score (i.e., people's overall happiness with how their decision turned out). For the full list, you can check out the longer report of this study by Magda Zena, but here are the biggest correlation sizes:
Variable | Description (how it was measured) | Correlation with total decision score | Lasso regression coefficient (R^2=0.12) | Ridge regression coefficient (R^2=0.15) |
Personal freedom | “I have total personal freedom: I am able to do whatever I choose to do” | 0.32 | 0.18 | 0.11 |
Decision affects identity | “The choice I picked for this decision reflects the king of person I am better than the other choices” | 0.26 | 0.18 | 0.13 |
Agreeableness | Measured by a standard Big 5 test | 0.28 | 0.11 | 0.08 |
Optimism | “How much are you the sort of person who is typically optimistic about the future?” | 0.27 | 0.05 | 0.06 |
Conscientiousness | Measured by a standard Big 5 test | 0.24 | 0.01 | 0.03 |
Stability | Emotional stability measured by a standard Big 5 test | 0.25 | 0.01 | 0.04 |
Status quo | “Did the choice that you ended up picking for this decision involve sticking with the status quo or default option, or did it instead involve implementing a change?” | 0.25 | 0.003 | 0.05 |
Self-confidence | “How self-confident are you?” | 0.22 | 0.00 | 0.02 |
Identifies as depressed | “Do you believe that you are depressed?” | -0.29 | -0.04 | -0.05 |
Total Decision importance | The number of important decisions the participant has to make in their life right now and their intensity | -0.22 | -0.08 | -0.06 |
People were happier with their decisions if they had a higher degree of personal freedom to do whatever they chose to do. Other factors predictive of decision satisfaction included some positive personality traits like agreeableness, optimism, pragmatism, emotional stability, conscientiousness, and self-efficacy (i.e., belief in one’s own abilities).
Negative predictors of total decision score included depression, number of negative stressors, and making decisions out of a sense of obligation to someone else rather than out of a participant’s genuine desire.
But, the most notable finding here, was the fact that belonging to the intervention group turned out to be a negative predictor of the total decision score:
Variable | Description (how it was measured) | Correlation with total decision score | Lasso regression coefficient (r2=0.12) | Ridge regression coefficient (r2=0.15) |
Group Assignment | Whether the participant was assigned to the intervention group (1) or the control group (0). | -0.14 | -0.03 | -0.05 |
The correlation is small, but statistically significant (and matches the result we mentioned earlier; that those in the control group had a higher total decision score).
In summary, we can say that people with specific positive personality traits, a high degree of personal freedom, and less cognitive overload tended to be happier with their decisions. Completion of the decision-making program did not enhance their total decision score and appeared to have the opposite effect.
So, what went wrong?
We expected the tool to work (that's why we made it). So, we thought the most likely outcome would be a positive result. But we thought that the second-most-likely outcome (by far!) would be that it just didn’t work at all. We really didn’t expect it to (at least appear to) have an average negative impact. So, the question is: what went wrong?
We did further exploratory analyses in order to try to figure this out. In order to understand these analyses, it’s important to know that the tool had four parts:
Narrow framing/Brainstorming
Brainstorming options that the participant may not have considered. The rationale here is that people often get stuck on just 2 or 3 options when making a decision (sometimes called "narrow framing") and the purpose of this was to help the user explore other options that might be even better than those they are considering.
Information gathering
Considering other sources of information that could help with their decision (e.g., a person they could talk to about the decision and other information sources they could consult with to help guide the decision). The rationale here is that sometimes we simply don't have enough information to make an informed decision. While not all information is useful, the goal was to help the user consider whether there was other information that would be useful to gather before deciding.
Cognitive bias training
Learning about cognitive biases that are relevant to and might negatively impact their decision. The rationale here was to raise awareness of the relevant biases that might impact their decision right before the decision was made, with the goal of helping people avoid such biases by keeping them top of mind.
Expected value calculation
Going through a process of estimating the expected value of different options (relative to each other) and performing an expected value calculation, to help evaluate which is best. The rationale was to help people apply one of the most theoretically sound decision-making frameworks: expected value maximization.
Here are some hypotheses regarding why our tool was ineffective, along with our conclusion about these hypotheses and our reasoning (each of these is discussed in much more depth, along with other hypotheses, in the full write-up of this study):
Hypothesis 1: The tool lowers confidence
We found that people's confidence in their decision at the end of the tool was predictive of their total decision score at the end of the study. This naturally raises one hypothesis for what went wrong: maybe the tool caused people to be less confident, therefore lowering their total decision score.
For instance, perhaps learning about all the biases that affect you or the various other things you could do instead will tend to reduce confidence in a decision.
However, we have evidence that is not what happened: participants who were assigned to use the tool actually have their confidence in their decision go up throughout the use of the tool, not down.
Verdict: likely false.
Hypothesis 2: The expected value section was the problem
While the theory of expected value maximization has a very strong theoretical backing, could it be that somehow going through that process caused people to be less satisfied with their decisions? For instance, perhaps people took the result of the expected value too seriously, and went with whatever it told them even if it somehow failed to capture important factors that they were aware of on a gut level.
We measured participants’ confidence just before the expected value section and again right after it. We found that 75% of the participants in the intervention group became more confident! Very few became less confident. And as we discussed before, greater confidence was actually predictive of higher total decision scores.
Additionally, we can look at participants who changed their mind about the option they were leaning toward during the expected value section. If the expected value section was the problem, we would expect that those who changed their mind during this section would have lower total decision scores - but they didn't! Those that changed their mind during this section actually had slightly higher total decision scores (though it was not a statistically significant difference).
Although we can't be certain that the expected value section was not a problem, the two lines of evidence discussed here both suggest that it actually helped people have higher total decision scores, not lower ones.
Verdict: likely false.
Hypothesis 3: The narrow framing section was harmful
Although missing out on a good option (because you didn't think to consider it) seems like clearly a bad thing, the paradox of choice literature suggests that there could be some reason to think that considering more options might reduce feelings of satisfaction about a decision (even, potentially, in situations where the outcome is objectively better). The basic idea is that the more options you consider, the more foregone opportunities you are aware of having, which may increase negative feelings associated with missed opportunities. Additionally, considering more options may increase the cognitive burden of deciding. Since the narrow framing section was all about expanding the number of options you’re considering, we looked at whether this might explain the lower total decision scores of the intervention group.
Our data show that, on its own, the number of options a participant considered was positively related to being more satisfied (r = 0.13). If this were a paradox of choice effect, we’d expect the exact opposite to be true. When we controlled for all other factors, this did become very slightly negative (Lasso coefficient: -0.06; Ridge coefficient: -0.05), but this is a negligibly small effect - essentially zero. So we think the paradox of choice explanation is unlikely.
We also looked at whether there was a difference in total decision score between people who picked an option that was generated during the narrow framing exercise (which they hadn’t considered previously) and those who did not. We found no statistically significant difference. If the narrow framing exercise was really harming people's total decision scores, then wouldn’t the people who changed their minds due to the narrow framing exercise be worse off?
As with the expected value section, we have two lines of evidence suggesting that the narrow framing section was not harmful to people's total decision scores.
Verdict: likely false.
Hypothesis 4: one of the other sections led people to have lower total decision scores
Because of the way the tool is designed, we gathered a lot more data on the expected value and narrow framing sections than the brainstorming and cognitive bias training sections (in retrospect, we wish we had thought to include more data gathering in the other sections!)
So, although we have evidence that the problem with the tool is not in the narrow framing or expected value calculation sections, we unfortunately don’t have enough evidence to get traction on the question of which of the other sections might be causing harm. So, to play it safe, we have cut those other sections from the tool, leaving just those sections for which we had some evidence of benefits and evidence of lack of harm.
We can speculate (and the full version of our study report contains many more hypotheses about all of these sections) but that’s unfortunately the limit of what we can do with this data.
Verdict: probably true.
How we improved the Decision Advisor
As we mentioned, when we discovered the data indicated that the Decision Advisor was not working for people (on average), we took it down from our website. We were then faced with a choice: either we delete the tool or we try to make it better.
We were certainly open to deleting the tool, and we would always opt for doing so in cases where a tool was causing clear harm and could not be fixed. But we received lots of emails from people saying that they were disappointed to find the tool was gone because they had found it valuable and helpful. This is some further evidence that the tool does contain valuable content, and this motivated us to change it (in line with the findings of the study) to make it better, rather than delete it completely.
Here are the changes we made:
We deleted the two sections that did not have evidence in favor of their effectiveness and evidence against potential harm (the brainstorming section and the cognitive bias training section). Although we did not have direct evidence that they were harmful, the fact we couldn’t get evidence that they weren’t harmful was enough to warrant removing them.
We added a section that explains the framework used by the tool and some of its limitations. This section is called ‘The Theory of Maximizing Expected Value’.
This means we kept the two sections that we had good reason to believe were not harmful: the narrow framing section and the expected value calculation section. A side benefit of these removals (though not the reason we made them) is that the tool is now faster to use and more streamlined. This helps address another complaint that we sometimes got: that the tool felt too long.
We believe that the improved version of the Decision Advisor is more helpful than before, and while we can't be completely confident it will help you, it's back online if you'd like to try it out.
Part 2: Lessons You Can Use in Your Own Work
What are some generalizable lessons that can be taken away from this failure? We see a few.
1. The pre-mortem technique
Before undertaking a plan, project, or idea, imagine for a moment that it fails and then ask: "what are the most plausible reasons why it failed?" Even if you’re really confident you'll succeed, it’s extremely useful to think about what you could do to help steer away from these failure modes. And in our case, this could have helped us plan better what data we wanted to collect (in case of failure).
This can be done in almost any context. Considering hiring someone new? Ask them: “If we make an offer but you decide not to join this company, what do you think the most likely reason would be?" Considering dating someone? Imagine for a second that you do date, but then you break up shortly after. Imagine what would be the most likely reason for that break up.
In our case, if we had spent more time on the question, “What would we want to know if the randomized trial shows that the tool is harmful?” we likely would have added more questions to the study to help diagnose where things were going wrong. This would have saved us time and given us more clarity when we saw surprising results. We recommend using this approach on any major project or decision you're working on. It can expose blind spots before they become real problems.
2. Expect difficult trade-offs
We would love to run a randomized controlled trial of this size on every tool, but the reality is that they’re very expensive, require a huge amount of labor, and typically take at least a year. This means we can usually only do them when we have specific grants to do so.
Instead, our typical process involves looking at the evidence and what’s known about a topic, and running our tools through multiple rounds of testing. We really do think that these steps reduce the chances of making something harmful and increase the chances of making something valuable, as user feedback contains a lot of valuable suggestions. Nevertheless, this is a tradeoff between (i) certainty and (ii) cost and speed. There are tricky questions about what the right tradeoff is here. We use two rounds of user feedback where we solicit criticisms (from alpha testers then beta testers) to try to strike a balance between speed/reasonable cost and quality, but sometimes this process doesn't get the desired result. And in our case, this failure is a reminder to us that our process is not foolproof.
If you are working on a large project of any kind, you will likely also face trade-offs between speed (or cost) and quality. Finding the right balance can be tricky, but going back to your mission (or your principles or your values) may help guide you on those tradeoffs.
And, finally…
3. If you try anything hard, you will make mistakes
We failed in this instance. Were we surprised that this particular tool failed? Yes! Do we wish it hadn't? Absolutely. Are we surprised that some of the work we do fails? No, not at all. The fact is (much as we wish it weren't the case), when you try doing difficult things (which is what we try and aspire to do) some of them will fail.
Obviously, we aim to prevent mistakes and to make our tools as valuable as possible. The reality is, we will sometimes fail. Our commitment is that we will continue to try hard to make valuable things for you. Unfortunately, we can't promise that they will work every time. The only way we know of to never fail is either to (1) only to do easy things, or (2) grind our processes to a halt requiring a large-scale randomized controlled trial before producing anything (which, sadly, would mean we would barely produce anything at all). We don't think either of these approaches would serve our mission of helping the world as well as our current process, which aims to strike the best balance we know of between speed/cost and quality. But that means, unfortunately, that you can expect failures from us from time to time.
If you're curious to try the new Decision Advisor tool for yourself, here's the link: