When is evidence “sufficient”?
- Travis M.
- Sep 9
- 11 min read
Updated: Sep 11

Key Takeaways
The claim that "You should only believe that for which you have sufficient evidence” is popular but it contains an extremely important error. It treats belief as binary: either you believe something (because you have enough evidence for it) or you don't (because you don't have enough evidence).
It is more accurate to think of belief as probabilistic. There are probably contexts where you already do this, but some where it is much less common.
Misleading ideas about sufficient evidence are built into the idea of 'statistical significance' in science. It causes problems in science but it's hard to know whether things would be better or worse without it.
Some researchers advocate for alternatives, but it's easy to make errors there too. Common misunderstandings of p-values can make alternatives less useful.
Whatever the context, you'll have a more accurate picture of the world if you reject binary thinking about sufficient evidence. Bertrand Russell's conception of rationality is a more accurate alternative.
A shipowner was preparing to send a ship to sea. He knew it was old, poorly built, and had needed frequent repairs. People warned him it might not be seaworthy. He was troubled by these doubts and so reflected on whether it would be best to delay the voyage for a full inspection and refit.
Ultimately, he talked himself into believing that the ship would be fine. It had survived many voyages; why wouldn’t it survive one more? He told himself to trust in providence and assume the builders had done their jobs. He stopped entertaining “ungenerous suspicions” and gradually became comfortable with a sincere conviction that all would be well. When it departed, he watched the ship sail away with a light heart. And then he collected the insurance when the ship sank and all those on board died.
Was the shipowner guilty of those deaths? The mathematician and philosopher William Clifford invented this parable and argued that the shipowner is guilty of the deaths despite sincerely believing the ship would be fine because “he had no right to believe on such evidence as was before him.” That is to say, because instead of basing his belief on sufficient evidence, he based it on squashing his doubts and wishful thinking. In that same paper, Clifford goes on to argue for the bold conclusion that:
“[I]t is wrong always, everywhere, and for anyone, to believe anything upon insufficient evidence.”
Would you agree with that statement?
Some people are put off by how moralistic Clifford’s wording is. To make it more neutral-sounding, we could rephrase it to: “It is rational only to believe that for which you have sufficient evidence.”
That sounds reasonable, right? It’s a common sentiment among people who care about rationality and critical thinking. Nevertheless, despite how apparently plausible it is and how widely it seems to be endorsed, claims like “You should only believe that for which you have sufficient evidence” contain a subtle but extremely important error. This article is all about that error.
In what follows, we’ll explain where claims like the ones above go wrong and what a stronger claim would be, instead. Along the way, we’ll talk about a related issue with the idea of ‘statistical significance’ that is widely used in science. All of this will give you a more nuanced conception of the idea of sufficient evidence, which you can apply across all domains of your life. This may help you to make better-informed decisions about your beliefs and improve your critical thinking.
What Did Clifford Get Wrong?
The thing that Clifford’s famous phrase gets wrong is very simple: It treats belief as binary. Clifford appears to have assumed that, for any claim, there are only two states you can be in: either you believe it or you don’t believe it. There is no more detail or nuance than that. He’d say: If you have sufficient evidence for a belief, then you should have that belief - but without sufficient evidence, you should not.
One of the greatest improvements you can make to your critical thinking and reasoning skills is to reject that view. Instead of thinking of belief as binary, you can think of belief as probabilistic. That means, instead of thinking things like “I believe the ship will be fine” and “I don’t believe the ship won’t be fine”, you think things like:
“I mostly believe the ship will be fine, but I slightly believe it won’t be.”
Or, even better, you can make your belief more precise by thinking:
“I believe there's a 60% chance the ship will be fine, and a 40% chance it won't be."
This kind of thinking will help you to more carefully account for uncertainty, make better judgments about the future, and avoid jumping to simplistic one-sided conclusions. It is touted by experts in forecasting and rationality as one of the most important changes you can make to your thinking. We also see it as one of the important aspects of critical thinking and have developed a free, interactive mini-course to help you with it.
This idea might seem obvious, because you probably already do this in some contexts. For example, when watching a sports game: Maybe you start the game with a low degree of belief that your team will win, but then they score in the first few minutes and suddenly you believe more strongly that a win is possible. When the other team equalizes, your confidence goes down a little, but as the game goes on and your team plays well, your confidence rises. In the final minutes, with your team ahead, it rises further still. A comeback is always possible, but your belief in victory keeps building and building.
So, you probably already think about your beliefs as probabilistic (rather than binary) in some contexts. Unfortunately, it’s much less common to think of certain types of beliefs this way - such as those regarding political issues, moral issues, or situations where your ego is on the line. When it comes to, say, thinking that our own political group would be better for the economy than groups we oppose, it's easy to feel extremely confident - we might even feel 100% confident. But of course, people in those other groups are also 100% confident that we're wrong.
The truth is that most (perhaps all) of your beliefs are things you have some confidence in but you can’t completely rule out every alternative possibility. There are simple possibilities like maybe you forgot something relevant, or you are misremembering something, or there’s an error in your reasoning. Or maybe you're succumbing to a bias, or social influence. These things happen to everyone. Then there are more outlandish possibilities, such as that you’ve been systematically lied to, or even (as Rene Descartes pondered) that you might be dreaming without knowing or being deceived by a malicious demon who is manipulating your senses. You might even be a brain in a vat. Of course those latter possibilities appear extremely implausible to many people - but they help illustrate how absolute, 100% certainty isn't really a possibility in almost any situation.
For all of these reasons, we advocate rejecting Clifford’s dictum and instead opting for a view of rational, evidence-based beliefs that is probabilistic. For instance, here’s what Bertrand Russell has to say:
“Perfect rationality consists, not in believing what is true, but in attaching to every proposition a degree of belief corresponding to its degree of credibility.” - Bertrand Russell (source)
It is laudable that Clifford cared so much about whether beliefs are based on evidence, but he missed an important detail from his account. Unfortunately, Clifford’s mistake is still made in lots of contexts. And scientists, despite their training, are not immune to this mistake. Let’s explore how this mistake damages science.
What’s Wrong with ‘Statistical Significance’?
Imagine you’re a scientist and you think you’ve just discovered a connection between two things. Maybe it’s a connection between a drug and brain tumors, or (if you were a scientist in the 1950s) seatbelt wearing and surviving car crashes. Whatever it is, if you want to get your findings published and taken seriously, you’ll probably have to state whether they are ‘statistically significant’ or not.
It’s a widespread convention in science today that findings deemed ‘statistically significant’ are publishable (as "real" findings), whereas ones that aren’t are not. The problem is: the very idea of neatly separating findings into those that are statistically significant and those that aren’t is another form of binary thinking that treats an arbitrary threshold for statistical significance (usually a p-value of 0.05) as a magical boundary that separates insufficient evidence of an effect from sufficient evidence of an effect. In reality, a p-value of 0.049 provides virtually the same amount of evidence as a p-value of 0.051, but they are treated extremely differently.
Remarkably, the inventor of the p-value (polymath Sir Ronald Aylmer Fisher) himself vehemently argued against using p-values this way. He warned that p-values should not be used to make automatic inferences about hypotheses and should instead be part of an overall body of evidence.
In addition to being based on flawed reasoning, the binary notion of statistical significance may have created problems in science. Unfortunately, we can’t know for sure whether things would have been better or worse without this notion (e.g., if it would have been easier to publish false findings had this notion not been used), but it is clear that it has introduced incentives that negatively affect how science is practiced. Goodhart’s Law offers some insight - it warns us that:
Goodhart’s Law
“When a measure becomes a target, it ceases to be a good measure”
|
For example: Imagine a company notices that its most productive employees are the ones who send the most emails (because at this organization, most sales are made through email), so they start offering bonuses based on the number of emails sent. But that just incentivizes people to send more emails without becoming more productive. Before it was turned into a target, the number of emails sent may have been a somewhat informative proxy measure for productivity at that particular organization, but as soon as it became a target, it stopped being so.
We have also seen this law play out in science. The binary concept of statistical significance has resulted in a convenient threshold (p < 0.05) becoming a target to reach or exceed for publication, and this has resulted in incentives for scientists to engage in a group of practices called ‘p-hacking’ that lower p-values artificially. For instance, this can be done via fishy statistics or by trying lots of analysis methods and just keeping the one that has the best-looking result. This lowers the quality of studies and increases the chances that published research findings are false.
Fortunately, scientists are aware of this. Debates are being conducted, and attitudes towards statistical significance are changing. At least one prominent academic journal has even banned the use of p-values and talk of ‘significance’ (whether this will turn out to be an overcorrection is still an open question).
The problem is not actually p-values themselves. They really do tell you something interesting - they tell you how likely it is that you’d get a result at least as extreme as your findings, if the effect you’ve found wasn’t real. That’s very useful to know, because it can help you determine whether your results are just due to chance. All p-values are probabilities, and so are always numbers from 0 to 1. That means that the number of possible p-values is uncountably infinite (there is one for each of the real numbers between 0 and 1). The real problem is that many academic journals act as if there are really only two possible p-values: ‘significant’ and ‘not significant’ and they sometimes use that as a substitute for careful scientific reasoning.
In reality, a statistically insignificant p-value like 0.10 is still more evidence of an effect (all other things being equal) than 0.20 would be for the same finding. It’s not extremely strong evidence, but it’s evidence nonetheless, and (all other things being equal) it usually justifies increasing your degree of belief in the reported findings to some extent. Just not quite as much as a lower p-value for the same finding would.
These considerations have led some researchers to argue that we should be thinking of p-values as providing different amounts of evidence “for a certain finding or effect”, on a scale like this:
But even this is not quite right. The p-value alone doesn't tell us how much evidence a result is for a conclusion. Consider the following two hypothetical studies that have the same results with the same p-value. Can you tell why one provides much stronger evidence than the other (for the question of whether the supplement being tested broadly improves memory)?

Study 1 has a more biased sample and uses tests of memory that are less evidence-based, so even though the two studies have the same finding and the same p-value, Study 2 provides much stronger evidence for the supplement broadly helping memory.
In truth, if you want to know the strength of evidence provided by a finding, you need what we’ve called ‘The Question of Evidence’: “How many times more likely am I to see this evidence if my hypothesis is true than if it's false?”
Here’s where all this leaves us. The binary notion of statistical significance has led many people and publishers to think of the p-value as a single, simple test that can tell you whether a result counts as sufficient evidence for a finding. It isn’t, and it cannot. A p-value is just one component of scientific reasoning - to be considered alongside things like study design, prior plausibility, and so on. Evidence is fundamentally a probabilistic concept that doesn’t have simple thresholds.
This doesn’t necessarily mean science would be improved by getting rid of the simplistic, binary idea of ‘statistical significance’. Even though it’s arbitrary and incentivizes things like p-hacking, removing that threshold might end up making it even easier to game the system. Getting rid of statistical significance might be like having easy-to-pick locks on a door and saying, “Well, since people have learned to pick them, let's just remove the locks altogether." - it might actually make things even worse!
Ultimately, whether or not there are good, pragmatic reasons for setting some kind of standard threshold for ‘statistical significance’, we should be careful to be aware of its drawbacks and not let it lead us into binary thinking about evidence.
Is ‘Sufficient Evidence’ Always Misguided?
There are times when it makes sense to talk about ‘sufficient evidence’. For example, it is perfectly reasonable to say something along the lines of: “In order to have a 60% confidence level in this belief, I need sufficient evidence - i.e., evidence that indicates this belief is roughly 60% likely to be true.”
Also, there will be times when you must decide whether or not to act on a belief. If it’s a very risky action, you might require a very high degree of belief / amount of evidence before feeling comfortable acting. For instance, legal systems often require evidence “beyond a reasonable doubt” before handing in a ‘guilty’ verdict when sitting on a jury. You might believe very confidently that the defendant is guilty but still recognize that you don’t have sufficient evidence to justify the act of giving a ‘guilty’ verdict. Thus, it can be perfectly reasonable to talk about sufficient evidence for action, even when it doesn’t make sense to talk about sufficient evidence for belief.
Finally, when the stakes are low, thinking in simplistic terms like “Do I have enough evidence to treat this claim as if I believe it 100%?” can speed things up or reduce cognitive load. You’ll be trading accuracy for speed and ease, but sometimes that’s worth it.
The takeaway: Reject the binary view of belief
The key thing to remember is that, regardless of whether you’re:
a sports fan, watching a game,
a scientist thinking about statistical significance,
a voter thinking about economic policies,
a shipowner thinking about seaworthiness,
or anybody else, for that matter, you will increase your chances of having true beliefs if you reject the binary view of belief (and its accompanying ideas about ‘sufficient evidence’) and think in terms of probabilities. When the stakes are high, this can matter greatly.
Let’s end by returning to Clifford’s parable from the start of this article. Clifford argued that the shipowner was responsible for the deaths of the people aboard his ship because he (the shipowner) believed they would be fine without sufficient evidence for holding that belief. You can see now that this is a way of thinking that lacks nuance and misunderstands the nature of belief - treating it as binary when it isn’t. However, we can still say that the shipowner from Clifford’s parable was guilty. On the more nuanced (probabilistic) view of belief that we have advocated in this article, you can say:
Through negligence of reason, the shipowner failed to proportion his degree of belief to the evidence before him, and this led directly to the deaths of all the people aboard his ship. For that reason, the deaths of all those people are, to a substantial degree, on his hands.
If you found this article interesting, you might want to try out our Nuanced Thinking Techniques tool. Learn to recognize 3 common binary thinking traps and learn the nuanced thinking techniques you can use to combat them.

