Understanding Moral Disagreement Through Data

Markus Over
2 hours ago
15 min read

Short of time? Click here to read the key takeaways!

📊 We ran a study to better understand the moral judgments of people in the US, and we found striking disagreement. When participants rated the immorality of scenarios, a substantial share of them routinely gave maximally opposite judgments (0.0 vs. 5.0). Many scenarios showed almost as much disagreement as a totally random spread of answers, and extreme ratings (0.0 or 5.0) were surprisingly common (26% of all ratings). So far from being universal, even within a single country, moral judgments vary a lot!

📈 Moral judgments don’t follow a ‘normal’ distribution. Instead of forming a neat bell curve, many scenarios produce unusual, skewed, or bimodal distributions.

🧭 A person’s moral judgments can be predicted using underlying moral dimensions. We modelled people’s moral frameworks as being built out of 15 underlying ‘moral dimensions’ like Unfairness, Abuse of Power, Impurity, Social Taboos, Emotional Pain, or Loyalty. Taken together, these strongly predict how people rate scenarios.

🔍 The Clearer Thinking 15-dimension model aligns with (but differs from) Moral Foundations Theory. Both frameworks capture how people differ morally, but our model is data-driven and optimized for prediction within the U.S., whereas MFT is theory-driven and designed for cross-cultural application. Our study also confirms MFT’s classic finding: that political ideology correlates with moral foundations.

⚖️ Identifying with various political or social worldviews correlated with scoring higher on different moral dimensions. This means that certain moral dimensions are more predictive of the moral judgments of people with those worldviews than of people without them. For example:

Self-identified feminists (n=96) (when compared to non-feminists) made judgments that were predicted by the moral dimensions of Emotional Pain (r = 0.23), Harm to Vulnerable (r = 0.22), and Unfairness (r = 0.19).
Self-identifying as Christian (n=167) correlated with having moral judgements that were predicted by Impurity (r = 0.26), Utilitarianism (r = -0.17), Emotional Pain (r = -0.16), and more.

🪞 You can find out what moral dimensions predict your own judgements (and how you compare to others) by using our free Understanding Your Morality tool.

In a recent newsletter, we told you about a large study we ran to better understand people's moral intuitions. We asked US study participants to rate the immorality of actions in a variety of different scenarios. Today, we'll dive deeper into our study findings.

You don’t need to have read the previous article in order to enjoy this one but, if you’re interested: That article explains the rationale for the study, introduces our novel approach to understanding people's moral compasses, and launches our new Understanding Your Morality tool, where you can rate the scenarios from our study yourself and receive a personalized analysis of what your ratings reveal about your own moral compass.

Let's start with a tiny experiment. Read the scenario below and, without overthinking it, come up with a quick rating of how immoral it is, from 0.0 (fine) to 5.0 (extremely unethical):

Poe, an American, is cleaning out their closet and finds their old American flag. Poe doesn't want the flag anymore but needs some rags for cleaning, so Poe cuts it up into pieces and uses the scraps to clean their bathroom.

If you felt an immediate "that’s obviously fine," or "that’s obviously wrong," can you notice how objective that reaction feels? Does it feel like you're reporting a fact rather than a preference? Would you expect most other people’s answers to cluster near yours, or would you bet there are significant numbers of people with very different answers?

This matters because moral conflicts rarely start with one person thinking, "I want to do harm." They typically start with people using different moral lenses and feeling similarly certain. If you can understand which lens you're using (and which lens someone else is using), you can get an idea where you'll talk past each other and how to have a more productive disagreement. Our Understanding Your Morality tool aims to help you gain that self-understanding.

In our study, hundreds of participants rated 300+ scenarios like the one above. What we found wasn't just that people disagree: We found that, for many scenarios, disagreement is structured and often extreme. And you can partially predict someone's judgments from the moral ‘dimensions’ their responses reveal.

In the rest of this post, we'll show what those disagreement patterns look like in the data, why the average opinion can be misleading, and how our 15-dimension model (and Moral Foundations Theory) can help explain which kinds of people see the same act as harmless versus immoral.

How We Studied Morality

If you read our last article about this study, you may want to skip this section and jump right ahead to What We Learned About Disagreements and Extremes. If you need a refresher, here's a brief summary of what we did.

We set out to answer a central question in practical ethics: how do people make rapid moral judgments? We aimed to identify which moral principles best explain why different people label the same situation as right or wrong. This premise – that judgments rest on underlying principles – appears in frameworks like Moral Foundations Theory (MFT). The key idea is that individuals endorse different principles and weight them differently, which can yield sharply divergent judgments of the very same scenario. Often, people are not consciously aware of these principles; they judge as if following them without knowing why. To probe this, we assembled a broad set of candidate principles and identified the ones that best account for people's judgments.

Moral Foundations Theory (MFT) is the most well-established framework of this kind, but it was developed through a very different method from ours: MFT is theory-driven (it synthesized existing observations and theory into a framework, then experimentally validated it), whereas our method involved identifying principles more directly from data using a novel method based on predicting people's judgments. In brief, here's what we did:

As a result, we have developed a method for creating personalized linear regression models that assign weights (coefficients) to 15 different moral dimensions, showing which dimensions are most predictive of how any given person makes moral judgments. You can try this for yourself and find out more about your moral compass by trying our new tool:

Launch the Understanding Your Morality tool!

Here are the 15 moral dimensions we found were most informative, and which are included in our model:

This work not only informed our research (the results of which we'll share in the rest of this article), it also allowed us to build our new Understanding Your Morality tool: a free interactive tool that walks you through a similar process of answering some moral judgment questions, after which you'll get your own personalized analysis of which moral dimensions you rely on (and how strongly), how you compare to others, as well as a personalized Moral Foundations Theory result.

With the background covered, let's roll up our sleeves and dive into the findings of our study.

What We Learned About Disagreements and Extremes

In our study, we included a scenario about a person eating their deceased dog that many find disturbing. In the study, it was phrased as follows:

Poe's dog was killed by a car in front of their house. Poe had heard that dog meat is delicious, so Poe cut up the dog's body, cooked it, and ate it for dinner.

Participants were then asked: How would you rate the immorality of this action, on a scale from 0.0 (fine) to 5.0 (extremely unethical)?

This is one of the scenarios from our study where we can most clearly see disagreement between study participants. Let's have a look at the distribution of ratings that people gave this scenario:

Distribution of judgments of our study participants on the scenario about a person eating their dog. N = 118, mean = 3.3, median = 3.9.

One interesting observation is that the two most common ratings for this scenario were 5.0 and 0.0 - the two opposing extremes. More than a quarter of respondents (26%) chose the highest immorality rating possible, while 8% chose the lowest possible rating, meaning they found this scenario to be not immoral at all. This suggests a high degree of disagreement about the immorality of this scenario.

Looking at the 15 moral dimensions, two that seem particularly relevant to this scenario are Impurity and Social Taboos. Would you expect these to predict what judgments people made for this scenario?

It turns out that these dimensions do explain the disagreement that we've observed above quite well. Here are two reasons to think so:

First, if we look only at participants who have a positive coefficient for Impurity and Social Taboos (in other words, only the participants whom our statistical model finds rate actions as more immoral when they are more impure or more transgressive against social taboos), then we get an even higher share of 5.0 ratings (45%), and the ratings from the lower half of the spectrum are greatly reduced.

Second, if we look only at participants whose Impurity and Social Taboos coefficients are both 0 (meaning that our model found neither of those dimensions is at all predictive of how these participants rate the immorality of our scenarios), we end up with a rather small sample size: out of the 118 participants who rated this scenario, only 15 had both Impurity and Social Taboos coefficients of 0. But those 15 people gave half of the original 0.0 ratings and almost none of the 5.0 ratings:

Distribution of judgments of our study participants on the scenario about a person eating their dog, when filtering for users with either positive coefficients for Impurity and Social Taboos (in red, N = 45, mean = 4.2, median = 4.8) or both coefficients at 0 (in blue, N = 15, mean = 1.5, median = 0.9)

This helps illustrate the idea that moral dimensions can be meaningful predictors of how people make moral judgments, while also showing how strongly different groups can disagree in their moral judgments of such scenarios.

So, this single case already demonstrates several interesting insights that, as we'll see, show up in many of our scenarios:

Disagreement: There is enormous disagreement on many of the scenarios, with some people finding them extremely immoral while others find them not the slightest bit concerning. The standard deviation of the judgments of all scenarios averaged around 1.2 on our 5-point scale, which is quite large. For comparison, if the spread of participants’ answers was even across all options (meaning there was a uniform distribution of judgments, indicating a severe lack of agreement), the standard deviation would be about 1.46 - not much more than what we observe.

Extremes: Many users tend to choose the extreme values, 0.0 and 5.0, rather than the values in between. In the above scenario, 35% of users chose the extremes. Across all scenarios, 26% of all judgments taken together were either 0.0 (8%) or 5.0 (18%).

Non-standard Distributions: It's quite common in studies where a large number of participants generates a distribution of numbers that you end up with something reminiscent of a (possibly skewed) normal distribution (also known as a bell curve because it's shaped like a bell). As we see in our histograms for scenario judgments, though, many scenarios end up with a distribution of judgments that is very far from a bell curve (although there are a few exceptions).

Let's look at another scenario:

Poe and Avery are platonic friends, but secretly, Poe often masturbates while imagining engaging in sexual acts with Avery.

This is how our study participants judged it:

Distribution of judgments of our study participants on the Platonic Friends scenario. N = 144, mean = 1.8, median = 1.5.

This is one of the scenarios that study participants, on aggregate, found comparatively harmless. Still, 5% of respondents assigned the highest possible immorality rating, and even though 15% found it completely fine, 37.5% assigned a judgment from the upper half of the scale. So, even though this is far from the most controversial scenario we've had, the disagreement amongst the US population that took part in our study was still very large.

These examples illustrate something that we observe across the majority of scenarios: that moral judgments are indeed highly heterogeneous, with a non-negligible percentage of people simultaneously judging the same scenario as either maximally harmless or maximally immoral. In fact, 77% of our scenarios had judgments that simultaneously included both extremes (0.0 and 5.0). These findings give some credence to the model of people subscribing to different moral principles: if some people weigh, say, purity very highly, while others don't think it has any moral relevance, this would explain why, for certain scenarios, some people think of them as highly immoral, while others find them perfectly acceptable. That said, while the idea of moral principles aligns well with the evidence we see and allows us to make decent predictions about people's judgments, it is, of course, still a simplification, and other important factors beyond the scope of our research may also contribute to how people make moral judgments.

Here is a table showing a small selection of scenarios for different levels of agreement and disagreement among study participants, using standard deviation as a measure of disagreement. The 'Distribution of Ratings' column uses the same scale as this graphic (which some may find disturbing).

Warning: Some people will find some of the scenarios described to be disturbing:

How Our 15 Dimension Framework Compares to Moral Foundations Theory

We have now focused a lot on our new 15-dimensional moral framework. But it's obviously not the only such framework that exists. A popular framework that aims to explain people's moral intuitions based on a set of core dimensions is Moral Foundations Theory (MFT). This framework, spearheaded by Jonathan Haidt and others, originally proposed five foundations: Care, Fairness, Loyalty, Authority, and Sanctity. Later, Liberty was added as a sixth principle, and Fairness was eventually split into Equality and Proportionality to better differentiate between "equal outcomes" and "outcomes based on merit", which can be viewed as two distinct forms of fairness. Thus, the original five foundations have grown to seven. More recently, there have been proposals to raise Honor and Ownership to "foundationhood", but the discussion around this is ongoing.

In our tool and research, we developed our own Clearer Thinking 15 Dimension framework, and we also worked with MFT. This way, users of our new tool get insights into what MFT says about their judgments as well, plus we are able to compare the two frameworks and even test some of the empirical claims that were made by developers of MFT. For all this work, we relied on the following seven MFT foundations:

Care
Equality
Loyalty
Authority
Ownership
Sanctity
Liberty

The reason we didn't include Proportionality and Honor was that our 323 moral scenarios were not well-suited to tap those particular foundations.

One big difference between these two frameworks is how they were created: while our framework is almost entirely empirical (and based on US data only), the MFT framework is grounded in theory (and is designed to be applied internationally). Also, we're pursuing different goals: We strive to accurately predict people's quick moral judgments in the US on moral scenarios, whereas MFT aims to provide a theoretical underpinning to morality across cultures. So, it was not our intention to come up with a "better" framework than MFT. Instead, we wanted to cover somewhat different use cases, such as prediction and empirical verification, while casting a wide net for potential moral dimensions, and therefore chose our distinct approach.

So, what are the empirical claims around MFT that we tested?

One of the most widespread claims about MFT is that the political orientation of people in the US predicts which moral foundations they care about: The theory is that people from the political left care much more about Care and Equality than about the other foundations, while conservatives, on the other hand, are said to value the remaining foundations more strongly, on average. And indeed, our data, based on our methodology, suggests this is pretty accurate. To test this, we used our system to predict each participant's judgments, assigning weights to the MFT foundations rather than our 15 moral dimensions. This way, we obtained 7 foundation weights for each participant. Then we looked at the correlation between these foundation weights and political progressivism:

Note: In the 'linear regression coefficient' column above, each number shown reflects the strength of the relationship between a given MFT foundation and political conservatism while controlling for the other foundations (i.e., statistically holding them constant). Prior to running the linear regression, all variables were standardized to have a mean of 0 and a standard deviation of 1.

In our study, we found that Care, Equality, and Liberty were strong predictors of political progressivism, while the other foundations were moderately correlated with and predictive of political conservatism. Liberty/Oppression is a bit of a surprise here, as some would assume it's more associated with conservatism, but previous MFT research helps make sense of this result: they found that both progressives and conservatives value liberty but emphasize different aspects: conservatives focus on freedom from government interference, while progressives emphasize protecting vulnerable groups from domination. Our correlation findings might reflect that our liberty-related scenarios were weighted more toward progressive interpretations of liberty rather than conservative ones, suggesting an opportunity for future research to capture both aspects more fully.

Demographic Findings

We didn’t just examine how people’s political views relate to their MFT foundations; beyond people's position on the political spectrum, we also collected various demographic and psychological data (such as religion, education, gender, or Big Five personality traits) and assessed how these relate to the different dimensions in both frameworks, MFT and our own.

To do this, we looked at one or both of these sets of numbers:

The general correlations between the demographic or psychological traits and the moral dimensions
In some cases, we trained additional linear regression models to predict people's demographic or psychological traits based on their moral dimension scores. These linear regression models provided coefficients, similar to the table shared above, that allow us to see how a given moral dimension affects the trait while statistically holding all other dimensions constant.

Political Ideology and Clearer Thinking's 15 Dimensions

Our Clearer Thinking framework revealed several strong correlations with political conservatism:

Christianity showed the strongest positive correlation with conservatism (r = 0.39), indicating that the scenarios our model associated with Christian values are strongly linked to conservative identity
The Prejudice dimension showed a fairly strong negative correlation with conservatism (r = -0.31), suggesting that progressives were more likely to consider prejudice when making moral judgments
Emotional Pain and Utilitarianism both showed notable negative correlations with conservatism (r = -0.28), suggesting that conservatives were less likely to base moral judgments on emotional suffering or on calculating the greatest happiness for the greatest number
Social Contract (valuing societal cooperation and judging actions by commonly agreed-upon rational rules) and Unfairness were also less emphasized by conservatives (r = -0.24 and -0.20, respectively)

The table below shows the correlations between political progressivism and the 15 Dimensions of Moral Judgment by Clearer Thinking:

How Other Demographics Relate to Moral Dimensions

The following sections explore correlations between moral dimensions and other demographic factors (e.g., feminist identity, religious affiliation, urban/rural living, personality, etc.) These are exploratory findings, so please note that they have a higher risk of being false positives than some of our other results, especially since many hypotheses were tested for this section.

Feminist Identity and Moral Dimensions

Self-identified feminists (n = 96) showed:

Moderately strong positive correlations with CT's Emotional Pain (r = 0.23) and CT's Harm to Vulnerable (r = 0.22), highlighting an emphasis on empathy for negative emotions and protecting those who are less able to protect themselves
Positive correlations with CT's Utilitarianism (r = 0.18) and CT's Unfairness (r = 0.19), indicating a tendency towards considering the extent to which actions cause suffering/reduce happiness and equity in resource distribution
Negative correlation with CT's Impurity (r = -0.20)
Negative correlations with CT's Loyalty (r = -0.17) and CT's Christianity (r = -0.15), suggesting a potentially critical stance on religious/patriarchal structures and rejecting group conformity pressures

Religious Identity and Moral Dimensions

Self-identified Christians (n = 167; Note: 'Christians' here refers to a demographic group and is different from CT's Christianity moral dimension) showed negative correlations with CT's Utilitarianism (-0.17) and CT's Emotional Pain (r = -0.16), and a positive correlation with CT's Christianity (r = 0.28) and CT's Impurity (r = 0.26)
Self-identified atheists (n = 53) showed a positive correlation with MFT's Equality (r = 0.27)
People identifying as spiritual but not religious (n = 49) showed a positive correlation with CT's Abuse of Power (r = 0.19)

Big Five Personality Traits

Openness to experience correlated negatively with MFT's Sanctity/Degradation (r = -0.19), meaning people who are higher in Openness were less likely to make moral judgments based on concepts of purity or disgust
Extraversion correlated positively with CT's Authority (r = 0.15)
Emotional stability (i.e., low neuroticism) correlated negatively with CT's Emotional Pain (r = -0.17), possibly because emotionally stable people experience less emotional distress and may be less attuned to it in others
Agreeableness correlated negatively with MFT's Authority/Subversion (r = -0.18) and CT's Abuse of Power (r = -0.15)

Economic Trust Games

Giving behavior in the Trust Game correlated positively with CT's Inequality (r = 0.16) and CT's Christianity (r = 0.15), and negatively with CT's Loyalty (r = -0.16). In this hypothetical game, participants proposed splitting a theoretical $100 with an anonymous partner. The partner could accept or reject the offer (if rejected, neither received money). Then, the amount participants offered was doubled by researchers, and the partner then decided how much, if any, to return. Our 'Giving behavior' metric refers to the share of the initial $100 that participants allocated to their partner. Note that no real money was actually exchanged.

Other Findings

Self-identified environmentalists (n = 80) showed positive correlations with MFT's Equality (r = 0.18), MFT's Care/Harm (r = 0.15)
Living in more urban areas showed positive correlations with MFT's Care/Harm (r = 0.17), CT's Emotional Pain (r = 0.19), and CT's Utilitarianism (r = 0.15) — suggesting city residents may place a higher priority on empathy and considering others' suffering compared to rural residents
Higher donations to charity in the past 12 months showed a positive correlation with CT's Impurity (r = 0.19)
A higher number of visits to churches, temples, or other places of worship in the past 30 days (excluding weddings and funerals) showed a positive correlation with MFT's Sanctity/Degradation (r = 0.18) and CT's Impurity (r = 0.20)
Socioeconomic status correlated negatively with CT's Prejudice (r = -0.17)
Education also correlated negatively with CT's Prejudice (r = -0.17)
Income and gender showed relatively weak correlations (r < 0.15) with all moral dimensions, suggesting that these factors may play a smaller role in shaping moral judgments

Note: These findings are based on a sample size of n=359. We focused on correlations of r ≥ 0.15, a threshold that ensures a p-value of less than 0.01 – indicating that, for each result considered individually, a result at least this extreme would be obtained less than 1% of the time if there actually was no correlation. Given the multiple comparisons being made (and the lack of other adjustments to account for this), this more conservative approach increases the robustness of our conclusions – even though, as mentioned, this section is more exploratory and may contain false positives.

What's Next?

In part one of this three-part series, we explained the research behind our new 15-dimensional Clearer Thinking framework of morality and introduced our new Understanding Your Morality tool that this research enabled.

In this second part, we presented many of the findings from our data and compared our new framework with Moral Foundations Theory.

To conclude our series, part three will soon further explore what our new framework can tell us about the moral tendencies of AI.

In the meantime, we'd love to hear your thoughts – on the tool, on the research, or on the ideas and questions it raises.

Launch the Understanding Your Morality tool!