André Ferretti, Spencer Greenberg, Emmanuel Nnaemeka
- Feb 24
- 26 min read

How accurate are popular personality test frameworks at predicting life outcomes? A detailed investigation.

Updated: Apr 9

Our Aims

Our aim was to measure the predictive accuracy of the most popular personality test frameworks, and then to create a single test that could measure all of these at once. You can take the free 12-minute test that we developed while completing this study by clicking here.

It seems that the three most popular general-purpose personality tests in the world are the Myers–Briggs Type Indicator (also known as MBTI, with >30 million google search results), the Enneagram (>11 million search results), and the Big Five (>4 million search results).

Source: Global web search traffic from google trends

Summary

Here are the key takeaways from our study:

Big Five Superiority: The Big Five personality test outperformed the Jungian (an MBTI-inspired framework) and Enneagram test in predicting life outcomes.
Neuroticism's Impact: Removing Neuroticism from the Big Five resulted in a substantial drop in predictive accuracy.
Trait Distribution: Most personality traits approximately formed bell curves, meaning that most people fall near the middle on each trait, suggesting binary categorization (as is typical with MBTI-style tests) might introduce substantial noise.
Continuous vs. Binary: Continuous scores in the Jungian framework predicted outcomes substantially better than binary categories (which is important since MBTI-style tests are usually presented in a categorical form).
Jungian Limitations: The Jungian 4-letter framework showed less predictive accuracy than the Big Five, mostly due to its use of binary types (splitting participants into letters like I vs. E and N vs. S) and its failure to measure Neuroticism. By adapting the Jungian framework to give continuous scores (rather than categories) and excluding Neuroticism from the Big Five, then the predictive gap between the two frameworks narrows. However, even with these adjustments, the Big Five (without Neuroticism) still slightly outperformed the modified Jungian test (with continuous scores, not binary types).
Cross-framework Relations: Almost every Jungian trait correlated with a specific Big Five trait: the Jungian Extraversion/Introversion aligned with Big Five’s Extraversion, Intuition/Sensing with Openness, and Feeling/Thinking with Agreeableness. However, the Judging/Perceiving trait was associated with three of the Big Five traits.
Integration Ineffectiveness: Combining Big Five and Jungian test results didn't improve prediction accuracy over using just the Big Five alone. This suggests that the Jungian test does not add significant predictive value beyond what is already captured by the Big Five.
Enneagram's Surprisingly Good Performance: Despite its simplicity, the Enneagram binary (where we used only the 1-digit Enneagram as variable e.g., Type 9) performed better than the binary Jungian Type at predicting life outcomes. However, the Enneagram still underperformed the Big Five.
Participant Perception: Despite the Jungian test’s lower predictive accuracy, participants felt better after reading their Jungian assessments than their Big Five assessment, likely due to the Jungian test's positive framing — it’s better to be called “Thinking” than “with low Agreeableness”.

After releasing our research and article about this topic, we received a letter from The Myers-Briggs Company. It is at the bottom of this report if you'd like to read it (as well as our response to their letter).

How we developed our own versions of each test

To develop our own personality tests modeled on each of these personality frameworks, we studied the theoretical constructs that each framework claims to measure, and then developed our own questions to measure those constructs. For example, Conscientiousness in the Big Five is about being organized, rule-abiding, hard-working, and more; the hard-working construct can be captured by a statement like “I aim for maximum productivity.”

We built the personality tests using GuidedTrack , a low-code platform that we developed to create surveys, studies, learning modules and behavior interventions. All of our online tools on ClearerThinking.org are built using GuidedTrack

We then used our Positly platform to recruit study participants to complete our survey. We used this empirical data to test the performance of questions we had drafted, and also to select the final set of questions for each test. Our initial study had 323 valid participants (after removing spammers and low-quality data).

For all personality questions, we standardized the answers using a 7-point Likert scale from “totally disagree” to “totally agree.” So when we say personality “question”, we actually mean a statement that the user agrees or disagrees with (or they can be neutral).

For the Big Five framework, we had already completed a factor analysis of Big Five-style questions in the past, resulting in a curated list of questions for each trait (alongside their statistical correlation with each trait). From this pool, we chose 60 questions, prioritizing those with the highest correlation to their trait, but avoiding questions that were overly similar to each other (aiming to get a broad measurement of each of the five traits).

While developing our version of a MBTI-style and Enneagram test, we found the work of OpenPsychometrics.org very useful. They have published insightful and open-source work on Enneagram and Jungian Type s scales, which helped inspire our own versions of these tests. We would like to thank them for this work.

For the MBTI personality framework in particular, due to its proprietary nature, we couldn't examine the official test. Instead, we crafted our own questions inspired by similar constructs. While we tried to reflect the theory as accurately as possible, our results will inevitably differ from the official test (For the official version of that test, see the work of the Myers-Briggs company.)

We named our MBTI-inspired test “Jungian Type."

For both the Jungian and Enneagram tests, we purposely wrote more questions than would be needed (each question being designed to measure one of the underlying theoretical constructs). The data that we then gathered enabled us to see which questions were most and least correlated with the underlying trait, helping us to select the best questions to measure each trait.

We examined the effectiveness of each question using these criteria:

Cohesiveness: How well do responses to this question correlate with responses to other questions designed to measure the same assigned trait? The higher, the better.
Uniqueness: How unique is the question? Is it very different from the other questions? We removed those questions that were overly similar to other ones in the test.
Self-evaluation: For each MBTI-style trait and Enneagram type, we showed each participant a detailed description of that trait and asked them to self-evaluate on it. A question was considered more effective if its responses closely matched the self-evaluations for the intended trait.
Consistency: For the Jungian questions, we also looked at how each question’s responses correlated with people’s past MBTI-style test results (only for those who claimed to know their type). A question was deemed better if its responses aligned as expected with these self-reported results.
Clarity: All else equal, we favored questions that were clear, short and unambiguous.

To compute correlations between questions and traits, we used a Python program to exclude each question from its assigned trait temporarily. This step avoids inflating correlations, since a question naturally correlates with itself.

Applying all of these criteria led us to a final set of questions designed to measure the traits of each framework.

A limitation of this work is that our findings are only as good as the tests we designed for each framework. We endeavored to make high-quality tests and used empirical evidence for the design of each test to ensure accurate measurements. However, like all personality frameworks, none of our tests can be considered perfect at measuring the underlying constructs.

How we tested the predictive accuracy of each test

To figure out which personality test is more accurate, we gave our tests to 559 paid U.S. participants recruited on Positly.com. We also asked participants approximately 40 questions on their life outcomes (e.g., to what extent they exercise, are satisfied with their life, have many close friends, etc.).

In our study, we randomized the question order to reduce sample bias. We mixed the order of both personality and life outcome questions. However, participants had to finish the personality section before moving on to the life outcomes.

Since not all participants pay attention while completing surveys, we added attention checks in the personality questions. If a participant failed two or more, we excluded them from our dataset. For example, we expected U.S. participants to agree with “I have heard of the historical figure George Washington”.

We measured the performance of each personality test by seeing how accurately it could predict each of these life outcomes. In particular, we attempted to predict each life outcome using:

The five scores from each person's Big Five traits
Each person's 4 letters (I vs. E, N vs. S, T vs. F and J vs. P) from their Jungian type (MBTI-style)
Which of the 9 Enneagram categories the person was assigned to

As a comparison point to help us better understand the performance of each test, we also tried making these predictions with just 4 out of the Big Five traits (excluding Neuroticism), and treating the Jungian traits as four continuous scores (rather than the typical use as 4 binary categories, or a single assignment to one of 16 categories).

Technical details about our methods: Our primary measure of prediction "accuracy" is what's known as R (the square root of R^2). It measures how close each prediction is to the true value of that outcome, with a maximum score of 1 indicating a perfect prediction. It is similar to (but not quite the same as) the correlation between predictions and outcomes. For continuous life outcomes, we made our predictions using multivariate linear regression. For dichotomous outcomes (of which there were only a few), we made our predictions using multivariate logistic regression (using the mode value rather than the mean value in the denominator of our calculation of R^2 and hence R). When R scores were negative (which can happen when a prediction is worse than simply using the mean or mode to predict for every person, indicating that the regression algorithm has learned nothing), we assigned an R of 0. All prediction models included an intercept. To help prevent the possibility of overfitting (which can cause inflated accuracy estimates if performance is evaluated on the training data) we measured predictive performance out-of-sample using 10-fold cross-validation (by computing the sum of squared error on each withheld cross-validation fold, then summing this error across the folds). Measuring performance out-of-sample is critical in this case because we're comparing models with different numbers of independent variables, and in-sample (training data) error measurement would be biased toward favoring models with more independent variables, such as the Big Five, over models with fewer, like the Jungian/MBTI-style framework. For astrological sun signs, in addition to ordinary least squares and logistic regression, we also experimented with adding an L2 penalty to these models just in case the very poor prediction performance was caused by overfitting the training data, but the L2 penalty versions did not improve the predictions as can be seen here. For outcome variables with outliers, we clipped outliers before training the prediction models. You can download an anonymized copy of our data (including Big Five scores, Enneagram scores, MBTI-style/Jungian scores, zodiac signs, and life outcomes) and the Python code from our main analysis by clicking here. When assigning Enneagram categories to study participants, we calculated what score each participant received for each of the 9 Enneagram types and then assigned to that person whichever of those 9 types their score was the highest percentile for (compared to other study participants). For Jungian/MBTI-style categories, we assigned participants a type by calculating the percentile scores for each person for each pair (E/I, S/N, T/F, and P/J) and then assigned that person to the type in the pair that they are higher than the 50th percentile for.

What were the studies we ran?

Our work on this actually consisted of two separate studies.

Study 1 involved collecting data on a wide range of questions with the goal of winnowing them down to a smaller set. We also collected information about 37 life outcomes that we could use the three test frameworks to predict. After filtering out spammers and low-quality responses, this first study gathered 323 valid participants.

We used the data from study one to create a "study" version of our tests, featuring an equal number of questions for the Big Five and Jungian frameworks — 60 questions each. For the Big Five, we included 12 questions per trait, while the Jungian framework, having four traits, had 15 questions per trait. At this stage, we intentionally avoided using any question across more than one framework to prevent artificially enhancing their similarity.

After having created the “study” version of our test, we retroactively assigned personality trait scores to our 323 participants on the Big Five and Jungian frameworks. We used this data to test how well each framework performed in predicting each of our 37 life outcomes.

Our “ClearerThinking.org” version of the test, used in our free online personality test, consists of 120 questions: 60 for the Big Five, 60 for the Jungian Type, and 45 for the Enneagram. Despite what the numbers suggest, the total isn't 165 because some questions count towards two traits at once (as we were able to validate that some questions measure more than one trait), thus keeping the total question count at 120.

This dual-purpose approach streamlines the test, making it more time-efficient for users. For example, if a participant agrees with the statement "During conflicts, I focus on acknowledging others' emotions", it not only lowers their score in the Thinking trait (the reverse of Feeling) within the Jungian framework but also increases Agreeableness in the Big Five.

Our second study was to explore how people react to the reports generated by the different personality tests. We used this to explore the hypothesis that, despite their lower predictive accuracy, people may find MBTI-style tests more compelling than the Big Five. We recruited another 250 U.S. participants for this, ultimately including 236 in our analysis after filtering out spammers. These participants took our "ClearerThinking" version of the personality test.

After completing the test, participants received their personality reports, the order of which (Big Five or Jungian or Enneagram type) was randomized. After each section, we asked participants to rate that section using a Likert scale on the following aspects:

Whether the report made them feel good about their personality.
Whether it made them learn valuable things about themselves.
Whether it accurately described their personality.
Whether they found it interesting.

How well does our Jungian scale measure the same thing as MBTI-like tests?

We asked those participants who were familiar with their Myers-Briggs Type Indicator (MBTI) type what they believed it to be. Our findings show a solid (though far from perfect) correlation between the participants' self-identified MBTI types and our Jungian scores, confirming our scale's accuracy.

There are three challenges of using this self-report method to help validate a test:

Some participants think they remember their test results, but actually are misremembering them.
Some participants took low-quality tests that are not reliable, and so are reporting the inaccurate results given by those tests.
The test-retest reliability on personality tests is far from perfect (this is especially known to be a problem on some MBTI-style tests). So even if participants took accurate tests in the past, it’s reasonable to expect they might get different results today.

All three of these factors will reduce the correlations between any new test and what people report about their scores on similar tests. That being said, looking at these correlations can still be a useful method for seeing that a new personality test is on the right track.

For 118 participants familiar with their MBTI type, we found an average 0.36 correlation between the Jungian scores we assigned (e.g., the E, S, T and J scores) and the corresponding type they reported they were given in the past test they took (e.g., “I am an Extovert, not an Introvert”). See the discussion below for context on whether 0.36 is a "good" correlation.

Out of all the correlations between each self-reported MBTI trait and our assigned Jungian scores, each Jungian trait has the highest correlation with its corresponding self-reported MBTI trait. This key finding helps confirm the validity of our approach.

Additionally, we asked participants to read detailed descriptions of traits (for instance, the descriptors for I versus E) and then rate themselves on this trait using a slider. Here is the form those participants took (we included common elements from descriptions of MBTI-style tests):

Each self-evaluated trait from these slider questions also correlated most with its assigned Jungian trait, which is another positive sign for our test’s validity.

Interestingly, the self-evaluated J vs. P showed a significant correlation with our assigned N vs. S measurement. This likely occurred because there’s also a substantial correlation between our assigned J vs. P trait and our assigned N vs. S trait. Below is a plot of how each Jungian trait score correlated with each other Jungian trait:

The correlation of our measurements with self-reported scores (from a prior test people recalled taking, and with the self-report sliders) can be seen below:

Within this group of 118 people, our assigned scores accurately predicted the full four-letter self-reported type in 32% of cases, and the individual traits in 72% of cases. That means that, on average, we correctly predict about three out of four MBTI traits per person.

At first, an average correlation of 0.36 for the binary category correlation and a 72% accuracy at predicting each letter code might seem low. However, these figures are better than they appear. Imagine this: if 15% of participants misremember a letter code, and another 15% report the wrong code even if they remember their previous test result correctly (maybe they took an inaccurate test, or their personality changed, or they fall right near the middle of the trait distribution and so even a single answer change would make them flip letter), then only about 72% of reported letter codes are reliable. This figure is precisely the rate at which we were able to predict individual traits! Given these factors, a new test achieving near-perfect accuracy in predicting dichotomous categories is unrealistic.

Given the scenario where 15% of participants misremember their letter code and another 15% face inaccuracies due to prior testing, the highest achievable correlation between predicted and actual categories for the best possible test would be around r=0.45. Unfortunately, we don't know the exact rate of misremembering for each letter and the test-retest reliability of each letter (presumably taken from free online tests). Therefore, the theoretical maximal correlation for the best possible test is unknown, but it’s clear that it’s far below 1.0.

Which personality framework is best at predicting life outcomes?

As mentioned, we asked 559 U.S. participants approximately 40 life outcome questions, ranging from the size of their social circles to their exercise habits and overall happiness. Our aim was to determine which personality test's results (e.g., the 5 scores from the Big Five, the 4 dichotomies or scores from the Jungian test, and their Enneagram type number) can be used to most accurately predict each of these outcomes.

The Big Five emerged as the winner, boasting twice the predictive accuracy of our Jungian test (when used as a binary category, like ‘ISTJ’, which is the typical use of such tests; not using a continuous numerical score for each trait). The results below show the average correlations:

In our analysis, the Big Four (which is the Big Five minus Neuroticism) outperformed the Jungian binary type in predicting outcomes, showing that the Big Five test did not merely outperform because it has five traits instead of four. We chose Neuroticism to remove in this analysis because it’s the Big Five trait that is least captured in the Jungian test.

We also compared the binary Jungian Type classifications (E vs. I, N vs. S, F vs. T and P vs. J) to the continuous Jungian scores (with each participant being assigned a continuous number for each of the 4 Jungian traits). Unsurprisingly, the numerical continuous version performed better, though still not as well as the Big Four (the Big Five without Neuroticism).

The Jungian binary version loses information because each score is turned into just a 0 or 1, which probably is the reason that it underperforms the continuous version. The Jungian binary categories did, however, have greater predictive accuracy than astrological sun signs (e.g., "I am an Aquarius" or "I am a Pisces"), which had no predictive accuracy whatsoever.

Interestingly, combining the Big Five and Jungian frameworks didn't add predictive power compared to using the Big Five alone. Using linear/logistic regression, the Big Five alone yields an average R^2 of 0.06 (or, equivalently, a correlation between predictions and actual life outcomes of 0.23), which is no different than when we also include the Jungian variables. Median R^2 values also were the same whether we used just the Big Five or used the Big Five and the Jungian variables together.

Perhaps surprisingly, the Enneagram binary framework performed better than the Jungian Type binary. Considering that Enneagram binary is composed of only a single type assignment (e.g., Type 1, or Type 2, etc.), the Enneagram's performance is even more impressive. That being said, the Enneagram still produced substantially worse predictions than the Big Five test.

Why is the Jungian test less accurate?

Part of the reason for the Jungian’s underperformance compared to the Big Five is that the Jungian test uses only four traits, compared to the Big Five's five — the notable omission being Neuroticism. To see how big of a factor this is, we tested removing Neuroticism from the Big Five, leading to a 22% drop in its predictive accuracy.

Note that, possibly to make up for this difference, some newer MBTI-style platforms introduce a fifth trait (e.g., Assertive vs. Turbulent); however, we did not include it.

Another limitation of some Jungian tests is that they put people in binary categories (e.g., E or I, S or N, etc.), unlike the Big Five, which scores traits with continuous numerical values (e.g., an Extraversion score of 84 out of 100). All the personality constructs that we studied approximately formed bell curves when we plotted their distribution, implying that categorizing people as either high or low on a trait introduces significant error because most people fall near the middle.

For instance, the distribution of scores for our Jungian "Sensing" trait is shown below in blue, with the dotted line representing a perfect bell curve:

If a personality trait distribution tends to fall just to the left or right of the mean (but people are assigned as binary categories), this creates instability in their letter assignment. Suppose Alice is assigned the letter S (Sensing) the first time she takes the test, but she’s extremely close to the mean. If she retakes the test, there’s almost a 50% chance that this second time she’ll be assigned the letter N (Intuitive). For people near the mean, answering even just a question differently can be enough to switch letters.

Switching from a binary to a spectrum-based approach significantly enhanced the predictive power of our Jungian test. The table below illustrates this, displaying the average and median accuracy for each framework when predicting life outcomes:

The table above shows that, without Neuroticism, the Big Five has a reduction in its mean predictive accuracy by 22% (from an R of 0.23 to 0.18). Additionally, converting the Jungian test from its typical binary use to continuous numerical scores increases its accuracy by 36% (from an R of 0.11 up to 0.15).

With both of those adjustments, the gap between the Big Five and the Jungian test narrows significantly: The Big Five without Neuroticism has an average R of 0.18, while the numerical continuous Jungian test has an average R of 0.15.

Hence, our study suggestions that primarly the Big Five outpeforms Jungian (MBTI-style) tests for making predictions because Jungian style tests (1) are missing Neuroticism and (2) dichotomize people into categories rather than using continuous scores for each trait.

However, our Big Five test without Neuroticism still performed a little bit better than the continuous version of the Jungian test. These numbers are close enough that it could be a mere fluke that the Big Five performs better, but it’s also possible that the Big Five used a superior test construction approach (i.e., selecting better factors to measure).

By being designed to best capture personality variance, perhaps that also makes the Big Five better at predicting outcome variance (while personality and outcome variance are not the same, perhaps they are related). When the Big Five test was originally developed, researchers compiled an extensive list of words to describe people, like "kind", "anxious", and so on. They then analyzed which words often go together using factor analysis. For example, if "outgoing" people also described themselves as "talkative", those words were part of a bigger trait (Extraversion). This method of grouping similar words under broader traits led to the Big Five traits we know today.

How do the Jungian traits relate to the Big Five traits?

In our study, the four Jungian traits aligned with four of the Big Five traits. Extraversion in the Jungian test mirrored Big Five's Extraversion, Intuitiveness mirrored Openness, and Feeling resembled Agreeableness. Finally, the Jungian Judging trait correlated with three Big Five traits: positively with Conscientiousness and negatively with Extraversion and Openness.

It is perhaps somewhat intuitive that J, which measures preferring to structure one's life and plan ahead (rather than being spontaneous), is positively correlated with Conscientiousness (which is about being organized and orderly), negatively correlated with Extraversion (since extroverts are known to be somewhat more impulsive) and negatively correlated with Openness (since people higher in Openness like novelty).

Could it be that Jungian J is nothing more than a combination of these three Big Five traits? No, because we tried to predict Jungian J using a linear regression with the three Big Five traits as independent variables, and the R score (similar to a multi-dimensional version of correlation) was only 0.43 out of sample. So Jungian J is not merely some combination of these three — there’s more to it.

Is Big Five’s Neuroticism part of the Jungian traits?

Of all the Big Five, the one least represented in Jungian traits is Neuroticism. Neuroticism inversely correlates with Jungian Extraversion in our study (r = -0.33), showing no significant links with other traits. But this relationship seems less unusual when you consider that Neuroticism and Big Five’s Extraversion (not the Jungian one) are also negatively correlated in our study (r = -0.28). Part of this negative correlation may be driven by a common finding that people high in Neuroticism tend to be less happy than average, while people high in Extraversion tend to be more happy.

How do the Big Five traits relate to each other?

Within the Big Five framework, we found that Extraversion shows a moderate positive correlation with both Openness and Conscientiousness, but it’s negatively correlated with Neuroticism. Additionally, Openness is moderately positively correlated with Conscientiousness while also having a negative correlation with Neuroticism. Agreeableness, on the other hand, appears to have minimal connections with the other traits.

Comparing our results to a meta-analysis that examined the interrelations between the Big Five traits, we notice that our findings mostly align with the general trends in the meta-analysis. However, two significant discrepancies stand out.

First, in our study, Agreeableness shows a weaker correlation with the other traits (particularly Extraversion) than the meta-analysis. Second, we observed a weaker negative correlation between Conscientiousness and Neuroticism compared to the meta-analysis.

How do the Jungian traits relate to each other?

Examining our Jungian test results, our analysis reveals two key relationships between personality traits.

First, Extraversion (E, the opposite of Introversion), displays a moderate negative correlation with Judging (J, the opposite of Perceiving). This suggests that individuals who are more extraverted tend to be less inclined towards the structured, planned approach associated with the Judging trait.

Second, Sensing (S, the opposite of Intuition), shows a strong positive correlation with Judging (J). This suggests that those who rely more on sensory experience and concrete information (typical of Sensing types) are more likely to prefer order and organization, which are characteristics of the Judging trait.

How does the Big Five influence life outcomes?

Using data from 559 U.S. participants, this table reveals the impact of the Big Five traits on some of the life outcomes we analyzed. For instance, people with high Extraversion and Conscientiousness but low Neuroticism report greater life satisfaction:

In this analysis, we have not explored Jungian predictions because of their relatively lower predictive power compared to the Big Five. Furthermore, incorporating Jungian traits alongside the Big Five did not enhance the overall predictive accuracy of our Big Five-only model.

If you'd like to see the results for how each of the personality models we tested performed at predicting each one of the 37 life outcomes, see the pdf here.

How do people feel about their Big Five, Jungian and Enneagram results?

Why are MBTI-style tests popular? Their appeal may partly lie in their ability to make people feel good by framing personality traits positively. For instance, what the Big Five labels as lack of Agreeableness — often viewed negatively — becomes "Thinking" in our Jungian test, which sounds better. Similarly, Neuroticism, the least flattering Big Five trait, is absent from our Jungian test.

In our follow-up study comparing perceptions of different frameworks, 236 participants viewed their personality report sections (Big Five, Jungian, and Enneagram) in a randomized order. We then asked them to rate the report they just read using a 7-point Likert scale.

Participants preferred their Jungian personality assessments over the Big Five, appreciating the feel-good factor and finding them somewhat more accurate, valuable, and interesting. However, the preference for the last three (accuracy, value, and interest) was mild — the statistical analysis shows no definitive superiority due to p-values exceeding 0.1.

Notably, the only significant difference among the means emerged in how the reports made users feel: the Jungian test excelled at making users feel good. Yet, when it came to predicting life outcomes, the Jungian approach lagged behind the Big Five, indicating a trade-off between immediate satisfaction and statistical predictive accuracy.

Let’s break down the distribution of answers for this feel-good question. When asked if the report they just read made them feel good about their personality, 10% disagreed for the Jungian report, while 19% disagreed for the Big Five. That's nearly double the dissatisfaction, indicating that the softer framing of our Jungian report was better received.

Conclusion

Our study suggests that if you care about how well a personality test can predict outcomes about people's lives, then the Big Five test is superior to a Jungian (MBTI-style) and Enneagram approach. It also suggests that dichotomizing traits into binaries (rather than using continuous scores) substantially impairs accuracy. If you'd like to compare your results based on the Big Five, Jungian and Enneagram frameworks, you can do so using our free online test here.

Response

After releasing this report and our article in Scientific American, the Myers-Briggs Company company emailed us this letter and requested that we post it:

While the study described in the article Personality Tests Aren’t All the Same--Some Work Better Than Others was certainly interesting, it managed to miss the entire point of the Myers-Briggs Type Indicator® (MBTI®) instrument, which is not designed to predict life outcomes, but rather to give people more control over their own life outcomes by describing their personality. By asserting that the MBTI doesn’t work well, the authors--who should have noted that they have a clear conflict of interest as one of them runs a website that sells products that are competitive to the MBTI and Big 5--are essentially criticizing the instrument for not being effective at something that it was not designed for in the first place.

First, and possibly most importantly however, the article is misleading from the outset. Its subtitle and first three paragraphs clearly imply that the research is based on the MBTI assessment. In fact, the research is based not on the MBTI assessment, a questionnaire backed by significant research, but on an “MBTI-style test” of the authors’ own construction. There is absolutely no reason to conclude that findings based on this assessment bear any resemblance to findings that would have been obtained if the actual MBTI had been used. Thus, many claims made in the article should by all rights be disregarded, including:

· The assertion that Big Five being twice as accurate as MBTI-styled tests for predicting life outcomes, and the MBTI being halfway between the Zodiac and the Big Five is misleading. It implies that differences in correlation coefficients can be compared in a linear fashion, which they can not. The Zodiac correlation was essentially zero, while the correlation with the MBTI-styled instrument was statistically significant--these findings do not indicate that the MBTI is ‘...halfway between science and astrology--literally’. On the contrary, they imply that their “MBTI-styled instrument” was shown to produce some level of predictivity, while astrology did not.

· The statement “Our study suggests that MBTI-style tests may be sacrificing predictive accuracy in exchange for gratification” does not hold water. The “softer framing” of the “MBTI-style” test used in the research is, so far as can be gauged from the original article, likely due to the way in which the researchers have written their version of the reports, and cannot be held to apply to the reports produced from the MBTI assessment itself, which of course focus on possible negatives (or development needs) as well as possible positives.

So, not only does the study tell readers absolutely nothing about the MBTI because it does not use the MBTI, but those conclusions which the authors draw demonstrate poor interpretation of data. Furthermore, the study design itself is lacking, as correlation is not the most appropriate statistic for categorical variables such as personality type. The actual MBTI--a psychometrically validated instrument built on decades of research--is the world’s most popular personality assessment largely because people find it accurate and useful. By helping people understand their own tendencies and inclinations, it enables individuals and organizations to optimize their talents and abilities based on what they have to work with. Consequently, its value in work settings and applications such as conflict resolution, leadership development, team building, and numerous other areas has been demonstrated through case studies from numerous organizations and its use by the majority of the Fortune 100 as well as top universities and government agencies.

However, both The Myers-Briggs Company and The Myers-Briggs Foundation have consistently maintained that the instrument was not designed to measure:

· Pathology. It assesses normal, healthy personality differences.

· Potential. The MBTI suggests predisposition, but not predetermination. Thus, its use for predicting how people will perform in various settings has been discouraged by both of the organizations that govern its use.

One may ask, if the instrument is psychometrically validated, shouldn’t it correlate with certain outcomes? The answer to this is yes, and in fact there have been countless studies that have shown that this is indeed the case. Peer reviewed research has shown that MBTI type correlates with an incredibly wide range of aspects of life, from entrepreneurial tendencies, innovation approaches and teacher performance to more abstract experiences such as dream patterns. These are just a few examples among more than 11,000 citations of the MBTI (Center for Applications of Psychological Type’s MBTI Bibliography). Furthermore, the statement “...adding MBTI-style personality results to Big Five ones didn’t lead to predictions that were any more on the mark than Big Five ones alone” is no doubt true of this study--which is actually not based on the MBTI--but is contradicted by other research.

The contrast between the instrument’s descriptive capabilities and the decision not to use it to predict performance can be best understood in the context of its use in career development. Extensive research has shown that MBTI type correlates with occupational choice (a phenomenon described in detail in the MBTI Type Tables for Occupations, 2nd Ed.). Many occupations tend to attract certain types disproportionally. This does not mean, however, that those individuals are more likely to succeed in those careers. Rather, the value of such knowledge lies in its ability to educate individuals regarding what they are likely to experience when they enter a certain profession.

Someone entering an engineering-related field whose personality type differs from the majority of those within the field will, for example, find great value in learning about how work, learning and communication styles of their peers may differ dramatically from their own. This empowers them to adapt, be more understanding of others, and modify their own behavior in ways that make them more effective in their jobs.

While the study cited in the article criticized the MBTI’s use of binary classification, The Myers-Briggs Company and Myers and Briggs Foundation continue to maintain that this is the most appropriate format for describing personality, based on the theory upon which the instrument is founded. This theory holds that while an individual may engage in behaviors characterized by both poles within a preference pair, it is assumed that he or she naturally prefers one preference over the other. This is similar to how most people are perfectly capable of using either their right hand or left hand, but prefer one or the other for most tasks. While in reality there may be degrees of right-handed or left-handedness, describing one’s hand preference on a scale would not be as beneficial or practical as the binary classification. In fact, the MBTI type classifications are based on scales, in similar fashion to the Five Factor model, but are translated into type for this very reason.

But even within this classification-based approach, the MBTI can provide more detailed descriptions based on a “Preference Clarity Index (PCI), which describes how clear an individual is about a particular preference. The PCI indicators include slight, moderate, clear, and very clear and are used to aid in interpretation.

And finally, it should be noted that the implication that the MBTI dimensions are a “rebranding” of Big 5 names, undertaken to make them more acceptable, is historically inaccurate. The naming of the MBTI dimensions is taken directly from Jungian theory, which predates formulation of the Big 5 by decades.

In summary, the assertion that the MBTI isn’t as good as other instruments, based on the results of a rather narrowly focused study, is misleading at best, and ignores both its stated purpose as well as the wealth of research supporting its insights (and once again, glosses over the fact that the MBTI isn’t even actually used in this study).

Response to their response

Here is the response one member of our team sent after receiving their letter:

As you know, we never claimed to test the official Myers-Briggs Type Indicator® (MBTI®) instrument. Since it is a paid commercial test, it is not easy to study directly. For the research, we designed a test based on the underlying Jungian theory and constructs and put in a substantial effort to validate it carefully before using it. If you'd like to make your commercial test available to us for free for research purposes, we could compare the MBTI-style test used in the research discussed in the article directly against the official type indicator.

The claim of our article is that, in our research, MBTI-style constructs predicted life outcomes better than astrological sun signs but less well than Big Five constructs. Furthermore, we found that a substantial proportion of this underperformance could be attributed to (1) the way that the MBTI-style test, in line with typical usage, uses categories rather than continuous scores, and (2), the lack of a measurement of "neuroticism" in MBTI-style tests.

I stand behind these findings. Nothing in your email (including the studies you sent, which I took a look at) provides evidence against these claims.

There are some errors in your email, which I'd like to correct:

as one of them runs a website that sells products that are competitive to the MBTI and Big 5

This isn't true. We offer to the public a free version of the tests used in our research. We do not sell these.

Furthermore, the study design itself is lacking, as correlation is not the most appropriate statistic for categorical variables such as personality type.

This concern appears to be based on a misinterpretation of the way we conducted the analysis.

One may ask, if the instrument is psychometrically validated, shouldn’t it correlate with certain outcomes? The answer to this is yes, and in fact there have been countless studies that have shown that this is indeed the case. Peer reviewed research has shown that MBTI type correlates with an incredibly wide range of aspects of life, from entrepreneurial tendencies, innovation approaches and teacher performance to more abstract experiences such as dream patterns.

I agree with you that a valid personality test should be able to predict outcomes. As our article explained, the MBTI-style test did predict outcomes; it just did less well than the Big Five test. We made that clear in the article. A number of people, in response to our article, even told us that our article raised their opinion of MBTI-style tests because previously, they had assumed they were pseudoscience, but the results of our research suggested that they can predict outcomes to a reasonable degree.