Shortly after Election Day 2016, ClearerThinking took part in organizing a novel research project. Inspired by our longstanding interest in making more accurate predictions, we collaborated with affiliates to set up a long-term experiment: how accurately could different groups of people predict the political events of the coming years? Our findings from the first year of the Trump presidency demonstrate the intriguing power of group forecasts.
The immediate aftermath of the 2016 elections was an exceptionally contentious moment for public discussion. Wildly differing expectations for the incoming Trump administration gave rise to equally wide-ranging predictions about the years to follow, running the full gamut from utopian optimism to apocalyptic pessimism. Our experiment was inspired in part by this hyperbolic cultural conversation. Could it be that asking people to make predictions using the precise language of probabilistic likelihoods encourages more nuance and more accurate predictions, even under highly uncertain conditions?
Here's how the experiment worked:
Three groups of people made forecasts:
The "friends" group, which consisted of roughly 20 of one study designer's acquaintances who'd expressed an interest in improving their forecasting. This group was permitted to take as much time as they needed to make their predictions.
The Good Judgment Inc. group, consisting of paid "superforecasters." Good Judgment, Inc. offers a paid service where an organization can place questions of its choice into "forecasting tournaments," which carefully score individual participants for forecasting aplomb. Its "superforecasters" are individuals who have performed exceptionally well in such tournaments.
Good Judgment Inc.'s superforecasters were recruited primarily from the IARPA ACE program, a four-year government forecasting tournament. The superforecasters consist of the best forecasters in the project, along with additional forecasters recruited from the top performers on the Good Judgment Project's open competition site, Good Judgment Open.
The Mechanical Turk group, consisting of workers from Amazon's Mechanical Turk service using our affiliated platform Positly. These workers were pre-vetted for a basic level of adeptness and interest in forecasting, and were given some simple instructions for how to approach each individual forecast. Unlike the other two groups, this group was required to make their predictions within a limited timeframe.
This group in particular was selected to test the power of aggregated forecasting – we hoped to determine whether their collective efforts would yield accurate forecasts, despite their lay status, lack of practice, and limited time to make predictions.
Each participant was asked to predict the likelihood of specific, measurable political or economic events occurring during 2017 or 2018.
Participants gave their forecasts in terms of percentage likelihood, a la "this event has an X% chance of occurring."
The possible events the participants guessed the likelihood of were specific and measurable – such as "Trump's average approval rating falls beneath 30%" instead of "Trump becomes extremely unpopular," or "ICE removes 500,000 people from the United States or more," instead of "Trump aggressively cracks down on illegal immigration."
The number of events that each group made predictions for varied – the Mechanical Turk group evaluated hundreds of possibilities, while the other two groups assessed far fewer.
So far, 30 of these scenarios have been conclusively resolved for the calendar year of 2017 (i.e. we were able to conclusively determine whether 30 of these items either definitely did occur or definitely did not). Our results thus far are based on these scenarios.
All three forecasting groups made their predictions before the Trump administration began.
We also used an "extremizing" process to adjust the collective predictions of the Mechanical Turk group in order to account for the systemic under-confidence that has been observed by researchers in prior forecasting studies. This procedure gives more weight to more extreme predictions, which prevents dilution from too many participants who know nothing at all about the topic. Studies have shown that this technique tends to improve the accuracy of group forecasts..
Now that we're well over a year into the Trump administration, it's possible to start evaluating the accuracy of all three groups' predictions. Here's a short summary of the findings so far:
Most of the 2017-specific events we asked participants to predict the likelihood of did NOT come to pass.
In fact, only two of the events we've been able to evaluate thus far – both dealing with Trump's approval rating – came to pass.
This result doesn't necessarily suggest that the Trump administration has been uneventful per se, but it does indicate that radical changes to America's social fabric have not occurred with the speed that some expected at the outset of the administration.
Of the three participant groups, the Good Judgment Inc. group made the most accurate forecasts. (Understandably, as this group consisted of professional forecasters.) However, all three groups performed fairly well – each one collectively guessed accurately on each outcome we were able to evaluate.
We used an "average log score" to measure accuracy - think of this measure as assigning points based on both confidence and accuracy, where greater confidence means more points if you are right, but more lost points if you are wrong.
This score tended to range from 80-90% for the "friends" group, with the worst of the 22 individuals in the "friends" group scoring around 70% and the Good Judgment, Inc. superforecasters scoring around 95%.
The median accuracy scores for each group broke down as follows:
The "friends" group: 87.3%
The Good Judgment Inc. superforecaster group: 95.47%
The Mechanical Turk group: 84.68%
And here are some noteworthy takeaways from the results:
Even though the outset of the Trump administration was one of the most chaotic and unpredictable political environments in recent American history, all three of these groups still managed to collectively predict a broad range of uncertain events with reasonable accuracy.
Remarkably, the Mechanical Turk group – consisting of lightly-vetted members of a very broad chunk of the overall population – forecasted the events of 2017 with a degree of accuracy fairly similar to that of a group of highly informed laypeople with an active interest in forecasting, and not all that much less accurately than a group of professional forecasters.
However, the results showed that the Mechanical Turk forecasters tended to assign excessively high probabilities to events that did not occur, suggesting that they systematically overestimated the chances of unlikely events.
In fact, the Mechanical Turk collective forecast frequently out-predicted individual members of the "friend" group, beating 73% of the individuals in that group.
(If you would like to see the anonymized forecasting data for the Mechanical Turk group, you can download it here. If you'd like to try the timed forecasting task these participants fulfilled for yourself, you can check it out here.)
In short, the Mechanical Turk group did much better relative to the other groups than you might expect. So, how did this happen?
It seems likely that one factor in the Mechanical Turk group's surprising predictive accuracy was the "wisdom of the crowd" effect — the tendency for aggregated estimates or forecasts generated by large numbers of people to be more accurate than individual efforts. Taking group aggregates like this one can cause opposing biases among individuals to cancel each other out, leading to a more accurate set of forecasts overall.
The detailed forecasting instructions we provided for Mechanical Turk participants may have made a difference as well. Each MTurker read the instructions below and consented to use them while making predictions for the study (though we've modified them slightly to be more general here). As we've discussed, they were then able to make well-reasoned and fairly accurate collective forecasts. Try using these tips yourself next time you're hoping to make an unbiased and well-reasoned probabilistic prediction:
Think probabilistically - Remember, if you say you are 50% confident that something will happen, it means you think that it would actually transpire 1 time out every 2 possible times that it could happen; if you say you're 99% confident that something will happen, it means that you think it would actually transpire 99 out of 100 possible times that it could happen.
Consider both sides - Try to consider both sides of each question, including arguments both for and against it happening. For instance, if you're considering whether Obamacare will be repealed in 2019, you could consider reasons why this might be likely, and also reasons why it might not happen.
Break problems down - If a problem seems too hard to estimate, consider breaking the prediction into simpler sub-prediction problems, and then combining them together to make your prediction. For instance, if you are estimating the chance that the number of violent deaths around the world will greatly increase, you could think about different ways this could happen and estimate them separately.
Use what you know - Use all the information you can access to improve the accuracy of your forecasts. Also be sure to use your existing beliefs and what you already know about the world to help you refine your thinking.