Why there’s so much disagreement about the timeline for advanced AI

Sarah H. Woodhouse
12 minutes ago
14 min read

Guest Post by Sarah Hastings-Woodhouse, originally published on BlueDot Impact

Curious or Concerned About the Future of AI?

AI could help humanity or harm it. The difference will depend on how wisely we handle it. Join thousands of curious minds thinking critically about its risks and rewards, get hands-on experience with the AI tools making headlines (instead of just reading about them), and be part of the conversation about its future by joining this free online course. Additionally, earn an industry-recognized certificate that shows you’re engaged with the future of AI and helps you stand out in roles focused on AI, policy, or the public good.

Join the free online course on the Future of AI

Key Takeaways

💡 Many experts agree AGI (Artificial General Intelligence) is possible, but not when it will arrive. Forecasts range from a few years to decades, reflecting uncertainty about what artificial intelligence truly requires and how fast current progress can continue.

⚙️ Some experts think AGI is close because AI systems are rapidly mastering academic benchmarks, completing longer tasks, and may soon automate parts of AI research itself - especially if computing power continues to scale exponentially as it has.

⚙️Others think AGI is further away because benchmarks capture only narrow skills, many everyday tasks remain hard for AI, an “intelligence explosion” might be impossible, and real-world progress depends on more than raw reasoning power.

🔍 The future is uncertain. If AGI doesn’t emerge by 2030, progress may plateau, but if it does, the world doesn't seem ready. Understanding these debates helps us prepare for either outcome.

Few who have closely followed the field would argue that AI progress over the past few years has not been rapid.

Large Language Models (LLMs) have provided an unexpected path to a broader and broader range of AI capabilities. In 2019, OpenAI’s GPT-2 struggled to write a coherent paragraph. In 2025, LLMs write fluent essays, outcompete human experts at some graduate-level science questions, and excel at competition mathematics and coding. The most advanced multi-modal AI models now produce images and videos that are hard to distinguish from reality.

These models are impressive but they still fall short of the north star that frontier AI companies are working towards. Artificial General Intelligence (AGI), which OpenAI describes as “a highly autonomous system that outperforms humans at most economically valuable work” has been the ultimate ambition of AI researchers for many decades.

Most experts in the relevant tech industries agree that AGI is possible. They also agree that it will have transformative consequences. There is less consensus about what these consequences will be. Some believe AGI will usher in an age of radical abundance. Others believe it will likely lead to human extinction. One thing we can be sure of is that a post-AGI world would look very different from the one we live in today. If you’re interested in hearing more about this perspective, you might enjoy our podcast with Nate Soares (the President of the Machine Intelligence Research Institute and the co-author of the book If Anyone Builds It, Everyone Dies) titled “Will AI superintelligence kill us all?”.

So, is AGI just around the corner? Or are there still hard problems in front of us that will take decades to crack, despite the speed of recent progress? This is a subject of lively debate. Ask various groups when they think AGI will arrive and you’ll get very different answers, ranging from just a couple of years to more than two decades.

Why is this? We’ve tried to pin down some core disagreements.

The case for short timelines

Many of the people closest to frontier AI say they are expecting AGI to arrive before 2030.

Dario Amodei, CEO of Anthropic, reports being “confident” that very powerful capabilities will be achieved within 2-3 years. Sam Altman, CEO of OpenAI, has claimed that his company “knows how to build AGI”, and says he thinks we may reach an even grander goal of “superintelligence” within “thousands of days”. And Demis Hassabis, CEO of Google DeepMind, forecasts a slightly more conservative (but still near-term) 3-5 years until AGI.

Two high-profile scenarios, Situational Awareness and AI-2027, written by forecasters and former AGI company employees, make similar projections.

Here are some reasons to think that AGI might be just a few years away:

#1 Benchmarks keep saturating

The easiest argument for a short-timelines advocate to make is that – at least in the capabilities that we know how to measure – it really doesn’t look like we have far to go.

Here’s a chart showing how quickly AI capabilities have improved in on various benchmarks in just the last two years:

In certain closed-ended academic benchmarks, AIs are closing in on human-expert level. Flagship models from OpenAI, DeepMind at Anthropic all score over 82% in MMLU, which contains multiple-choice questions on a range of disciplines from mathematics to international law, and over 75% on GPQA, which contains graduate-level questions in STEM fields.

Benchmarks are saturating so quickly that researchers are scrambling to create new ones which will continue to challenge state-of-the-art models. For example, Humanity’s Last Exam (HLE) contains 2,500 questions contributed by over 1,000 subject-matter experts, designed to be at the frontier of human knowledge in most fields – and LLMs are already starting to make progress. OpenAI’s o3 scored 20% on HLE, compared to an 8% score from its predecessor, o1, which was released just months earlier. Its creators think it is “plausible” that an AI will achieve more than 50% on HLE by the end of 2025.

Importantly, AI models aren’t “just memorising” the answers to questions on benchmarks like the GPQA and HLE – researchers maintain private test sets to make sure that solutions don’t find their way into models’ training data.

If progress continues at the pace of the last few years, it won’t be long before we struggle to come up with close-ended questions that AI models can’t answer. At this point, all that will be missing are the properties needed to put all that raw intelligence to use, such as agency and long-term planning. Conveniently for short-timelines advocates, we have evidence of progress on these metrics too – which brings us on to our next point.

#2 AIs are able to complete longer and longer tasks

Impressive benchmark scores don’t translate neatly into real-world impact. One reason is that AI models cannot currently complete tasks over long time horizons. Even if they can solve any one step more reliably than most humans, they can’t autonomously carry out tasks that would take a person days or weeks.

But that could change soon. A recent study by METR, an organisation that develops and runs evaluations of frontier AI models, found that the length of tasks they can successfully complete is doubling every seven months:

If this trend continues, AIs will be able to carry out month-long projects by the end of the decade. It could even accelerate. For 2024-25 specifically, the doubling time was 4 months, which would predict AIs tackling month-long tasks by 2027.

#3 AI research might be the only capability we need to automate to achieve AGI

The goal of frontier AI companies is to develop AI systems that can automate every economically valuable task. One such task is AI research itself. If we can automate – or significantly accelerate – the process of building better AI, then any remaining hurdles to AGI could be overcome soon thereafter.

If an AI company internally develops an AI system that outcompetes its top engineers at the task of advancing the AI frontier, it would face tremendous incentive to automate a significant fraction of its own research. Automated AI researchers could work day and night without breaks, and even self-replicate to develop what Geoffrey Hinton (one of the pioneers behind the deep learning paradigm that kicked off today’s acceleration in AI capabilities) calls a “digital hive mind”.

It takes much more computing power to train a new AI model than to run one. One AI researcher estimates that we’ll be able to run millions of automated researchers in parallel by 2027, compressing years’ worth of progress into just a few days.

This could trigger what people sometimes refer to as an intelligence explosion – a recursive feedback loop where increasingly powerful AIs build their own successors. An intelligence explosion could quickly result in AI systems that are vastly more capable than humans.

AIs are already getting better at the skills needed to automate AI research. Last year, METR tested frontier models including Anthropic’s Claude Sonnet 3.5 and OpenAI’s o1-preview against over 50 human experts. The results showed that models are already outcompeting humans at AI R&D tasks over 2-hour time horizons.

#4 We could train much bigger models before 2030

Today’s most powerful AI systems are trained using three inputs – data (largely internet text), algorithms (instructions for learning from this data) and compute (the cutting-edge chips used to power the entire process).

Over the last few years, we’ve observed that the amount of compute used to train models generally correlates with how capable they are. Some have even concluded that compute is the most important ingredient in the AI development process – it’s less about clever algorithms or flashy architectures than it is just adding more and more hardware.

Epoch estimates that by 2030, it will be possible to train AI models using 10,000 times more compute than was used for OpenAI’s GPT-4. That’s around the same leap as we saw between GPT-2, which could barely produce a coherent paragraph without straying off topic, and GPT-4, which can engage in complex reasoning, generate code, pass standardized exams, and hold detailed, context-rich conversations.

If (as we’ve been arguing so far) there isn’t much further to go before we either hit AGI or automate AI research, it’s hard to imagine that another 10,000x scale-up won’t get us there.

#5 People’s timelines keep getting shorter

Expert opinion is an important datapoint in the debate over AGI timelines, but it’s a fuzzy one. For one, there’s an extremely wide spectrum of opinion – which is the whole point of this article! For two, it’s not clear what qualifies someone to forecast the arrival of AGI. Many different types of expertise could provide insight, from experience building frontier models to an impressive forecasting record.

If we can’t pinpoint a single authority whose predictions we should trust, it’s tricky to know how expert opinion should inform our forecasts. But one thing we can do is look at how predictions have shifted over time. Do they appear to be converging on a particular time period?

Taking this perspective lends some credence to the short timeline argument. In a 2023 survey of machine learning researchers, run by AI Impacts, participants thought AGI would arrive by 2047 – what would qualify by today’s standards as “long timelines”. But in the 2022 survey, this year was 2060. On the the forecasting platform Metaculus, the predicted date for AGI’s development has dropped by over two decades since 2022:

The chorus of shortening timelines is loud. Take Geoffry Hinton and Yoshua Bengio, two winners of the 2018 Turing Award for Deep Learning, and nicknamed “Godfathers of AI”. Both shortened their timelines from many decades to as little as five years after the release of ChatGPT (a watershed moment for many observers of AI progress). In a recent Substack post, former OpenAI board member Helen Toner reflects on this timeline-shrinking epidemic, and cites several examples of once-sceptics who now believe AGI could arrive within 10 years.

While there’s certainly a limit to how much we can learn from predictions (especially given the unprecedented nature of AGI), it is certainly worth noting that it’s far easier to find examples of shortening timelines than it is lengthening ones.

People who believe AI capabilities are moving at an incredible pace sometimes see those arguing the other side as constantly moving the goal posts, as depicted in this meme (by @andersonbcdefg):

The case for long timelines

Despite plenty of excitement (and alarm) about near-term AGI, not everyone is convinced. Although many industry insiders are confident in short timelines, surveys from broader sets of experts still elicit longer ones, despite the downward trend mentioned above!

Skeptics point out that LLMs still make silly errors, emphasise barriers to using AIs for real-world tasks, and doubt the plausibility of an intelligence explosion.

Here are some reasons to think AGI could still be decades away:

#1 Benchmarks only tell us about capabilities that are easy to measure

Benchmarks are best for clearly-defined tasks, where success is easily verified. AI models are particularly good at these kinds of tasks. This is especially true for more recent reasoning models such as OpenAI’s o1, which rely heavily on reinforcement learning (RL).

During RL, a model learns to maximise reward through trial and error. This has produced AI systems that are superhuman in narrow domains, like DeepMind’s AlphaGo and AlphaFold– and researchers have recently found that it works better than expected on general-purpose LLMs too. It’s much easier for an AI to learn via RL when what constitutes “good” vs “bad” performance is indisputable. If we give an AI a hill to climb, it will. This is one explanation for why reasoning models have exhibited extremely impressive capability improvements in domains like maths and coding.

But AGI-skeptics point out that the real world is not so tidy. Think about the day-to-day experience of doing your own job. You might receive different (or even conflicting) feedback from various people. There’s probably more than one “right” way to deliver an output. You’ll learn what works and what doesn’t by observing the real-world impacts of your work, which could be mixed.

This line of argument formed much of the pushback to the METR study that we mentioned earlier. The study was used to demonstrate that AIs are able to act over longer and longer time horizons – but it exclusively measured performance on software engineering tasks. This is precisely the kind of task we’d expect AIs to be good at! It’s not clear how well this trend will generalise to other economically valuable labour.

Real-world jobs are also not easily divided into discrete, self-contained tasks. Instead, tasks tend to be overlapping, and dependent on lots of context that’s hard to give an AI. For this reason, it’s not so easy to delegate even short-form ones to AI models. LLMs are expert email-crafters, but asking one to follow up with a colleague on an earlier discussion is pretty hard when the AI doesn’t know what was discussed!

#2 Tasks that are easy for humans are hard for AIs – and vice-versa

People have historically expected that manual labour would be automated long before white-collar work. Yet in 2025, we have language models that can solve PhD-level science questions – but not robots that can reliably assemble furniture.

One person who did see this coming was computer scientist Hans Moravec. He observed in 1988 it’s far easier to get a computer to exhibit expert-level performance at a game like checkers than the basic perception and mobility of a one-year-old. This principle became known as Moravec’s Paradox: reasoning requires far less computation than sensorimotor skills.

Moravec hypothesised that skills like grasping objects or navigating around obstacles are harder to replicate because they’ve taken so much more time to emerge through the process of biological evolution. On the other hand, humans acquired abstract reasoning skills relatively recently. The older the skill, the more computational resources it requires to reverse-engineer.

Maybe this phenomenon has given us a warped perception of how capable today’s AI systems actually are. Being able to solve a PhD-level maths problem looks very impressive to us, because most humans can’t. On the other hand, loading a dishwasher seems like a trivially easy task, because most of us can. But the Moravec argument would say that this doesn’t actually tell us much about the absolute difficulty of either task. This implies that we’ve unlocked the easy AI capabilities, and the hardest part could still be in front of us.

#3 An intelligence explosion might not be possible

So far, we’ve been arguing timelines to AGI could be long because there’s much further to go than benchmarks imply.

However, this ignores the intelligence explosion argument that we made earlier. Even if there are many hurdles we need to jump before we reach true AGI, automating AI research might mean we still get there very soon. That AIs are especially good at easily-measured tasks like maths and coding only supports this – these are precisely the skills they’ll need in order to accelerate AI progress.

But whether an intelligence explosion is actually possible is an open question. As we explained earlier, AI is trained using three inputs: data, algorithms and compute. It’s this second ingredient, algorithms, that depend on cognitive labour. This labour is performed by human researchers today, and could be performed by automated researchers in the future.

This means that the likelihood of an intelligence explosion hinges on a key question: how much algorithmic progress could a team of automated researchers make while compute and data remain static? Producing new chips and building datacentres takes time, as does gathering data (or generating synthetic data, which may become essential if we run out of human-written text altogether!).

One study found that historically, the biggest algorithmic advances have been compute-dependent – they required big scale-ups in hardware to develop and validate. If this continues to be true in the future, then an intelligence explosion looks less plausible.

Of course, this is a big if. It’s difficult to predict what lots of super-smart automated AI researchers running in parallel could achieve. They might be able to find lots of clever ways to run experiments with fixed supplies of compute, such as using scaffolding techniques to squeeze more capability out of existing models. Skeptics acknowledge that an intelligence explosion is a live possibility, but point out that the possibility is speculative.

#4 Raw intelligence might not be the main driver of discovery

In the short timeline scenario, we develop what Dario Amodei calls a “country of geniuses in a datacenter” that go on to transform the world by doing a lot of research and development (R&D). For example, these AI geniuses could be directed to develop new medicines, find sustainable energy solutions, or develop coordination mechanisms that help us avoid future conflict (or, in the bad scenario, AI-enhanced research is turned against us).

Whether things will really play out this way depends on what actually drives real-world discovery. Skeptics will point out that there’s much more to R&D than just very smart people thinking very hard. One piece of evidence for this is the phenomenon of simultaneous discovery – multiple people often have the same insight independent of each other, at roughly the same time. For example, Isaac Newton and Gottfried Leibniz both discovered calculus in the 1770s and Charles Darwin and Alfred Russel Wallace both described natural selection in 1839.

There are many theories for why simultaneous discovery happens. One is that culture moves faster than scientific discovery, creating a lag which people will then seek technical solutions or explanations for. This runs counter to the idea that the raw thinking power from a “country of geniuses in a datacenter” could actually cause a massive acceleration in R&D.

People have also pointed out that the process of R&D requires a far broader set of skills than just abstract reasoning (the bread and butter of reasoning models). Human researchers direct research teams, collect evidence from experiments, and so on. Maybe this means that scientific R&D isn’t any easier to automate than anything else. This would mean we shouldn’t expect a rapid, AI-driven transformation until after a broad range of more routine jobs across the economy have been automated.

So… who’s right?

Lots of smart and qualified people have spent a long time thinking about when AGI will arrive, and come to very different conclusions. Unfortunately, there isn’t conclusive evidence either way!

We’ll know soon enough. Many believe that AGI will either happen before 2030, or take much longer. This is because we probably can’t sustain our current rate of scaling past this point. We could build compute clusters that cost $1 trillion in five years’ time. This is probably close to the limit of what the US economy could sustain. This suggests that if we get to 2030 with no AGI, the yearly probability starts to decrease.

There are all sorts of reasons to think that the current rate of progress could fizzle out, but we can’t be confident in them. There are enough arguments for near-term AGI to warrant taking the possibility extremely seriously – and this is a scenario that the world is drastically underprepared for. We don’t know how to ensure AI systems reliably follow our instructions. We don’t understand how they work. We don’t know how we’ll avoid the worst outcomes from developing AGI, up to and including human extinction.

AI is getting more powerful – and fast. We still don’t know how we’ll ensure that very advanced models are safe.

BlueDot Impact designed a free, 2-hour course to help you understand how AI could transform the world in just a few years. Start learning today to join the conversation.