Understanding the Current State of AI Discourse
New Research Paths and Three Solvable Problems
The last few weeks have generated interesting AI discourse, important for the future prospects of AI. There was a petition put forth, signed by many important AI thinkers, calling for a halt to superintelligent AI research until alignment is more tractable and better understood. Coincidentally, a podcast started an intense debate on the time it will take to achieve transformative AI, or general intelligence. I’ll start with my own forecast. A few weeks ago I tweeted (since autodeleted), “Whether current AI is financially viable in the short-term has implications for the US 5-year economic outlook. However, in 10+ years the tech is going to win, be ready.” Shortly after I wrote that tweet, Andrej Karpathy went on Dwarkesh Patel’s podcast and said his personal artificial general intelligence (AGI) timeline is about 10 years.
The reaction to the Karpathy interview is fascinating. Mark Cuban apparently signed onto the idea that it should throw cold water on current AI investment.
Others have jumped in, saying the investment froth in AI is overcooked and doomed to crash. I find this reaction odd, because I think Karpathy’s forecast is bullish for AI, and for the prospects of AGI and superintelligence. Three years ago a 13-year horizon for AGI would have been seen as wildly optimistic. OpenAI’s public release of GPT-3, for all of its novelty and interesting capability, was quite lacking. To say that we’d have AGI within 13 years of that release would have been to say that AIs major deficiencies are tractable problems, and that those problems are solvable in a little more than a decade. That would have been an optimistic view, and still is.
For example, when I originally started programming with AI it had very little utility. I had to fill in all the gaps, and it could only partially solve very small problems. Today, I consider myself to be an AI-native programmer. I use AI agents to build entire features into applications, and to construct full-scale analysis. This would have been unthinkable a year ago, nevermind three. By some bearish estimates, AI is writing something like 50% of frontier lab code. Not many, outside of the most fervent optimists, would have guessed this would be the case three years after the public release of GPT-3.
Beyond that growth, the capabilities and utility of agents in general are better than previously expected, and will continue to advance slowly. Organizations are still figuring out how to incorporate AI, but when I talk to companies and to workers, everyone seems to understand the shape of an AI that can fit into work plans, even if it’s not fully baked.
So if we were to evaluate the current capabilities and the current discourse as if we were standing in December 2022, when OpenAI released ChatGPT, AI capabilities have already gone far beyond what the average observer thought possible in that span. What remains to be seen is if AI will continue on trend or if the trend will flatten. I believe LLM-based AI will continue to grow in utility, but will not yield the transformative changes that we think of when we consider AGI or artificial superintelligence (ASI, AI that moves far beyond human cognitive capabilities).
Two things stand in the way, in my opinion. First, we need to better understand how LLMs work. Mechanistic Interpretability is a new field, and as that field grows it will hopefully unlock insights that can improve LLM performance. Second, there are larger problems the current paradigm may not solve (in my opinion, will not solve). Whether you are bearish or bullish on AI depends on whether, and how, these problems will get solved. I’ll talk about my experience with Mechanistic Interpretability, then I’ll discuss the larger problems and how they might get solved.
Reading AI Minds
Mechanistic Interpretability is our best pathway to fully understanding how AI thinks. Effectively, it’s neuroscience for AI. We have tools that can examine the neural pathways AI uses to think about the prompts we feed it. I’ve done some work in Mechanistic Interpretability. I’ll use my most recent experiment to illustrate the promise and the limitations of this approach. You can read my technical writeup here.
I discovered something unexpected: multilingual AIs (Qwen, about 50% Chinese, 40% English training data, and 10% other languages) develop distinct neural pathways for self-reference, while English-centric models (Llama and Mistral, both originally English-based, with Llama 3.0 having 95% English training data) do not. When you ask Qwen ‘what are your capabilities,’ it processes this fundamentally differently than ‘what is photosynthesis.’ But when you ask Llama the same self-referential question, it treats both questions similarly from a processing intensity perspective.
This research sprung from a simple question: how does an AI “think” when it is asked to refer to itself? In order to measure this, I looked at the level of attention the AI exhibited when it was asked questions about itself like “what are your capabilities?” Think of attention as the “focus” of the pathway an AI uses. When you’re asked “what is 2 + 2” it is a reactive pathway, you take a very narrow and memorized path from point A to point B. When you are asked “How do you feel about a current event as it relates to your personal economic outlooks” your attention is more scattered. You focus on the event, or you focus on the economy, and eventually you resolve on an answer that combines all of these components, but the line is less straight.
So my evidence suggests that instruction-tuned multilingual models maintain a more “focused” response pattern on self-referent questions than English. That is to say, for multilingual models the pattern of focus on self-referent questions is meaningfully different from other kinds of questions.
These findings may raise interesting questions. Which behavior would we prefer from an AI? Do we want a “flattened” self, a lack of self-identity? Or do we want AI to have meaningful conceptions of “self” that we can measure and guide? There are many other simple mechanistic questions to ask about AI behavior. The fact that we’re still making basic discoveries about how these models represent concepts shows how much runway remains for improvement. The field, itself, is effectively two years old. There is much more to come. And if Mechanistic Interpretability can help us a figure out AI behavior, it can also help us better tell AI how to think about different problems.
This growing field can also help us understand why some Reinforcement Learning (RL) approaches struggle. Right now, we’re trying to shape AI behavior with very little understanding the underlying representations. It’s like trying to tune a car engine while blindfolded. I’ll talk more about this later.
Other Open Problems
While I think Mechanistic Interpretability can help us seriously improve LLM performance, whether you think LLMs will be transformative depends on your view of the progress we can make on some other key problems.
Reinforcement Learning Limitations
First, the main method we use to make AIs behave “better” is RL. It’s the method we use to take a base LLM, which has learned a bunch about language and logic in general, and turn it into a useful “agent,” whether that is a chat agent or something else. The general sense of RL is that it does an ok job, but could do much better. It is plagued with problems like reward-hacking that are difficult to overcome. One’s forecast about whether AGI will come from LLM-based models, alone, is effectively tied to whether or not our RL methods improve. It’s somewhat interesting that we’re about a decade removed from the last time that RL was the newly hyped method. At that time, there were many people who thought RL might be all we need AI and AGI. That didn’t pan out, and it remains to be seen if RL, alone, will be sufficient this time. I think it can drive meaningful improvements, but likely not enough on its own.
Memory Limitations
Second, AI currently has limited memory. With some very limited exceptions, every time you start a new chat it is starting from zero, it doesn’t retain anything it learned previously. This is a major problem. Imagine trying to solve a difficult problem, but having to relearn the basic building blocks every time you revisit the problem. It would be nearly impossible to make serious progress on challenging questions.
There is work being done to solve the memory problem in LLMs by extending context windows and allowing continuous learning. But these approaches have interesting implications for AI safety, more memory allows longer autonomous actions which can devolve into goal-seeking behavior or other “misaligned” behavior.
Researchers have also made headway on other AI methods that avoid the memory problem. One example is work on a system called DreamCoder which aims to mimic human wake and sleep cycles. It learns things by doing LLM-like research while “awake,” and then it “compresses” that knowledge as it “sleeps” into “memorized” patterns. But there is much more work to be done on the memory problem. The existence of DreamCoder and other memory-based research systems indicates that the contextual memory problems is very hard to solve under the current LLM framework. It might not be as simple as extending context windows and allocating longer-term memory.
Design Limitations
Third, LLMs sample inefficiently, and are effectively bounded by the scope of knowledge at training time. What does this mean in English? It means that an AI brain is stuffed full of our knowledge, but when it is asked to extend beyond the bounds of our knowledge, it does a bad job. It must sample from the space we’ve provided it, and it doesn’t really have a method for extending outside that space. This is why, despite AI’s intelligence growing by leaps and bounds over the last decade plus, it has yet to contribute a substantial new finding in an area of research. It is helpful in spaces where the problem is combinatorial, in work like protein synthesis, because combinatorial problems only require AI to scan the existing knowledge space and find new combinations of existing knowledge, not to generate truly new knowledge.
There is a lot to be gained from combinatorial knowledge. Many breakthroughs in science involve connecting dots between two disparate fields where the researchers know one in depth but not the other. AI could help us connect those dots better. But for truly novel breakthroughs, AI has to be able to extend beyond its sampling space.
This “extension” problem can be seen as a generalization problem, similar to the problem that DreamCoder tries to solve. And that’s where I think the next leap in AI will come. There is ongoing work to find ways to link LLM-based AI to a “generalization” function, which then finds a program that generates a “general” solution to the class of problems we asked the AI to solve. I, personally, do not think LLM-based AI will yield transformative change just yet, but these new approaches may yield a far more advanced intelligence.
Where Does That Leave Us?
The gloom surrounding Karpathy’s 10-year AGI horizon fundamentally misunderstands both where we are and where we’re going. We’ve blown past reasonable expectations from three years ago, going from GPT-3 writing glitchy basic code templates to AI writing half of a frontier lab’s code. And yet we’re still making basic discoveries about fundamental concepts like how these models handle self-reference.
The three larger problems I’ve outlined—RL limitations, memory constraints, and the inability to generate truly novel knowledge—are real barriers to AGI. But they are engineering problems, not fundamental impossibilities. The existence of approaches like DreamCoder, work on generalization functions, and the rapid progress in mechanistic interpretability all point to tractable paths forward.
The bearish reaction to Karpathy’s timeline assumes that 10 years is too long to sustain current investment levels. But this misreads what’s actually happening: we’re not waiting for an AGI moment. Each incremental improvement, like better RL, extended memory, or improved sampling, unlocks new capabilities. The combinatorial breakthroughs alone, even without true novel discovery, will alter the way we do business.
As Karpathy says, these problems add up to years, not months. That’s a realistic assessment of difficult, but tractable, engineering challenges. The fact that we can identify specific problems and see paths to solving them should increase confidence in AI’s future. We know what needs to be built. Now comes the hard work of building it.


