The Missing Variable in AI
The case for treating human interiority as a solvable engineering problem
I absolutely loved the Dwarkesh conversation with Ilya Sutskever. The whole discussion is brilliant and absolutely worth listening to. I was struck by a particular segment. At around the 9:30 mark they move into a discussion about emotion and value functions. Often, these ideas are brushed aside by serious AI researchers as fluffy concepts, but deeply understanding emotion’s impact on human reasoning might be critical for unlocking more advanced AI. There were three elements of the discussion I want to highlight:
Dwarkesh essentially says that emotions are easy to understand and raises that it’s interesting that they are difficult to encode.
Ilya and Dwarkesh both indicate that emotions might operate like value functions, or modifiers of a value function.
Ilya notes, citing an empirical case, that emotions are critical for reasoning.
I disagree in part and agree in part, and my disagreement is mostly focused on the idea that emotions are simple in both function and modelability. But I agree with and want to extend the idea that understanding the role of emotions is critical. I believe that emotions (and all internal states) are high dimensional and complex elements of human interiority that are essential for reasoning, and this essential element is not only missing in current AI architecture, but also can’t be approximated without some changes in approach.
I’ll outline both the intuition and market evidence for my framing, and I’ll provide a functional representation of the relationships I’m describing. Then, I’ll discuss how this relates to the “jagged frontier” of AI, and how we might solve the problem.
What Art Can Tell Us About the Dimensionality of Emotion
I don’t think emotions or emotional states are easy to understand, but I think we can be fooled into believing they are because our high-level heuristics for emotions work fairly well human-to-human. If I tell you “I’m ashamed” you can do a reasonably good job of inferring how I feel, even if it is an incomplete and contextless inference. The ease of our human understanding creates the illusion that emotions are simple to understand. That is despite the fact that many of our human misalignments come about when our quick fire “understanding” misfires badly.
For suggestive proof that emotions are complex, high-dimensional, and difficult to understand, I’ll turn to markets, and specifically the market for art. People feel a need to communicate complex emotional states, so they seek representations that communicate more effectively.
So much of the artistic market is built on developing representations of emotions, or representations that impose an emotional state on the consumer. Artistic products describe a relatively small set of states like ‘joy’ or ‘embarrassment’. And if simple labels were sufficient, there would be no need for the wealth of diverse expressions of those states.
More specifically, if the label ‘grief’ captured the feeling completely, or even nearly completely, then there would be no need for books on the subject. They would be unnecessary redundancies.
The very breadth and scale of our emotional experience drives the demand for these markets. And we have a very large and longstanding market for emotive art, from which consumers tend to select art that best aligns with their internal representation of a given emotion. This implied complexity suggests we need a better model for how emotional states, and all internal states, get turned into representations. And we need to understand how those representations are decoded.
A General Formal Model of Communication
Our model is inspired by information theory, but is primarily designed to structure our thinking about communication as it relates to AI. Let’s consider a human internal state, it can be an emotion but it can be any other internal, felt state of being, and let’s call that I. Now let’s say we want to communicate that state to someone else. We use an encoder function ƒ that converts I into some representation R. A representation can be written text, video, speech, art, or any other communicative artifact.
R is a lossy, discrete, and low-dimension transformation of I. Think of it like asking someone to describe a symphony in one sentence only, or to summarize a full book in one paragraph. There is information loss in the compression.
The person receiving a communication uses some decoder function 𝑔 to process R using the receiver’s internal state I, and which infers the meaning of R and the necessary parts of the internal state of the sender, resulting in a decoded meaning I’.
As Ilya suggests, emotions, and other internal states are critical to this decoder function. The key insight is that in order to effectively decode R humans rely on their own internal state, I. From my earlier example, your internal state is necessary for you to reasonably and reliably infer meaning from “I’m ashamed.” There are other important properties of these relationships. For example, the encoder function is non-invertible. Our representations, R, are far too sparse, because of compression, to generate sufficient information about I, so we can’t get I from R alone. That property creates two problems, (1) AI must rely on estimating 𝑔 to get any meaning out of R and thus (2) AI has to approximate I without the tools to do so.
When I read “I think, therefore I am,” I can’t fully recover Descartes’ internal state—but I can draw on my own experience of existential dread, my own moments of radical doubt, to partially reconstruct what might have motivated the claim. I’m using my I to approximate his. An AI reading the same sentence has no such resource.
This should prompt us to change our formulation of some open questions. We need to stop thinking about how much information survives representational transformation, and think more about what kind of information survives. We should reconsider what can be inferred from representations, alone. And then we should start developing models to fill in gaps.
The current information pipeline for AI is not only compressed, but it is structurally incomplete. AI lacks the requisite keys, human internal states, to unlock complete information from representations alone. It is trying to execute function 𝑔 without human I.
Effectively, this means that current AIs, under current architectures and specifications, can only operate as very rich classifiers; they can’t become true decoders. They are beyond exceptional classifiers, but I suspect this will remain a missing piece of the puzzle.
We can use our framework to help us think about where current AI will continue advancing rapidly, and where it might slow or stall.
The Interiority-Dependence of Different Knowledge Domains
AI currently does well on problems in domains where the encoder function is nearly invertible. In fields like math, logic, physics, or other propositional fields, humans have made great efforts to ensure that the encoder functions are nearly lossless. The symbolic representations preserve most of the information. Furthermore, in these fields the decoder function is stable across observers. The communication of math need not be tailored to a specific audience. The teaching of math often requires tailoring, but the representation requires no specific modifications. We can expect AI to continue on a path to superhuman performance in these fields.
But AI might continue to struggle in domains with high social reasoning, and high degrees of value judgments. It might also struggle where social reasoning and values intersect with fields like math and physics. In these cases, representational compression drives massive intent and meaning loss, and the decoder function becomes incredibly important.
ƒ and 𝑔 are Not Simply Functions; They are Elements of a Distribution
We can go a step further in our formalization of the communication pipeline. In our simple example, ƒ and 𝑔 are specific functions. When we consider the scope of human communication, it becomes clear that communication isn’t accomplished by one specific function every time, but rather functional distributions. The function ƒ is an element of a distribution of communication functions F that are unique to each individual communicator. Similarly a decoder 𝑔 is an element of a family of decoders G, also unique to each human. Humans rely on internal states, to determine which function within F and G to select.
These functions vary in ways that are intuitive to humans, but are difficult to encode. To name a few examples, communication functions depend on emotion, motivation, culture, context, social incentives, self-deception, and strategic aims.
We can better understand why humans fail to infer the internal states of other humans under these constraints. Consider motivations. We expend substantial time and effort trying to understand the motivations of other humans through their representations. This is most evident in politics, where once again I will rely on markets.
If motivation were easily inferred from representations, we would have little need for a cottage industry of experts and “insiders” who act as surrogate decoders. But that is what happens: a politician says something, and a marketplace of decoders generates new sets of representations capturing what the politician “really meant”. All of these new representations are fraught, infected with the same confounders as the original representation. Other humans try to cut through the noise with internal models relying on things like emotions or inferred relationships.
But ultimately, humans can only reliably decode the motivations of other humans when they are engaged in lengthy and deep relationships. A long-married couple can decode each other’s motivations much more reliably than a stranger reading a politician’s speech transcript. Long, amicable relationships reduce the uncertainty in F and G. If you have known someone for many years, you can better understand how they communicate, which is just another way of saying you better understand which functions they typically use to encode their representations to you.
AI lacks this kind of deep history, context, and collaborative understanding process. Its uncertainty in both F and G are unconstrained. And AI’s underlying pretraining data is full of communications with meanings highly dependent on context like motivation and circumstance (particularly the intended audience). How can AI distinguish between a critique of a religious text motivated by non-belief and one motivated by a genuine motivation to improve the religion? Even if those pointers exist in the data, they can be poorly formed, because they are muddled by the same hidden motivational layer. Our understanding of encoding and decoding functions as a family of internal, purpose-built functions helps us better understand the “jagged frontier” of AI performance.
AI’s Jagged Frontier Reinterpreted
In order to think differently about the jagged frontier, we should consider two things. First, what makes the frontier expand generally. Second, how do current processes contribute to that expansion. The straightforward interpretation of our discussion, and our model, indicates that scaling alone will continue to produce new capabilities in fields like math, physics, and coding where I is less relevant because the representation space is composed, explicitly, to remove I influences in human interaction.
Reinforcement learning (RL) attempts to close the gap in other domains, but this is where I think the current consensus about frontier expansion falls short. In our communications model, RL can’t plausibly approximate all of F and G, but it might help approximate a small subset of those functions.
Reinforcement learning takes a current capability frontier, ties a lasso to one point on the line and yanks a narrow capability to a new space compatible with some level of human performance. This creates new jaggedness not overall expansion. I think of RL as similar to a Taylor Series approximation of a mathematical function. It does a good job of approximating human internal states in a local space, but it does not generalize well.
By modifying AI behavior, RL adjusts a local reasoning space, but the broader reasoning logic remains mostly unaltered. RL feedback is also representational, and subject to the same information loss as other representations. Further, when heavy-handed RL creates new jaggedness it risks warping the general reasoning manifold near these focal domain spaces. That interpretation of RL could explain phenomena like sycophancy.
This implies that RL improvements generate new spikes on the jagged frontier without expanding the overall frontier. In order to more broadly expand, we’ll need a smooth expansion function across the entire space. And that means we might have to model human internal states.
This is not pessimism about current architectures, which will still be immensely powerful and drive advancement in very important fields, it is simply an observation that something more is required to expand the overall reasoning manifold.
To move past this constraint, I’ll offer some approaches that might point us in the right direction, though none are complete solutions.
Paths Toward a Partial Recovery of I
I find that most that notice the human interiority problem tend to assume the problem is insurmountable. I don’t think that’s the case, but I do think we need to explicitly think about an “internal state” modeling problem. That means we can’t treat “internal state” inference as something that might fall out of some other process.
Ilya and Dwarkesh both consistently name ‘continuous learning’ as a necessary step for AI to become AGI or ASI. I think they are likely correct, but I think the reason why is important. I see continuous learning as necessary because it enables the AI to construct a more robust state history. It moves the AI from an effectively stateless classifier (yes, with some reasoning-like behaviors) to a path-dependent reasoner. It comes from a sequencing pattern: AI encounters information, consumes information, compresses that information into some internal state, and then recalls it later on. This process may, at least, sufficiently mimic a similar learning process in humans that can encode states that improve reasoning even if they aren’t, directly, emotions or other feelings.
I also think there may be some benefit to looking at biometric data. We need to be smart about it, and we need to understand how limited this is. But by collecting simple signals alongside text, we can at least provide the AI with a vector. It is a directional signal that might help differentiate between broadly different internal state ranges and subspaces. We could operationalize this by collecting biometric information from people that are consuming various texts and media, so we can measure some proxies for internal reactions. Then we can model that against the sample texts, extrapolate to similar texts, and encode an “internal state space” proxy. Biometrics would still be low dimensional, there are real limits to this, but it’s possible that reducing the search space of internal states is a sufficient solution.
It’s also possible that RL could be improved to incorporate state-based information from human data. Humans could not only provide feedback on representations, but they might provide information on their emotional state, as well. It could be as simple as them saying “this is a preferred response” and also “this is how I felt when I wrote/read that response”.
All three of these approaches could play a role in sufficiently approximating the internal states that are relevant to human reasoning. It’s possible we don’t have far to go down this road. A weak, directional approximation might be sufficient to drive massive AI improvements. It could be enough to go from AI with spiky usefulness to near AGI or ASI.
Interiority as an Engineering Problem
Engineering minds should take the internal state problem seriously as it applies to AI. This explicitly does not require that AI researchers solve the hard problem of consciousness, or confront mind-body dualism, or make discernments about whether we, or AI, has a soul. AI researchers only need to operationalize the idea that there is an internal nature of being, and solve for an encoding that can map some proxy for that into AI. That requires:
Recognizing that AI is always reading communications from humans that are intended for humans and have many assumptions baked into them based on that fact.
Recognizing that so much of human reasoning is implicit, biological, and grounded in evolution.
Defining the functional relationships where AI is inserting into this communication process, and solving for the missing variables.
The gap between current AI and general intelligence may not be compute or data or architecture. It may be that we’ve built extraordinarily powerful systems for manipulating representations while excluding important data about what those representations were designed to communicate. It’s a statistical bias correction, in the end, a solvable problem.

