a compression of a compression

I get a thought in my head, and it feels whole, every part of it there at once. Then I try to say it out loud, or write it down, and what comes out is thinner. Some of it doesn’t make the trip. The easy move is to call that a vocabulary problem, or to decide I’m just bad at explaining. It’s neither. It’s the medium doing what the medium does, and once you see the mechanism inside that gap, it tells you something exact about what language is and what a language model actually learns from.

I argued in the brain drives the tool that the LLM recombines and doesn’t originate, and that the work stays yours. This is the floor under that claim: why the originating can’t come from the LLM, put in terms of information instead of opinion.

What won’t fit in a line

Start with what language is, because the common answer is too small. Language is a representational system: it holds things, the relations between them, and the operations you can run on those relations. We notice it mostly when we use it to move something from one head into another. That move is communication, and it’s probably where language came from, social animals that we are. But the representing came to outweigh the transmitting. It’s the medium we think in.

Thinking in that medium doesn’t run in a line. It runs in parallel: several threads at once, looping back on themselves, jumping ahead, holding a dozen things in relation simultaneously. When an idea is fully present to you, before you’ve said a word of it, it’s there all at once, as a structure, not a sequence. The chess player who feels a position is dangerous before naming a single threat is having a thought with a shape, and the shape is not a list.

Communication is the other shape entirely. The instant you want that structure out of your head and into someone else’s, exactly one move is available: speech, or its frozen form, writing. Both are strictly serial. One unit after another, locked to the order of time. You can’t say two words at once. You can’t write the last line and the first line in the same instant. Whatever you want to get across has to come out single file.

So getting a thought across forces a graph into a line. That collapse is the compression, and it’s worth being precise about the kind, because every encoding loses something and saying so explains nothing. The loss that does the work here is topological. What gets destroyed is the simultaneity and the structure, the all-at-once relations between the parts. You keep the words. You lose the shape.

linearize /ˈlɪn.i.ə.raɪz/ verb

To arrange in a single sequence. To take structure related in many directions at once and emit it one element after another, in an order the structure never required.

Psycholinguistics has been measuring this for forty years, so it isn’t a metaphor I’m reaching for. In the standard model of how speech gets produced, the mind first assembles a “preverbal message,” and then has to linearize it: impose a sequence on content that wasn’t sequential, and simplify the conceptual structure down to something the channel can carry.¹ The flattening is a measured step in the machinery, not a poetic complaint about words.

The part that can’t go single file is the part that doesn’t make it onto the page, and that holds whether what was in your head was a finished structure or only a gestalt you hadn’t unfolded yet. By the time it’s a sentence, it’s a projection of something with a shape it couldn’t keep.

A compression of a compression

Follow that one step further and it lands on the machines. Every sentence ever written arrived pre-flattened. The collapse happened inside the writer, before the ink hit the page, before any token existed. What’s on the page is the shadow the thought cast on its way through the serial channel. The thought itself never left the skull.

Now train an LLM on that page. It learns the statistical structure of the shadows, billions of them, in every configuration people produce, until it can generate new ones you can’t tell from the originals. The thing it’s modeling is already flattened, though. It’s learning a compression, so what it builds is a compression of a compression.

The part that matters, and the reason scale doesn’t save you: the parallel structure, the graph, was not faintly preserved in the text waiting for a big enough LLM to recover it. It was destroyed at the source. It was gone before the data existed, deleted in a human mind that emitted a line and kept no record of the graph. Training operates on the second compression. It cannot reach back through the first, because the first happened somewhere no LLM has ever had access to, inside a person, in the moment before they spoke.

This claim doesn’t lean on any of the contested stuff. I don’t have to settle whether the LLM understands anything, or whether it’s conscious, or what “meaning” really is. It’s a question about where information went, and the answer is that it went nowhere. It was never recorded.

The LLM isn’t a parrot

A weaker version of this argument stops right there and concludes the LLM is a hollow mimic shuffling shadows with nothing going on inside. That conclusion is wrong, and walking past the evidence to protect a tidy story would make this essay an instance of the exact failure it’s describing. So the other side, stated as plainly as I can.

The LLM is not serial on the inside. Its attention mechanism is all-to-all: every token in the context relates to every other one at the same time. Structurally, it’s a graph, not a line.

And it plans. When Anthropic traced Claude writing a rhyming couplet, the LLM had already settled on the word it was rhyming toward before it wrote the line that leads there, and that end-word visibly shaped the words in between.² The ending existed before the beginning was written. Probe an LLM’s internal state at the prompt, before it has emitted a single token, and you can read off global properties of the whole answer it is about to give: roughly how long it will run, how many reasoning steps it will take, what the final answer is, how confident it is.³ The shape of the response is present before the response.

So the LLM does rebuild a graph. The parallel structure a human has to throw away in order to get a sentence out, the LLM partly reconstructs inside itself. The clean symmetry I might have wanted, graph on the inside and a line coming out and nothing more to say, is false, and it’s false in a way that favors the LLM. It’s doing more than echoing.

A graph, not the graph

Which sounds like it rescues the LLM. It doesn’t, and the reason it doesn’t is the whole point of this piece.

The LLM rebuilds a graph. It does not rebuild the graph. Serialization is many-to-one: an enormous number of different parallel structures all collapse to the same line. Two people can write the identical sentence and mean something subtly different by it, because the sentence sits downstream of two different graphs that happened to flatten the same way. The line doesn’t carry the difference. The difference is exactly what got deleted.

A many-to-one map can’t be inverted. Given the line, there is no procedure, for any intelligence, that recovers which graph produced it, because the distinguishing information isn’t underspecified in the text, it’s absent from the text. So the LLM does the only thing on offer: it reconstructs the most probable graph consistent with the line, given the ocean of other lines it has read. That reconstruction can be brilliant. It is never the original. It’s a plausible preimage, not the true one, and nothing in the data could promote it to the true one.

This is the precise version of “trained on a compression of thinking, not thinking itself.” Said loosely, that claim falls apart the moment someone leans on it, because the honest answer is that the LLM runs something analogous to thinking, internally, structured and planned. The defensible claim is narrower and it doesn’t move: the LLM’s process is reconstructive, not originating. It rebuilds graphs from the flattened traces of other minds. It does not originate one of its own, coupled to a world, and then flatten that into a line. The direction is reversed, and the reversal is everything.⁴

Don’t hand it the deciding

The place this matters most is the place people are most eager to point it: making decisions. I keep watching smart people paste a hard call into the thing, get back ranked options and a confident recommendation, and treat that as the answer. It reads like judgment, so it ships. It isn’t, and the compression argument is only the opening move here. Two things make deciding the worst job to hand it, and the sturdier of them has nothing to do with compression at all.

The first is information. A real decision runs on inputs that were never verbal to begin with: a gut feeling that something is off, the felt weight of one value against another, calibrated unease, the tacit read on what’s actually at stake for you. That’s a different loss from the flattening. The gut and the felt weight were never language, so they were never going to show up in text at all. And the words you do produce when someone makes you explain the call are mostly a story assembled after the fact, not a readout of the process that ran.⁵ So the text record of a decision is a residue of a residue: the non-verbal drivers gone before the first word, the verbal account a tidied-up reconstruction of what’s left. An LLM trained on that learns the genre of decision write-ups. It can produce a flawless one. The fluency is the trap, because what it has is the shape of a justification, not the mechanism of a choice.

I can’t lean too hard on the picture of a finished decision getting flattened, though, because for decisions it often isn’t finished first. Sometimes the call gets made in the act of arguing it out. That doesn’t rescue the LLM. It just moves the weight onto the part of this that was never about information: the stake.

Nothing is on the line for the LLM. Part of what makes a human call answerable is that whoever makes it eats the result: gets fired, loses the money, carries the regret that sharpens the next one. The LLM eats nothing. It can’t be wrong in a way that costs it anything, so it never had the one signal that calibrates judgment, and that holds whether the deciding is a graph being flattened or a judgment being formed live under pressure. When the call goes bad, you are the one holding it. The LLM has already moved on to the next prompt.

So this is the edge of what a language model is for. It’s genuinely good at the reconstructible work, the drafting and synthesis and breadth a competent person would produce given a clear brief. Deciding isn’t on that list, and no amount of fluent output puts it there. Use it for what it’s good at. Keep the call.

The only graph in the loop

Back to where I started. In the last post I said the LLM recombines and doesn’t originate. This is the why. The originating, the parallel all-at-once structure that thinking actually is, never made it into the training data. It got flattened to a line in every person who ever wrote anything down, and the LLM learned the lines. It learned them well enough to fool you, and me, on a tired afternoon.

When the output looks like thought, what you’re looking at is a faithful reconstruction of the shadow, drawn by something genuinely good at shadows. The one thing in the loop still coupled to a graph is the brain on the other end of it.

The LLM rebuilds a graph. Never the graph.

Levelt’s standard model of speech production (Speaking: From Intention to Articulation, 1989) splits the process into stages. The first assembles a preverbal message; before anything can be said, that message is linearized, sequenced, and its conceptual content simplified to fit what the linguistic channel can carry. ↩
Anthropic, “On the Biology of a Large Language Model” (2025). Tracing Claude 3.5 Haiku through a rhyming couplet, the researchers found the rhyme word represented before the line was generated, with the words leading up to it shaped to land on it. ↩
Dong et al., “Emergent Response Planning in LLMs”, ICML 2025. Probing a model’s hidden state at the prompt recovers attributes of the entire forthcoming response, including its length, number of reasoning steps, final answer, and confidence. ↩
Whether the thought is fully formed before the words or gets shaped in the act of finding them is a real question, and the argument doesn’t need to settle it. Sometimes articulation transcribes a finished thought; sometimes it helps build the structure as it goes. The many-to-one point holds either way: whatever was there, fully formed or half-made, the line it collapsed to can’t be inverted back into it. ↩
Nisbett and Wilson, “Telling More Than We Can Know: Verbal Reports on Mental Processes”, Psychological Review (1977). Their review of the evidence: people have little direct introspective access to the higher-order processes behind their judgments, and the explanations they offer are plausible after-the-fact theories rather than reports of the process that actually ran. ↩