// writing
the brain drives the tool
I’ve used AI every day for about a year now, on real work: data, code, writing, some hiring. That’s long enough to have opinions I’ll defend, so here they are. The tool is real and the gains are real, I’m not going to pretend otherwise. But you’re the one driving, and that turns out to be the whole game.
What AI is
Strip the marketing off and AI is a pattern completion engine. It read an absurd amount of human text and got very, very good at predicting what comes next. That’s the whole trick. The output reads like thinking because the stuff it learned from was thinking, written down. So it produces text shaped like a thought process without ever running one. Sit with that for a second, because almost every mistake I watch people make with this stuff comes back to forgetting it.
The way I think about it: the thing is an amplifier.
amplifier /ˈæm.plɪ.faɪ.ər/ noun
A tool that increases the magnitude of a signal without altering its shape. A bad input produces a louder bad output. The tool does not add fidelity.
Feed it a clear head and you get clearer output back. Feed it a muddled one and you get polished muddle, which is worse than useless, because now your bullshit has footnotes and looks authoritative. Same tool either way. Whatever you bring in is the thing that gets amplified.
It was also trained to produce text people rated as helpful, which is a different thing from text that’s correct. A confident, fluent, nicely structured answer can be flat wrong, and the LLM has no idea which of its answers are the wrong ones. Neither do you, until you go and check.
Left alone, it drifts to the average. Some researchers pushed 30,000 strategic business questions through every major LLM. Centralize or decentralize, differentiate or commoditize, that flavor of question. Same answer every time, regardless of industry or context, and the bias wouldn’t shift no matter how they reworded the prompt. They called it “trends slop”1, which is about right.
What it’s good for
None of that makes the tool useless. I reach for it constantly. It just means you have to know what actually fits it.
What fits is anything close to what a competent person would typically produce for a given input. Standard formats, common patterns, well-trodden ground. The closer your task sits to that fat middle of the distribution, the better the tool does. Push out toward something genuinely novel and it starts to flail.
It’ll compress 200 pages into five before I’ve finished the first chapter. Hand it a pile of prose and it gives back a clean outline, or a table, or a different schema, without dropping much on the floor. When I already know what I want to say and just don’t feel like typing it, the draft lands in seconds. Point it somewhere deliberate and it surfaces adjacent ideas I’d have walked right past. And when I’m precise about role, scope, and format, it stays consistent run to run.
Every one of those has the same shape. I bring the specification, the tool fills it in from the average. The judgment is mine. The typing is its.
What brains do that AI can’t
Then there’s the other column: the stuff a brain does that the LLM structurally can’t, and most of it isn’t getting fixed by a bigger one.
It can’t decide. It’ll hand you five options, ranked, each with a confident little rationale, but the criteria, the weighting, the actual call, that’s all yours. It has no view of your constraints, your risk tolerance, the politics on your team, the thing you promised someone last Tuesday. Deciding is the job, and the job is the one part it can’t touch.
It also doesn’t know what it doesn’t know. A person who’s out of their depth can usually feel the ground going soft underfoot. The LLM writes the same self-assured prose whether it’s standing on rock or making things up wholesale, and nothing in the output tells you which. Polish hides the hole.
There’s no doubt in it, either. No regret, no nagging sense that something’s off, no mechanism where shipping a bad answer today makes tomorrow’s better. That friction between “this feels wrong” and “this is wrong” is most of what good work is made of, and it has none of it.
And you can’t pin anything on it. When the thing you shipped falls over, you own it. Nobody points at the LLM and walks off whistling, because there’s nobody home to take the blame.
Last one, and it’s the big one: it doesn’t originate, it recombines. What looks like a creative leap is interpolation between things already sitting in the training data. Useful, often. But not the same as having an idea nobody’s had yet.
MIT ran every major LLM against roughly 11,000 real job tasks. They cleared “minimally sufficient” about 65% of the time and never once cracked 50% on “superior” work2. So the odds of one of these tools producing genuinely excellent output, left unattended, are worse than a coin flip. Matches my year.
Then there’s Deloitte, who billed the Australian government 290,000 USD for a welfare-compliance report and let an AI fabricate the citations. Invented quotes from a federal judge. Sources that were never written. When it all surfaced, the firm’s official position was that the recommendations still stand3. Obviously they fucking don’t. You don’t get to make up your evidence and keep your conclusions.
The LLM did exactly what LLMs do. It completed the pattern, produced fluent output, executed the request to the letter. The failure was upstream of the LLM, in the chair where a brain was supposed to be driving and wasn’t.
The work is yours
So here’s where I’ve landed after a year of this. A power tool in the hands of someone who knows the work makes the work better and faster. The same tool in the wrong hands just makes worse work faster. The drill doesn’t know who’s gripping it, and the LLM doesn’t either. It amplifies whatever shows up to use it, and the only variable that ends up mattering is whether there’s a brain on the other end doing the part that can’t be handed off.
No language model is a brain. The work is yours.
Footnotes
-
A 30,000-data-point study of strategic AI recommendations across major models found that all clustered to the same advice regardless of industry, context, or prompting. Researchers coined the term “trends slop.” ↩
-
MIT performance review of major AI models on roughly 11,000 real-job tasks. Models hit “minimally sufficient” 65% of the time and never cracked 50% on “superior” work. ↩
-
Deloitte AI-fabricated-citation scandal involving an Australian welfare-compliance report (290,000 USD) and a Canadian healthcare report (1.6 million CAD). Citations, federal-judge quotes, and an author listed on a paper she had never seen were among the fabrications. ↩