The most AI-pilled people in the industry have quietly converged on a humbling idea: agents need a human who cares about them.

Dan Shipper put it plainly on Lenny’s Podcast. His company runs almost entirely on coding agents, and his read after a year of living in that future is not that humans are obsolete. It is the opposite. “In order for an AI agent to be useful right now, it really needs a human who cares about it.” The moment someone stops maintaining the agent, he says, “is the minute the agent is not really that useful anymore.”

He calls automation a lie. “Every time you automate something, in order to make sure the automation is working well you need a human on top of it making sure that it’s working well.” He is simultaneously, in his words, “extremely AI-pilled” and “very bullish on humans.”

I think he is right. But “a human who cares” is doing a lot of work in that sentence, and the two things it bundles together have very different futures.

Caring is two jobs. One is curation: deciding which context the agent should be working from, and keeping that current as the work moves. The other is judgment: deciding whether what the agent produced is correct, coherent, and safe to ship.

These get treated as one role because today the same person does both. That is a temporary accident, not a law. One of these jobs is mechanical and should become infrastructure. The other is the actual job, and it is not going anywhere.

Curation Is the Filing Work

Look closely at what “caring for an agent” means in practice right now, and most of it is filing.

You re-paste the same company background into a new chat. You remind the agent of a decision it made yesterday in a different tool. You explain, again, that the German rollout is opt-in only and blocked on a signed DPA. You watch it confidently follow a plan that changed last week because nobody told it the plan changed. When Shipper describes people abandoning their personal agents because the upkeep is too much, that upkeep is largely this: keeping the agent pointed at the right state.

This is real work, and it genuinely makes the agent more useful. But notice what kind of work it is. It is clerical. It is the cost of the agent not knowing what you already know. It goes stale the instant new information arrives, which is constantly, because work produces decisions, commitments, and blockers every day.

Nobody’s edge comes from being good at this. Re-pasting context is not a skill that compounds. It is a tax you pay because the context does not travel with you from one surface to the next.

So this half should be automated. Not with a bigger prompt or a longer context window, but with a system that fills itself from what you have already captured and lets you correct only the edges. The relevant decisions, the open blocker, the source the claim came from: those should assemble themselves around the job in front of the agent. The human’s role shrinks to the exceptions. Pin the thing the system missed. Drop the thing that does not belong. Confirm the one fact that looks stale.

That is the difference between curating a set and editing a set. Editing the edges of a context the system maintains is sustainable. Hand-assembling the whole context before every session is the thing people quietly give up on. A context that depends on a person remembering to feed it is a filing cabinet, not infrastructure. It will go stale, and a stale context is worse than none, because the agent follows it with full confidence.

Judgment Is the Job

Now take the other half.

Once the agent has the right context and produces something, someone has to decide whether it is any good. Does the release note match what procurement actually cleared? Does this refactor cohere with the rest of the system, or did it paper over the problem? Is this strategy memo right, or just fluent? Should this email go to the investor as written?

This is not clerical. This is the work. And it does not get easier as models improve. It gets more important, because the volume of plausible output goes up. Shipper’s own example is telling: he describes data science teams whose job has shifted from running analyses to reviewing everyone else’s confident, sometimes wrong, AI-generated analyses. The bottleneck moved from production to judgment.

Judgment is also the thing models are structurally behind on. They are trained on what was already known and rewarded for being agreeable. The interesting calls, the ones that involve saying “this is wrong even though it sounds right” or “we should rewrite this from scratch rather than patch it,” are exactly the calls a compliant model avoids. That gap does not close just because the benchmark scores rise.

So the two halves of caring diverge. Curation should fall toward zero human effort. Judgment should rise to fill the time that curation used to eat. The person stops being a clerk for the agent and becomes an editor of its output. That is a better job, and a more durable one.

As models improve, the effort spent curating context falls toward zero while the effort spent on judgment rises and becomes the job.

What This Means for the Forward-Deployed Engineer

Shipper’s most concrete prediction is that the forward-deployed engineer, the person who keeps a company’s agents working, is a real and growing role that “comes out of every agent needs a human.”

I agree the role is real. I think it is going to shed half its weight.

Today that person spends a lot of their time on plumbing: wiring the agent to the right sources, re-grounding it when it drifts, maintaining the harness, keeping context fresh. Strip out the curation half and what remains is the part that was always the point: designing the system so that people with less context can do work that used to require an expert, and exercising judgment about whether the agent’s output is correct. Less time keeping the agent fed, more time deciding what good looks like and building the guardrails that enforce it.

That is not a smaller role. It is a sharper one. The filing-clerk version of the job does not scale, because the maintenance burden grows with every new agent and every new source. The judgment-and-design version scales, because the curation underneath it is carried by infrastructure rather than by a person’s memory.

The companies that get this right will not be the ones that hire an army of people to garden agents. They will be the ones that automate the gardening and point their best people at the judgment.

Shared Context Is a Lens, Not a Grant

There is a second prediction worth taking seriously: that most companies will run on a single shared agent rather than a swarm of personal ones. Shopify has one. Ramp has one. Shipper flipped from believing in personal agents to believing the near-term winner is one super-agent per company.

A shared agent has an obvious problem. The whole company queries it, but not everyone is allowed to see everything. Shipper’s example is a data bot “that knows at the warehouse level who has permission to access what.” That permission awareness is the entire ballgame. Get it wrong and the helpful company-wide agent becomes the fastest leak you have ever built.

The clean way to think about this: a shared context is a lens, not an access grant. Putting a memory into a context that the whole team can reach should change what the agent can find, not who is allowed to see it. Every retrieval still runs inside what the requesting person is already permitted to see. Adding something to a shared context can never widen its audience. The lens decides relevance. Permissions still decide visibility, independently and underneath.

One shared work context, but each viewer only sees the memories they are already permitted to see. Adding a memory to the shared context never widens who can see it.

That separation is what makes one agent safe to point at a whole company. It is also, not coincidentally, the harder half of the engineering. Anyone can build a shared context that leaks. The work is building one that is shared and governed at the same time.

The Split Is the Strategy

If you take the curation-versus-judgment split seriously, it tells you where to invest.

Stop paying people to re-explain the company to its own tools. That work should be carried by a context layer that captures the durable pieces of work as they happen, keeps the source attached, assembles the right subset for the task, and lets the human correct only the edges. The same context should follow the work across Claude, ChatGPT, Codex, Cursor, and whatever harness wins next, because the bet is on the work state being portable, not on any one tool.

Then spend the freed-up human attention on judgment. On deciding what is correct, what coheres, what ships, and what the agent should never have suggested. That is the part that compounds, the part models stay behind on, and the part worth a person’s day.

This is the practical frame for 3ngram. It is not another place to store notes, and it is not an agent that tries to replace the human. It is the layer that carries the curation half so the company can spend its judgment where judgment actually matters.

Every agent needs a human. Just not for filing.

The system should carry the context. The human should carry the judgment.