Autonomy scales with reversibility, not trust

For a long time I framed agent permissions the wrong way: which decisions do I trust it to make. That framing has no good answer, because trust has no units.

The question I kept asking about my AI agent was a trust question. Can I trust it to commit code? To deploy? To send a message? To spend money? Each one felt like a judgment about the agent’s reliability, a dial I would turn up as it earned my confidence and down when it slipped.

The problem with that frame is that trust is not measurable. There is no number on it. So every permission decision became a gut call that could drift in either direction depending on my mood, the last thing that went wrong, how the week had gone. A system built on that is a system with no principled line anywhere.

i.The reframe: blast radius times reversibility

The reframe that fixed it: stop asking what decisions can the agent make, and ask what is the blast radius of this action, and is it reversible. Two axes, both objective.

Blast radius is how far the consequence reaches. Does it stay inside my own files, or does it touch the outside world, other people, money, public surfaces. Reversibility is whether I can undo it. Can I revert this, or is it a one-way door.

Put those together and the picture sorts itself. An action with a small blast radius that is fully reversible, the agent editing its own notes, drafting, staging, writing to a git-backed store, is safe to automate completely, because the worst case is a revert. An action with a large blast radius that is irreversible, publishing under my name, sending, spending, deleting, is the one to gate, because there is no undo and the consequence reaches other people. Neither of those calls required me to assess how much I trust the agent. They fell out of the geometry of the action.

On the lever You do not negotiate trust.You engineerrecoverability.

ii.Build the undo, and autonomy is free

Here is the move that follows, and it is the practical payoff. If reversibility is what determines whether an action can be automated, then you can manufacture autonomy by manufacturing reversibility. Build a clean undo and the autonomy falls out for free.

Autonomy scales with reversibility, not trust

That is what git is, for code: an undo so reliable that letting an agent write code is low-risk, because any change can be reverted. That is what a drafts folder is, for content: a holding pen so the agent can produce freely without anything reaching the public. That is what a staging environment is, for deploys. Each of those is an engineered undo, and each one converts a category of action from gate-it into let-it-run, not by trusting the agent more, but by making its mistakes cheap to reverse.

So the design lever for safe autonomy is not building more trust. It is building more recoverability. Every undo you add expands the zone where the agent can operate freely, because you have lowered the cost of it being wrong to near zero.

iii.Why trust is the wrong variable

Trust fails as the variable for a deeper reason than being unmeasurable. Even a fully trustworthy agent makes mistakes, because it operates on incomplete information, ambiguous instructions, and a world that surprises it. Trust, at its best, lowers the rate of mistakes. It never makes the cost of a mistake zero. And for an irreversible action, the cost is what matters, not the rate. A rare mistake on a one-way door is still catastrophic.

Reversibility attacks the cost directly. It does not care how often the agent errs. It ensures that when the agent errs inside the reversible zone, the error is undoable, full stop. That is a stronger guarantee than any amount of trust, and it is one you can actually engineer rather than hope for.

Build a clean undoand autonomyfalls out for free.

iv.Engineer the recoverability

So if you are deciding how much autonomy to give an agent, do not start with how much you trust it. Start with the actions, sort them by blast radius and reversibility, and then go build undos for everything you can. Version control, drafts, staging, holding pens, soft deletes. Each undo you add is a piece of autonomy you can grant without a leap of faith.

What is left after that, the genuinely irreversible, large-blast-radius actions, is a small set, and that small set is where the human gate belongs. You will find it is a much shorter list than your trust-based instinct wanted to gate. The agent runs free across the broad reversible middle, and you stand the one guard at the one-way doors. That is autonomy scaled by recoverability, and it holds steady no matter what kind of week you are having.

Drafted with Bishop, my AI partner.
Words picked, edited, and approved by me. Model provenance: Claude Code (Claude Opus and Sonnet)