The Abizor Manifesto for AI-Made Software Changes

The work changed.

AI agents now write a growing share of the world's software. METR measures agent capability by the length of task a model can complete, as measured by how long the task takes humans, and reports that this time horizon has doubled roughly every seven months across recent frontier-model history, with an even faster replication on SWE-bench (METR). They open pull requests faster than any team can review them. The hard part of shipping is no longer writing the code. It is trusting it.

The agent's word is not trust.

An agent will tell you the change is done. It sounds confident. It passes the checks. But the system that wrote the change is the worst possible judge of whether the change is right — these systems drift from intent, optimize the wrong thing, and can present work that is not what it appears to be. Confidence is not evidence. A green check is not a decision.

Execution is not authority.

The agent may execute a change. It must not also be the authority that declares the change ready to ship. Whatever decides whether a change is safe to merge has to be independent of whatever made it. This is not a feature request. It is the only arrangement that can be trusted.

Abizor is that independent decision.

For every AI pull request, Abizor produces one thing: an independent, replayable record of what the change is and whether it should move forward — intent, scope, evidence, decision. One question, what is the next safe step, and four typed answers: allowed, blocked, needs_human, done. No advisory maybe. No chat-only approval. Blocked means blocked.

The record is not the agent's summary. It is generated from a verifiable run record, and the run package can be checked locally. The decision is made by Abizor — not by the agent that wrote the code.

Owned by no agent.

Abizor works with every agent — Copilot, Cursor, Claude Code, Codex, and whatever ships next — and is owned by none of them. An authority that belongs to the agent vendor judges its own work. Independence is not a tagline; it is the entire point. Abizor reads the pull request, not the agent.

What this does and doesn't claim.

An independent decision record is worth nothing if it overclaims, so we are precise. The record states what an AI change is and whether Abizor decided it was ready to move forward, backed by a verifiable run record you can check locally. It does not say the code is secure, correct, or compliant, and it does not stand in for a human's approval. We do not claim safety or compliance guarantees we cannot back with evidence. The discipline is the credibility.

The standard.

As AI writes more of the world's software, every team needs one honest answer to one question: is this AI change still the change we meant to ship, and is it safe to move forward? That answer should be independent, replayable, and the same regardless of which agent wrote the code. That is the standard we are building.

Agents execute. Abizor decides.