Engineering
The Three-Gate Safety Model: Why We Run Your Tests Before Touching Your Code
Every existing AI refactoring tool generates code and hopes you catch the bugs in review. Refactron's bet is that you should be able to delete the review step for the refactors it makes, because the engine has already proved they do not break anything. The way you prove that is gates. Three of them, in sequence, each cheaper to run than the last is expensive to recover from. Here is how they work.
Gate 1: syntax
The syntax gate re-parses every modified file using the language's own AST library — LibCST for Python, the TypeScript Compiler API for .ts and .tsx files. It catches the dumbest failures fastest and runs first because it is the cheapest gate by an order of magnitude.
What it catches: a missing bracket, an unterminated f-string, a malformed import statement, a decorator that lost its closing paren during a rewrite. What it cannot catch: anything that crosses a file boundary. A syntactically valid file can still import a symbol that no longer exists.
This gate exists even though the refactorer claims to round-trip every AST it touches. Round-trip bugs happen. The cost of running a parse is microseconds per file. The cost of writing a broken file and discovering it in CI is hours. We run the gate.
Gate 2: imports
The imports gate walks the project's import graph and verifies that every import statement, from-import, require call, and TypeScript module specifier still resolves. It walks reverse imports too — if a refactor renamed a symbol, every file that imports that symbol gets checked, not just the file the rename happened in.
This is the gate that catches the cross-file damage the syntax gate cannot see. It runs once per refactor against the shadow tree, after gate 1 passes, before the test runner spins up.
Gate 3: tests
The tests gate auto-detects the project's test runner by looking for the configuration file that defines it. For Python: pytest.ini, or a [tool.pytest] section in pyproject.toml. For TypeScript: vitest.config.ts, vitest.config.js, jest.config.ts, jest.config.js, or the equivalent fields in package.json. If no runner is detected, the gate short-circuits with an explicit message — it does not silently skip the most important gate in the system.
When a runner is detected, the gate runs the suite twice. First, against the baseline — your project as it exists before any refactor — with up to three retry attempts to absorb flake. Then again, against the shadow tree after the refactor has been applied. The before-and-after split is the only honest way to attribute a failure to the refactor.
If the baseline fails, the refactor aborts. We will not pretend a refactor passed tests when the suite was already broken before we touched the code. If the baseline passes and the shadow run fails, we know with certainty that the refactor introduced the failure, and we tell you which tests started failing.
The shadow tree
The shadow tree is a hardlinked mirror of your project. On filesystems that support hardlinks — every modern Unix, NTFS — the shadow tree shares inodes with your real files, so creating it is effectively free. Most operations during verification are reads, and reads against a hardlink are reads against the original. The shadow tree only diverges from your real tree when the refactor writes a new file, which happens on a copy-on-write boundary.
On filesystems that do not support hardlinks — older Windows configurations, certain network mounts, FAT — the shadow tree falls back to a full copy. Slower, but functionally equivalent.
The original tree is only touched at the very end, after every gate has passed.
Atomic batch write
When the three gates pass, the refactor writes its files using the write-file-atomic library and a two-phase rename. Each file is written to a temporary path, fsync'd, and renamed into place. Every file in the batch commits, or none does.
The invariant is: there is no state Refactron can leave your tree in that a git checkout cannot recover from. If Refactron crashes in the middle of a write, you do not end up with half a refactor. You end up with the pre-refactor state. This is non-negotiable.
What gate 1 alone would miss
Consider pep585_generics, one of the transforms that shipped in v0.2.3. Suppose your file foo.py uses typing.List[str] in a few places and has from typing import List at the top. The transform rewrites every List[X] to list[X] and removes the now-unused from typing import List line. The resulting foo.py is syntactically valid. Gate 1 passes.
But bar.py has from foo import List, because at some point a developer re-exported it. After the refactor, that import no longer resolves. Gate 1 cannot see this — it only parsed foo.py. Gate 2 catches it because it walks the import graph and notices that bar.py is now broken.
What gate 2 alone would miss
Consider a hypothetical transform that rewrites a SQLAlchemy query expression into a different but supposedly equivalent shape. The rewritten query has the same return type, the same module dependencies, the same imports. Gate 1 passes. Gate 2 passes.
But the rewrite reorders the WHERE clauses in a way that changes the generated SQL string, and there is a test in the suite that pins the generated SQL — perhaps a snapshot test, perhaps an explicit assertEqual on the compiled query. The test fails. Gate 3 catches what gates 1 and 2 cannot.
The point is not that these gates are theoretical. Every refactor we have shipped has been caught by at least one of them during development, and every catch has informed the refusal logic of the transform that produced it.
Code: example of a gate-3 failure
Here is what a gate-3 failure looks like at the CLI:
[gate 1 / syntax] PASS 3 files
[gate 2 / imports] PASS imports resolve
[gate 3 / tests] FAIL 2 failing (was 0)
tests/test_users.py::test_serialize_with_none
tests/test_users.py::test_optional_field
Refactor aborted. Your tree is untouched.Two tests started failing as a direct result of the refactor. Refactron reports them by name, abandons the shadow tree, and exits non-zero. Your working directory is byte-for-byte identical to what it was before you ran the command. There is no cleanup step. There is no "did it actually revert?" anxiety. The original tree was never written to.
What this costs
A typical refactor on a 10K-LOC Python project runs all three gates in eight to forty-five seconds, depending on the size of the test suite. The test gate dominates that range — gates 1 and 2 finish in under a second on projects up to that size. On larger projects, the test gate scales linearly with the suite.
We do not apologize for the cost. The alternative is a refactor that passes review and fails in staging, or worse, in production. Three gates and a thirty-second wait is the difference between "refactoring is something you do on Friday afternoon before a long weekend" and "refactoring is something you escalate to the senior engineer and put off for a quarter." The test gate is the gate that earns the deletion of the review step. Without it, every other gate is just a faster way to ship a regression.