Skip to main content

Research · Performance report 01

v0.2.0 · 2026-05-15

Refactron 0.2.0. A measured look at deterministic refactoring at scale.

We measured Refactron 0.2.0 on the work users actually do — analyze a tree, plan a refactor, then verify and apply it. Every number on this page is a wall-clock measurement. Every script that produced it lives in the repo.

Analyze benchmark · v0.1.0-beta.2 → v0.2.0 · synthetic fixtures

Fixturev0.1v0.2Δ median

10k LOC

448 files

1.31s

1.21s

−8%

100k LOC

4 465 files

20.58s

11.13s

−46%

Authors

Om Sherikar ↗

Founder, Refactron

Published

2026-05-15

Version

refactron 0.2.0

00 · Abstract

Refactron 0.2.0 analyzes a 100k-LOC tree in a median 11.13 seconds — 45% faster than 0.1.0-beta.2, with run-to-run variance compressed by 65%. On a real Python project the full analyze → plan → apply loop, including the 3-gate verifier running pytest on a shadow tree, completes in roughly five seconds.

This report measures the cost of safety-first deterministic refactoring[3]. Every figure is wall-clock; every harness is public.

01 · Headline result

45%faster analyze on 100k LOC

vs 0.1.0-beta.2. Median dropped from 20.58s to 11.13s, and the long-tail variance compressed by 65% — predictable enough to drop into a pre-commit hook.

0s10s20s30s40s20.58s11.13sV0.1.0-BETA.2V0.2.0
Figure 1. Every measured run on 100k LOC; the horizontal bar is the median. Both spread and centre collapse from v0.1 to v0.2.

02 · Methodology

Reproducible by design.

Each measurement is the wall-clock real time reported by /usr/bin/time -p, captured over five runs after a single warm-up. We report median, min, and max — never a single best case. For the apply step, the fixture is freshly copied per iteration because the command mutates the tree.

Hardware

Apple M2

8 cores · 8 GB

OS

macOS 26.4

Darwin arm64

Runtime

Node 24.2

Python 3.13

Iterations

5 + 1

warm-up discarded

03 · Results — synthetic

How fast can we walk a cold tree?

Synthetic fixtures generated fresh per run from bench/gen-fixture.ts[1] — mixed Python and TypeScript with every legacy pattern Refactron's ten transforms target. This isolates the analyze step.

Fixturev0.1v0.2Spread (5 runs)Δ

10k LOC

448 files

1.31s

1.21s

1.19–1.31s

−8%

100k LOC

4 465 files

20.58s

11.13s

10.56–13.14s

−46%

Table 1. Median over five runs. Bars show the spread of all five measured runs; min and max in the same row. Δ% is the v0.1 → v0.2 median improvement.

04 · The full pipeline

Real fixture, real test suite.

Synthetic numbers isolate the analyze step. Real users run the whole loop. We measure against python-legacy-mini[2] — 9 files, 189 LOC, with a pytest suite that exercises every function the transforms touch.

Figure 2. The three pipeline stages and their measured median wall-clock.

01

analyze

refactron analyze .

0.16s

parse · 7 analyzers · score

02

plan

run --dry-run .

1.70s

transforms · diff render

03

apply

run --apply .

3.38s

3-gate verify · atomic write

Table 2. Median wall-clock per step, five runs each, fresh fixture copy per apply. End-to-end: ~5.2s from scan to atomically-written refactor.

05 · Inside the apply step

Three gates. All or nothing.

Of the 3.38-second apply budget on this fixture, roughly 3s is the test gate — pytest cold-start dominates at 9 files. On larger projects the ratio inverts: the test gate becomes bound by your suite, while plan and verification overhead stay roughly constant.

Figure 3. Every refactor passes three gates before any byte is written. Any failure drops to the rejected state — your tree never changes, so there is nothing to roll back.

06 · Discussion

What this report does and doesn't claim.

Tool comparisons ship separately

A fair head-to-head against jscodeshift, Comby, and eslint --fix is its own study — now published as research paper #02[5].

No memory profile

Wall-clock only at v0.2. Peak RSS during the 100k LOC analyze is in the next pass.

Single hardware target

Apple M2 only. Linux x86 and Windows numbers when the bench moves into CI.

500k LOC isn't here

Fixture generation alone takes ~30s and pushes 8 GB. The bench script supports SIZES=500000 bash bench/run-bench.sh; we don't publish until we can run it with headroom.

07 · Reproducibility

Run it yourself.

Both bench scripts ship in the public repo. No special hardware, no proprietary fixtures, no telemetry. If your numbers come out meaningfully different on Apple Silicon, please open an issue.

Synthetic analyze (10k + 100k LOC)

git clone https://github.com/Refactron-ai/Refactron_Lib_TS
cd Refactron_Lib_TS && npm ci
bash bench/run-bench.sh

Full pipeline (python-legacy-mini)

pip install pytest libcst requests
bash bench/run-pipeline-bench.sh

References

  1. [1]Synthetic fixture generator and timing harness. bench/gen-fixture.ts
  2. [2]Real Python fixture with pytest suite. fixtures/python-legacy-mini
  3. [3]Opdyke, W. F. (1992). Refactoring Object-Oriented Frameworks. PhD thesis, University of Illinois Urbana-Champaign — the foundation behaviour-preserving refactoring rests on.
  4. [4]Per-file parallelization PR — the source of the 45% win. Refactron_Lib_TS · PR #23
  5. [5]Refactron vs the codemod baseline — research paper #02.