Research · Comparison report 02
2026-05-15
Refactron vs the codemod baseline. A head-to-head.
Two transforms, four other tools, identical inputs. We measured speed, coverage, and safety against the existing deterministic-codemod technology. The result is mixed in exactly the way an honest benchmark should be.
Coverage benchmark · var → const/let & format → f-string · 5 tools · identical inputs
Authors
Founder, Refactron
Published
2026-05-15
Hardware
Apple M2
00 · Abstract
Refactron is the slowest tool we measured. It is also the only one that is top-coverage on both transforms while never producing a single unsafe rewrite. The two pure codemod tools that ship no verification step run sub-second and write code that does not compile.
That tension is the paper. Speed without verification bought broken code in every cell where it was measured. The benchmark builds on the deterministic-refactoring tradition[3] — behaviour preservation checked, not assumed.
01 · Why this study
The engineering baseline, not the competitors.
jscodeshift and LibCST are codemod frameworks — you author codemods with them. Comby is a structural search/replace DSL. ESLint --fix is a linter's autofix. None of them is the product a team weighs Refactron against.
But they are the existing technology that performs deterministic source-to-source transformation — exactly what Refactron's engine does. A new approach earns credibility by being measured against the established one on identical inputs. This is "transform + verify versus transform only," not "our product versus theirs."
02 · Setup
Identical inputs, three axes.
Transforms
2
var → const/let · format → f-string
Planted sites
234
126 TypeScript · 108 Python
Runs / cell
5 + 1
warm-up discarded
Each tool runs the equivalent codemod, authored the way a competent engineer would and committed to the repo for audit. LibCST uses Instagram's reference ConvertFormatStringCommand[2]; ESLint runs its stock prefer-const + no-var rules.
Speed
Wall-clock for the whole invocation, process startup included. What a user actually waits for.
Coverage
Per-site exact classification — correct, missed, wrong, broken — by stable anchor, not line proximity.
Safety
tsc --noEmit / py_compile plus the fixture's own test suite, run against the tool's output.
03 · Results
Coverage and safety move together.
var → const/let
TypeScript · 126 planted sites
5.22s
100.0% 126/126
0.65s
100.0% 126/126
0.67s
46.0% 58/126
0.29s
47.6% 60/126
format → f-string
Python · 108 planted sites
3.76s
99.1% 107/108
2.68s
57.4% 62/108
4.79s
15.7% 17/108
Table 1. Median of five runs. Refactron and ESLint convert every site correctly and safely; the pure codemod tools emit dozens of wrong rewrites that fail to compile.
04 · The split
Careful tools are safe. Unguarded tools are fast and broken.
Careful · safe output
Refactron (verification gate), ESLint (narrow ruleset), LibCST (conservative codemod). Every output compiles and passes tests.
Unguarded · broken output
jscodeshift and Comby transform with no verification step. Sub-second, and every cell failed compilation.
No cell in this study was fast, high-coverage, and safe at once. Speed without verification bought broken code every time.
05 · Discussion
Where Refactron wins, and where it loses.
Wins · coverage + safety
Refactron is top-coverage on both transforms — a tie with ESLint at 100% on var → const/let, an outright win at 99.1% vs LibCST's 57.4% on format → f-string — and never emits an unsafe rewrite. No other tool here is top-coverage on both.
Loses · speed
Refactron is the slowest tool measured — ~8× slower than ESLint on var → const/let. This is not an optimization gap to apologize for; it is the pipeline. Refactron applies a transform, re-parses every changed file, resolves every import, runs the full test suite on a shadow tree, then writes atomically. The 5.22s figure is that pipeline.
The honest claim is not "Refactron is fastest." It is: Refactron is the only tool measured here that never wrote broken code — and that guarantee has a price denominated in seconds.
06 · Limitations
The honest fine print.
Not a product comparison
These are frameworks and linters, not Refactron's commercial alternatives. Cursor, SonarQube, and the LLM tools belong in a separate, categorical study.
Two transforms, not ten
Refactron ships ten. These two were chosen for tool overlap, not because they are the hardest cases.
Synthetic fixtures
Ten files per transform with planted patterns. Real codebases are messier. The fixtures are published so the methodology can be challenged.
Competitor codemods may not be optimal
We authored them and committed them for audit. If an expert shows us a better visitor, we rerun and republish.
07 · How this benchmark was built
It embarrassed us first.
The first run reported Refactron at 27% coverage on var → const/let. Investigation found two real bugs in Refactron's own transform — a scope-unaware reference scan and a missed AST node kind. They were fixed (27% → 100%) before publication[4]. The benchmark also caught a precision flaw in its own checker that was miscounting every tool; that was corrected too. A benchmark you publish should be one that has already embarrassed you in private.
Reproduce on your machine
git clone https://github.com/Refactron-ai/Refactron_Lib_TS cd Refactron_Lib_TS && npm ci bash bench/comparison/harness/run.sh
References
- [1]Comparison bench — fixtures, per-tool codemods, harness, raw results. github.com/Refactron-ai · bench/comparison
- [2]Instagram / LibCST. ConvertFormatStringCommand — the reference Python format codemod benchmarked here.
- [3]Opdyke, W. F. (1992). Refactoring Object-Oriented Frameworks. PhD thesis, University of Illinois Urbana-Champaign — the precondition-checking foundation behaviour-preserving refactoring rests on.
- [4]The var_to_const_let scope-correctness fix and the printf-grammar percent converter. Refactron_Lib_TS · PR #27
- [5]Refactron 0.2.0 performance report — research paper #01.