Skip to main content

Research · Comparison report 02

2026-05-15

Refactron vs the codemod baseline. A head-to-head.

Two transforms, four other tools, identical inputs. We measured speed, coverage, and safety against the existing deterministic-codemod technology. The result is mixed in exactly the way an honest benchmark should be.

Coverage benchmark · var → const/let & format → f-string · 5 tools · identical inputs

TransformToolCoverageSafe
var → const/letRefactron
100%
ESLint --fix
100%
jscodeshift
46%
Comby
47.6%
format → f-stringRefactron
99.1%
LibCST
57.4%
Comby
15.7%

Authors

Om Sherikar ↗

Founder, Refactron

Published

2026-05-15

Hardware

Apple M2

00 · Abstract

Refactron is the slowest tool we measured. It is also the only one that is top-coverage on both transforms while never producing a single unsafe rewrite. The two pure codemod tools that ship no verification step run sub-second and write code that does not compile.

That tension is the paper. Speed without verification bought broken code in every cell where it was measured. The benchmark builds on the deterministic-refactoring tradition[3] — behaviour preservation checked, not assumed.

01 · Why this study

The engineering baseline, not the competitors.

jscodeshift and LibCST are codemod frameworks — you author codemods with them. Comby is a structural search/replace DSL. ESLint --fix is a linter's autofix. None of them is the product a team weighs Refactron against.

But they are the existing technology that performs deterministic source-to-source transformation — exactly what Refactron's engine does. A new approach earns credibility by being measured against the established one on identical inputs. This is "transform + verify versus transform only," not "our product versus theirs."

02 · Setup

Identical inputs, three axes.

Transforms

2

var → const/let · format → f-string

Planted sites

234

126 TypeScript · 108 Python

Runs / cell

5 + 1

warm-up discarded

Each tool runs the equivalent codemod, authored the way a competent engineer would and committed to the repo for audit. LibCST uses Instagram's reference ConvertFormatStringCommand[2]; ESLint runs its stock prefer-const + no-var rules.

Speed

Wall-clock for the whole invocation, process startup included. What a user actually waits for.

Coverage

Per-site exact classification — correct, missed, wrong, broken — by stable anchor, not line proximity.

Safety

tsc --noEmit / py_compile plus the fixture's own test suite, run against the tool's output.

03 · Results

Coverage and safety move together.

0%25%50%75%100%100%Refactron100%ESLint46%jscodeshift47.6%CombyVAR → CONST/LET99.1%Refactron57.4%LibCST15.7%CombyFORMAT → F-STRING
Figure 1. Correct-rewrite coverage per tool. Bar colour encodes safety — green compiled and passed tests, red did not.

var → const/let

TypeScript · 126 planted sites

ToolSpeedCoverageWrongSafety
Refactron

5.22s

100.0% 126/126

0safe
ESLint --fix

0.65s

100.0% 126/126

0safe
jscodeshift

0.67s

46.0% 58/126

55fail
Comby

0.29s

47.6% 60/126

66fail

format → f-string

Python · 108 planted sites

ToolSpeedCoverageWrongSafety
Refactron

3.76s

99.1% 107/108

0safe
LibCST

2.68s

57.4% 62/108

0safe
Comby

4.79s

15.7% 17/108

074 brokenfail

Table 1. Median of five runs. Refactron and ESLint convert every site correctly and safely; the pure codemod tools emit dozens of wrong rewrites that fail to compile.

04 · The split

Careful tools are safe. Unguarded tools are fast and broken.

ALL SAFE RESULTS LAND IN THIS BAND0%25%50%75%100%0s1s2s3s4s5sSPEED — WALL-CLOCK SECONDSCOVERAGERefactron · varRefactron · fmtESLint · varComby · varjscodeshift · varLibCST · fmtComby · fmt
safe — compiles + tests passunsafe — failed compilation
Figure 2. Every measured cell, speed against coverage. Safe results (green) sit in a high-coverage band; unsafe results (red) sit below 50%.

Careful · safe output

Refactron (verification gate), ESLint (narrow ruleset), LibCST (conservative codemod). Every output compiles and passes tests.

Unguarded · broken output

jscodeshift and Comby transform with no verification step. Sub-second, and every cell failed compilation.

No cell in this study was fast, high-coverage, and safe at once. Speed without verification bought broken code every time.

05 · Discussion

Where Refactron wins, and where it loses.

Wins · coverage + safety

Refactron is top-coverage on both transforms — a tie with ESLint at 100% on var → const/let, an outright win at 99.1% vs LibCST's 57.4% on format → f-string — and never emits an unsafe rewrite. No other tool here is top-coverage on both.

Loses · speed

Refactron is the slowest tool measured — ~8× slower than ESLint on var → const/let. This is not an optimization gap to apologize for; it is the pipeline. Refactron applies a transform, re-parses every changed file, resolves every import, runs the full test suite on a shadow tree, then writes atomically. The 5.22s figure is that pipeline.

The honest claim is not "Refactron is fastest." It is: Refactron is the only tool measured here that never wrote broken code — and that guarantee has a price denominated in seconds.

06 · Limitations

The honest fine print.

Not a product comparison

These are frameworks and linters, not Refactron's commercial alternatives. Cursor, SonarQube, and the LLM tools belong in a separate, categorical study.

Two transforms, not ten

Refactron ships ten. These two were chosen for tool overlap, not because they are the hardest cases.

Synthetic fixtures

Ten files per transform with planted patterns. Real codebases are messier. The fixtures are published so the methodology can be challenged.

Competitor codemods may not be optimal

We authored them and committed them for audit. If an expert shows us a better visitor, we rerun and republish.

07 · How this benchmark was built

It embarrassed us first.

The first run reported Refactron at 27% coverage on var → const/let. Investigation found two real bugs in Refactron's own transform — a scope-unaware reference scan and a missed AST node kind. They were fixed (27% → 100%) before publication[4]. The benchmark also caught a precision flaw in its own checker that was miscounting every tool; that was corrected too. A benchmark you publish should be one that has already embarrassed you in private.

Reproduce on your machine

git clone https://github.com/Refactron-ai/Refactron_Lib_TS
cd Refactron_Lib_TS && npm ci
bash bench/comparison/harness/run.sh

References

  1. [1]Comparison bench — fixtures, per-tool codemods, harness, raw results. github.com/Refactron-ai · bench/comparison
  2. [2]Instagram / LibCST. ConvertFormatStringCommand — the reference Python format codemod benchmarked here.
  3. [3]Opdyke, W. F. (1992). Refactoring Object-Oriented Frameworks. PhD thesis, University of Illinois Urbana-Champaign — the precondition-checking foundation behaviour-preserving refactoring rests on.
  4. [4]The var_to_const_let scope-correctness fix and the printf-grammar percent converter. Refactron_Lib_TS · PR #27
  5. [5]Refactron 0.2.0 performance report — research paper #01.