Article · Jan 16, 2026
Why AI coding tools rewrite full files instead of using diffs
Cursor fast apply, Morph, and Aider all converge: full-file rewrites beat diffs under 400 lines. Three reasons rooted in how language models work.
If you have used Cursor, Aider, or Morph’s apply API for any non-trivial edit, you have probably noticed these tools prefer to rewrite an entire file rather than generate a unified diff. The behaviour looks wasteful at first: more tokens, more latency, more bytes on the wire. It is the empirically correct choice.
Unified diff: a patch format that describes file changes as numbered hunks, each prefixed with
@@line-range headers,+for added lines, and-for removed lines. Applying a unified diff requires the target file to match the patch’s context lines exactly, including whitespace.
What the research actually shows
After Aider’s team published issue #625 in June 2024 with the finding that “fully rewriting the full file outperforms aider-like diffs for files under 400 lines,” two other teams published converging results within twelve months:
- Aider issue #625 (June 2024). Paul Gauthier benchmarked search-and-replace blocks against full-file rewrites on a controlled edit task. Rewrites had a measurably higher success rate under 400 lines.
- Cursor’s instant-apply post (October 2024). The Cursor team described their “fast apply” architecture: a frontier model emits the change in structured form, then a small fine-tuned model rewrites the file at roughly 1,000 tokens per second.
- Morph’s fast-apply paper (2025). Morph generalized Cursor’s pattern into a hosted API. Any agent asking a frontier model to rewrite the file can swap in Morph and cut token usage 50-60% and latency 90% or more, while keeping the rewrite-style architecture.
Same conclusion, different starting points. The reasons are rooted in how language models actually work.
Why training data distribution matters
Code on GitHub is overwhelmingly stored as complete files, not as patch hunks. When a language model is pre-trained on a large code corpus, the ratio of full-file examples to valid-diff examples is roughly 1,000:1.
This shapes the model’s priors hard. A model asked to emit a complete file is operating in territory it has seen millions of times. A model asked to emit a unified diff is operating in territory it has seen thousands of times, mostly in commits and blog posts, almost never as the primary output of a long synthesis task.
Fabian Hertwig’s analysis and Morph’s diff-format breakdown both put unified-diff success rates at 70-80% on complex files, with the gap widening as files grow. Full-file rewrites in the same conditions clear 95%+.
How tokenizer behaviour breaks diff syntax
Unified diff format is unforgiving. Every line starts with one of four characters (+, -, , or @). Hunk headers have the form @@ -old_start,old_count +new_start,new_count @@. Context lines must match the source file character-for-character, including whitespace.
Byte-pair encoding (BPE): the tokenization method used by Claude, GPT, and Llama. Rather than splitting text at word boundaries, BPE merges frequent character sequences into single tokens. Leading whitespace is typically folded into the next code token, not treated as a standalone token.
This unforgiveness collides directly with BPE. When a model emits a diff hunk, it has to coordinate the prefix character, the indentation level, and the actual code simultaneously. A drift of one space, or a merged whitespace token, produces a patch that fails to apply entirely.
Full-file rewrites have none of this. The model emits code, the tool writes the file, the next read is the source of truth. Errors that would silently break a diff just produce slightly different code that the test suite catches.
Why the token cost math favours rewrites under 400 lines
Under 400 lines, a frontier model rewriting the full file emits roughly 4,000-6,000 tokens. At Claude Sonnet’s current pricing that is approximately 3-5 cents and 5-6 seconds. The diff alternative would be 200-500 tokens at under a second, but with a 20-30% chance of failing to apply.
Run the expected-cost calculation: if 25% of diffs fail, you pay 1.25 × diff_cost + 0.25 × retry_cost. If the retry costs roughly the same as the original generation, you are paying 1.5x the diff cost on average, plus wall-clock retry time, plus the human cost of noticing the silent failure. Under 400 lines, that expected cost is comparable to or worse than a single rewrite.
This is the exact problem Cursor’s speculative-edits algorithm solves. It uses the original file as the deterministic draft for speculative decoding, so the small fast-apply model only has to verify tokens for unchanged regions instead of regenerating them. Full-file rewrite accuracy at roughly diff-token speed.
Where the math flips
Above 400 lines, the same calculation runs the other way. A 2,000-line file is 16,000-24,000 output tokens to rewrite. At frontier pricing that is 30-50 cents and 30-60 seconds, every time you change a comment.
Diff format does not have this scaling problem. A small change to a 2,000-line file is still a small diff. The 70-80% success rate is unchanged from smaller files; it is a function of file complexity, not file size. At large files, diff cost is much lower while the failure rate holds constant.
This is why Aider, Claude Code, and most CLI-based agents still support diff-style edits as a fallback for large files, and why Cursor’s fast-apply documentation bounds the technique to small files. There is no globally-best edit format. There is a file-size threshold, and 400 lines is a working approximation of where it sits.
What to do with this information
For most application-development work, full-file rewrites are the right default. Component files, route handlers, schema definitions, and most utility modules land well under 400 lines. The rewrite-style edit is more reliable.
For large monolithic files, pick tools that switch formats. Aider explicitly chooses between rewrite, search-and-replace, and udiff based on file size. Claude Code’s edit tool does similar logic internally. A tool that always uses one format will be wrong above or below the threshold.
Most importantly: treat “applied 0 of N edits” as a bug. If your AI tool reports a partial diff application, the model thought there was an edit to make but the apply step could not find it. That is a silent failure. It is the most common cause of “the AI said it fixed it but the bug is still there.”
Related reading
- The full series: How AI coding tools actually edit code: 6 truths from 4,000+ hours of building software
- What happens when an edit-format assumption meets a real-world platform shift: Migrating to Supabase publishable keys broke my Chrome extension
If you are a SaaS founder or product team and want a second pair of eyes on how your AI coding toolchain handles edits on your actual codebase, let’s talk.
Frequently asked questions
Why don't AI coding tools use unified diff format?
Unified diff format reaches only 70-80% accuracy on complex files. The failure modes are predictable: incorrect hunk line numbers (the LLM gets the @@ -old,+new @@ headers wrong), context drift (the file has changed since the LLM last saw it), and anchor matching failures (the patch context matches zero locations or multiple ambiguous locations in the file). Whitespace, comment additions, or formatter runs between generation and application can break a diff that was syntactically correct when emitted. Full-file rewrites avoid all four failure modes by handing the model the entire file as context and trusting it to reproduce the unchanged parts verbatim.
What is Cursor's fast apply?
Cursor's fast apply is a two-model architecture for code edits. A frontier model (Claude, GPT, etc.) generates the change in structured form, then a smaller fine-tuned model rewrites the entire file with the change applied at roughly 1,000 tokens per second. The key technique is "speculative edits," a custom speculative-decoding algorithm that uses the original file as the deterministic draft, so the small model only has to verify tokens for unchanged regions instead of regenerating them. The end result is functionally equivalent to a full-file rewrite but up to 9x faster than a frontier-model rewrite, with comparable accuracy.
When do diffs beat full-file rewrites for AI edits?
Around 400 lines is the empirically observed boundary in Aider's benchmarks (issue #625) and Cursor's internal testing. Below that, full-file rewrites have a higher success rate and the rewrite token cost is small enough that the accuracy gain dominates. Above 400 lines, rewrite cost grows linearly while diff success rate stays roughly constant, so diff format wins on token economics even though its application failure rate is higher. The exact threshold varies by language and file structure, but 400 is a reasonable working number.
What is Morph's fast apply API?
Morph is a hosted "fast apply" model: a specialized small model trained specifically on the merge-an-edit-into-a-file task. The pitch is that any agent currently asking a frontier model to rewrite the full file can swap in Morph's API and cut token usage by 50-60% and latency by 90% or more. A 1,000-line file takes 1.3 seconds through Morph versus 10-12 seconds through Claude Sonnet doing a full rewrite. The accuracy stays in the same range as full-file rewrites because the architecture is the same; the specialization is in decoding speed and cost, not in the format.
How does Aider apply edits, and why is it different?
Aider uses search-and-replace blocks: the LLM emits a SEARCH section (the exact text to find) and a REPLACE section (the new text), with the surrounding context as the anchor. Aider then matches the SEARCH section against the file using a cascade of strategies (exact match, anchor-based, string similarity, Levenshtein distance) and applies the REPLACE if the confidence score passes. This avoids the line-number drift problem of unified diff format but inherits the context-drift problem when the file has changed since the LLM saw it. Aider's own benchmarks (issue #625) showed full-file rewrites beat this format under 400 lines, which is why most newer tools have moved that direction.
Should I worry about which edit format my AI coding tool uses?
Yes, for two reasons. First, "applied 0 of 4 edits" is a real failure mode that AI tools report as success and silently leave your code in a half-applied state. If your tool uses a diff format, watch for this and treat it as a bug. Second, edit format affects cost and latency: full-file rewrites consume more output tokens than diffs, so on large files (>400 lines), diff-based tools will be cheaper and faster, but with a higher probability of partial-application bugs. Pick the format that matches your file sizes and your tolerance for silent failures.