Article · Jan 9, 2026
How AI coding tools edit code: 6 things from shipping software with real users
What's actually happening when your AI coding tool rewrites a whole file, misses a diff, or loads files you never mentioned.
If you have used Cursor, Claude Code, Aider, or Lovable for any serious work, you have probably hit this: a one-line change comes back as a full file rewrite. A search-and-replace reports “applied 4 edits” and silently misses two of them. A question about one file pulls in three others you never mentioned. These are not bugs or random behavior. They are the tools working exactly as designed. Here is what is actually happening.
1. Why your AI tool runs two models, not one
Most production AI coding tools are not a single model. They are two: a large frontier model that reasons about what needs to change, and a smaller, faster fine-tuned model that writes the result.
Frontier model: a large, expensive language model trained on broad data (Claude Opus, GPT-4o, Gemini 2.5 Pro are examples). Strong at reasoning. Generates around 50-200 tokens per second.
Fast-apply model: a small model fine-tuned specifically on the task of rewriting code files. Can generate thousands of tokens per second on the same hardware. Cursor calls this pattern fast apply. Morph sells it as a standalone API. Claude Code’s edit tool uses a similar architecture internally.
The reason for the split is speed. A frontier model handles a short inline edit without issue, but regenerating several hundred-line files at 100 tokens/sec is too slow to keep you in flow. The fast-apply model handles the writing at a speed that feels instant.
The frontier model decides what to change; the fast model writes it. One suggestion, two models behind it.
This is why edits sometimes feel inconsistent. The two models can disagree at the boundaries, and the fast model produces the final bytes. Most of the “it reverted my variable name” or “it lost my comment” complaints come from this seam.
2. Why AI tools rewrite whole files instead of diffs
This was the most counterintuitive finding for me, and it has the clearest evidence behind it. Given a choice between generating a diff and rewriting the entire file, AI coding tools will almost always rewrite. Under about 400 lines, the rewrite has a higher success rate.
Unified diff: the standard format Git uses to show code changes. Lines starting with
+are additions, lines starting with-are deletions, and unchanged lines provide surrounding context. The format is rigid:@@ -12,7 +12,8 @@line markers, exact whitespace, and column-perfect prefixes. One character off and the patch fails to apply entirely.
The data comes from Aider’s issue #625, where the team measured edit-application success rates across both formats. Cursor’s instant-apply model productized the same finding. Two factors explain it:
- Training data. Code on GitHub is stored as complete files, not as patch hunks. The model has seen far more full files than valid diffs, so its instincts are calibrated for files, not patches.
- Format tolerance. Unified diffs have zero margin for error. Full-file rewrites are forgiving: the model can use surrounding code to resolve ambiguity and still produce a correct result.
Above 400 lines, the math flips. Rewrite cost grows linearly with file length while diff cost stays roughly flat, so diffs start to win. This is the boundary Cursor’s fast-apply model is built around.
3. Why search-and-replace edits fail silently
If your AI tool’s only edit primitive is “find this exact string and replace it,” you are one whitespace change away from a silent failure.
This is the failure mode that GitHub Copilot community discussion #152226 keeps surfacing. The model generates a “find” string that matched the file at generation time. By the time the patch reaches disk, a comment was added or whitespace shifted. The string no longer matches. The tool reports success anyway: “applied 4 edits,” moves on.
You find the missed edits when a test fails or a bug surfaces in production.
Reliable tools handle this one of two ways: they use full-file rewrites with a fast-apply model (no string matching needed), or they validate the number of hunks applied against the model’s intent and re-prompt when the counts don’t match. Tools that do neither are a footgun in any real codebase.
4. How AI tools decide which files to load from your codebase
You cannot fit a 200-file codebase into an LLM’s context window. Every AI coding tool faces the same problem: which subset of your repo do you load for a question like “fix the bug in checkout”?
Aider’s solution is the most transparent I have seen, and it is mathematically the same algorithm Google used to rank web pages in the 1990s.
tree-sitter: an open-source parsing library that builds a full syntax tree for source code in 130-plus languages. Aider uses it to extract every symbol definition (functions, classes, variables) and every reference to those symbols across your entire repo.
Personalized PageRank: a variant of Google’s original link-ranking algorithm. Instead of ranking web pages by incoming links, it ranks files by incoming symbol references, weighted toward the symbols mentioned in the current chat.
Aider parses your repo with tree-sitter, builds a directed graph (files as nodes, symbol references as edges), then runs personalized PageRank. Files containing symbols you mentioned get a 10x weight boost. Well-named, specific identifiers get 10x. Files already open in the chat get 50x. The top-scoring files within a token budget become the model’s context.
The 1990s gave us PageRank for the web. The 2020s gave us PageRank for codebases. Same math, different graph.
5. Why closed-loop feedback beats a better model
A 2025 benchmark across major coding agents found one thing clearly: the gap between agents that can run their own tests and agents that can’t is larger than the gap between any two frontier models.
Closed-loop feedback: the AI tool can observe the result of its own edit (test pass/fail, type-checker output, lint warnings, runtime exceptions) and iterate on failures before reporting done. Open-loop means the tool writes code, says “done,” and stops. You discover the failure at runtime.
Render’s coding-agent benchmark and Artificial Analysis’s coding-agents leaderboard both show this pattern. A closed-loop model can fail on the first attempt, see why, fix it, and only stop when the loop closes green. An open-loop model has to be right on the first try, with all the failure modes above stacked against it.
This is why I run Claude Code with npm run test and npm run build wired in as commands it can invoke. Without that, even Claude Opus ships open-loop. With it, the same model converges on correct code in two or three iterations instead of requiring you to manually audit every diff.
6. Does vibe coding mean you can stop reading the code?
No. The original Karpathy framing was about lowering friction in the inner loop: describe what you want, accept the output, iterate. Nothing in that framing said stop reading the code.
The phrase has since been picked up by two opposing camps. The “vibe coding is dangerous” camp uses it to mean the operator doesn’t know what their code does, then points at the resulting failures. The “vibe coding is the future” camp uses it to mean you don’t need to understand your code, then ships the same failures and is surprised when they don’t scale.
Both camps are wrong because both treat vibe coding as a binary. It is a technique, not a discipline. The technique is having the AI write the code. The discipline (separate, non-negotiable) is reading what it wrote.
The successful vibe-coded production apps I have taken over are all maintained by people who can answer “why does this pattern exist” without consulting the original prompt. The unsuccessful ones share one failure mode: the operator stopped reading the diffs, the AI drifted across iterations, and now nobody can explain why a particular abstraction is shaped the way it is.
What this means for your workflow
- Choose closed-loop tools for anything that ships to production. Claude Code with test and build commands wired up. Cursor with terminal access. Aider with the test loop enabled. Chat-window-only copilots are fine for prototypes and unreliable for production.
- Trust full-file rewrites under 400 lines. Treat zero-hunk diffs as bugs. If your tool reports “applied 0 of 4 hunks,” that is not a no-op. It is the silent failure mode from section 3. Investigate it.
- Know which files your tool is loading and why. On Aider, read the repomap output before sending a complex request. On Claude Code, name files explicitly when the change is non-obvious. PageRank-style ranking is good, not infallible.
- Match the tool to the task. Inline tab-completion: Cursor. Whole-repo reasoning: Claude Code. Transparent file selection: Aider. Visible UI scaffolding: Lovable or Bolt. Using one tool for every job means fighting its weakest mode on every third change.
- Read the diffs. This is the only thing that separates vibe coding that ships from vibe coding that accrues debt.
Related reading
- The documentation system that makes production AI-built code maintainable long-term: How I document AI-built projects.
- A real case where skipping the diff cost half a day to recover from: Migrating to Supabase publishable keys broke my Chrome extension.
- What to audit after shipping a Lovable app quickly: How to audit a Lovable app after the BOLA disclosure.
If you are running a production codebase built with Claude Code, Cursor, or Lovable across North America, the UK and Ireland, the EU and EEA, or the ANZ region, and you want a second pair of eyes on what your tools are doing under the hood, let’s talk.
Frequently asked questions
Why do AI coding tools rewrite the whole file instead of showing a diff?
Under around 400 lines, rewriting the entire file produces fewer errors than generating a unified diff. Diff format requires exact whitespace, precise line markers, and character-perfect column prefixes. One character off and the patch fails to apply. Full-file rewrites are forgiving because the model can use surrounding context to fill gaps. Aider's issue #625 measured this directly; Cursor's fast-apply model productized the same finding. Above 400 lines the math reverses: rewrite cost grows linearly while diff cost stays flat, so diffs win again.
What is closed-loop feedback in AI coding and why does it matter?
Closed-loop means the AI tool can run your tests, see the output, and iterate on failures before it reports done. Open-loop means it writes code and stops. You find the bug at runtime. Claude Code, Aider, and Cursor agent mode all support closed-loop operation when you wire up test and build commands. A closed-loop model can be wrong on the first attempt, see the test failure, fix it, and keep iterating until the loop closes green. That self-correction ability is why the gap between closed-loop and open-loop tools exceeds the gap between any two frontier models.
How does Aider decide which files to load from my codebase?
Aider parses your repo with tree-sitter (a code parser supporting 130-plus languages), builds a directed graph where files are nodes and symbol references are edges, then runs personalized PageRank to score every file by relevance to the current question. Files whose symbols you mentioned get a 10x weight boost; files already open in the chat get 50x. The result is a ranked shortlist that fits within the model's token budget, dense with relevant code rather than arbitrary file chunks.
What is the two-model architecture in tools like Cursor and Claude Code?
Large frontier models generate around 50-200 tokens per second, which is too slow for multi-file edits. So tools like Cursor split the work: a frontier model reasons about what needs to change, then a smaller fine-tuned model writes the actual file at thousands of tokens per second. Cursor calls this fast apply; Morph sells the same pattern as an API. The fast model produces the final bytes, which is why variable names or comments sometimes disappear at the seam between the two.
Does vibe coding mean you don't need to understand the code?
No. Karpathy's original framing was about lowering friction in the inner loop: describe what you want, accept the output, iterate. Nothing in that framing said stop reading the code. Every production vibe-coded codebase I have taken over that still works is maintained by someone who can explain why the key patterns exist without consulting the original prompt. The ones that are broken share one pattern: the operator stopped reading the diffs, the AI drifted, and now nobody can explain why a particular abstraction is shaped the way it is.