AI IDE 대시보드에서 PCW 98%로 표시된 화면 — AI가 짠 코드 비율이 부풀려지는 현상을 보여주는 이미지

williamoconnell.me

Your AI Is Lying to Your Boss — How Windsurf Hits 98% PCW and How to Catch the Real Signal in a PR

AI code validation, AI metrics lying, PCW, measuring AI-generated code, PR review checklistDev

Your AI Might be Lying to Your Boss

Percentage of Code Written — Windsurf

Top engineers at Anthropic, OpenAI say AI now writes 100% of their code

A developer opened his IDE dashboard and saw "PCW: 98%" — meaning 98% of his code was supposedly written by AI. That felt off, so he ran experiments. The result? When he typed 49 characters by hand, the system credited only 46 to him. A copy-paste of his own work? Zero credit. The AI took 100% of the score.

3-second summary

"AI wrote 98%" → Reverse-engineer the metric → Find 5 biases → Prove the metric lies → Use a PR checklist for the real signal

So what actually happened?

In April 2026, William O''Connell published his findings. His company''s Windsurf dashboard showed his "% new code written by Windsurf" at 98%. His gut said maybe 10–20%. So if AI was producing 49x more code than he was, why hadn''t he blown through his token budget? Why hadn''t he been promoted (or fired)?

Windsurf''s own blog states: "Customers should expect PCW values of 85%+, often 95%+. This is not a hallucination." But once you dig into how it''s computed, "not a hallucination" gets a lot harder to defend.

O''Connell tried mitmproxy first, but Windsurf encodes traffic with protobuf. Luckily the analytics dashboard exposed user_bytes, codeium_bytes, total_bytes, and percent_code_written in plain JS. From there, he uncovered five biases.

Five ways Windsurf''s PCW shaves down human contribution

1. Auto-closed brackets/quotes don''t count as typing. Type 49 chars, get credit for 46.
2. Pasting doesn''t increment user_bytes. Move your own code to another file? Zero credit.
3. Refactors get fully credited to AI. Ask AI to move a function you wrote? 100% AI.
4. Sessions reset memory. After restart, the editor forgets where each line came from.
5. "Measured at commit time" — not really. The counters move while you type.

The decisive test: O''Connell typed one line into human_file.js (49 chars), asked AI to write a similar one in ai_file.js. Then he pasted his own function into the AI file and asked AI to copy his function back. Result: AI was credited with more than twice the code he was, even though both files were almost identical in length.

Cursor''s "AI Share of Committed Code" is built on cleaner git plumbing, but it has its own failure mode. O''Connell pasted a 100-line JS file and asked Cursor to convert double quotes to single quotes. AI touched 49 of 93 non-blank lines — yet Cursor reported 100% of all 100 lines as AI-authored. Different mechanism, same direction: AI share gets inflated.

Why this is more than a metrics-accuracy problem

In January 2026, Anthropic''s Boris Cherny posted on X that 100% of his code is now written by Claude, and "pretty much 100%" company-wide. Satya Nadella claimed 30% at Microsoft. Google: 75%. Numbers like these make great press releases — and great fundraising material for AI vendors.

Then there''s the METR randomized controlled trial: 16 senior open-source developers, real PRs in their own repos. AI-allowed group was 19% slower. The kicker? They believed they''d been 20% faster. Self-reported speedup is essentially noise.

GitClear analyzed 211 million changed lines across five years. Refactoring dropped from 25% (2021) to under 10% (2024). Copy/paste rose from 8.3% to 12.3% — the first time in history "pasted" beat "moved" lines. The volume went up, the codebase got worse, in measurable ways.

	Vendor metrics	What you should track instead
Unit	Bytes / lines (volume)	PR cycle time, post-merge fix rate (quality)
Bias direction	Inflates AI share	Neutral — validated post-merge
When measured	On keystroke / commit	7–30 days after merge
What it justifies	"AI does it all — cut headcount"	"Where is verification debt accumulating?"
Legal exposure	"Most code isn''t copyrightable"	Conservative human-attribution

The damage isn''t numerical accuracy — it''s narrative. The sentence "90% of our code is AI" creates a gut feeling executives can''t un-feel. They start asking "why do we need this many engineers?" And U.S. courts have made clear AI-generated work isn''t copyrightable, so "most of our code is AI" is a legal-team nightmare too.

Even on Korean dev forums, the same Goodhart''s Law warning is showing up: AI code-acceptance rates as KPIs incentivize accepting without review. And one CTO summed it up — "AI writes code in one minute, then humans spend ten reviewing it." The metric lies, but the cost shifts to review time anyway.

The PR-review checklist that catches the real signal

Treat dashboard PCW/AI Share numbers as directional only
Even Windsurf calls it a "directional proxy." Absolute values are meaningless. Track trends within the same team, same tool, same quarter. Never compare across tools or teams.
Read the diff layout first
AI-authored PRs often touch files they didn''t need to touch (the "100 lines all AI" pattern). When the diff is unusually wide, the first question is "what was the actual intent?" before reviewing logic.
Suspect tests that look too clean
METR found AI tends to write self-fulfilling tests with hardcoded values. If assertions echo input, or there''s only a happy path, that''s a red flag. Add one failing case and watch what breaks.
Wire up duplicate-code detection in CI
GitClear''s 4x copy-paste explosion is the fastest signal to catch. jscpd, SonarQube duplications, or even a grep script in GitHub Actions works. AI tends to rewrite similar functions instead of reusing existing ones.
Make AI defend its own code
Adam Ferrari''s pattern: feed the PR diff back to a model and ask why each change was needed and what could break. If "the author" can''t explain it, you didn''t save reviewer time — you deferred it.

One-liner for managers

When someone reports "X% of our code is AI," ask two questions: ① How is that number computed? (PCW? Cursor Share? a custom definition?) ② How have post-merge fix rate and rollback rate moved in the same quarter? If you can''t answer both, don''t use the number for decisions.

Going deeper

Your AI Might be Lying to Your Boss O''Connell''s original write-up — full reverse-engineering of Windsurf and Cursor metrics williamoconnell.me

Percentage of Code Written Windsurf''s official PCW explainer — 85–95% is "normal" plus six caveats windsurf.com

METR: Early-2025 AI on Experienced Developers RCT with 16 senior devs — 19% slowdown, 20% perceived speedup metr.org

GitClear AI Copilot Code Quality 2025 211M lines analyzed — 4x copy-paste, refactoring down from 25% to 10% gitclear.com

Anthropic and OpenAI engineers claim 100% AI code Boris Cherny and Roon say none of their code is hand-written anymore fortune.com

Quantifying AI Coding Impact Adam Ferrari on PCW limits and alternative metrics adamferrari.substack.com

AI Code Review Reliability — Engineering Org Principles Korean enterprise case — AI review on top, senior review above it brunch.co.kr

FAQ

Where should I look first in a PR to catch AI-coding traps quickly?

The diff layout and the test assertions. AI tends to touch files it didn't need to touch, so an unusually wide diff is your first signal — ask intent before logic. Then check if assertions echo the inputs or only cover happy paths. Self-fulfilling tests are the most common trap.

Is it OK to use Windsurf PCW or Cursor AI Share as a KPI?

No. Use them as a directional proxy within the same tool, same team, same quarter — never as an absolute KPI. Auto-closed brackets, paste handling, and refactor attribution all skew the number toward inflating the AI share. Real KPIs should be PR cycle time, post-merge fix rate, and rollback rate.

Are claims like Boris Cherny's '100% AI code' lies?

Not lies, just missing context. The same Fortune piece quotes an Anthropic spokesperson saying the company-wide figure is 70–90%, with Claude Code itself around 90%. '100%' applies to one senior's personal workflow, and even then it means 'I don't hand-type' — not 'merge without review.' Treat exec quotes as anecdotes, not metrics.

Should companies measure AI-generated code share at all?

Yes, but only for legal exposure. U.S. courts have ruled AI-generated work isn't copyrightable, so you need a conservative estimate of human attribution for your code assets. Just don't use the same number for headcount decisions or external PR ('X% of our code is AI').

What's the smallest validation loop a team can start with?

Add one duplicate-code check to CI. jscpd, SonarQube duplications, or a simple grep in GitHub Actions all work. The 4x rise in copy-paste reported by GitClear is the fastest-emerging signal, and blocking it cuts the rest of your verification debt noticeably.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Antioch — Meet the Cursor for Robot AI

Physical AI startups no longer need to rent warehouses or build million-dollar test facilities. Antioch brings software-speed development to robotics through cloud simulation — and just raised $8.5M seed to prove it.

Explore more AI workflow guides on similar topics

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

morningbrew.com

Medvi telehealth, AI startup leverage, GLP-1 startup, one-person unicorn, AI operations

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

Matthew Gallagher built Medvi, a GLP-1 telehealth startup, in 14 months with $20,000 and AI tools. 2 employees. 16.2% net margin. $401M in year one. Here's how the model works — and where it's breaking.

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

substackcdn.com

What if your code review was already done when you woke up, and your newsletter

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

What if your code review was already done when you woke up, and your newsletter sources were already organized? Here's how to automate recurring tasks with Claude Code Scheduled Task.

Next →Antioch — Meet the Cursor for Robot AI