AI Coding Assistants Are Getting Worse 기사 헤더

spectrum.ieee.org

Are AI Coding Assistants Actually Getting Worse? The Silent Failure Problem

IEEE Spectrum and Hacker News developers report declining AI coding tool qualityDev

AI Coding Assistants Are Getting Worse

AI coding assistants are getting worse? | Hacker News

AI Is still making code worse: A new CMU study confirms

GPT-5 performed worse than GPT-4 at coding? Not a joke. IEEE Spectrum verified it.

TL;DR

New model ships → No crashes but → Silently wrong results → "Silent failures" surge → Debugging gets harder

What Is This?

An IEEE Spectrum analysis from January 2026, combined with a 700+ comment Hacker News discussion. Veteran developers report that newer AI models produce lower quality code than their predecessors.

IEEE Spectrum's key finding is "Silent Failures." Older models would crash obviously when wrong. Newer models generate code that runs without errors but produces incorrect results. Harder-to-find bugs are on the rise.

Tests showed GPT-5 underperforming GPT-4 in certain coding scenarios. A CMU team analyzing 800+ popular GitHub projects also confirmed code quality degradation after AI tool adoption.

Anthropic's own research is telling too — AI-assisted coding made experienced developers 19% slower. One study, specific conditions, but it challenges the "AI always speeds things up" assumption.

What Changed?

	Earlier Models (2024-early 2025)	Latest Models (late 2025-2026)
Failure type	Crashes/errors (visible)	Silent failures (runs fine)
Debug difficulty	Traceable via error messages	Logic errors, hard to trace
Acceptance rate	Lower but accurate code	Higher but subtly wrong code
Developer experience	"If it breaks, I know immediately"	"Thought it worked, results are off"

Why is this happening? Goodhart's Law is at work. Models optimize for "code the user accepts." Since users accept code that runs, models are optimized for "code that runs" — not "code that's correct." A vicious cycle.

DORA Research (Google DevOps) raised similar concerns — over-reliance on AI tools may degrade developers' deep learning ability (human learning, not machine learning!).

Anthropic's Research Finding

Experienced developers using AI coding assistants took 19% longer to complete tasks than those without. The "AI always equals faster" assumption needs revisiting.

How to Deal With This Realistically

Don't trust AI code blindly
"It runs" and "it's correct" are different things. Always review logic, especially edge cases and boundary conditions.
Increase testing
Test coverage is key to catching silent failures. Have AI write tests too, then review test quality as well.
Pin model versions
Newest isn't always best. If a model version works well for your project, pin the API version.
Be specific in prompts
Instead of "write this function," try "function that takes X, returns Y, handles Z exceptions. TypeScript, with error handling." Specificity improves quality.
Strengthen code review
Code review is the last line of defense, AI or human code alike. Auto-merging AI-generated PRs is still risky.

1/3

Goodhart's Law

"When a measure becomes a target, it ceases to be a good measure." Acceptance rate optimization leading to quality degradation.

2/3

Silent Failures Are Worse

Code that crashes is less dangerous than code that runs but gives wrong results. The latter gets caught much later in production.

3/3

AI + Human Verification = Best

AI excels at draft generation, but final verification remains human territory. Teams that nail this balance win.

🔗

Go Deeper

IEEE Spectrum: AI Coding Degrades

The first systematic analysis of the silent failure phenomenon

Hacker News Discussion (700+ comments)

Real developer experiences and counterarguments

CMU Study: AI Makes Code Worse

Analysis of 800+ GitHub projects post-AI adoption

Counterpoint: Not Worse, Misused

Goodhart's Law and user habit analysis

DORA: Balancing AI Tensions

Google DevOps Research on AI tool effectiveness

FAQ

Is any specific AI coding tool worse than others?

IEEE Spectrum found GPT-5 underperforming GPT-4 in some cases. HN users flagged both Copilot and Cursor. Claude Code was noted as relatively consistent. It's more of a generational model issue than a specific tool problem.

Should junior developers avoid AI coding tools?

Not at all. But they need the ability to critically read AI-generated code. Copy-pasting without understanding prevents learning and misses silent failures. Treat AI as a colleague, not a teacher.

Is vibe coding more vulnerable to this?

Yes, relatively. Vibe coding that skips code review is particularly susceptible to silent failures. Production deployment always needs expert code review.

Will this get fixed eventually?

Long-term, yes. As training methodologies shift from acceptance-rate optimization to accuracy optimization, and better benchmarks emerge. Short-term, developer verification habits are the best defense.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Antioch — Meet the Cursor for Robot AI

Physical AI startups no longer need to rent warehouses or build million-dollar test facilities. Antioch brings software-speed development to robotics through cloud simulation — and just raised $8.5M seed to prove it.

Explore more AI workflow guides on similar topics

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

morningbrew.com

Medvi telehealth, AI startup leverage, GLP-1 startup, one-person unicorn, AI operations

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

Matthew Gallagher built Medvi, a GLP-1 telehealth startup, in 14 months with $20,000 and AI tools. 2 employees. 16.2% net margin. $401M in year one. Here's how the model works — and where it's breaking.

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

substackcdn.com

What if your code review was already done when you woke up, and your newsletter

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

What if your code review was already done when you woke up, and your newsletter sources were already organized? Here's how to automate recurring tasks with Claude Code Scheduled Task.

Next →Antioch — Meet the Cursor for Robot AI