AI 에이전틱 패턴 4년 진화 타임라인 — 프롬프트에서 하네스 엔지니어링까지

images.ctfassets.net

From Prompts to Harnesses — How We Work with AI Changed Three Times in Four Years

harness engineering, AI agents, prompt engineering, context engineering, Mitchell HashimotoDev

프롬프트에서 하네스까지 — AI 에이전틱 패턴 4년의 기록

GeekNews Weekly #353: 스킬이 쏟아지는 시대, 내 하네스는 내가 만든다

Harness Engineering: Leveraging Codex in an Agent-First World

There was a time when "hey AI, do this for me" was all it took. That was barely two years ago. Now when an agent makes a mistake, you fix the system, not the agent — and that's just common sense. Prompt engineering to context engineering to harness engineering. Three paradigm shifts in just four years.

TL;DR

Prompt (what to say) → Context (what to show) → Harness (what system to build) → Meta-harness (let agents build their own systems)

What Is This?

Jonas Kim (AWS Korea Data Scientist) published "Four Years of AI Agentic Patterns" and it's been making waves across the dev community. It's not just a trend summary — it reads more like a post-mortem tracking why each era failed.

Here's the core story. In 2022, we obsessed over "what should I say to the AI?" We believed that crafting the perfect prompt was everything. By 2025, the question became "what information should I feed it?" We realized that what you put in the context window mattered far more than the prompt itself. And in 2026, it became "what system should I build around it?".

The origin story is fascinating. The term "harness engineering" was coined in February 2026 by Mitchell Hashimoto, creator of Terraform. His principle was dead simple: "Every time the agent makes a mistake, engineer the environment so it can't make that specific mistake again." You don't fix the prompt. You fix the system around the agent — the rules, tools, constraints, and feedback loops.

What happened next was remarkable. Within two weeks, OpenAI, Martin Fowler (ThoughtWorks), and Ethan Mollick (Wharton professor) independently published the same conclusion. No coordination. It was a "multiple discovery" — everyone had hit the same wall.

The Core Formula

Agent = Model + Harness
A harness is "everything in the agent except the model itself." The model is the engine, the harness is the operating system. Even the most powerful engine is useless without an OS.

What Actually Changes?

Chad Fowler (Honeycomb CTO) nailed it in one sentence: "Rigor doesn't disappear. It relocates." When XP advocated "test code instead of design docs," and when dynamic languages shipped without compilers, critics always said rigor was being abandoned. They were wrong every time. Rigor just moved to a higher level of abstraction.

Dimension	Prompt Engineering (2022)	Context Engineering (2025)	Harness Engineering (2026)
Core Question	"What do I say?"	"What do I show?"	"What system do I build?"
Metaphor	Writing an email	Managing your inbox	Designing the email system
Failure Mode	Blind prompting	Context pollution	Orchestration bugs
Key Metric	Response quality (subjective)	KV-cache hit rate	Task completion rate
Skills Required	Language intuition	Information architecture	System design + security

The most critical insight: each era doesn't replace the previous one — it subsumes it. Harness engineering contains context engineering, which contains prompt engineering. "Prompt engineering is dead" is wrong. It didn't die — it got promoted into a submodule of a larger system.

Real-world results make this concrete. OpenAI built a million-line codebase with zero manually-written code. Seven engineers spent five months writing not a single line — instead, they designed the environment for stable code generation. Anthropic split their agent into a 3-agent architecture: one plans, one generates, one evaluates.

How to Start Right Now

Make AGENTS.md a Table of Contents
Instead of a massive instruction manual, create a 100-line map. Let agents find what they need themselves. OpenAI's lesson: "Give Codex a map, not a 1,000-page manual".
Update the Harness on Every Mistake
When agents repeatedly run wrong commands, update AGENTS.md. When they make structural mistakes, add linters or tests. Hashimoto's core principle: "Don't blame the agent — improve the harness".
Introduce Mechanical Enforcement
Documentation alone isn't enough. Use custom linters and structural tests to enforce architecture rules. Make agents fix their own code to pass the linter.
Design Rippable Harnesses
When models improve, half your error recovery logic becomes unnecessary. Harnesses must be rippable. Over-engineering today becomes tomorrow's technical debt.
Always Have an Agent Running
Hashimoto's goal: always have an agent running. Start by kicking off agents in the last 30 minutes of your workday and reviewing results the next morning.

Security Alert: The Lethal Trifecta

Simon Willison warns that when these three exist simultaneously, security incidents are inevitable: (1) Processing untrusted external input (2) Access to sensitive data (3) Ability to change state. Meta's "Rule of Two" — only allow agents a maximum of two of these at once.

🔗

Go Deeper

From Prompts to Harnesses — Original Article

The core seed article tracing four years of evolution with academic rigor

My AI Adoption Journey — Mitchell Hashimoto

The post that coined harness engineering. A practical 6-step AI adoption guide

Harness Engineering — Martin Fowler

ThoughtWorks deep analysis proposing the feedforward/feedback 2x2 framework

Harness Engineering — OpenAI

How the Codex team built a million lines of code with zero manual coding

The Missing Layer Behind AI Agents — Louis Bouchard

The clearest explanation of prompt vs context vs harness engineering

GeekNews Weekly #353 — Skills Flood, Build Your Harness

Curated Korean developer perspective on the harness engineering wave

FAQ

Does this mean learning prompt engineering was a waste?

Not at all. Prompt engineering didn't die — it became a submodule of harness engineering. Good harnesses still require good prompts. The point is that prompts alone aren't enough anymore.

Can non-developers do harness engineering?

Yes. Instruction files like AGENTS.md are plain Markdown — no coding needed. If you're recording mistakes and adding rules, you're already doing harness engineering.

Do I have to rebuild the harness every time the model changes?

Partially. That's why Martin Fowler emphasizes the rippable principle. When Claude 5 drops, half the error recovery logic for Claude 4.5 may become unnecessary. The key is designing for easy removal.

What comes after harness engineering?

The original article points to Guardian Agents (real-time oversight layers), evaluation engineering, and knowledge engines. Meta's HyperAgents research — where agents design their own harnesses — is already underway.

Written by Rush

Tracking where business meets AI.

Did you find this reference helpful?

Get curated references delivered to your inbox weekly

Share this reference

Antioch — Meet the Cursor for Robot AI

Physical AI startups no longer need to rent warehouses or build million-dollar test facilities. Antioch brings software-speed development to robotics through cloud simulation — and just raised $8.5M seed to prove it.

Explore more AI workflow guides on similar topics

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

morningbrew.com

Medvi telehealth, AI startup leverage, GLP-1 startup, one-person unicorn, AI operations

$20K and 12 AI Tools Built a $1.8B Telehealth Company — And Then the Red Flags Arrived

Matthew Gallagher built Medvi, a GLP-1 telehealth startup, in 14 months with $20,000 and AI tools. 2 employees. 16.2% net margin. $401M in year one. Here's how the model works — and where it's breaking.

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

substackcdn.com

What if your code review was already done when you woke up, and your newsletter

AI That Works While You Sleep — Automating Recurring Tasks with Claude Code Scheduled Task

What if your code review was already done when you woke up, and your newsletter sources were already organized? Here's how to automate recurring tasks with Claude Code Scheduled Task.

Next →Antioch — Meet the Cursor for Robot AI