OpenAI's next-gen image model showed up before it was ever officially announced. GPT-Image-2 was caught testing on LM Arena under three codenames — and it looks like the long-standing problem of AI images butchering text might finally be fixed.

What Is It?

GPT-Image-2 is OpenAI's upcoming image generation model — not officially announced yet. In early April 2026, it appeared on LM Arena (the blind AI model evaluation platform) under three codenames: maskingtape-alpha, gaffertape-alpha, and packingtape-alpha. It was pulled within hours.

Developer Pieter Levels (@levelsio) was the first to identify the models, which set off a wave of community-captured outputs. Two things stood out:

  • Text rendering: Text placed inside images comes out sharp and accurate
  • World knowledge: It knows what real brands, interfaces, and objects actually look like

And the yellow tint that plagued GPT-Image-1 appears to be gone.

What Changes?

FeatureGPT-Image-1.5 (Current)GPT-Image-2 (Leaked)
Architecture4o-basedBrand-new independent architecture
Text rendering accuracy~95%99%+ (estimated)
ColorYellow tint presentNatural color, tint removed
PhotorealismHighNear-photographic
World knowledgeGoodSignificantly improved (brands, UIs, handwriting, etc.)
Aspect ratio support1:1, 3:2, 2:316:9 widescreen confirmed

Here's the thing — AI image models have always had three glaring weaknesses: garbled text, mangled hands, and inaccurate real-world objects. GPT-Image-2 appears to tackle all three at once.

What the Community Actually Made

Images generated during the blind test spread across the community — and people couldn't tell they were AI-made.

  1. IKEA store at night
    Mistaken for an actual photograph. Sign fonts, lighting, and entrance signage were all reproduced accurately.
  2. YouTube & Windows UI
    Accurate enough to pass as a real screenshot. Button text and layout matched the real thing.
  3. Medical handwritten notes
    Handwriting that looks like it came from an actual person. Previous models couldn't come close.
  4. Clock hand test
    Set a specific time, and the hands point to exactly the right position. Nano Banana Pro failed this one.
  5. Comic book panels
    Spider-Man and Batman costume details rendered accurately, with readable speech bubbles.

Why This Is a Genuine Step Change

While GPT-Image-1.5 was built on 4o (GPT-4 Omni), analysts suggest GPT-Image-2 runs on a completely new architecture. That means this isn't just an incremental upgrade — it's closer to a generational shift.

There's also relevant context: OpenAI shut down Sora on March 24, 2026. Inference costs were running $15 million a day, and the GPU resources freed up from that shutdown are believed to have been redirected to GPT-Image-2 training and inference.

Getting Started

GPT-Image-2 isn't officially out yet, but some ChatGPT users are reportedly getting it through A/B testing. Here's what you can do right now.

How to check if you're on GPT-Image-2
Add "Format 16:9" at the end of your prompt. If you get a 16:9 wide image with sharp text and no yellow tint, you're on the new model.

  1. Keep generating complex images
    Generate text-heavy posters, infographics, or UI screenshots in ChatGPT Images 5–15 times in a row. This reportedly increases your odds of being routed to the new model.
  2. Prep your text-rendering use cases
    Line up the tasks where text accuracy matters most: product mockups, social media cards, presentation slides.
  3. Run your own benchmark
    Use the same prompts across Nano Banana Pro, Midjourney V7, and Ideogram 3.0 to feel the difference firsthand.
  4. Plan for the API launch
    GPT-Image-1.5 runs $0.133 per high-quality 1024×1024 image via API. Given the new architecture, GPT-Image-2 may land slightly higher — around $0.15–0.20.

Competitive Landscape at a Glance

ModelKey Strengthvs. GPT-Image-2
Nano Banana ProGoogle compute, early-mover advantageCame out behind GPT-Image-2 in multiple blind tests
Midjourney V7Artistic style, strong communityTrails on photorealism and text rendering
FLUX ProOpen-source, local deploymentGap in world knowledge and complex scene handling
Ideogram 3.0Specialized in text renderingGPT-Image-2 leads on overall capability, not just text

Deep Dive Resources

How LM Arena Blind Testing Works

LM Arena is a platform where users compare two outputs without knowing which model made them. Because Elo scores are based on pure performance — no marketing, no branding — a high score here is real validation. OpenAI used the same approach in December 2025, testing GPT-Image-1.5 under the codenames Chestnut and Hazelnut before releasing it.

Sora's Shutdown and the GPU Reallocation

Sora shut down in March 2026, with peak inference costs hitting $15 million a day. Total in-app revenue over its entire lifetime? Just $2.1 million. Sam Altman said OpenAI would redirect compute toward next-gen automation research and enterprise applications — and GPT-Image-2 looks like one of the primary beneficiaries.

Multilingual Text Rendering

Turkish-speaking users tested GPT-Image-2's ability to render non-Latin characters and reported a noticeable improvement over previous models. Similar gains are expected for Korean, Arabic, and other scripts.