Fable Knows. AI & Tech, decoded
Tools & Apps

Anthropic’s Claude Fable 5 Just Beat Pokemon FireRed Using Only Screenshots.

By Ved Vyas June 12, 2026 12 min read Updated June 16, 2026

It picked Charmander. Grinded it to level 78. Steamrolled the Elite Four.

Claude Fable 5, Anthropic’s newest frontier model, completed Pokemon FireRed from start to finish on June 9, 2026, using nothing but raw game screenshots. No maps. No RAM reading. No helper tools. Just a sequence of images and the model deciding what buttons to press.

The run was captured as a timelapse. Community analysis spotted the highlights pretty fast: a monstrous, over-leveled Charizard torching everything in its path, and at least one moment where Fable 5 burned a Revive on a level 3 Pikachu for no apparent reason, then sent that same Pikachu directly back into battle.

Tactically incoherent. Strategically victorious.

And that gap between “messy execution, real result” is exactly why the Pokemon demo matters beyond the obvious novelty.


What Actually Changed Here

Previous Claude models tried to play Pokemon too. They failed, or needed significant assistance to make any progress. The difference was scaffolding: complex helper harnesses that gave the model additional information about game state, maps, and context it couldn’t extract from the screen itself.

Fable 5 needed none of that.

According to Anthropic’s June 9 announcement, the model beat FireRed with a “minimal, vision-only harness,” reading raw screenshots the way a human player would, interpreting menus, tracking its position, and deciding on actions purely from what appeared on screen.

That sounds like a party trick. It’s not.

Vision-only game completion requires something technically demanding: the ability to reliably parse dynamic visual interfaces across a very long horizon, hold goals in mind across hours of play, and recover when things go wrong. These are the exact same skills that computer-use agents need when they operate real software through the screen. The gap between “AI can play Pokemon using only vision” and “AI can navigate your company’s internal tools by watching the screen” is smaller than most people realize.

Fable 5 also played Slay the Spire during testing. With access to persistent file-based memory, it reached the game’s final act three times more often than Opus 4.8 did. The memory scaffolding mattered more for Fable 5 than for older models, which suggests the model is actually using that context rather than just accumulating tokens.


The Real Announcement Underneath the Pokemon Story

The Pokemon run was Anthropic’s most viral demonstration. The actual announcement was bigger.

Fable 5 is a Mythos-class model. That matters because “Mythos-class” refers to a capability tier that sits above the Opus class, and the first model in that tier, Claude Mythos Preview, was released in April only to a restricted group of cyberdefenders and critical infrastructure providers. Anthropic judged it too risky for public access.

Fable 5 is the same underlying model as Mythos 5. The distinction between the two is a set of safety classifiers layered on top: separate AI systems running alongside Fable 5 that intercept certain queries and hand them off to Claude Opus 4.8 instead. When those classifiers fire, users get an Opus 4.8 response and a notification explaining what happened.

Anthropic says the classifiers trigger in fewer than 5% of sessions on average. Over 95% of Fable 5 usage runs at full Mythos-class capability. The fallback exists for three specific categories.

Cybersecurity: Mythos-class models can conduct multi-step offensive operations autonomously, finding vulnerabilities, moving through networks, and executing attack sequences without human guidance at each step. Anthropic’s internal red-team tests found that Fable 5 complied with zero harmful single-turn cyber requests across 30 different public jailbreak techniques. An external bug bounty running over 1,000 hours found no universal jailbreaks. The UK’s AI Safety Institute made partial progress toward one during an initial testing window, which Anthropic disclosed in the announcement.

Biology and chemistry: Mythos-class models outperformed dedicated protein language models on predicting how genetic modifications affect viral shell assembly, without being trained specifically for that task. Anthropic calls out adeno-associated viruses (AAVs) specifically: the same capability useful for designing gene therapies could theoretically be used to engineer dangerous pathogens. The classifiers here are deliberately broad, trading false positives for safety margin.

Distillation: Anthropic has previously identified large-scale attempts to extract Claude’s capabilities to train competing models, including from what the company describes as authoritarian countries. Requests that look like systematic capability extraction fall back to Opus 4.8.


What Fable 5 Actually Does

Software engineering is the most concrete demonstration so far.

Stripe reported that Fable 5 completed a codebase-wide migration across 50 million lines of Ruby in a single day. Anthropic’s estimate is that the same migration would have taken a full engineering team more than two months by hand. A 50-million-line codebase migration isn’t just about writing code: it requires understanding how thousands of interconnected components relate to each other, finding all the places a pattern appears, and making coherent changes across a massive surface area without breaking what’s connected.

On Cognition’s FrontierCode evaluation, which tests whether models can complete difficult coding tasks while meeting production-quality standards, Fable 5 scores highest among frontier models, even at medium effort.

On Hebbia’s Finance Benchmark for senior-level analytical reasoning, Fable 5 ranked highest among all tested models. IMC, a Dutch trading firm, reported that Fable 5 handled factual lookup, conceptual reasoning, root-cause analysis, and expected-value calculations with near-perfect results across their internal evaluations.

Vision is a separate story. Fable 5 can extract precise numbers from scientific figures, which sounds minor until you consider the workflow it unlocks: a model that reads figures accurately can engage with scientific literature directly, not just with prose summaries. For literature reviews, meta-analyses, and data recovery from published papers where the underlying datasets are unavailable, that’s a genuine capability jump.

Screenshot-to-application is the other vision capability Anthropic highlights. Show Fable 5 an interface screenshot, it reconstructs the application, including layout, components, and behavior. The combination of that with Fable 5’s software engineering performance creates a loop: screenshot a legacy tool, get working code.


The Biology Results Are the Part People Are Sleeping On

The Pokemon story got the coverage. The genomics results deserve more attention.

Mythos 5, which is Fable 5 with cybersecurity classifiers removed for vetted partners, conducted more than a week of largely autonomous genomics research. It assembled single-cell data covering millions of cells across 138 animal species, designed and trained a custom machine learning model to identify cells performing the same biological role in distantly related organisms, and produced results that Anthropic says outperformed a model recently published in Science, a top peer-reviewed journal, despite being 100 times smaller.

That last comparison is important. Size-to-performance ratios in ML research are a meaningful signal. A 100x smaller model beating a published benchmark means something about how the model is using information, not just about raw compute.

In drug design, using protein design tools and no human assistance at each step, Mythos 5 identified strong drug design candidates for 9 out of 14 protein targets, covering immune checkpoints, growth-factor signaling, neurodegeneration, muscle disease, and harder structural targets. One molecular biology hypothesis the model generated was independently confirmed by a separate lab working on the same E. coli protein mechanism.

Anthropic’s internal protein design experts reported around a 10x acceleration in aspects of the drug design process. “Matches or beats skilled human operators” is the phrase in the announcement, with the caveat that the comparison involves specific defined tasks rather than the full breadth of what a research scientist does.


Mythos 5 and Project Glasswing

Mythos 5, the unrestricted version, remains limited to Project Glasswing partners: cyberdefenders and critical infrastructure providers working in collaboration with the US government.

Glasswing launched in April with Claude Mythos Preview and has grown to roughly 150 organizations across 15 countries. Today’s release upgrades those partners from Mythos Preview to Mythos 5, at pricing Anthropic describes as less than half of what Mythos Preview cost.

Both Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens.

An application process for cybersecurity organizations to join the trusted access program is coming. A parallel biology track is in development, which would give access to Fable 5 with biology and chemistry classifiers removed (but cyber classifiers intact) for a small group of life science researchers.

India was not part of Glasswing’s first cycle. The Print noted that Indian agencies including CERT-In and DRDO research bodies have no current pathway to Mythos-class capabilities. The application process Anthropic is building would represent the first structured route for institutions outside the initial Western partner group.


The Safeguard Tradeoff Is Honest About Its Costs

Something worth noting: Anthropic’s framing on the classifier system is unusually direct about what it gets wrong.

The announcement acknowledges that the safeguards are “tuned conservatively” and will “sometimes catch harmless requests.” The false positive rate in the less-than-5%-of-sessions figure is real but not zero. The company says it recognizes this will frustrate some users and commits to reducing false positives as the classifiers improve.

The alternative they rejected was not releasing Fable 5 publicly at all, which they chose not to do, and releasing it without classifiers, which they also chose not to do. The medium path is a model with genuine Mythos-class capability accessible to everyone, with a routing system that occasionally misidentifies benign requests and hands them to a less capable model.

Whether that tradeoff is calibrated correctly is genuinely uncertain. The UK AISI made partial progress on a universal jailbreak during their testing window, which Anthropic disclosed rather than buried. A universal jailbreak on a system with these capabilities would be a different kind of problem than jailbreaks on previous models. Anthropic’s stated goal is not jailbreak impossibility but making jailbreaks “sufficiently slow and costly that we can detect and prevent them before they are used at scale.”

The 30-day data retention policy is part of this. Business customers using Mythos-class models are required to accept data retention for all traffic. Anthropic commits to not using this data for training, requiring deletion after 30 days, and logging all human access. The reasoning: defending against sophisticated, multi-request attacks requires being able to see what those attacks look like across sessions.


Availability

Claude Fable 5 is available on the Claude API using the model string claude-fable-5. It’s also accessible on Amazon Bedrock and GitHub Copilot.

For Pro, Max, Team, and Enterprise subscription plans: Fable 5 is included at no extra cost through June 22, 2026. After that date, usage requires purchasing credits separately. Anthropic says it intends to restore Fable 5 to standard subscription plans once server capacity stabilizes, with no firm date given.

The access pattern here is a demand management problem, not a capability access one. Anthropic expects high volume and is staging rollout to avoid degraded performance for everyone.

For context: this is a model that just got reviewed by Cursor’s CEO as “the state of the art model on CursorBench” and opened up “a class of long-horizon problems that were out of reach for earlier models.” Demand expectations seem reasonable.


What the Pokemon Run Actually Tells You

Go back to the Charizard.

The AI picked one starter, trained it past any reasonable level, and brute-forced the endgame. That’s not strategic play. Any experienced player would tell you it’s a waste of potential. But it worked, which means the model understood the goal (defeat the Elite Four), found a reliable path to that goal (overwhelming type advantage and level advantage), and executed it consistently across hours of gameplay using only screen pixels.

The tactical blunders, the Revive on the level 3 Pikachu, the suboptimal team composition, those are real. They’re also not the point. The point is: the model saw a screen, understood what it needed to do, and kept doing it until the credits rolled.

Vision-only game completion is a proxy benchmark for something the industry actually cares about: AI agents that can navigate real software without needing custom APIs, memory read access, or human-built scaffolding for every new application. The implication for computer-use agents operating on enterprise software, legacy tools, and browser-based interfaces is direct.

A level 78 Charizard won a Pokemon game. What Fable 5’s vision system does with that same capability in production workflows is the thing worth paying attention to.


Frequently Asked Questions

What is Claude Fable 5?

Claude Fable 5 is Anthropic’s newest publicly available AI model, released June 9, 2026. It’s a Mythos-class model, the highest capability tier Anthropic has ever made generally accessible, with a set of safety classifiers that redirect certain queries to Claude Opus 4.8 instead of responding at full capability.

What is the difference between Fable 5 and Mythos 5?

Same underlying model. The difference is safeguards. Fable 5 has classifiers covering cybersecurity, biology and chemistry, and model distillation. Mythos 5 has those classifiers removed in specific areas and is restricted to government-vetted partners in Project Glasswing.

How did Claude Fable 5 beat Pokemon FireRed?

Using only raw game screenshots, no maps, no memory read tools, no helper scaffolding. The model read the screen, interpreted the game state, and chose actions. Its strategy was to over-level a single Charizard and power through the Elite Four rather than build a balanced team.

What is Slay the Spire used for in Claude’s testing?

Anthropic used Slay the Spire to evaluate memory and long-context performance. With persistent file-based memory, Fable 5 reached the game’s final act three times more often than Opus 4.8, showing the model uses long-context memory more effectively than predecessor models.

What does the Stripe 50-million-line codebase migration mean?

Stripe, the payments company, reported that Fable 5 completed a migration across 50 million lines of Ruby code in a single day. Anthropic estimates the same work would have taken a full engineering team more than two months. It’s the most concrete enterprise benchmark in the announcement.

How much does Claude Fable 5 cost?

$10 per million input tokens and $50 per million output tokens. This is less than half the price of Claude Mythos Preview. Available on Pro, Max, Team, and Enterprise plans at no extra cost through June 22, 2026.

How resistant is Fable 5 to jailbreaking?

External testing found it complied with zero harmful single-turn cyber requests across 30 public jailbreak techniques. An external bug bounty ran over 1,000 hours without finding a universal jailbreak. The UK AI Safety Institute made partial progress toward one during a brief initial testing window, which Anthropic disclosed.

What is Project Glasswing?

Anthropic’s program, run in collaboration with the US government, that provides access to Mythos-class models for cybersecurity defenders and critical infrastructure providers. Currently spans roughly 150 organizations across 15 countries.

Why does the data retention policy exist?

Anthropic requires 30-day retention on all Mythos-class model traffic from business customers. The stated purpose is to detect and respond to sophisticated multi-request attacks, including novel jailbreaks and coordinated misuse patterns that only become visible across sessions. The data is not used for training and is deleted after 30 days.

What does the genomics research result mean?

Mythos 5 assembled single-cell data across 138 animal species and trained a custom machine learning model that outperformed a recently published model from the journal Science, despite being 100 times smaller. Anthropic intends to publish the full results.

Ved Vyas

Writer at Fable Knows, covering AI and the technology shaping everyday life.

Leave a Reply

Your email address will not be published. Required fields are marked *