GPT-5.6 Is Locked: 3 Powerful Models Only 20 Firms Can Use

OpenAI shipped its strongest model yet on Friday, and almost nobody can use it. GPT-5.6 launched June 26 as a limited preview locked to roughly 20 companies. The reason GPT-5.6 is locked is the actual story here, and it has very little to do with the model’s coding scores.
Here is what changed, who gets access, what the three new GPT-5.6 models cost, and why Washington is now sitting between OpenAI and its own customers. It is a strange launch. Worth understanding in full.
What GPT-5.6 actually is
GPT-5.6 is not one model. It is a family of three, and OpenAI named them after the sun, the earth, and the moon. Sol is the flagship, the most capable of the set. Terra is the middle option built for everyday work, priced to undercut the previous generation while keeping competitive performance. Luna is the cheap, fast one for high-volume jobs where you care more about speed and cost than raw intelligence.
The naming is a deliberate reset. OpenAI says the number now marks the generation, while Sol, Terra, and Luna mark durable capability tiers that can each move forward on their own schedule, so a future GPT-5.7 Sol would simply be the next flagship and you would always know roughly where any given model sits without memorizing a tangle of version numbers. It is cleaner than the GPT-5.3-Instant, GPT-5.4, GPT-5.5 ladder that came before it. Overdue, honestly.
Two new controls ship with Sol. There is a max reasoning effort that gives the model the most time to think through a hard problem, and an ultra mode that goes past what a single agent can do by spinning up subagents to split complex work in parallel. If you have used the older reasoning sliders, think of max as the deepest single-threaded setting and ultra as the model delegating to copies of itself.
The numbers OpenAI chose to show
OpenAI did not drop a full benchmark suite for GPT-5.6. The official preview announcement shared a deliberately narrow slice covering coding, biology, and cybersecurity, and promised the expanded results when the model goes wide. Read that choice as a signal. These three areas are where the gains are largest and where the safety questions are sharpest.
On coding, Sol sets a new high on Terminal-Bench 2.1, the test that measures command-line work needing planning, iteration, and tool coordination, the kind of multi-step agentic task that decides whether a coding assistant is actually useful in a real terminal rather than a demo. That matters for tools like Codex. Not a toy puzzle.
On biology, Sol beats GPT-5.5 on GeneBench v1, a test of long-horizon genomics and quantitative-biology analysis, and it posts that win while burning fewer tokens than its predecessor needed to score lower. Cheaper and better on the same task is the combination that moves production decisions.
The cybersecurity results are where it gets interesting. On ExploitBench, Sol is competitive with Anthropic’s Mythos Preview while using only about a third of the output tokens. OpenAI is careful to frame this as a defender’s tool. Its line is that Sol is better at helping people find and fix vulnerabilities than at reliably running an end-to-end attack, and that the model does not cross the “Cyber Critical” threshold in its own preparedness framework. In testing against Chromium and Firefox, it found bugs and exploitation building blocks but did not autonomously produce a working full-chain exploit under the conditions tested.
That hedge, “under the conditions tested,” is doing a lot of work, and OpenAI knows it. The company admits benchmark thresholds cannot capture every way a model gets used or chained with other tools. That uncertainty is the bridge to the part of this launch that actually matters.
Why you can’t have GPT-5.6 yet
OpenAI wanted a broad release. It did not get one. At the request of the U.S. government, the company is starting with a limited GPT-5.6 preview for a small group of trusted partners whose participation has been shared with Washington. Axios put the number at around 20 companies, with OpenAI hoping to expand access the following week and reach a broad release in the coming weeks.
Sit with that for a second. The federal government is now reviewing a frontier American AI model before its own developer is allowed to sell it widely, vetting the buyers, shaping the timeline, and effectively deciding who gets early access to GPT-5.6 before a single broad customer can sign up. Not regulating it after a harm. Reviewing it before launch and approving the customer list. That is a different posture than anything the industry operated under a year ago, and OpenAI clearly does not love it.
The company said so directly in its announcement. It does not believe this kind of government access process should become the long-term default, arguing it keeps the best tools away from the users, developers, enterprises, and cyber defenders who need them. OpenAI framed the limited preview as a short-term step it is accepting because it sees that as the fastest path to a broad release, while it works with the administration on a cyber Executive Order framework and a repeatable process for future launches.
In plainer terms: OpenAI is cooperating while making it obvious it considers the arrangement temporary and not ideal. Sam Altman reportedly previewed GPT-5.6 with the White House over the past month, including in early-June meetings. The company expected it might need to stagger the rollout. It did not expect the government to effectively approve each customer and cap the launch at around 20 partners.
This already happened to Anthropic
If the government-gated launch feels familiar, that is because OpenAI is not the first to hit it. The same restrictions landed earlier on Anthropic’s most capable models, Fable 5 and Mythos 5, which were export-controlled and pulled from broad public access.
The meaningful shift is what OpenAI itself pointed out. Anthropic is no longer being singled out. When only one lab faced pre-release government review, you could read it as a one-off tied to that company’s specific safety posture. With OpenAI now in the same position, it looks less like a special case and more like the emerging default for any model with serious cyber capability. That is the real headline. It is bigger than GPT-5.6.
What it costs
GPT-5.6 pricing is set per million tokens, and the three tiers spread out cleanly. Sol runs $5 for input and $30 for output, Terra is half that at $2.50 input and $15 output, and Luna sits at the bottom at $1 input and $6 output. A simple ladder.
For context, Sol’s $5 in / $30 out matches the GPT-5.5 pricing we broke down in our [GLM-5.2 versus GPT-5.5 cost comparison](INTERNAL: GLM-5.2 GPT-5.5 pricing), where the open-weights challenger came in at roughly a sixth of the cost. So Sol is not cheaper than the last flagship at the top end. The savings show up in the middle: Terra delivers competitive-to-GPT-5.5 performance at half the price, which is the tier most teams will actually standardize on.
GPT-5.6 also reworks prompt caching. There are now explicit cache breakpoints and a 30-minute minimum cache life, which makes caching behavior more predictable than the old implicit system. Cache writes bill at 1.25x the uncached input rate, and cache reads keep the 90% discount. If you run repetitive prompts with a stable prefix, that 30-minute floor is worth designing around.
There is one more availability note worth flagging. OpenAI plans to launch Sol on Cerebras hardware at up to 750 tokens per second in July, starting with select customers. Frontier intelligence at that speed changes which use cases are viable, especially anything interactive where latency kills the experience.
How OpenAI is trying to keep Sol safe
The safety stack is the other half of why this launch is staggered. It is built in layers, because no single guardrail holds against a determined attacker. OpenAI trained refusals into the model for prohibited cyber assistance, including attempts to disguise intent or jailbreak it. On top of that sit real-time classifiers for cyber and biology misuse that watch output as it is generated. For higher-risk cases, generation can pause mid-stream while a larger reasoning model reviews the full conversation, and a disallowed result gets withheld before it ever reaches you.
Beyond a single chat, flagged activity can trigger account-level review across a user’s conversations and risk signals. The stated goal is to tell persistent malicious behavior apart from legitimate dual-use security work, where the same technical concepts show up in very different contexts. OpenAI is upfront that during the preview, legitimate users may hit blocks, refusals, or slower responses when generation pauses for review. Testing whether real work still flows smoothly is part of what the preview is for.
The scale of the red-teaming is the detail that stuck with me. OpenAI says it spent over 700,000 A100-equivalent GPU hours on automated red-teaming aimed at finding universal jailbreaks, the kind of attack that works across many prompts rather than one narrow case, throwing more compute at breaking its own model than most labs spend training a mid-sized one. Pointing that much firepower at your own system before release is a real commitment. It is the sort of number that is hard to fake or hand-wave. It also tells you how seriously OpenAI takes the cyber risk it is publicly downplaying.
What to watch next
The clock matters. Under the AI Executive Order, the administration has until August to stand up a classified process for assessing AI cyber capabilities and deciding which systems count as “covered frontier models.” Right now OpenAI and Anthropic are operating in the gap between a process that has been announced and one that actually exists. That is why this launch looks improvised: there is no settled rulebook yet, just a request and a handshake.
So the question is not really whether GPT-5.6 is good. The narrow benchmarks say it is a real step up on coding, biology, and cyber. The real question is what the rules look like in August. And whether “the government approves your customer list” becomes a normal part of shipping a frontier model in the United States. OpenAI is betting it does not. The next two months will tell us whether that bet holds.
Frequently asked questions
When will GPT-5.6 be available to everyone? OpenAI says it is aiming for a broad GPT-5.6 release in the coming weeks, with access expanding to more companies as soon as the week after the June 26 launch. There is no firm public date, because the timeline depends partly on the government’s additional testing period and the cyber framework still being written.
What is the difference between Sol, Terra, and Luna? Sol is the flagship and the most capable, Terra is the balanced everyday model that roughly matches the previous generation’s performance at a lower price, and Luna is the fast, low-cost option for high-volume work. The GPT-5.6 number marks the generation, and the names mark capability tiers that advance on their own schedules.
Why did the government restrict GPT-5.6? The Trump administration cited national security concerns tied to the model’s cybersecurity capabilities and asked OpenAI to limit the rollout to a vetted partner list. The same kind of restriction previously applied to Anthropic’s Fable 5 and Mythos 5 models.
Is GPT-5.6 dangerous? OpenAI says Sol does not cross the “Cyber Critical” threshold in its preparedness framework and is better at finding and fixing vulnerabilities than at running full attacks autonomously. It pairs the model with a layered safety stack and acknowledges the benchmarks cannot capture every possible misuse.
How much does GPT-5.6 cost? Per million tokens: Sol is $5 input and $30 output, Terra is $2.50 input and $15 output, and Luna is $1 input and $6 output. Prompt caching reads keep a 90% discount, while cache writes cost 1.25x the uncached input rate.
[…] LLM is the underlying engine, the trained neural network. A product like ChatGPT, Claude, or Gemini is an application built around an LLM, with a chat interface, safety guardrails, […]