Sakana Fugu: orchestration as a product, the verification stays with you

Agentic Engineering
Automation
Verification

On 22 June, Sakana AI introduced Sakana Fugu, a multi-agent system that behaves like a single model from the outside. You send a request to one endpoint, and Fugu decides for itself whether to answer directly or assemble and coordinate a team of specialised models. Model selection, delegation, verification, and synthesis all run internally. The wiring you would otherwise build yourself becomes one API call. That is the real news: not a new frontier model, but a product that sells the orchestration.

What is Sakana Fugu?

By Sakana’s own account, Fugu is itself a language model trained to call other models in an agent pool, including recursive calls to instances of itself. It is meant to know when to delegate, how the agents should communicate, and how to combine their partial results into one answer. From the outside you call one model. Inside, a coordinated system of experts does the work. Sakana points to two of its own ICLR 2026 papers, Trinity and the Conductor, as the basis for this learned orchestration.

Access runs through a single OpenAI-compatible API. That lowers the bar to almost nothing: anyone already calling a chat endpoint swaps the base URL and has an agent ensemble behind the same call. Sakana mentions a beta program with close to 500 early users and lists the product as generally available now, with subscription tiers and a pay-as-you-go plan for heavier loads.

Fugu and Fugu Ultra

There are two models. Fugu balances performance against low latency and is meant as a default for everyday work, for instance in coding tools like Codex or in interactive services. Teams with data, privacy, or compliance requirements can opt specific agents out of the pool. Fugu Ultra is tuned for maximum answer quality on hard, multi-step problems and pulls in a deeper pool of experts. Sakana names AI research, paper reproduction, security analysis, and literature and patent investigations as typical uses.

The benchmarks call for caution, and Sakana is unusually candid here. The company claims that Fugu Ultra stands shoulder-to-shoulder with leading models like Anthropic’s Fable 5 and Mythos Preview across demanding engineering, scientific, and reasoning benchmarks. All scores other than Fugu’s are reported by the respective providers, so they are not independently measured. And Fable 5 and Mythos Preview are explicitly not part of Fugu’s pool, because they are not publicly accessible. That is no accident. It is the heart of the sales story.

Why does Sakana sell this as sovereignty?

Sakana frames Fugu not as mere convenience but as a hedge against single-vendor dependency. Running critical infrastructure, finance, or governance on a single API is a real weakness. Sakana points directly at the export controls that hit Anthropic’s Fable and Mythos models. I described exactly this risk in the recall of Fable 5: a model is not a foundation when access can vanish overnight.

The promise: if a provider drops out, Fugu dynamically routes around it, because the models in the pool are swappable. That is an honest argument. But it only shifts the dependency. Instead of one model provider, you now depend on Sakana’s orchestration layer, which decides which model gets to see your request. The concentration does not disappear, it moves up a level.

What gets abstracted away leaves your sight

This is the point that interests me as a practitioner. An orchestration I build myself is work, but it is visible. I see which agent gets which task, where a step fails, how a result is checked. Reliability does not come from the single model but from the loop around it, which is what I described in loop engineering. Fugu takes that loop off my hands. With it, it takes away my view into it.

Sakana says itself that model selection, delegation, verification, and synthesis run internally, “so the complexity of a multi-agent system never reaches your code”. That is meant as comfort, and for many applications it is exactly that. But verification is not just any step. It is the one that decides right from wrong. When a provider handles it internally and reports it as passed, you no longer check whether the answer is correct, you trust that someone else checked. For a code review, a security analysis, or a patent search, that is a difference that matters.

What does this mean for you?

Fugu is a good tool for tasks where a coordinated ensemble is measurably better than a single call and where you check the result at the end anyway. That is exactly where the abstraction earns its place: you save the wiring and keep control over what counts. It gets risky when you mistake the internal verification for your own. A product that takes over the orchestration does not take over the responsibility. That stays with you, and it does not get lighter just because an API call got smaller. Treat Fugu as a very capable supplier whose output you sign off on, not as an authority that replaces your check.