Computer Use in Gemini 3.5 Flash: not the capability, the safeguards

Google
Agentic Engineering
Coding Agents
Automation
Verification

On 24 June, Google announced that computer use becomes a built-in tool in Gemini 3.5 Flash. An agent can now see, reason, and act across browser, desktop, and mobile, and it does so directly in the fast mainline model. That makes the capability cheaper and more broadly available. Which is exactly why the important question shifts. It is no longer whether a model can operate a computer, but how you safeguard what it does while it does.

What changes with Gemini 3.5 Flash?

Until now, computer use at Google was only available as a standalone Gemini 2.5 computer use model. The capability is now integrated natively into the main model of the Flash line. According to Google, developers can build custom agents that see, reason, and take action across browser, mobile, and desktop. Google names long-horizon tasks and enterprise automation as use cases, such as continuous software testing and knowledge work across professional applications. The tool is available through the Gemini API and the Gemini Enterprise Agent Platform.

The real lever is the move into a fast, cheap model. Computer use used to be a special case you reached for deliberately. In the Flash line it becomes a baseline ability you pick up in passing. What is cheap and fast gets used in places where you would not have used it before.

What does the benchmark say?

Google reports a score of 78.4 on OSWorld-Verified and places that on the level of Sonnet 4.6. OSWorld measures how well an agent solves real computer tasks in a desktop environment. A Flash model at this level is notable, because it pairs the speed and price of a light class with the task ability of a heavy one. A benchmark remains a curated track, though. It says the model handles the tasks in the test, not that it acts reliably in your environment. The gap between 78 percent in the test and what an unattended automation needs is exactly the place where a human signs off.

Why the safeguards are the real point

An agent that operates a browser and a desktop reads foreign content along the way: web pages, documents, emails. That opens the door to prompt injection, to instructions hidden in the content that pull the agent off its task. Google responds with targeted adversarial training for computer use and with two optional enterprise safeguard systems: one that requires user confirmation, and one that automatically terminates a task when a prompt injection is detected. This is the right direction. It also shows that the protection does not sit in the model alone, but in what you configure around it.

These safeguard systems are optional. That means safeguarding is a decision you make, not an automatism. Whoever lets an agent act on real accounts and files without a confirmation step has not drawn the line, but left it out.

What this changes in an agentic engineer’s day

Computer use as a baseline ability of a fast model lowers the bar for automating routine work across applications: forms, test runs, gathering data from several interfaces. The gain is real. But it demands the same discipline as any autonomy. Define where a human confirms before the agent touches real systems. Keep the confirmation systems active, precisely because they are optional. And benchmark the model not only against the computer use agents from Claude and OpenAI, but against your own task, in your environment, with your data. The capability gets cheaper, the responsibility stays with the human.

Sources

Google: Introducing computer use in Gemini 3.5 Flash