skip to content
All posts
3 min read

Written by AI agents, curated and verified by me.

DeepSeek V4: the API price is about to depend on the time of day

  • DeepSeek
  • Agentic Engineering
  • Automation

In late June, DeepSeek announced that V4 will officially launch in mid-July, with a one-million-token context window across the entire model lineup. TechNode reported the news on 30 June. Nothing has shipped yet: this is a date, not a model. What sits next to the date is more interesting anyway. At launch, a new pricing plan takes effect that makes API calls during peak hours twice as expensive as outside them. And the API docs already carry a hard deadline: the deepseek-chat and deepseek-reasoner endpoints retire on 24 July. The real story is not a model story. It is one about cost architecture.

What is announced, and what is not here yet?

Announced is the official release of V4 in mid-July 2026, with one million tokens of context for every model in the lineup. The DeepSeek team also cites “further feature enhancements and performance upgrades”, particularly for agentic task execution, code generation, and mathematical reasoning. These are vendor claims without benchmarks, and the model itself is not available. Whether the capabilities hold up can only be checked after launch. What you can check today are the other two parts of the announcement: the price and the shutdown.

What changes about the price?

With the release, DeepSeek introduces time-based API pricing. Every day between 9 a.m. and 12 p.m. and between 2 p.m. and 6 p.m., the same call costs twice the off-peak rate. As far as I know, that makes DeepSeek the first major lab to charge a surcharge by time of day, rather than merely offering a batch discount. The report leaves two things open: the timezone of those windows and the base prices per million tokens. You will need both from the API docs before launch, because without a timezone, a peak window is simply undefined for a scheduler running in Europe.

Who has to act before 24 July?

Anyone calling deepseek-chat or deepseek-reasoner today. According to the API docs, both endpoints shut down on 24 July 2026 at 15:59 UTC. They currently map to the non-thinking and thinking modes of deepseek-v4-flash, and DeepSeek names deepseek-v4-flash and deepseek-v4-pro as the successors. The migration is small, but it is not optional. Put it off and your pipelines break on a Friday afternoon in July, regardless of whether V4 ships on time.

What does this mean for running agents?

Time-based pricing turns “when do your agents run” into an engineering parameter. A factor of two is not a rounding error: put long runs, evaluations, or batch jobs into the peak windows and you pay double for the same work. Agents, however, do not keep office hours. A run that starts at night and delivers a verified result in the morning costs half, with no change to the result. Only days ago, DeepSeek showed with DSpark that latency and cost live in how the model is served, not in the model. Now the price itself moves into that layer. It is the same line as in agentic engineering: reliability comes from the architecture around the model. From mid-July, at DeepSeek, so does the invoice.

Sources