OpenAI leader debunks Responses API myths and urge developers to migrate for performance and cost because it enables tool-calling chain-of-thought, higher cache utilization, and ZDR-compliant stateless usage • DigiBanker

Too many developers are still misinformed about the Responses API and avoiding usage as a result, according to Prashant Mital, Head of Applied AI at OpenAI. He went on to lay out several “myths” about the API. Myth one: “it’s not possible to do some things with responses.” His response: “Responses is a superset of completions. Anything you can do with completions, you can do with responses – plus more” Myth two was that Responses always keeps state and therefore cannot be used in strict cases where the customer (or their end-users/partners) must adhere to Zero Data Retention (ZDR) policies. In these kinds of setups, a company or developer requires that no user data is stored or retained on the provider’s servers after the request is processed. In such contexts, every interaction must be stateless, meaning all conversation history, reasoning traces, and other context management happen entirely on the client side, with nothing persisted by the API provider. Mital countered, “You can run responses in a stateless way. Just ask it to return encrypted reasoning items, and continue handling state client-side.” Mital also called out what he described as the most serious misconception: “myth #3: Model intelligence is the same regardless of whether you use completions or responses. wrong again.” He explained, “Responses was built for thinking models that call tools within their chain-of-thought (CoT). Responses allows persisting the CoT between model invocations when calling tools agentically — the result is a more intelligent model, and much higher cache utilization; we saw cache rates jump from 40-80% on some workloads.” Mital described this as “perhaps the most egregious” misunderstanding, warning that “developers don’t realize how much performance they are leaving on the table. It’s hard because you use LiteLLM or some custom harness you built around chat completions or whatever, but prioritizing the switch is crucial if you want GPT-5 to be maximally performant in your agents.” For teams continuing to build on Completions, Mital’s clarification may serve as a turning point. “If you’re still on chat completions, consider switching now — you are likely leaving performance and cost-savings on the table.” The Responses API is not just an alternative but an evolution, designed for the kinds of workloads that have emerged as AI systems take on more complex reasoning tasks. Developers evaluating whether to migrate may find that the potential for efficiency gains makes the decision straightforward.

Read Article