• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

OpenAI leader debunks Responses API myths and urge developers to migrate for performance and cost because it enables tool-calling chain-of-thought, higher cache utilization, and ZDR-compliant stateless usage

September 9, 2025 //  by Finnovate

Too many developers are still misinformed about the Responses API and avoiding usage as a result, according to Prashant Mital, Head of Applied AI at OpenAI. He went on to lay out several “myths” about the API. Myth one: “it’s not possible to do some things with responses.” His response: “Responses is a superset of completions. Anything you can do with completions, you can do with responses – plus more” Myth two was that Responses always keeps state and therefore cannot be used in strict cases where the customer (or their end-users/partners) must adhere to Zero Data Retention (ZDR) policies. In these kinds of setups, a company or developer requires that no user data is stored or retained on the provider’s servers after the request is processed. In such contexts, every interaction must be stateless, meaning all conversation history, reasoning traces, and other context management happen entirely on the client side, with nothing persisted by the API provider. Mital countered, “You can run responses in a stateless way. Just ask it to return encrypted reasoning items, and continue handling state client-side.” Mital also called out what he described as the most serious misconception: “myth #3: Model intelligence is the same regardless of whether you use completions or responses. wrong again.” He explained, “Responses was built for thinking models that call tools within their chain-of-thought (CoT). Responses allows persisting the CoT between model invocations when calling tools agentically — the result is a more intelligent model, and much higher cache utilization; we saw cache rates jump from 40-80% on some workloads.” Mital described this as “perhaps the most egregious” misunderstanding, warning that “developers don’t realize how much performance they are leaving on the table. It’s hard because you use LiteLLM or some custom harness you built around chat completions or whatever, but prioritizing the switch is crucial if you want GPT-5 to be maximally performant in your agents.” For teams continuing to build on Completions, Mital’s clarification may serve as a turning point. “If you’re still on chat completions, consider switching now — you are likely leaving performance and cost-savings on the table.” The Responses API is not just an alternative but an evolution, designed for the kinds of workloads that have emerged as AI systems take on more complex reasoning tasks. Developers evaluating whether to migrate may find that the potential for efficiency gains makes the decision straightforward.

Read Article

Category: AI & Machine Economy, Innovation Topics

Previous Post: « Fairmint launches Observer Nodes on Canton Network, enabling near real-time regulatory oversight of tokenized equity with privacy protections and immutable audit trails
Next Post: Banks shift from KYC to KYAI, adopting algorithmic transparency with model inventories, explainability, bias monitoring, and audit logs to meet regulation and trust demands »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.