OpenAI’s new Codex carries out coding tasks in isolated software containers that don’t have web access and allows developers to customize those production environments, fix bugs with an accuracy rate of 75% • DigiBanker

OpenAI debuted a new AI agent, Codex, that can help developers write code and fix bugs. The tool is available through a sidebar in ChatGPT’s interface. One button in the sidebar configures Codex to generate new code based on user instructions, while another allows it to answer questions about existing code. Prompt responses take between one and 30 minutes to generate based on the complexity of the request. Codex is powered by a new AI model called codex-1. It’s a version of o3, OpenAI’s most capable reasoning model, that has been optimized for programming tasks. The ChatGPT developer fine-tuned Codex by training it on a set of real-world coding tasks. Those tasks involved a range of software environments. A piece of software that runs well in one environment, such as a cloud platform, may not run as efficiently on a Linux server or a developer’s desktop, if at all. As a result, an AI model’s training dataset must include technical information about every environment that it will be expected to use. OpenAI used reinforcement learning to train codex-1. It’s a way of developing AI models that relies on trial and error to boost output quality. When a neural network completes a task correctly, it’s given a virtual reward, while incorrect answers lead to penalties that encourage the algorithm to come up with a better approach. In a series of coding tests carried out by OpenAI, Codex achieved an accuracy rate of 75%. That’s 5% better than the most capable, hardware-intensive version of o3. OpenAI’s first-generation reasoning model, o1, scored 11%. Codex carries out coding tasks in isolated software containers that don’t have web access. According to OpenAI, the agent launches a separate container for each task. Developers can customize those development environments by uploading a text file called AGENTS.md. The file may describe what programs Codex should install, how AI-generated code should be tested for bugs and related details. Using AGENTS.md, developers can ensure that the container in which Codex generates code is configured the same way as the production system on which the code will run. That reduces the need to modify the code before releasing it to production. Developers can monitor Codex while it’s generating code. After the tool completes a task, it provides technical data that can be used to review each step of the workflow. It’s possible to request revisions if the code doesn’t meet project requirements.

Read Article