
OpenAI has introduced its inaugural production AI model operating on hardware not supplied by Nvidia. The GPT-5.3-Codex-Spark coding model now runs on chips manufactured by Cerebras, delivering output at a rate exceeding 1,000 tokens per second. This performance represents an approximate 15-fold increase in speed compared to the previous iteration of the model.
For perspective, Anthropic’s Claude Opus 4.6 model, when set to its new premium fast mode, achieves about 2.5 times its standard velocity of 68.2 tokens per second. It is worth noting that Claude Opus 4.6 is a larger and more capable model overall than Spark.
Sachin Katti, who leads compute at OpenAI, commented on the collaboration. “Cerebras has been a great engineering partner, and we’re excited about adding fast inference as a new platform capability,” Katti stated.
Codex-Spark is currently available as a research preview to ChatGPT Pro subscribers, who pay $200 per month. Access is provided through the Codex application, a command-line interface, and a VS Code extension. OpenAI is also granting API access to a select group of design partners.
The model launches with a context window capable of handling 128,000 tokens and supports only text input at this stage. This release builds upon the full GPT-5.3-Codex model that OpenAI introduced earlier in the month.
While the full model is designed for heavyweight agentic coding tasks, Spark has been optimized specifically for speed rather than depth of knowledge. OpenAI constructed it as a text-only model and fine-tuned it exclusively for coding purposes, distinguishing it from the general-purpose tasks managed by the larger GPT-5.3 version.
On benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0, which assess software engineering capabilities, Spark reportedly surpasses the older GPT-5.1-Codex-mini while completing tasks in significantly less time, according to OpenAI. The company has not provided independent verification of these benchmark results.
Historically, Codex’s speed has been a point of criticism. In a test conducted in December where four AI coding agents were tasked with building Minesweeper clones, Codex required roughly twice as much time as Anthropic’s Claude Code to produce a functional game.
The 1,000 tokens per second achieved by GPT-5.3-Codex-Spark marks a substantial advancement over any previous model served through OpenAI’s own infrastructure. Independent benchmarks from Artificial Analysis indicate that OpenAI’s fastest models running on Nvidia hardware fall well below this threshold: GPT-4o delivers approximately 147 tokens per second, o3-mini reaches about 167 tokens per second, and GPT-4o mini operates at around 52 tokens per second.



