The most useful OpenAI/Codex news in the last 48 hours is not another model launch post. It is a developer runtime update:

  • February 24, 2026: OpenAI added gpt-5.3-codex to v1/responses in the API changelog.
  • February 23, 2026: OpenAI launched WebSocket mode for the Responses API for long-running, tool-call-heavy workflows.

If you run agentic coding systems, this is a bigger operational improvement than it looks at first glance. It reduces the friction between “Codex as a product experience” and “Codex as a programmable runtime inside your own orchestration stack.”

What changed

1) gpt-5.3-codex is now in v1/responses (Feb 24)

OpenAI’s API changelog shows a Feb 24, 2026 entry: gpt-5.3-codex was released to the Responses API.

That matters because many teams standardize agent orchestration around v1/responses for:

  • tool use
  • state chaining
  • background/long-running workflows
  • unified runtime behavior across models

On the model page, OpenAI now lists gpt-5.3-codex as supporting v1/responses (along with other endpoints) and documents developer-relevant details including:

  • 400,000 context window
  • 128,000 max output tokens
  • reasoning effort settings (low, medium, high, xhigh)
  • text input/output plus image input

2) WebSocket mode for v1/responses (Feb 23)

OpenAI’s new WebSocket mode guide introduces a persistent connection pattern for Responses API runs:

  • keep a WebSocket open to /v1/responses
  • send response.create events per turn
  • continue runs using previous_response_id
  • send only incremental input items each turn

The stated target is exactly the kind of workload Codex-style systems produce: long chains of model/tool/model/tool interactions.

OpenAI’s guide explicitly says WebSocket mode is most useful for workflows with many model-tool round trips and notes that for rollouts with 20+ tool calls, they have seen up to roughly 40% faster end-to-end execution.

Why it matters

This is a runtime architecture update, not just an API transport update.

For serious coding agents, total task time is often dominated by:

  • repeated tool-call continuations
  • state handoff overhead
  • context retransmission
  • orchestration latency between turns

WebSocket mode attacks the continuation overhead directly. Combined with gpt-5.3-codex in v1/responses, it gives teams a cleaner path to run production coding loops on OpenAI’s newer agentic runtime instead of stitching together older request/response patterns.

This narrows the gap between “Codex UX” and “Codex infrastructure”

A recurring problem in agentic coding stacks is that teams like Codex behavior in product surfaces, but their internal runtime architecture lags behind:

  • custom HTTP polling loops
  • excessive state replay
  • brittle continuation logic
  • latency that compounds across tool-heavy chains

This week’s changes reduce that gap:

  • model availability (gpt-5.3-codex) in the right API surface
  • transport/runtime path (WebSocket mode) designed for long agent loops

Implementation notes

1) WebSocket mode is best for chained tool-call workloads, not everything

OpenAI’s own guidance is specific: use WebSocket mode when your workflow has many model-tool round trips.

Good fits:

  • coding agents that repeatedly call shell/tools
  • orchestration loops with multiple function calls
  • long-running runs with incremental continuation

Less compelling:

  • short single-turn requests
  • simple tool-less calls
  • latency-insensitive batch-style tasks

This should be a routing decision, not a blanket migration.

2) The continuation model is fast, but connection-local

The guide’s most important implementation detail is easy to miss:

  • WebSocket mode keeps one most-recent previous-response state in a connection-local in-memory cache
  • continuation is fastest when chaining from that cached most recent response

This creates real engineering consequences:

  • latency wins depend on connection continuity
  • reconnect behavior is part of correctness, not just reliability
  • state locality matters for orchestration design

If you already distribute agent turns across workers arbitrarily, you may not see the full latency benefit until you preserve socket affinity per run.

3) store=false and ZDR compatibility is a real design lever

OpenAI explicitly documents that WebSocket mode is compatible with store=false and Zero Data Retention (ZDR) because the cached previous-response state can remain in memory on the active connection.

That is important for teams who previously assumed lower-latency chaining required persisted server-side state.

Tradeoff:

  • with store=false, if the cached response ID is unavailable after reconnect, you can hit previous_response_not_found
  • you need a fallback path (full context resend or compaction-assisted restart)

In other words: privacy-friendly and low-latency is possible, but only if your recovery logic is disciplined.

4) No multiplexing means parallelism needs multiple sockets

The WebSocket mode guide notes:

  • one connection can receive multiple response.create messages
  • but responses run sequentially
  • no multiplexing support today

If your agent runtime runs parallel branches, plan for:

  • one socket per active chain (or per small worker pool)
  • explicit connection lifecycle management
  • backpressure and connection-limit handling

This is a runtime topology decision, not just a code change.

5) Compaction strategy now matters more

OpenAI’s WebSocket guide documents how WebSocket continuations interact with:

  • normal previous_response_id chaining
  • server-side compaction via context_management
  • standalone /responses/compact

This is the right moment to stop treating compaction as a “later optimization.” In long coding runs, compaction and continuation strategy are part of your latency budget and failure recovery plan.

Where gpt-realtime-1.5 fits in this week’s picture

OpenAI’s Feb 23 changelog also released gpt-realtime-1.5 (and gpt-audio-1.5) to API surfaces.

That is a separate modality lane, but it reinforces the same trend:

  • OpenAI is investing in runtime ergonomics (transport, state handling, low-latency execution paths)
  • not just model launches and benchmark updates

For teams building agent systems, this week’s signal is architectural:

the API is getting better at long-running, stateful, tool-heavy workflows.

Practical rollout pattern for Codex teams

If your coding-agent stack already uses OpenAI tools/models, a pragmatic rollout looks like this:

  1. Keep your current HTTP Responses path as the stable baseline.
  2. Add a WebSocket-mode execution path behind a feature flag.
  3. Route only tool-heavy runs (e.g. >10-20 tool calls expected) to WebSocket mode first.
  4. Measure end-to-end task time, failure recovery rate, and operator complexity.
  5. Expand only after reconnect and cache-miss handling are boring.

This avoids the classic mistake of shipping a faster happy path with a weaker failure path.

What to do now

If you are shipping agentic coding workflows this week:

  1. Confirm your stack can call gpt-5.3-codex via v1/responses.
  2. Prototype WebSocket mode for one long-running coding workflow.
  3. Implement recovery for previous_response_not_found and 60-minute connection expiry.
  4. Decide when to use store=true vs store=false/ZDR based on your policy model.
  5. Re-benchmark latency and accepted-change throughput before widening rollout.

The headline here is not “new benchmark number.” It is that OpenAI shipped runtime changes that make Codex-class agent workflows easier to run well.

Sources