GPT-5.3-Codex Lands In the Responses API, and WebSocket Mode Changes Long-Running Agent Loops

The most useful OpenAI/Codex news in the last 48 hours is not another model launch post. It is a developer runtime update:

February 24, 2026: OpenAI added gpt-5.3-codex to v1/responses in the API changelog.
February 23, 2026: OpenAI launched WebSocket mode for the Responses API for long-running, tool-call-heavy workflows.

If you run agentic coding systems, this is a bigger operational improvement than it looks at first glance. It reduces the friction between “Codex as a product experience” and “Codex as a programmable runtime inside your own orchestration stack.”

What changed

1) `gpt-5.3-codex` is now in `v1/responses` (Feb 24)

OpenAI’s API changelog shows a Feb 24, 2026 entry: gpt-5.3-codex was released to the Responses API.

That matters because many teams standardize agent orchestration around v1/responses for:

tool use
state chaining
background/long-running workflows
unified runtime behavior across models

On the model page, OpenAI now lists gpt-5.3-codex as supporting v1/responses (along with other endpoints) and documents developer-relevant details including:

400,000 context window
128,000 max output tokens
reasoning effort settings (low, medium, high, xhigh)
text input/output plus image input

2) WebSocket mode for `v1/responses` (Feb 23)

OpenAI’s new WebSocket mode guide introduces a persistent connection pattern for Responses API runs:

keep a WebSocket open to /v1/responses
send response.create events per turn
continue runs using previous_response_id
send only incremental input items each turn

The stated target is exactly the kind of workload Codex-style systems produce: long chains of model/tool/model/tool interactions.

OpenAI’s guide explicitly says WebSocket mode is most useful for workflows with many model-tool round trips and notes that for rollouts with 20+ tool calls, they have seen up to roughly 40% faster end-to-end execution.

Why it matters

This is a runtime architecture update, not just an API transport update.

For serious coding agents, total task time is often dominated by:

repeated tool-call continuations
state handoff overhead
context retransmission
orchestration latency between turns

WebSocket mode attacks the continuation overhead directly. Combined with gpt-5.3-codex in v1/responses, it gives teams a cleaner path to run production coding loops on OpenAI’s newer agentic runtime instead of stitching together older request/response patterns.

This narrows the gap between “Codex UX” and “Codex infrastructure”

A recurring problem in agentic coding stacks is that teams like Codex behavior in product surfaces, but their internal runtime architecture lags behind:

custom HTTP polling loops
excessive state replay
brittle continuation logic
latency that compounds across tool-heavy chains

This week’s changes reduce that gap:

model availability (gpt-5.3-codex) in the right API surface
transport/runtime path (WebSocket mode) designed for long agent loops

Implementation notes

1) WebSocket mode is best for chained tool-call workloads, not everything

OpenAI’s own guidance is specific: use WebSocket mode when your workflow has many model-tool round trips.

Good fits:

coding agents that repeatedly call shell/tools
orchestration loops with multiple function calls
long-running runs with incremental continuation

Less compelling:

short single-turn requests
simple tool-less calls
latency-insensitive batch-style tasks

This should be a routing decision, not a blanket migration.

2) The continuation model is fast, but connection-local

The guide’s most important implementation detail is easy to miss:

WebSocket mode keeps one most-recent previous-response state in a connection-local in-memory cache
continuation is fastest when chaining from that cached most recent response

This creates real engineering consequences:

latency wins depend on connection continuity
reconnect behavior is part of correctness, not just reliability
state locality matters for orchestration design

If you already distribute agent turns across workers arbitrarily, you may not see the full latency benefit until you preserve socket affinity per run.

3) `store=false` and ZDR compatibility is a real design lever

OpenAI explicitly documents that WebSocket mode is compatible with store=false and Zero Data Retention (ZDR) because the cached previous-response state can remain in memory on the active connection.

That is important for teams who previously assumed lower-latency chaining required persisted server-side state.

Tradeoff:

with store=false, if the cached response ID is unavailable after reconnect, you can hit previous_response_not_found
you need a fallback path (full context resend or compaction-assisted restart)

In other words: privacy-friendly and low-latency is possible, but only if your recovery logic is disciplined.

4) No multiplexing means parallelism needs multiple sockets

The WebSocket mode guide notes:

one connection can receive multiple response.create messages
but responses run sequentially
no multiplexing support today

If your agent runtime runs parallel branches, plan for:

one socket per active chain (or per small worker pool)
explicit connection lifecycle management
backpressure and connection-limit handling

This is a runtime topology decision, not just a code change.

5) Compaction strategy now matters more

OpenAI’s WebSocket guide documents how WebSocket continuations interact with:

normal previous_response_id chaining
server-side compaction via context_management
standalone /responses/compact

This is the right moment to stop treating compaction as a “later optimization.” In long coding runs, compaction and continuation strategy are part of your latency budget and failure recovery plan.

Where `gpt-realtime-1.5` fits in this week’s picture

OpenAI’s Feb 23 changelog also released gpt-realtime-1.5 (and gpt-audio-1.5) to API surfaces.

That is a separate modality lane, but it reinforces the same trend:

OpenAI is investing in runtime ergonomics (transport, state handling, low-latency execution paths)
not just model launches and benchmark updates

For teams building agent systems, this week’s signal is architectural:

the API is getting better at long-running, stateful, tool-heavy workflows.

Practical rollout pattern for Codex teams

If your coding-agent stack already uses OpenAI tools/models, a pragmatic rollout looks like this:

Keep your current HTTP Responses path as the stable baseline.
Add a WebSocket-mode execution path behind a feature flag.
Route only tool-heavy runs (e.g. >10-20 tool calls expected) to WebSocket mode first.
Measure end-to-end task time, failure recovery rate, and operator complexity.
Expand only after reconnect and cache-miss handling are boring.

This avoids the classic mistake of shipping a faster happy path with a weaker failure path.

What to do now

If you are shipping agentic coding workflows this week:

Confirm your stack can call gpt-5.3-codex via v1/responses.
Prototype WebSocket mode for one long-running coding workflow.
Implement recovery for previous_response_not_found and 60-minute connection expiry.
Decide when to use store=true vs store=false/ZDR based on your policy model.
Re-benchmark latency and accepted-change throughput before widening rollout.

The headline here is not “new benchmark number.” It is that OpenAI shipped runtime changes that make Codex-class agent workflows easier to run well.

GPT-5.3-Codex Lands In the Responses API, and WebSocket Mode Changes Long-Running Agent Loops

Builder Takeaway

What To Do Now

What changed

1) `gpt-5.3-codex` is now in `v1/responses` (Feb 24)

2) WebSocket mode for `v1/responses` (Feb 23)

Why it matters

This narrows the gap between “Codex UX” and “Codex infrastructure”

Implementation notes

1) WebSocket mode is best for chained tool-call workloads, not everything

2) The continuation model is fast, but connection-local

3) `store=false` and ZDR compatibility is a real design lever

4) No multiplexing means parallelism needs multiple sockets

5) Compaction strategy now matters more

Where `gpt-realtime-1.5` fits in this week’s picture

Practical rollout pattern for Codex teams

What to do now

Sources

GPT-5.3-Codex Lands In the Responses API, and WebSocket Mode Changes Long-Running Agent Loops

Builder Takeaway

What To Do Now

What changed

1) gpt-5.3-codex is now in v1/responses (Feb 24)

2) WebSocket mode for v1/responses (Feb 23)

Why it matters

This narrows the gap between “Codex UX” and “Codex infrastructure”

Implementation notes

1) WebSocket mode is best for chained tool-call workloads, not everything

2) The continuation model is fast, but connection-local

3) store=false and ZDR compatibility is a real design lever

4) No multiplexing means parallelism needs multiple sockets

5) Compaction strategy now matters more

Where gpt-realtime-1.5 fits in this week’s picture

Practical rollout pattern for Codex teams

What to do now

Sources

1) `gpt-5.3-codex` is now in `v1/responses` (Feb 24)

2) WebSocket mode for `v1/responses` (Feb 23)

3) `store=false` and ZDR compatibility is a real design lever

Where `gpt-realtime-1.5` fits in this week’s picture