refactor: unify empty-output retry logic into shared completionruntime package and normalize protocol adapter boundary.

2026-05-10 03:07:41 +08:00 · 2026-05-10 00:10:53 +08:00
parent 067cf465bb
commit 7c66742a19
32 changed files with 930 additions and 371 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -22,6 +22,13 @@ These rules apply to all agent-made changes in this repository.
 - Keep changes additive and tightly scoped to the requested feature or bugfix.
 - Do not mix unrelated refactors into feature PRs unless they are required to make the change pass gates.

+## Protocol Adapter Boundary
+
+- Do not let OpenAI Chat, OpenAI Responses, Claude, Gemini, or other interface protocol formatting own shared business behavior.
+- Normalize protocol-specific request shapes into the project standard request/turn model first, run shared business logic in one place, then render back to the target protocol at the boundary.
+- Business logic that must stay globally consistent includes empty-output retry, thinking/reasoning handling, tool-call detection and policy, usage accounting, current-input-file injection, history persistence, file/reference handling, and completion payload assembly.
+- If a behavior must differ by protocol, keep the difference as an explicit adapter/rendering concern and document why it cannot live in the shared normalized path.
+
 ## Documentation Sync

 - When business logic or user-visible behavior changes, update the corresponding documentation in the same change.
--- a/docs/prompt-compatibility.md
+++ b/docs/prompt-compatibility.md
@@ -112,7 +112,7 @@ DS2API 当前的核心思路，不是把客户端传来的 `messages`、`tools`
 - Vercel Node 流式路径本轮不迁移，仍使用现有 Node bridge / stream-tool-sieve 实现；后续若变更 Node 流式语义，需要按 `assistantturn` 的 Go canonical 输出语义同步对齐。
 - 客户端传入的 thinking / reasoning 开关会被归一到下游 `thinking_enabled`。Gemini `generationConfig.thinkingConfig.thinkingBudget` 会翻译成同一套 thinking 开关；关闭时即使上游返回 `response/thinking_content`，兼容层也不会把它当作可见正文输出。若最终解析出的模型名带 `-nothinking` 后缀，则会无条件强制关闭 thinking，优先级高于请求体中的 `thinking` / `reasoning` / `reasoning_effort`。未显式关闭时，各 surface 会按解析后的 DeepSeek 模型默认能力开启 thinking，并用各自协议的原生形态暴露：OpenAI Chat 为 `reasoning_content`，OpenAI Responses 为 `response.reasoning.delta` / `reasoning` content，Claude 为 `thinking` block / `thinking_delta`，Gemini 为 `thought: true` part。
 - 对 OpenAI Chat / Responses 的非流式收尾，如果最终可见正文为空，兼容层会优先尝试把思维链中的独立 DSML / XML 工具块当作真实工具调用解析出来。流式链路也会在收尾阶段做同样的 fallback 检测，但不会因为思维链内容去中途拦截或改写流式输出；真正的工具识别始终基于原始上游文本，而不是基于“已经做过可见输出清洗”的版本，因此即使最终可见层会剥离完整 leaked DSML / XML `tool_calls` wrapper、并抑制全空参数或无效 wrapper 块，也不会影响真实工具调用转成结构化 `tool_calls` / `function_call`。补发结果会作为本轮 assistant 的结构化 `tool_calls` / `function_call` 输出返回，而不是塞进 `content` 文本；如果客户端没有开启 thinking / reasoning，思维链只用于检测，不会作为 `reasoning_content` 或可见正文暴露。只有正文为空且思维链里也没有可执行工具调用时，才继续按空回复错误处理。
- OpenAI Chat / Responses 的空回复错误处理之前会默认做一次内部补偿重试：第一次上游完整结束后，如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用，并且终止原因不是 `content_filter`，兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略，把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。重试遵循 DeepSeek 多轮对话协议：从第一次上游 SSE 流中提取 `response_message_id`，并在重试 payload 中设置 `parent_message_id` 为该值，使重试成为同一会话的后续轮次而非断裂的根消息；同时重新获取一次 PoW（若 PoW 获取失败则回退到原始 PoW）。该重试不会重新标准化消息、不会新建 session、不会切换账号，也不会向流式客户端插入重试标记；第二次 thinking / reasoning 会按正常增量直接接到第一次之后，并继续使用 overlap trim 去重。若第二次仍为空，终端错误码仍保持现有 `upstream_empty_output`；若任一尝试触发空 `content_filter`，不做补偿重试并保持 `content_filter` 错误。JS Vercel 运行时同样设置 `parent_message_id`，但因无法直接调用 PoW API 而复用原始 PoW。
+- OpenAI Chat / Responses、Claude Messages、Gemini generateContent 的空回复错误处理之前会默认做一次内部补偿重试：第一次上游完整结束后，如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用，并且终止原因不是 `content_filter`，兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略，把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。Go 主路径的非流式重试由 `completionruntime.ExecuteNonStreamWithRetry` 统一处理；流式重试由 `completionruntime.ExecuteStreamWithRetry` 统一处理，各协议 runtime 只负责消费/渲染本协议 SSE framing。重试遵循 DeepSeek 多轮对话协议：从第一次上游 SSE 流中提取 `response_message_id`，并在重试 payload 中设置 `parent_message_id` 为该值，使重试成为同一会话的后续轮次而非断裂的根消息；同时重新获取一次 PoW（若 PoW 获取失败则回退到原始 PoW）。该重试不会重新标准化消息、不会新建 session、不会切换账号，也不会向流式客户端插入重试标记；第二次 thinking / reasoning 会按正常增量直接接到第一次之后，并继续使用 overlap trim 去重。若第二次仍没有任何输出，终端错误为 503 `upstream_unavailable`；若有 reasoning 但没有可见正文或工具调用，仍返回 429 `upstream_empty_output`；若任一尝试触发空 `content_filter`，不做补偿重试并保持 `content_filter` 错误。JS Vercel 运行时同样设置 `parent_message_id`，但因无法直接调用 PoW API 而复用原始 PoW。

 - 非流式 OpenAI Chat / Responses、Claude Messages、Gemini generateContent 在最终可见正文渲染阶段，会把 DeepSeek 搜索返回中的 `[citation:N]` / `[reference:N]` 标记替换成对应 Markdown 链接。`citation` 标记按一基序号解析；`reference` 标记只有在同一段正文中出现 `[reference:0]`（允许冒号后有空格）时才按零基序号映射，并且不会影响同段正文里的 `citation` 标记。
 - 流式输出仍默认隐藏 `[citation:N]` / `[reference:N]` 这类上游内部标记，避免分片输出中泄漏尚未完成映射的引用占位符。
@@ -168,7 +168,7 @@ OpenAI Chat / Responses 在标准化后、current input file 之前，会默认
 4. 把这整段内容并入 system prompt。

 工具调用正例现在优先示范官方 DSML 风格：`<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`。
-兼容层仍接受旧式纯 `<tool_calls>` wrapper，并会容错若干 DSML 标签变体，包括短横线形式 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`、下划线形式 `<dsml_tool_calls>` / `<dsml_invoke>` / `<dsml_parameter>`，以及其他前缀分隔形态如 `<vendor|tool_calls>` / `<vendor_tool_calls>` / `<vendor - tool_calls>`；但提示词会优先要求模型输出官方 DSML 标签，并强调不能只输出 closing wrapper 而漏掉 opening tag。需要注意：这是“兼容 DSML 外壳，内部仍以 XML 解析语义为准”，不是原生 DSML 全链路实现；这些别名会在解析入口归一化回现有 XML 标签后继续走同一套 parser。解析器会先截获非代码块中的疑似工具 wrapper，完整解析失败或工具语义无效时再按普通文本放行。
+兼容层仍接受旧式纯 `<tool_calls>` wrapper，并会容错若干 DSML 标签变体，包括短横线形式 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`、下划线形式 `<dsml_tool_calls>` / `<dsml_invoke>` / `<dsml_parameter>`，以及其他前缀分隔形态如 `<vendor|tool_calls>` / `<vendor_tool_calls>` / `<vendor - tool_calls>`；标签壳扫描还会把全角 ASCII 漂移归一化，例如 `<ｄＳＭＬ｜tool_calls>` 与全角 `＞` 结束符。但提示词会优先要求模型输出官方 DSML 标签，并强调不能只输出 closing wrapper 而漏掉 opening tag。需要注意：这是“兼容 DSML 外壳，内部仍以 XML 解析语义为准”，不是原生 DSML 全链路实现；这些别名会在解析入口归一化回现有 XML 标签后继续走同一套 parser。解析器会先截获非代码块中的疑似工具 wrapper，完整解析失败或工具语义无效时再按普通文本放行。
 数组参数使用 `<item>...</item>` 子节点表示；当某个参数体只包含 item 子节点时，Go / Node 解析器会把它还原成数组，避免 `questions` / `options` 这类 schema 中要求 array 的参数被误解析成 `{ "item": ... }` 对象。除此之外，解析器还会回收一些更松散的列表写法，例如 JSON array 字面量或逗号分隔的 JSON 项序列，只要它们足够明确；但 `<item>` 仍然是首选形态。若模型把完整结构化 XML fragment 误包进 CDATA，兼容层会在保护 `content` / `command` 等原文字段的前提下，尝试把非原文字段中的 CDATA XML fragment 还原成 object / array。不过，如果 CDATA 只是单个平面的 XML/HTML 标签，例如 `<b>urgent</b>` 这种行内标记，兼容层会保留原始字符串，不会强行升成 object / array；只有明显表示结构的 CDATA 片段，例如多兄弟节点、嵌套子节点或 `item` 列表，才会触发结构化恢复。对 `command` / `content` 等长文本参数，CDATA 内部的 Markdown fenced DSML / XML 示例会作为原文保护；示例里的 `]]></parameter>` 或 `</tool_calls>` 不会截断外层工具调用，解析器会继续等待围栏外真正的参数 / wrapper 结束标签。
 Go 侧读取 DeepSeek SSE 时不再依赖 `bufio.Scanner` 的固定 2MiB 单行上限；当写文件类工具把很长的 `content` 放在单个 `data:` 行里返回时，非流式收集、流式解析和 auto-continue 透传都会保留完整行，再进入同一套工具解析与序列化流程。
 在 assistant 最终回包阶段，如果某个 tool 参数在声明 schema 中明确是 `string`，兼容层会在把解析后的 `tool_calls` / `function_call` 重新序列化成 OpenAI / Responses / Claude 可见参数前，递归把该路径上的 number / bool / object / array 统一转成字符串；其中 object / array 会压成紧凑 JSON 字符串。这个保护只对 schema 明确声明为 string 的路径生效，不会改写本来就是 `number` / `boolean` / `object` / `array` 的参数。这样可以兼容 DeepSeek 输出了结构化片段、但上游客户端工具 schema 又严格要求字符串参数的场景（例如 `content`、`prompt`、`path`、`taskId` 等）。
--- a/internal/assistantturn/turn.go
+++ b/internal/assistantturn/turn.go
@@ -218,7 +218,7 @@ func UpstreamEmptyOutputDetail(contentFilter bool, text, thinking string) (int,
 	if strings.TrimSpace(thinking) != "" {
 		return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned reasoning without visible output.", "upstream_empty_output"
 	}
-	return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned empty output.", "upstream_empty_output"
+	return http.StatusServiceUnavailable, "Upstream service is unavailable and returned no output.", "upstream_unavailable"
 }

 // ShouldRetryEmptyOutput returns true when the turn produced no visible text
--- a/internal/assistantturn/turn_test.go
+++ b/internal/assistantturn/turn_test.go
@@ -1,6 +1,7 @@
 package assistantturn

 import (
+	"net/http"
 	"testing"

 	"ds2api/internal/promptcompat"
@@ -70,6 +71,13 @@ func TestBuildTurnFromCollectedThinkingOnlyIsEmptyOutput(t *testing.T) {
 	}
 }

+func TestBuildTurnFromCollectedPureEmptyOutputIsUpstreamUnavailable(t *testing.T) {
+	turn := BuildTurnFromCollected(sse.CollectResult{}, BuildOptions{})
+	if turn.Error == nil || turn.Error.Status != http.StatusServiceUnavailable || turn.Error.Code != "upstream_unavailable" {
+		t.Fatalf("expected upstream unavailable error, got %#v", turn.Error)
+	}
+}
+
 func TestBuildTurnFromCollectedToolChoiceRequired(t *testing.T) {
 	turn := BuildTurnFromCollected(sse.CollectResult{Text: "hello"}, BuildOptions{
 		ToolChoice: promptcompat.ToolChoicePolicy{Mode: promptcompat.ToolChoiceRequired},
--- a/internal/completionruntime/nonstream.go
+++ b/internal/completionruntime/nonstream.go
@@ -90,7 +90,11 @@ func ExecuteNonStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.R
 	if startErr != nil {
 		return NonStreamResult{SessionID: start.SessionID, Payload: start.Payload}, startErr
 	}
-	stdReq = start.Request
+	return ExecuteNonStreamStartedWithRetry(ctx, ds, a, start, opts)
+}
+
+func ExecuteNonStreamStartedWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, start StartResult, opts Options) (NonStreamResult, *assistantturn.OutputError) {
+	stdReq := start.Request
 	maxAttempts := opts.MaxAttempts
 	if maxAttempts <= 0 {
 		maxAttempts = 3
--- a/internal/completionruntime/nonstream_test.go
+++ b/internal/completionruntime/nonstream_test.go
@@ -91,7 +91,7 @@ func TestExecuteNonStreamWithRetryBuildsCanonicalTurn(t *testing.T) {

 func TestExecuteNonStreamWithRetryUsesParentMessageForEmptyRetry(t *testing.T) {
 	ds := &fakeDeepSeekCaller{responses: []*http.Response{
-		sseHTTPResponse(http.StatusOK, `data: {"response_message_id":77,"p":"response/status","v":"FINISHED"}`),
+		sseHTTPResponse(http.StatusOK, `data: {"response_message_id":77,"p":"response/thinking_content","v":"plan"}`),
 		sseHTTPResponse(http.StatusOK, `data: {"response_message_id":78,"p":"response/content","v":"ok"}`),
 	}}
 	stdReq := promptcompat.StandardRequest{
--- a/internal/completionruntime/stream_retry.go
+++ b/internal/completionruntime/stream_retry.go
@@ -0,0 +1,118 @@
+package completionruntime
+
+import (
+	"context"
+	"io"
+	"net/http"
+	"strings"
+
+	"ds2api/internal/auth"
+	"ds2api/internal/config"
+	"ds2api/internal/httpapi/openai/shared"
+)
+
+type StreamRetryOptions struct {
+	Surface          string
+	Stream           bool
+	RetryEnabled     bool
+	RetryMaxAttempts int
+	MaxAttempts      int
+	UsagePrompt      string
+}
+
+type StreamRetryHooks struct {
+	ConsumeAttempt  func(resp *http.Response, allowDeferEmpty bool) (terminalWritten bool, retryable bool)
+	Finalize        func(attempts int)
+	ParentMessageID func() int
+	OnRetry         func(attempts int)
+	OnRetryPrompt   func(prompt string)
+	OnRetryFailure  func(status int, message, code string)
+	OnTerminal      func(attempts int)
+}
+
+func ExecuteStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, initialResp *http.Response, payload map[string]any, pow string, opts StreamRetryOptions, hooks StreamRetryHooks) {
+	if hooks.ConsumeAttempt == nil {
+		return
+	}
+	surface := strings.TrimSpace(opts.Surface)
+	if surface == "" {
+		surface = "completion"
+	}
+	maxAttempts := opts.MaxAttempts
+	if maxAttempts <= 0 {
+		maxAttempts = 3
+	}
+	retryMax := opts.RetryMaxAttempts
+	if retryMax <= 0 {
+		retryMax = shared.EmptyOutputRetryMaxAttempts()
+	}
+
+	attempts := 0
+	currentResp := initialResp
+	for {
+		terminalWritten, retryable := hooks.ConsumeAttempt(currentResp, opts.RetryEnabled && attempts < retryMax)
+		if terminalWritten {
+			if hooks.OnTerminal != nil {
+				hooks.OnTerminal(attempts)
+			}
+			return
+		}
+		if !retryable || !opts.RetryEnabled || attempts >= retryMax {
+			if hooks.Finalize != nil {
+				hooks.Finalize(attempts)
+			}
+			return
+		}
+
+		attempts++
+		parentMessageID := 0
+		if hooks.ParentMessageID != nil {
+			parentMessageID = hooks.ParentMessageID()
+		}
+		config.Logger.Info("[completion_runtime_empty_retry] attempting synthetic retry", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "parent_message_id", parentMessageID)
+		retryPow, powErr := ds.GetPow(ctx, a, maxAttempts)
+		if powErr != nil {
+			config.Logger.Warn("[completion_runtime_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "error", powErr)
+			retryPow = pow
+		}
+		nextResp, err := ds.CallCompletion(ctx, a, shared.ClonePayloadForEmptyOutputRetry(payload, parentMessageID), retryPow, maxAttempts)
+		if err != nil {
+			if hooks.OnRetryFailure != nil {
+				hooks.OnRetryFailure(http.StatusInternalServerError, "Failed to get completion.", "error")
+			}
+			config.Logger.Warn("[completion_runtime_empty_retry] retry request failed", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "error", err)
+			return
+		}
+		if nextResp.StatusCode != http.StatusOK {
+			body, readErr := io.ReadAll(nextResp.Body)
+			if readErr != nil {
+				config.Logger.Warn("[completion_runtime_empty_retry] retry error body read failed", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "error", readErr)
+			}
+			closeRetryBody(surface, nextResp.Body)
+			msg := strings.TrimSpace(string(body))
+			if msg == "" {
+				msg = http.StatusText(nextResp.StatusCode)
+			}
+			if hooks.OnRetryFailure != nil {
+				hooks.OnRetryFailure(nextResp.StatusCode, msg, "error")
+			}
+			return
+		}
+		if hooks.OnRetry != nil {
+			hooks.OnRetry(attempts)
+		}
+		if hooks.OnRetryPrompt != nil {
+			hooks.OnRetryPrompt(shared.UsagePromptWithEmptyOutputRetry(opts.UsagePrompt, attempts))
+		}
+		currentResp = nextResp
+	}
+}
+
+func closeRetryBody(surface string, body io.Closer) {
+	if body == nil {
+		return
+	}
+	if err := body.Close(); err != nil {
+		config.Logger.Warn("[completion_runtime_empty_retry] retry response body close failed", "surface", surface, "error", err)
+	}
+}
--- a/internal/completionruntime/stream_retry_test.go
+++ b/internal/completionruntime/stream_retry_test.go
@@ -0,0 +1,62 @@
+package completionruntime
+
+import (
+	"context"
+	"io"
+	"net/http"
+	"strings"
+	"testing"
+
+	"ds2api/internal/auth"
+	"ds2api/internal/httpapi/openai/shared"
+)
+
+func TestExecuteStreamWithRetryUsesSharedRetryPayloadAndUsagePrompt(t *testing.T) {
+	ds := &fakeDeepSeekCaller{responses: []*http.Response{
+		sseHTTPResponse(http.StatusOK, `data: {"p":"response/content","v":"ok"}`),
+	}}
+	initial := sseHTTPResponse(http.StatusOK, `data: {"response_message_id":77,"p":"response/thinking_content","v":"plan"}`)
+	payload := map[string]any{"prompt": "original prompt"}
+	attemptsSeen := 0
+	retryPrompt := ""
+
+	ExecuteStreamWithRetry(context.Background(), ds, &auth.RequestAuth{}, initial, payload, "pow", StreamRetryOptions{
+		Surface:      "test.stream",
+		Stream:       true,
+		RetryEnabled: true,
+		UsagePrompt:  "original prompt",
+	}, StreamRetryHooks{
+		ConsumeAttempt: func(resp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			defer func() {
+				if err := resp.Body.Close(); err != nil {
+					t.Fatalf("close failed: %v", err)
+				}
+			}()
+			_, _ = io.ReadAll(resp.Body)
+			attemptsSeen++
+			return attemptsSeen == 2, attemptsSeen == 1 && allowDeferEmpty
+		},
+		ParentMessageID: func() int {
+			return 77
+		},
+		OnRetryPrompt: func(prompt string) {
+			retryPrompt = prompt
+		},
+	})
+
+	if attemptsSeen != 2 {
+		t.Fatalf("expected two stream attempts, got %d", attemptsSeen)
+	}
+	if len(ds.payloads) != 1 {
+		t.Fatalf("expected one retry completion call, got %d", len(ds.payloads))
+	}
+	if got := ds.payloads[0]["parent_message_id"]; got != 77 {
+		t.Fatalf("retry parent_message_id mismatch: %#v", got)
+	}
+	if prompt, _ := ds.payloads[0]["prompt"].(string); !strings.Contains(prompt, shared.EmptyOutputRetrySuffix) {
+		t.Fatalf("expected retry suffix in payload prompt, got %q", prompt)
+	}
+	if !strings.Contains(retryPrompt, shared.EmptyOutputRetrySuffix) {
+		t.Fatalf("expected retry suffix in usage prompt, got %q", retryPrompt)
+	}
+}
--- a/internal/httpapi/claude/handler_messages.go
+++ b/internal/httpapi/claude/handler_messages.go
@@ -145,7 +145,7 @@ func (h *Handler) handleClaudeDirectStream(w http.ResponseWriter, r *http.Reques
 		return
 	}
 	streamReq := start.Request
-	h.handleClaudeStreamRealtime(w, r, start.Response, streamReq.ResponseModel, streamReq.Messages, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
+	h.handleClaudeStreamRealtimeWithRetry(w, r, a, start.Response, start.Payload, start.Pow, streamReq.ResponseModel, streamReq.Messages, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.PromptTokenText, historySession)
 }

 func (h *Handler) proxyViaOpenAI(w http.ResponseWriter, r *http.Request, store ConfigReader) bool {
@@ -360,3 +360,112 @@ func (h *Handler) handleClaudeStreamRealtime(w http.ResponseWriter, r *http.Requ
 		OnFinalize: streamRuntime.onFinalize,
 	})
 }
+
+func (h *Handler) handleClaudeStreamRealtimeWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, model string, messages []any, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, promptTokenText string, historySession *responsehistory.Session) {
+	if resp.StatusCode != http.StatusOK {
+		defer func() { _ = resp.Body.Close() }()
+		body, _ := io.ReadAll(resp.Body)
+		if historySession != nil {
+			historySession.Error(resp.StatusCode, strings.TrimSpace(string(body)), "error", "", "")
+		}
+		writeClaudeError(w, http.StatusInternalServerError, string(body))
+		return
+	}
+
+	w.Header().Set("Content-Type", "text/event-stream")
+	w.Header().Set("Cache-Control", "no-cache, no-transform")
+	w.Header().Set("Connection", "keep-alive")
+	w.Header().Set("X-Accel-Buffering", "no")
+	rc := http.NewResponseController(w)
+	_, canFlush := w.(http.Flusher)
+	if !canFlush {
+		config.Logger.Warn("[claude_stream] response writer does not support flush; streaming may be buffered")
+	}
+
+	streamRuntime := newClaudeStreamRuntime(
+		w,
+		rc,
+		canFlush,
+		model,
+		messages,
+		thinkingEnabled,
+		searchEnabled,
+		stripReferenceMarkersEnabled(),
+		toolNames,
+		toolsRaw,
+		promptTokenText,
+		historySession,
+	)
+	streamRuntime.sendMessageStart()
+
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:      "claude.messages",
+		Stream:       true,
+		RetryEnabled: true,
+		MaxAttempts:  3,
+		UsagePrompt:  promptTokenText,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeClaudeStreamAttempt(r, currentResp, streamRuntime, thinkingEnabled, allowDeferEmpty)
+		},
+		Finalize: func(_ int) {
+			streamRuntime.finalize("end_turn", false)
+		},
+		ParentMessageID: func() int {
+			return streamRuntime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			streamRuntime.promptTokenText = prompt
+		},
+		OnRetryFailure: func(status int, message, code string) {
+			streamRuntime.sendErrorWithCode(status, strings.TrimSpace(message), code)
+		},
+	})
+}
+
+func (h *Handler) consumeClaudeStreamAttempt(r *http.Request, resp *http.Response, streamRuntime *claudeStreamRuntime, thinkingEnabled bool, allowDeferEmpty bool) (bool, bool) {
+	defer func() { _ = resp.Body.Close() }()
+	initialType := "text"
+	if thinkingEnabled {
+		initialType = "thinking"
+	}
+	finalReason := streamengine.StopReason("")
+	var scannerErr error
+	streamengine.ConsumeSSE(streamengine.ConsumeConfig{
+		Context:             r.Context(),
+		Body:                resp.Body,
+		ThinkingEnabled:     thinkingEnabled,
+		InitialType:         initialType,
+		KeepAliveInterval:   claudeStreamPingInterval,
+		IdleTimeout:         claudeStreamIdleTimeout,
+		MaxKeepAliveNoInput: claudeStreamMaxKeepaliveCnt,
+	}, streamengine.ConsumeHooks{
+		OnKeepAlive: func() {
+			streamRuntime.sendPing()
+		},
+		OnParsed: streamRuntime.onParsed,
+		OnFinalize: func(reason streamengine.StopReason, err error) {
+			finalReason = reason
+			scannerErr = err
+		},
+	})
+	if string(finalReason) == "upstream_error" {
+		if streamRuntime.history != nil {
+			streamRuntime.history.Error(500, streamRuntime.upstreamErr, "upstream_error", responsehistory.ThinkingForArchive(streamRuntime.rawThinking.String(), streamRuntime.toolDetectionThinking.String(), streamRuntime.thinking.String()), responsehistory.TextForArchive(streamRuntime.rawText.String(), streamRuntime.text.String()))
+		}
+		streamRuntime.sendError(streamRuntime.upstreamErr)
+		return true, false
+	}
+	if scannerErr != nil {
+		if streamRuntime.history != nil {
+			streamRuntime.history.Error(500, scannerErr.Error(), "error", responsehistory.ThinkingForArchive(streamRuntime.rawThinking.String(), streamRuntime.toolDetectionThinking.String(), streamRuntime.thinking.String()), responsehistory.TextForArchive(streamRuntime.rawText.String(), streamRuntime.text.String()))
+		}
+		streamRuntime.sendError(scannerErr.Error())
+		return true, false
+	}
+	terminalWritten := streamRuntime.finalize("end_turn", allowDeferEmpty)
+	if terminalWritten {
+		return true, false
+	}
+	return false, true
+}
--- a/internal/httpapi/claude/stream_runtime_core.go
+++ b/internal/httpapi/claude/stream_runtime_core.go
@@ -29,9 +29,10 @@ type claudeStreamRuntime struct {
 	bufferToolContent     bool
 	stripReferenceMarkers bool

-	messageID string
-	thinking  strings.Builder
-	text      strings.Builder
+	messageID         string
+	thinking          strings.Builder
+	text              strings.Builder
+	responseMessageID int

 	sieve                 toolstream.State
 	rawText               strings.Builder
@@ -92,6 +93,9 @@ func (s *claudeStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 		s.upstreamErr = parsed.ErrorMessage
 		return streamengine.ParsedDecision{Stop: true, StopReason: streamengine.StopReason("upstream_error")}
 	}
+	if parsed.ResponseMessageID > 0 {
+		s.responseMessageID = parsed.ResponseMessageID
+	}
 	if parsed.Stop {
 		return streamengine.ParsedDecision{Stop: true}
 	}
--- a/internal/httpapi/claude/stream_runtime_emit.go
+++ b/internal/httpapi/claude/stream_runtime_emit.go
@@ -22,16 +22,27 @@ func (s *claudeStreamRuntime) send(event string, v any) {
 }

 func (s *claudeStreamRuntime) sendError(message string) {
+	s.sendErrorWithCode(500, message, "internal_error")
+}
+
+func (s *claudeStreamRuntime) sendErrorWithCode(status int, message, code string) {
 	msg := strings.TrimSpace(message)
 	if msg == "" {
 		msg = "upstream stream error"
 	}
+	if code == "" {
+		code = "internal_error"
+	}
+	errType := "api_error"
+	if status == 429 {
+		errType = "rate_limit_error"
+	}
 	s.send("error", map[string]any{
 		"type": "error",
 		"error": map[string]any{
-			"type":    "api_error",
+			"type":    errType,
 			"message": msg,
-			"code":    "internal_error",
+			"code":    code,
 			"param":   nil,
 		},
 	})
--- a/internal/httpapi/claude/stream_runtime_finalize.go
+++ b/internal/httpapi/claude/stream_runtime_finalize.go
@@ -63,13 +63,10 @@ func (s *claudeStreamRuntime) sendToolUseBlock(idx int, tc toolcall.ParsedToolCa
 	})
 }

-func (s *claudeStreamRuntime) finalize(stopReason string) {
+func (s *claudeStreamRuntime) finalize(stopReason string, deferEmptyOutput bool) bool {
 	if s.ended {
-		return
+		return true
 	}
-	s.ended = true
-
-	s.closeThinkingBlock()

 	if s.bufferToolContent {
 		for _, evt := range toolstream.Flush(&s.sieve, s.toolNames) {
@@ -123,6 +120,7 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {
 		RawThinking:           s.rawThinking.String(),
 		VisibleThinking:       s.thinking.String(),
 		DetectionThinking:     s.toolDetectionThinking.String(),
+		ResponseMessageID:     s.responseMessageID,
 		AlreadyEmittedCalls:   s.toolCallsDetected,
 		AlreadyEmittedToolRaw: s.toolCallsDetected,
 	}, assistantturn.BuildOptions{
@@ -137,6 +135,22 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {
 	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{
 		AlreadyEmittedToolCalls: s.toolCallsDetected,
 	})
+	if outcome.ShouldFail {
+		if deferEmptyOutput {
+			return false
+		}
+		s.ended = true
+		s.closeThinkingBlock()
+		s.closeTextBlock()
+		if s.history != nil {
+			s.history.Error(outcome.Error.Status, outcome.Error.Message, outcome.Error.Code, responsehistory.ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking), responsehistory.TextForArchive(turn.RawText, turn.Text))
+		}
+		s.sendErrorWithCode(outcome.Error.Status, outcome.Error.Message, outcome.Error.Code)
+		return true
+	}
+
+	s.ended = true
+	s.closeThinkingBlock()

 	if s.bufferToolContent && !s.toolCallsDetected {
 		if len(turn.ToolCalls) > 0 {
@@ -197,6 +211,7 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {
 		},
 	})
 	s.send("message_stop", map[string]any{"type": "message_stop"})
+	return true
 }

 func (s *claudeStreamRuntime) onFinalize(reason streamengine.StopReason, scannerErr error) {
@@ -214,5 +229,5 @@ func (s *claudeStreamRuntime) onFinalize(reason streamengine.StopReason, scanner
 		s.sendError(scannerErr.Error())
 		return
 	}
-	s.finalize("end_turn")
+	s.finalize("end_turn", false)
 }
--- a/internal/httpapi/gemini/handler_generate.go
+++ b/internal/httpapi/gemini/handler_generate.go
@@ -137,7 +137,7 @@ func (h *Handler) handleGeminiDirectStream(w http.ResponseWriter, r *http.Reques
 		return
 	}
 	streamReq := start.Request
-	h.handleStreamGenerateContent(w, r, start.Response, streamReq.ResponseModel, streamReq.PromptTokenText, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
+	h.handleStreamGenerateContentWithRetry(w, r, a, start.Response, start.Payload, start.Pow, streamReq.ResponseModel, streamReq.PromptTokenText, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
 }

 func (h *Handler) proxyViaOpenAI(w http.ResponseWriter, r *http.Request, stream bool) bool {
--- a/internal/httpapi/gemini/handler_stream_runtime.go
+++ b/internal/httpapi/gemini/handler_stream_runtime.go
@@ -1,6 +1,7 @@
 package gemini

 import (
+	"context"
 	"encoding/json"
 	"io"
 	"net/http"
@@ -8,6 +9,8 @@ import (
 	"time"

 	"ds2api/internal/assistantturn"
+	"ds2api/internal/auth"
+	"ds2api/internal/completionruntime"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
@@ -54,7 +57,7 @@ func (h *Handler) handleStreamGenerateContent(w http.ResponseWriter, r *http.Req
 	}, streamengine.ConsumeHooks{
 		OnParsed: runtime.onParsed,
 		OnFinalize: func(_ streamengine.StopReason, _ error) {
-			runtime.finalize()
+			runtime.finalize(false)
 		},
 	})
 }
@@ -78,9 +81,83 @@ type geminiStreamRuntime struct {
 	accumulator       *assistantturn.Accumulator
 	contentFilter     bool
 	responseMessageID int
+	finalErrorStatus  int
+	finalErrorMessage string
+	finalErrorCode    string
 	history           *responsehistory.Session
 }

+func (h *Handler) handleStreamGenerateContentWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, model, finalPrompt string, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, historySession *responsehistory.Session) {
+	if resp.StatusCode != http.StatusOK {
+		defer func() { _ = resp.Body.Close() }()
+		body, _ := io.ReadAll(resp.Body)
+		if historySession != nil {
+			historySession.Error(resp.StatusCode, strings.TrimSpace(string(body)), "error", "", "")
+		}
+		writeGeminiError(w, resp.StatusCode, strings.TrimSpace(string(body)))
+		return
+	}
+
+	w.Header().Set("Content-Type", "text/event-stream")
+	w.Header().Set("Cache-Control", "no-cache, no-transform")
+	w.Header().Set("Connection", "keep-alive")
+	w.Header().Set("X-Accel-Buffering", "no")
+
+	rc := http.NewResponseController(w)
+	_, canFlush := w.(http.Flusher)
+	runtime := newGeminiStreamRuntime(w, rc, canFlush, model, finalPrompt, thinkingEnabled, searchEnabled, stripReferenceMarkersEnabled(), toolNames, toolsRaw, historySession)
+
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:      "gemini.generate_content",
+		Stream:       true,
+		RetryEnabled: true,
+		MaxAttempts:  3,
+		UsagePrompt:  finalPrompt,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeGeminiStreamAttempt(r.Context(), currentResp, runtime, thinkingEnabled, allowDeferEmpty)
+		},
+		Finalize: func(_ int) {
+			runtime.finalize(false)
+		},
+		ParentMessageID: func() int {
+			return runtime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			runtime.finalPrompt = prompt
+		},
+		OnRetryFailure: func(status int, message, _ string) {
+			runtime.sendErrorChunk(status, strings.TrimSpace(message))
+		},
+	})
+}
+
+func (h *Handler) consumeGeminiStreamAttempt(ctx context.Context, resp *http.Response, runtime *geminiStreamRuntime, thinkingEnabled bool, allowDeferEmpty bool) (bool, bool) {
+	defer func() { _ = resp.Body.Close() }()
+	initialType := "text"
+	if thinkingEnabled {
+		initialType = "thinking"
+	}
+	streamengine.ConsumeSSE(streamengine.ConsumeConfig{
+		Context:             ctx,
+		Body:                resp.Body,
+		ThinkingEnabled:     thinkingEnabled,
+		InitialType:         initialType,
+		KeepAliveInterval:   time.Duration(dsprotocol.KeepAliveTimeout) * time.Second,
+		IdleTimeout:         time.Duration(dsprotocol.StreamIdleTimeout) * time.Second,
+		MaxKeepAliveNoInput: dsprotocol.MaxKeepaliveCount,
+	}, streamengine.ConsumeHooks{
+		OnParsed: runtime.onParsed,
+		OnFinalize: func(_ streamengine.StopReason, _ error) {
+		},
+	})
+	terminalWritten := runtime.finalize(allowDeferEmpty)
+	if terminalWritten {
+		return true, false
+	}
+	return false, true
+}
+
 //nolint:unused // retained for native Gemini stream handling path.
 func newGeminiStreamRuntime(
 	w http.ResponseWriter,
@@ -127,6 +204,35 @@ func (s *geminiStreamRuntime) sendChunk(payload map[string]any) {
 	}
 }

+func (s *geminiStreamRuntime) sendErrorChunk(status int, message string) {
+	msg := strings.TrimSpace(message)
+	if msg == "" {
+		msg = http.StatusText(status)
+	}
+	errorStatus := "INVALID_ARGUMENT"
+	switch status {
+	case http.StatusUnauthorized:
+		errorStatus = "UNAUTHENTICATED"
+	case http.StatusForbidden:
+		errorStatus = "PERMISSION_DENIED"
+	case http.StatusTooManyRequests:
+		errorStatus = "RESOURCE_EXHAUSTED"
+	case http.StatusNotFound:
+		errorStatus = "NOT_FOUND"
+	default:
+		if status >= 500 {
+			errorStatus = "INTERNAL"
+		}
+	}
+	s.sendChunk(map[string]any{
+		"error": map[string]any{
+			"code":    status,
+			"message": msg,
+			"status":  errorStatus,
+		},
+	})
+}
+
 //nolint:unused // retained for native Gemini stream handling path.
 func (s *geminiStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedDecision {
 	if !parsed.Parsed {
@@ -192,7 +298,7 @@ func (s *geminiStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 }

 //nolint:unused // retained for native Gemini stream handling path.
-func (s *geminiStreamRuntime) finalize() {
+func (s *geminiStreamRuntime) finalize(deferEmptyOutput bool) bool {
 	rawText, text, rawThinking, thinking, detectionThinking := s.accumulator.Snapshot()
 	turn := assistantturn.BuildTurnFromStreamSnapshot(assistantturn.StreamSnapshot{
 		RawText:           rawText,
@@ -211,6 +317,19 @@ func (s *geminiStreamRuntime) finalize() {
 		ToolsRaw:              s.toolsRaw,
 	})
 	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{})
+	if outcome.ShouldFail {
+		if deferEmptyOutput {
+			s.finalErrorStatus = outcome.Error.Status
+			s.finalErrorMessage = outcome.Error.Message
+			s.finalErrorCode = outcome.Error.Code
+			return false
+		}
+		if s.history != nil {
+			s.history.Error(outcome.Error.Status, outcome.Error.Message, outcome.Error.Code, responsehistory.ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking), responsehistory.TextForArchive(turn.RawText, turn.Text))
+		}
+		s.sendErrorChunk(outcome.Error.Status, outcome.Error.Message)
+		return true
+	}
 	if s.history != nil {
 		s.history.Success(
 			http.StatusOK,
@@ -257,4 +376,5 @@ func (s *geminiStreamRuntime) finalize() {
 			"totalTokenCount":      outcome.Usage.TotalTokens,
 		},
 	})
+	return true
 }
--- a/internal/httpapi/openai/chat/empty_retry_runtime.go
+++ b/internal/httpapi/openai/chat/empty_retry_runtime.go
@@ -4,11 +4,11 @@ import (
 	"context"
 	"io"
 	"net/http"
-	"strings"
 	"time"

 	"ds2api/internal/assistantturn"
 	"ds2api/internal/auth"
+	"ds2api/internal/completionruntime"
 	"ds2api/internal/config"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	openaifmt "ds2api/internal/format/openai"
@@ -17,148 +17,53 @@ import (
 	streamengine "ds2api/internal/stream"
 )

-type chatNonStreamResult struct {
-	rawThinking           string
-	rawText               string
-	thinking              string
-	toolDetectionThinking string
-	text                  string
-	contentFilter         bool
-	detectedCalls         int
-	body                  map[string]any
-	finishReason          string
-	responseMessageID     int
-	outputError           *assistantturn.OutputError
-}
-
-func (r chatNonStreamResult) historyText() string {
-	return historyTextForArchive(r.rawText, r.text)
-}
-
-func (r chatNonStreamResult) historyThinking() string {
-	return historyThinkingForArchive(r.rawThinking, r.toolDetectionThinking, r.thinking)
-}
-
 func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Context, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, historySession *chatHistorySession) {
-	attempts := 0
-	currentResp := resp
-	usagePrompt := finalPrompt
-	accumulatedThinking := ""
-	accumulatedRawThinking := ""
-	accumulatedToolDetectionThinking := ""
-	for {
-		result, ok := h.collectChatNonStreamAttempt(w, currentResp, completionID, model, usagePrompt, thinkingEnabled, searchEnabled, toolNames, toolsRaw)
-		if !ok {
-			return
-		}
-		accumulatedThinking += sse.TrimContinuationOverlap(accumulatedThinking, result.thinking)
-		accumulatedRawThinking += sse.TrimContinuationOverlap(accumulatedRawThinking, result.rawThinking)
-		accumulatedToolDetectionThinking += sse.TrimContinuationOverlap(accumulatedToolDetectionThinking, result.toolDetectionThinking)
-		result.thinking = accumulatedThinking
-		result.rawThinking = accumulatedRawThinking
-		result.toolDetectionThinking = accumulatedToolDetectionThinking
-		detected := detectAssistantToolCalls(result.rawText, result.text, result.rawThinking, result.toolDetectionThinking, toolNames)
-		result.detectedCalls = len(detected.Calls)
-		result.body = openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, result.thinking, result.text, detected.Calls, toolsRaw)
-		addRefFileTokensToUsage(result.body, refFileTokens)
-		result.finishReason = chatFinishReason(result.body)
-		if !shouldRetryChatNonStream(result, attempts) {
-			h.finishChatNonStreamResult(w, result, attempts, usagePrompt, refFileTokens, historySession)
-			return
-		}
-
-		attempts++
-		config.Logger.Info("[openai_empty_retry] attempting synthetic retry", "surface", "chat.completions", "stream", false, "retry_attempt", attempts, "parent_message_id", result.responseMessageID)
-		retryPow, powErr := h.DS.GetPow(ctx, a, 3)
-		if powErr != nil {
-			config.Logger.Warn("[openai_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", "chat.completions", "stream", false, "retry_attempt", attempts, "error", powErr)
-			retryPow = pow
-		}
-		retryPayload := clonePayloadForEmptyOutputRetry(payload, result.responseMessageID)
-		nextResp, err := h.DS.CallCompletion(ctx, a, retryPayload, retryPow, 3)
-		if err != nil {
-			if historySession != nil {
-				historySession.error(http.StatusInternalServerError, "Failed to get completion.", "error", result.historyThinking(), result.historyText())
-			}
-			writeOpenAIError(w, http.StatusInternalServerError, "Failed to get completion.")
-			config.Logger.Warn("[openai_empty_retry] retry request failed", "surface", "chat.completions", "stream", false, "retry_attempt", attempts, "error", err)
-			return
-		}
-		usagePrompt = usagePromptWithEmptyOutputRetry(usagePrompt, attempts)
-		currentResp = nextResp
-	}
-}
-
-func (h *Handler) collectChatNonStreamAttempt(w http.ResponseWriter, resp *http.Response, completionID, model, usagePrompt string, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any) (chatNonStreamResult, bool) {
 	if resp.StatusCode != http.StatusOK {
 		defer func() { _ = resp.Body.Close() }()
 		body, _ := io.ReadAll(resp.Body)
-		writeOpenAIError(w, resp.StatusCode, string(body))
-		return chatNonStreamResult{}, false
-	}
-	result := sse.CollectStream(resp, thinkingEnabled, true)
-	turn := assistantturn.BuildTurnFromCollected(result, assistantturn.BuildOptions{
-		Model:         model,
-		Prompt:        usagePrompt,
-		SearchEnabled: searchEnabled,
-		ToolNames:     toolNames,
-		ToolsRaw:      toolsRaw,
-	})
-	respBody := openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, turn.Thinking, turn.Text, turn.ToolCalls, toolsRaw)
-	return chatNonStreamResult{
-		rawThinking:           result.Thinking,
-		rawText:               result.Text,
-		thinking:              turn.Thinking,
-		toolDetectionThinking: result.ToolDetectionThinking,
-		text:                  turn.Text,
-		contentFilter:         result.ContentFilter,
-		detectedCalls:         len(turn.ToolCalls),
-		body:                  respBody,
-		finishReason:          chatFinishReason(respBody),
-		responseMessageID:     result.ResponseMessageID,
-		outputError:           turn.Error,
-	}, true
-}
-
-func (h *Handler) finishChatNonStreamResult(w http.ResponseWriter, result chatNonStreamResult, attempts int, usagePrompt string, refFileTokens int, historySession *chatHistorySession) {
-	if result.detectedCalls == 0 && strings.TrimSpace(result.text) == "" {
-		status, message, code := upstreamEmptyOutputDetail(result.contentFilter, result.text, result.thinking)
-		if result.outputError != nil {
-			status, message, code = result.outputError.Status, result.outputError.Message, result.outputError.Code
-		}
 		if historySession != nil {
-			historySession.error(status, message, code, result.historyThinking(), result.historyText())
+			historySession.error(resp.StatusCode, string(body), "error", "", "")
 		}
-		writeOpenAIErrorWithCode(w, status, message, code)
-		config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "chat.completions", "stream", false, "retry_attempts", attempts, "success_source", "none", "content_filter", result.contentFilter)
+		writeOpenAIError(w, resp.StatusCode, string(body))
 		return
 	}
-	if historySession != nil {
-		historySession.success(http.StatusOK, result.historyThinking(), result.historyText(), result.finishReason, openaifmt.BuildChatUsageForModel("", usagePrompt, result.thinking, result.text, refFileTokens))
+	stdReq := promptcompat.StandardRequest{
+		Surface:         "chat.completions",
+		ResponseModel:   model,
+		PromptTokenText: finalPrompt,
+		FinalPrompt:     finalPrompt,
+		RefFileTokens:   refFileTokens,
+		Thinking:        thinkingEnabled,
+		Search:          searchEnabled,
+		ToolNames:       toolNames,
+		ToolsRaw:        toolsRaw,
+		ToolChoice:      promptcompat.DefaultToolChoicePolicy(),
 	}
-	writeJSON(w, http.StatusOK, result.body)
-	source := "first_attempt"
-	if attempts > 0 {
-		source = "synthetic_retry"
-	}
-	config.Logger.Info("[openai_empty_retry] completed", "surface", "chat.completions", "stream", false, "retry_attempts", attempts, "success_source", source)
-}
-
-func chatFinishReason(respBody map[string]any) string {
-	if choices, ok := respBody["choices"].([]map[string]any); ok && len(choices) > 0 {
-		if fr, _ := choices[0]["finish_reason"].(string); strings.TrimSpace(fr) != "" {
-			return fr
+	retryEnabled := h != nil && h.DS != nil && emptyOutputRetryEnabled()
+	result, outErr := completionruntime.ExecuteNonStreamStartedWithRetry(ctx, h.DS, a, completionruntime.StartResult{
+		SessionID: completionID,
+		Payload:   payload,
+		Pow:       pow,
+		Response:  resp,
+		Request:   stdReq,
+	}, completionruntime.Options{
+		RetryEnabled:     retryEnabled,
+		RetryMaxAttempts: emptyOutputRetryMaxAttempts(),
+	})
+	if outErr != nil {
+		if historySession != nil {
+			historySession.error(outErr.Status, outErr.Message, outErr.Code, historyThinkingForArchive(result.Turn.RawThinking, result.Turn.DetectionThinking, result.Turn.Thinking), historyTextForArchive(result.Turn.RawText, result.Turn.Text))
 		}
+		writeOpenAIErrorWithCode(w, outErr.Status, outErr.Message, outErr.Code)
+		return
 	}
-	return "stop"
-}
-
-func shouldRetryChatNonStream(result chatNonStreamResult, attempts int) bool {
-	return emptyOutputRetryEnabled() &&
-		attempts < emptyOutputRetryMaxAttempts() &&
-		!result.contentFilter &&
-		result.detectedCalls == 0 &&
-		strings.TrimSpace(result.text) == ""
+	respBody := openaifmt.BuildChatCompletionWithToolCalls(result.SessionID, model, result.Turn.Prompt, result.Turn.Thinking, result.Turn.Text, result.Turn.ToolCalls, toolsRaw)
+	respBody["usage"] = assistantturn.OpenAIChatUsage(result.Turn)
+	outcome := assistantturn.FinalizeTurn(result.Turn, assistantturn.FinalizeOptions{})
+	if historySession != nil {
+		historySession.success(http.StatusOK, historyThinkingForArchive(result.Turn.RawThinking, result.Turn.DetectionThinking, result.Turn.Thinking), historyTextForArchive(result.Turn.RawText, result.Turn.Text), outcome.FinishReason, assistantturn.OpenAIChatUsage(result.Turn))
+	}
+	writeJSON(w, http.StatusOK, respBody)
 }

 func (h *Handler) handleStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, historySession *chatHistorySession) {
@@ -166,42 +71,35 @@ func (h *Handler) handleStreamWithRetry(w http.ResponseWriter, r *http.Request,
 	if !ok {
 		return
 	}
-	attempts := 0
-	currentResp := resp
-	for {
-		terminalWritten, retryable := h.consumeChatStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, historySession, attempts < emptyOutputRetryMaxAttempts())
-		if terminalWritten {
-			logChatStreamTerminal(streamRuntime, attempts)
-			return
-		}
-		if !retryable || !emptyOutputRetryEnabled() || attempts >= emptyOutputRetryMaxAttempts() {
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:          "chat.completions",
+		Stream:           true,
+		RetryEnabled:     emptyOutputRetryEnabled(),
+		RetryMaxAttempts: emptyOutputRetryMaxAttempts(),
+		MaxAttempts:      3,
+		UsagePrompt:      finalPrompt,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeChatStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, historySession, allowDeferEmpty)
+		},
+		Finalize: func(attempts int) {
 			streamRuntime.finalize("stop", false)
 			recordChatStreamHistory(streamRuntime, historySession)
 			config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "chat.completions", "stream", true, "retry_attempts", attempts, "success_source", "none")
-			return
-		}
-		attempts++
-		config.Logger.Info("[openai_empty_retry] attempting synthetic retry", "surface", "chat.completions", "stream", true, "retry_attempt", attempts, "parent_message_id", streamRuntime.responseMessageID)
-		retryPow, powErr := h.DS.GetPow(r.Context(), a, 3)
-		if powErr != nil {
-			config.Logger.Warn("[openai_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", "chat.completions", "stream", true, "retry_attempt", attempts, "error", powErr)
-			retryPow = pow
-		}
-		nextResp, err := h.DS.CallCompletion(r.Context(), a, clonePayloadForEmptyOutputRetry(payload, streamRuntime.responseMessageID), retryPow, 3)
-		if err != nil {
-			failChatStreamRetry(streamRuntime, historySession, http.StatusInternalServerError, "Failed to get completion.", "error")
-			config.Logger.Warn("[openai_empty_retry] retry request failed", "surface", "chat.completions", "stream", true, "retry_attempt", attempts, "error", err)
-			return
-		}
-		if nextResp.StatusCode != http.StatusOK {
-			defer func() { _ = nextResp.Body.Close() }()
-			body, _ := io.ReadAll(nextResp.Body)
-			failChatStreamRetry(streamRuntime, historySession, nextResp.StatusCode, string(body), "error")
-			return
-		}
-		streamRuntime.finalPrompt = usagePromptWithEmptyOutputRetry(finalPrompt, attempts)
-		currentResp = nextResp
-	}
+		},
+		ParentMessageID: func() int {
+			return streamRuntime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			streamRuntime.finalPrompt = prompt
+		},
+		OnRetryFailure: func(status int, message, code string) {
+			failChatStreamRetry(streamRuntime, historySession, status, message, code)
+		},
+		OnTerminal: func(attempts int) {
+			logChatStreamTerminal(streamRuntime, attempts)
+		},
+	})
 }

 func (h *Handler) prepareChatStreamRuntime(w http.ResponseWriter, resp *http.Response, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, historySession *chatHistorySession) (*chatStreamRuntime, string, bool) {
--- a/internal/httpapi/openai/chat/handler.go
+++ b/internal/httpapi/openai/chat/handler.go
@@ -106,10 +106,6 @@ func cleanVisibleOutput(text string, stripReferenceMarkers bool) string {
 	return shared.CleanVisibleOutput(text, stripReferenceMarkers)
 }

-func upstreamEmptyOutputDetail(contentFilter bool, text, thinking string) (int, string, string) {
-	return shared.UpstreamEmptyOutputDetail(contentFilter, text, thinking)
-}
-
 func emptyOutputRetryEnabled() bool {
 	return shared.EmptyOutputRetryEnabled()
 }
@@ -118,14 +114,6 @@ func emptyOutputRetryMaxAttempts() int {
 	return shared.EmptyOutputRetryMaxAttempts()
 }

-func clonePayloadForEmptyOutputRetry(payload map[string]any, parentMessageID int) map[string]any {
-	return shared.ClonePayloadForEmptyOutputRetry(payload, parentMessageID)
-}
-
-func usagePromptWithEmptyOutputRetry(originalPrompt string, retryAttempts int) string {
-	return shared.UsagePromptWithEmptyOutputRetry(originalPrompt, retryAttempts)
-}
-
 func formatIncrementalStreamToolCallDeltas(deltas []toolstream.ToolCallDelta, ids map[int]string) []map[string]any {
 	return shared.FormatIncrementalStreamToolCallDeltas(deltas, ids)
 }
@@ -137,7 +125,3 @@ func filterIncrementalToolCallDeltasByAllowed(deltas []toolstream.ToolCallDelta,
 func formatFinalStreamToolCallsWithStableIDs(calls []toolcall.ParsedToolCall, ids map[int]string, toolsRaw any) []map[string]any {
 	return shared.FormatFinalStreamToolCallsWithStableIDs(calls, ids, toolsRaw)
 }
-
-func detectAssistantToolCalls(rawText, visibleText, exposedThinking, detectionThinking string, toolNames []string) toolcall.ToolCallParseResult {
-	return shared.DetectAssistantToolCalls(rawText, visibleText, exposedThinking, detectionThinking, toolNames)
-}
--- a/internal/httpapi/openai/chat/handler_toolcall_test.go
+++ b/internal/httpapi/openai/chat/handler_toolcall_test.go
@@ -85,8 +85,7 @@ func streamFinishReason(frames []map[string]any) string {
 	return ""
 }

-// Backward-compatible alias for historical test name used in CI logs.
-func TestHandleNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T) {
+func TestHandleNonStreamSingleAttemptReturns503WhenUpstreamOutputEmpty(t *testing.T) {
 	h := &Handler{}
 	resp := makeSSEHTTPResponse(
 		`data: {"p":"response/content","v":""}`,
@@ -95,17 +94,17 @@ func TestHandleNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T) {
 	rec := httptest.NewRecorder()

 	h.handleNonStream(rec, resp, "cid-empty", "deepseek-v4-flash", "prompt", 0, false, false, nil, nil, nil)
-	if rec.Code != http.StatusTooManyRequests {
-		t.Fatalf("expected status 429 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
+	if rec.Code != http.StatusServiceUnavailable {
+		t.Fatalf("expected status 503 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
 	}
 	out := decodeJSONBody(t, rec.Body.String())
 	errObj, _ := out["error"].(map[string]any)
-	if asString(errObj["code"]) != "upstream_empty_output" {
-		t.Fatalf("expected code=upstream_empty_output, got %#v", out)
+	if asString(errObj["code"]) != "upstream_unavailable" {
+		t.Fatalf("expected code=upstream_unavailable, got %#v", out)
 	}
 }

-func TestHandleNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
+func TestHandleNonStreamSingleAttemptReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
 	h := &Handler{}
 	resp := makeSSEHTTPResponse(
 		`data: {"code":"content_filter"}`,
@@ -124,7 +123,7 @@ func TestHandleNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutp
 	}
 }

-func TestHandleNonStreamReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
+func TestHandleNonStreamSingleAttemptReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
 	h := &Handler{}
 	resp := makeSSEHTTPResponse(
 		`data: {"p":"response/thinking_content","v":"Only thinking"}`,
--- a/internal/httpapi/openai/chat/ref_file_tokens.go
+++ b/internal/httpapi/openai/chat/ref_file_tokens.go
@@ -1,26 +0,0 @@
-package chat
-
-// addRefFileTokensToUsage adds inline-uploaded file token estimates to an existing
-// usage map inside a response object. This keeps the token accounting aware of file
-// content that the upstream model processes but that is not part of the prompt text.
-func addRefFileTokensToUsage(obj map[string]any, refFileTokens int) {
-	if refFileTokens <= 0 || obj == nil {
-		return
-	}
-	usage, ok := obj["usage"].(map[string]any)
-	if !ok || usage == nil {
-		return
-	}
-	for _, key := range []string{"input_tokens", "prompt_tokens"} {
-		if v, ok := usage[key]; ok {
-			if n, ok := v.(int); ok {
-				usage[key] = n + refFileTokens
-			}
-		}
-	}
-	if v, ok := usage["total_tokens"]; ok {
-		if n, ok := v.(int); ok {
-			usage["total_tokens"] = n + refFileTokens
-		}
-	}
-}
--- a/internal/httpapi/openai/responses/empty_retry_runtime.go
+++ b/internal/httpapi/openai/responses/empty_retry_runtime.go
@@ -7,6 +7,7 @@ import (
 	"time"

 	"ds2api/internal/auth"
+	"ds2api/internal/completionruntime"
 	"ds2api/internal/config"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	"ds2api/internal/promptcompat"
@@ -19,41 +20,34 @@ func (h *Handler) handleResponsesStreamWithRetry(w http.ResponseWriter, r *http.
 	if !ok {
 		return
 	}
-	attempts := 0
-	currentResp := resp
-	for {
-		terminalWritten, retryable := h.consumeResponsesStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, attempts < emptyOutputRetryMaxAttempts())
-		if terminalWritten {
-			logResponsesStreamTerminal(streamRuntime, attempts)
-			return
-		}
-		if !retryable || !emptyOutputRetryEnabled() || attempts >= emptyOutputRetryMaxAttempts() {
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:          "responses",
+		Stream:           true,
+		RetryEnabled:     emptyOutputRetryEnabled(),
+		RetryMaxAttempts: emptyOutputRetryMaxAttempts(),
+		MaxAttempts:      3,
+		UsagePrompt:      finalPrompt,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeResponsesStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, allowDeferEmpty)
+		},
+		Finalize: func(attempts int) {
 			streamRuntime.finalize("stop", false)
 			config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "responses", "stream", true, "retry_attempts", attempts, "success_source", "none", "error_code", streamRuntime.finalErrorCode)
-			return
-		}
-		attempts++
-		config.Logger.Info("[openai_empty_retry] attempting synthetic retry", "surface", "responses", "stream", true, "retry_attempt", attempts, "parent_message_id", streamRuntime.responseMessageID)
-		retryPow, powErr := h.DS.GetPow(r.Context(), a, 3)
-		if powErr != nil {
-			config.Logger.Warn("[openai_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", "responses", "stream", true, "retry_attempt", attempts, "error", powErr)
-			retryPow = pow
-		}
-		nextResp, err := h.DS.CallCompletion(r.Context(), a, clonePayloadForEmptyOutputRetry(payload, streamRuntime.responseMessageID), retryPow, 3)
-		if err != nil {
-			streamRuntime.failResponse(http.StatusInternalServerError, "Failed to get completion.", "error")
-			config.Logger.Warn("[openai_empty_retry] retry request failed", "surface", "responses", "stream", true, "retry_attempt", attempts, "error", err)
-			return
-		}
-		if nextResp.StatusCode != http.StatusOK {
-			defer func() { _ = nextResp.Body.Close() }()
-			body, _ := io.ReadAll(nextResp.Body)
-			streamRuntime.failResponse(nextResp.StatusCode, strings.TrimSpace(string(body)), "error")
-			return
-		}
-		streamRuntime.finalPrompt = usagePromptWithEmptyOutputRetry(finalPrompt, attempts)
-		currentResp = nextResp
-	}
+		},
+		ParentMessageID: func() int {
+			return streamRuntime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			streamRuntime.finalPrompt = prompt
+		},
+		OnRetryFailure: func(status int, message, code string) {
+			streamRuntime.failResponse(status, strings.TrimSpace(message), code)
+		},
+		OnTerminal: func(attempts int) {
+			logResponsesStreamTerminal(streamRuntime, attempts)
+		},
+	})
 }

 func (h *Handler) prepareResponsesStreamRuntime(w http.ResponseWriter, resp *http.Response, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string, historySession *responsehistory.Session) (*responsesStreamRuntime, string, bool) {
--- a/internal/httpapi/openai/responses/handler.go
+++ b/internal/httpapi/openai/responses/handler.go
@@ -103,14 +103,6 @@ func emptyOutputRetryMaxAttempts() int {
 	return shared.EmptyOutputRetryMaxAttempts()
 }

-func clonePayloadForEmptyOutputRetry(payload map[string]any, parentMessageID int) map[string]any {
-	return shared.ClonePayloadForEmptyOutputRetry(payload, parentMessageID)
-}
-
-func usagePromptWithEmptyOutputRetry(originalPrompt string, retryAttempts int) string {
-	return shared.UsagePromptWithEmptyOutputRetry(originalPrompt, retryAttempts)
-}
-
 func filterIncrementalToolCallDeltasByAllowed(deltas []toolstream.ToolCallDelta, seenNames map[int]string) []toolstream.ToolCallDelta {
 	return shared.FilterIncrementalToolCallDeltasByAllowed(deltas, seenNames)
 }
--- a/internal/httpapi/openai/responses/responses_stream_test.go
+++ b/internal/httpapi/openai/responses/responses_stream_test.go
@@ -397,7 +397,7 @@ func TestHandleResponsesNonStreamRequiredToolChoiceIgnoresThinkingToolPayloadWhe
 	}
 }

-func TestHandleResponsesNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T) {
+func TestHandleResponsesNonStreamSingleAttemptReturns503WhenUpstreamOutputEmpty(t *testing.T) {
 	h := &Handler{}
 	rec := httptest.NewRecorder()
 	resp := &http.Response{
@@ -409,17 +409,17 @@ func TestHandleResponsesNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T)
 	}

 	h.handleResponsesNonStream(rec, resp, "owner-a", "resp_test", "deepseek-v4-flash", "prompt", 0, false, false, nil, nil, promptcompat.DefaultToolChoicePolicy(), "")
-	if rec.Code != http.StatusTooManyRequests {
-		t.Fatalf("expected 429 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
+	if rec.Code != http.StatusServiceUnavailable {
+		t.Fatalf("expected 503 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
 	}
 	out := decodeJSONBody(t, rec.Body.String())
 	errObj, _ := out["error"].(map[string]any)
-	if asString(errObj["code"]) != "upstream_empty_output" {
-		t.Fatalf("expected code=upstream_empty_output, got %#v", out)
+	if asString(errObj["code"]) != "upstream_unavailable" {
+		t.Fatalf("expected code=upstream_unavailable, got %#v", out)
 	}
 }

-func TestHandleResponsesNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
+func TestHandleResponsesNonStreamSingleAttemptReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
 	h := &Handler{}
 	rec := httptest.NewRecorder()
 	resp := &http.Response{
@@ -441,7 +441,7 @@ func TestHandleResponsesNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWi
 	}
 }

-func TestHandleResponsesNonStreamReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
+func TestHandleResponsesNonStreamSingleAttemptReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
 	h := &Handler{}
 	rec := httptest.NewRecorder()
 	resp := &http.Response{
--- a/internal/httpapi/openai/shared/upstream_empty.go
+++ b/internal/httpapi/openai/shared/upstream_empty.go
@@ -17,7 +17,7 @@ func UpstreamEmptyOutputDetail(contentFilter bool, text, thinking string) (int,
 	if thinking != "" {
 		return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned reasoning without visible output.", "upstream_empty_output"
 	}
-	return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned empty output.", "upstream_empty_output"
+	return http.StatusServiceUnavailable, "Upstream service is unavailable and returned no output.", "upstream_unavailable"
 }

 func WriteUpstreamEmptyOutputError(w http.ResponseWriter, text, thinking string, contentFilter bool) bool {
--- a/internal/httpapi/openai/stream_status_test.go
+++ b/internal/httpapi/openai/stream_status_test.go
@@ -274,12 +274,12 @@ func TestChatCompletionsStreamEmitsFailureFrameWhenUpstreamOutputEmpty(t *testin
 	}
 	last := frames[0]
 	statusCode, ok := last["status_code"].(float64)
-	if !ok || int(statusCode) != http.StatusTooManyRequests {
-		t.Fatalf("expected status_code=429, got %#v body=%s", last["status_code"], rec.Body.String())
+	if !ok || int(statusCode) != http.StatusServiceUnavailable {
+		t.Fatalf("expected status_code=503, got %#v body=%s", last["status_code"], rec.Body.String())
 	}
 	errObj, _ := last["error"].(map[string]any)
-	if asString(errObj["code"]) != "upstream_empty_output" {
-		t.Fatalf("expected code=upstream_empty_output, got %#v", last)
+	if asString(errObj["code"]) != "upstream_unavailable" {
+		t.Fatalf("expected code=upstream_unavailable, got %#v", last)
 	}
 }

@@ -345,7 +345,7 @@ func TestChatCompletionsStreamRetriesEmptyOutputOnSameSession(t *testing.T) {

 func TestChatCompletionsNonStreamRetriesThinkingOnlyOutput(t *testing.T) {
 	ds := &streamStatusDSSeqStub{resps: []*http.Response{
-		makeOpenAISSEHTTPResponse(`data: {"response_message_id":99}`, "data: [DONE]"),
+		makeOpenAISSEHTTPResponse(`data: {"response_message_id":99,"p":"response/thinking_content","v":"plan"}`, "data: [DONE]"),
 		makeOpenAISSEHTTPResponse(`data: {"p":"response/content","v":"visible"}`, "data: [DONE]"),
 	}}
 	h := &openAITestSurface{
@@ -496,7 +496,7 @@ func TestResponsesStreamRetriesThinkingOnlyOutput(t *testing.T) {

 func TestResponsesNonStreamRetriesThinkingOnlyOutput(t *testing.T) {
 	ds := &streamStatusDSSeqStub{resps: []*http.Response{
-		makeOpenAISSEHTTPResponse(`data: {"response_message_id":88}`, "data: [DONE]"),
+		makeOpenAISSEHTTPResponse(`data: {"response_message_id":88,"p":"response/thinking_content","v":"plan"}`, "data: [DONE]"),
 		makeOpenAISSEHTTPResponse(`data: {"p":"response/content","v":"visible"}`, "data: [DONE]"),
 	}}
 	h := &openAITestSurface{
@@ -537,8 +537,15 @@ func TestResponsesNonStreamRetriesThinkingOnlyOutput(t *testing.T) {
 	if len(content) == 0 {
 		t.Fatalf("expected content entries, got %#v", item)
 	}
-	textEntry, _ := content[0].(map[string]any)
-	if asString(textEntry["type"]) != "output_text" || asString(textEntry["text"]) != "visible" {
+	var textEntry map[string]any
+	for _, entry := range content {
+		obj, _ := entry.(map[string]any)
+		if asString(obj["type"]) == "output_text" {
+			textEntry = obj
+			break
+		}
+	}
+	if asString(textEntry["text"]) != "visible" {
 		t.Fatalf("expected visible text entry, got %#v", content)
 	}
 }
--- a/internal/js/chat-stream/vercel_stream_impl.js
+++ b/internal/js/chat-stream/vercel_stream_impl.js
@@ -641,9 +641,9 @@ function upstreamEmptyOutputDetail(contentFilter, _text, thinking) {
    };
  }
  return {
-    status: 429,
-    message: 'Upstream account hit a rate limit and returned empty output.',
-    code: 'upstream_empty_output',
+    status: 503,
+    message: 'Upstream service is unavailable and returned no output.',
+    code: 'upstream_unavailable',
  };
 }

--- a/internal/js/helpers/stream-tool-sieve/parse_payload.js
+++ b/internal/js/helpers/stream-tool-sieve/parse_payload.js
@@ -1,6 +1,6 @@
 'use strict';

-const CDATA_PATTERN = /^<!\[CDATA\[([\s\S]*?)]]>$/i;
+const CDATA_PATTERN = /^<!\[CDATA\[([\s\S]*?)]](?:>|＞)$/i;
 const XML_ATTR_PATTERN = /\b([a-z0-9_:-]+)\s*=\s*("([^"]*)"|'([^']*)')/gi;
 const TOOL_MARKUP_NAMES = [
  { raw: 'tool_calls', canonical: 'tool_calls' },
@@ -102,9 +102,10 @@ function updateCDATAStateLine(inCDATA, line) {
  let state = inCDATA;
  while (pos < lower.length) {
    if (state) {
-      const end = lower.indexOf(']]>', pos);
+      const cdataEnd = findCDATAEnd(lower, pos);
+      const end = cdataEnd.index;
      if (end < 0) return true;
-      pos = end + ']]>'.length;
+      pos = end + cdataEnd.len;
      state = false;
      continue;
    }
@@ -252,8 +253,9 @@ function replaceDSMLToolMarkupOutsideIgnored(text) {
    const tag = scanToolMarkupTagAt(raw, i);
    if (tag) {
      if (tag.dsmlLike) {
-        out += `<${tag.closing ? '/' : ''}${tag.name}${raw.slice(tag.nameEnd, tag.end + 1)}`;
-        if (raw[tag.end] !== '>') {
+        const tail = normalizeToolMarkupTagTailForXML(raw.slice(tag.nameEnd, tag.end + 1));
+        out += `<${tag.closing ? '/' : ''}${tag.name}${tail}`;
+        if (!tail.endsWith('>')) {
          out += '>';
        }
      } else {
@@ -409,11 +411,12 @@ function findMatchingXmlEndTagOutsideCDATA(text, tag, from) {

 function skipXmlIgnoredSection(lower, i) {
  if (lower.startsWith('<![cdata[', i)) {
-    const end = lower.indexOf(']]>', i + '<![cdata['.length);
+    const cdataEnd = findCDATAEnd(lower, i + '<![cdata['.length);
+    const end = cdataEnd.index;
    if (end < 0) {
      return { advanced: false, blocked: true, next: i };
    }
-    return { advanced: true, blocked: false, next: end + ']]>'.length };
+    return { advanced: true, blocked: false, next: end + cdataEnd.len };
  }
  if (lower.startsWith('<!--', i)) {
    const end = lower.indexOf('-->', i + '<!--'.length);
@@ -425,6 +428,21 @@ function skipXmlIgnoredSection(lower, i) {
  return { advanced: false, blocked: false, next: i };
 }

+function findCDATAEnd(text, from) {
+  const ascii = text.indexOf(']]>', from);
+  const fullwidth = text.indexOf(']]＞', from);
+  if (ascii < 0 && fullwidth < 0) {
+    return { index: -1, len: 0 };
+  }
+  if (ascii < 0) {
+    return { index: fullwidth, len: ']]＞'.length };
+  }
+  if (fullwidth < 0 || ascii < fullwidth) {
+    return { index: ascii, len: ']]>'.length };
+  }
+  return { index: fullwidth, len: ']]＞'.length };
+}
+
 function scanToolMarkupTagAt(text, start) {
  const raw = toStringSafe(text);
  if (!raw || start < 0 || start >= raw.length || raw[start] !== '<') {
@@ -442,7 +460,7 @@ function scanToolMarkupTagAt(text, start) {
  const prefix = consumeToolMarkupNamePrefix(raw, lower, i);
  i = prefix.next;
  const dsmlLike = prefix.dsmlLike;
-  const { name, len } = matchToolMarkupName(lower, i, dsmlLike);
+  const { name, len } = matchToolMarkupName(raw, i, dsmlLike);
  if (!name) {
    return null;
  }
@@ -541,7 +559,7 @@ function findPartialToolMarkupStart(text) {
  }
  const start = includeDuplicateLeadingLessThan(raw, lastLT);
  const tail = raw.slice(start);
-  if (tail.includes('>')) {
+  if (tail.includes('>') || tail.includes('＞')) {
    return -1;
  }
  return isPartialToolMarkupTagPrefix(tail) ? start : -1;
@@ -579,10 +597,10 @@ function isPartialToolMarkupTagPrefix(text) {
    if (i === raw.length) {
      return true;
    }
-    if (hasToolMarkupNamePrefix(lower.slice(i))) {
+    if (hasToolMarkupNamePrefix(raw, i)) {
      return true;
    }
-    if ('dsml'.startsWith(lower.slice(i))) {
+    if (normalizedASCIITailAt(raw, i).startsWith('dsml') || 'dsml'.startsWith(normalizedASCIITailAt(raw, i))) {
      return true;
    }
    const next = consumeToolMarkupNamePrefixOnce(raw, lower, i);
@@ -614,9 +632,11 @@ function consumeToolMarkupNamePrefixOnce(raw, lower, idx) {
  if (idx < raw.length && [' ', '\t', '\r', '\n'].includes(raw[idx])) {
    return { next: idx + 1, ok: true };
  }
-  if (lower.startsWith('dsml', idx)) {
-    let next = idx + 'dsml'.length;
-    if (next < raw.length && (raw[next] === '-' || raw[next] === '_')) {
+  const dsml = matchNormalizedASCII(raw, idx, 'dsml');
+  if (dsml.ok) {
+    let next = idx + dsml.len;
+    const sep = normalizeFullwidthASCIIChar(raw[next] || '');
+    if (next < raw.length && (sep === '-' || sep === '_')) {
      next += 1;
    }
    return { next, ok: true };
@@ -629,12 +649,15 @@ function consumeToolMarkupNamePrefixOnce(raw, lower, idx) {
 }

 function consumeArbitraryToolMarkupNamePrefix(raw, lower, idx) {
-  if (idx < 0 || idx >= raw.length || !isToolMarkupPrefixSegmentChar(raw[idx])) {
+  const first = consumeToolMarkupPrefixSegment(raw, idx);
+  if (!first.ok) {
    return { next: idx, ok: false };
  }
-  let j = idx + 1;
-  while (j < raw.length && isToolMarkupPrefixSegmentChar(raw[j])) {
-    j += 1;
+  let j = first.next;
+  while (j < raw.length) {
+    const segment = consumeToolMarkupPrefixSegment(raw, j);
+    if (!segment.ok) break;
+    j = segment.next;
  }
  let k = j;
  while (k < raw.length && [' ', '\t', '\r', '\n'].includes(raw[k])) {
@@ -645,7 +668,7 @@ function consumeArbitraryToolMarkupNamePrefix(raw, lower, idx) {
  if (next < raw.length && isToolMarkupPipe(raw[next])) {
    next += 1;
    ok = true;
-  } else if (next < raw.length && (raw[next] === '_' || raw[next] === '-')) {
+  } else if (next < raw.length && ['_', '-'].includes(normalizeFullwidthASCIIChar(raw[next]))) {
    next += 1;
    ok = true;
  }
@@ -655,32 +678,41 @@ function consumeArbitraryToolMarkupNamePrefix(raw, lower, idx) {
  while (next < raw.length && [' ', '\t', '\r', '\n'].includes(raw[next])) {
    next += 1;
  }
-  if (!hasToolMarkupNamePrefix(lower.slice(next))) {
+  if (!hasToolMarkupNamePrefix(raw, next)) {
    return { next: idx, ok: false };
  }
  return { next, ok: true };
 }

-function isToolMarkupPrefixSegmentChar(ch) {
-  return /^[A-Za-z0-9]$/.test(ch);
+function consumeToolMarkupPrefixSegment(raw, idx) {
+  if (idx < 0 || idx >= raw.length) {
+    return { next: idx, ok: false };
+  }
+  const ch = normalizeFullwidthASCIIChar(raw[idx]);
+  if (/^[A-Za-z0-9]$/.test(ch)) {
+    return { next: idx + 1, ok: true };
+  }
+  return { next: idx, ok: false };
 }

-function hasToolMarkupNamePrefix(lowerTail) {
+function hasToolMarkupNamePrefix(raw, start) {
+  const tail = normalizedASCIITailAt(raw, start);
  for (const name of TOOL_MARKUP_NAMES) {
-    if (lowerTail.startsWith(name.raw) || name.raw.startsWith(lowerTail)) {
+    if (tail.startsWith(name.raw) || name.raw.startsWith(tail)) {
      return true;
    }
  }
  return false;
 }

-function matchToolMarkupName(lower, start, dsmlLike) {
+function matchToolMarkupName(raw, start, dsmlLike) {
  for (const name of TOOL_MARKUP_NAMES) {
    if (name.dsmlOnly && !dsmlLike) {
      continue;
    }
-    if (lower.startsWith(name.raw, start)) {
-      return { name: name.canonical, len: name.raw.length };
+    const matched = matchNormalizedASCII(raw, start, name.raw);
+    if (matched.ok) {
+      return { name: name.canonical, len: matched.len };
    }
  }
  return { name: '', len: 0 };
@@ -690,17 +722,18 @@ function findXmlTagEnd(text, from) {
  let quote = '';
  for (let i = Math.max(0, from || 0); i < text.length; i += 1) {
    const ch = text[i];
+    const normalized = normalizeFullwidthASCIIChar(ch);
    if (quote) {
-      if (ch === quote) {
+      if (normalized === quote) {
        quote = '';
      }
      continue;
    }
-    if (ch === '"' || ch === "'") {
-      quote = ch;
+    if (normalized === '"' || normalized === "'") {
+      quote = normalized;
      continue;
    }
-    if (ch === '>') {
+    if (normalized === '>') {
      return i;
    }
  }
@@ -711,13 +744,65 @@ function hasXmlTagBoundary(text, idx) {
  if (idx >= text.length) {
    return true;
  }
-  return [' ', '\t', '\n', '\r', '>', '/'].includes(text[idx]);
+  return [' ', '\t', '\n', '\r', '>', '/'].includes(text[idx])
+    || normalizeFullwidthASCIIChar(text[idx]) === '>';
 }

 function isSelfClosingXmlTag(startTag) {
  return toStringSafe(startTag).trim().endsWith('/');
 }

+function normalizeFullwidthASCIIChar(ch) {
+  if (!ch) {
+    return ch;
+  }
+  const code = ch.charCodeAt(0);
+  if (code >= 0xff01 && code <= 0xff5e) {
+    return String.fromCharCode(code - 0xfee0);
+  }
+  return ch;
+}
+
+function normalizedASCIITailAt(raw, start) {
+  let out = '';
+  for (let i = Math.max(0, start || 0); i < raw.length; i += 1) {
+    const ch = normalizeFullwidthASCIIChar(raw[i]).toLowerCase();
+    if (ch.charCodeAt(0) > 0x7f) {
+      break;
+    }
+    out += ch;
+  }
+  return out;
+}
+
+function matchNormalizedASCII(raw, start, expected) {
+  let idx = start;
+  for (let j = 0; j < expected.length; j += 1) {
+    if (idx >= raw.length) {
+      return { ok: false, len: 0 };
+    }
+    const ch = normalizeFullwidthASCIIChar(raw[idx]).toLowerCase();
+    if (ch !== expected[j].toLowerCase()) {
+      return { ok: false, len: 0 };
+    }
+    idx += 1;
+  }
+  return { ok: true, len: idx - start };
+}
+
+function normalizeToolMarkupTagTailForXML(tail) {
+  let out = '';
+  for (const ch of typeof tail === 'string' ? tail : String(tail || '')) {
+    const normalized = normalizeFullwidthASCIIChar(ch);
+    if (['>', '/', '=', '"', "'"].includes(normalized)) {
+      out += normalized;
+    } else {
+      out += ch;
+    }
+  }
+  return out;
+}
+
 function parseMarkupInput(raw) {
  const s = toStringSafe(raw).trim();
  if (!s) {
--- a/internal/toolcall/toolcalls_dsml.go
+++ b/internal/toolcall/toolcalls_dsml.go
@@ -1,6 +1,9 @@
 package toolcall

-import "strings"
+import (
+	"strings"
+	"unicode/utf8"
+)

 func normalizeDSMLToolCallMarkup(text string) (string, bool) {
 	if text == "" {
@@ -42,8 +45,9 @@ func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
 				b.WriteByte('/')
 			}
 			b.WriteString(tag.Name)
-			b.WriteString(text[tag.NameEnd : tag.End+1])
-			if text[tag.End] != '>' {
+			tail := normalizeToolMarkupTagTailForXML(text[tag.NameEnd : tag.End+1])
+			b.WriteString(tail)
+			if !strings.HasSuffix(tail, ">") {
 				b.WriteByte('>')
 			}
 			i = tag.End + 1
@@ -54,3 +58,27 @@ func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
 	}
 	return b.String()
 }
+
+func normalizeToolMarkupTagTailForXML(tail string) string {
+	if tail == "" {
+		return ""
+	}
+	var b strings.Builder
+	b.Grow(len(tail))
+	for i := 0; i < len(tail); {
+		r, size := utf8.DecodeRuneInString(tail[i:])
+		if r == utf8.RuneError && size == 1 {
+			b.WriteByte(tail[i])
+			i++
+			continue
+		}
+		switch normalizeFullwidthASCII(r) {
+		case '>', '/', '=', '"', '\'':
+			b.WriteRune(normalizeFullwidthASCII(r))
+		default:
+			b.WriteString(tail[i : i+size])
+		}
+		i += size
+	}
+	return b.String()
+}
--- a/internal/toolcall/toolcalls_markup.go
+++ b/internal/toolcall/toolcalls_markup.go
@@ -10,7 +10,7 @@ import (
 var toolCallMarkupKVPattern = regexp.MustCompile(`(?is)<(?:[a-z0-9_:-]+:)?([a-z0-9_\-.]+)\b[^>]*>(.*?)</(?:[a-z0-9_:-]+:)?([a-z0-9_\-.]+)>`)

 // cdataPattern matches a standalone CDATA section.
-var cdataPattern = regexp.MustCompile(`(?is)^<!\[CDATA\[(.*?)]]>$`)
+var cdataPattern = regexp.MustCompile(`(?is)^<!\[CDATA\[(.*?)]](?:>|＞)$`)

 func parseMarkupKVObject(text string) map[string]any {
 	matches := toolCallMarkupKVPattern.FindAllStringSubmatch(strings.TrimSpace(text), -1)
--- a/internal/toolcall/toolcalls_parse_markup.go
+++ b/internal/toolcall/toolcalls_parse_markup.go
@@ -6,6 +6,7 @@ import (
 	"html"
 	"regexp"
 	"strings"
+	"unicode/utf8"
 )

 var xmlAttrPattern = regexp.MustCompile(`(?is)\b([a-z0-9_:-]+)\s*=\s*("([^"]*)"|'([^']*)')`)
@@ -214,7 +215,7 @@ func skipXMLIgnoredSection(text string, i int) (next int, advanced bool, blocked
 		if end < 0 {
 			return 0, false, true
 		}
-		return end + len("]]>"), true, false
+		return end + toolCDATACloseLenAt(text, end), true, false
 	case strings.HasPrefix(text[i:], "<!--"):
 		end := strings.Index(text[i+len("<!--"):], "-->")
 		if end < 0 {
@@ -227,15 +228,26 @@ func skipXMLIgnoredSection(text string, i int) (next int, advanced bool, blocked
 }

 func hasASCIIPrefixFoldAt(text string, start int, prefix string) bool {
-	if start < 0 || len(text)-start < len(prefix) {
-		return false
+	_, ok := matchASCIIPrefixFoldAt(text, start, prefix)
+	return ok
+}
+
+func matchASCIIPrefixFoldAt(text string, start int, prefix string) (int, bool) {
+	if start < 0 || start >= len(text) && prefix != "" {
+		return 0, false
 	}
+	idx := start
 	for j := 0; j < len(prefix); j++ {
-		if asciiLower(text[start+j]) != asciiLower(prefix[j]) {
-			return false
+		if idx >= len(text) {
+			return 0, false
 		}
+		ch, size := normalizedASCIIAt(text, idx)
+		if size <= 0 || asciiLower(ch) != asciiLower(prefix[j]) {
+			return 0, false
+		}
+		idx += size
 	}
-	return true
+	return idx - start, true
 }

 func asciiLower(b byte) byte {
@@ -266,15 +278,14 @@ func findToolCDATAEnd(text string, from int) int {
 	if from < 0 || from >= len(text) {
 		return -1
 	}
-	const closeMarker = "]]>"
 	firstNonFenceEnd := -1
 	for searchFrom := from; searchFrom < len(text); {
-		rel := strings.Index(text[searchFrom:], closeMarker)
-		if rel < 0 {
+		end := indexToolCDATAClose(text, searchFrom)
+		if end < 0 {
 			break
 		}
-		end := searchFrom + rel
-		searchFrom = end + len(closeMarker)
+		closeLen := toolCDATACloseLenAt(text, end)
+		searchFrom = end + closeLen
 		if cdataOffsetIsInsideMarkdownFence(text[from:end]) {
 			continue
 		}
@@ -288,6 +299,31 @@ func findToolCDATAEnd(text string, from int) int {
 	return firstNonFenceEnd
 }

+func indexToolCDATAClose(text string, from int) int {
+	if from < 0 {
+		from = 0
+	}
+	asciiIdx := strings.Index(text[from:], "]]>")
+	fullIdx := strings.Index(text[from:], "]]＞")
+	if asciiIdx < 0 && fullIdx < 0 {
+		return -1
+	}
+	if asciiIdx < 0 {
+		return from + fullIdx
+	}
+	if fullIdx < 0 || asciiIdx < fullIdx {
+		return from + asciiIdx
+	}
+	return from + fullIdx
+}
+
+func toolCDATACloseLenAt(text string, idx int) int {
+	if strings.HasPrefix(text[idx:], "]]＞") {
+		return len("]]＞")
+	}
+	return len("]]>")
+}
+
 func cdataEndLooksStructural(text string, after int) bool {
 	for after < len(text) {
 		switch {
@@ -327,22 +363,29 @@ func cdataOffsetIsInsideMarkdownFence(fragment string) bool {
 }

 func findXMLTagEnd(text string, from int) int {
-	quote := byte(0)
-	for i := maxInt(from, 0); i < len(text); i++ {
-		ch := text[i]
+	quote := rune(0)
+	for i := maxInt(from, 0); i < len(text); {
+		r, size := utf8.DecodeRuneInString(text[i:])
+		if r == utf8.RuneError && size == 0 {
+			break
+		}
+		ch := normalizeFullwidthASCII(r)
 		if quote != 0 {
 			if ch == quote {
 				quote = 0
 			}
+			i += size
 			continue
 		}
 		if ch == '"' || ch == '\'' {
 			quote = ch
+			i += size
 			continue
 		}
 		if ch == '>' {
-			return i
+			return i + size - 1
 		}
+		i += size
 	}
 	return -1
 }
@@ -355,7 +398,8 @@ func hasXMLTagBoundary(text string, idx int) bool {
 	case ' ', '\t', '\n', '\r', '>', '/':
 		return true
 	default:
-		return false
+		r, _ := utf8.DecodeRuneInString(text[idx:])
+		return normalizeFullwidthASCII(r) == '>'
 	}
 }

--- a/internal/toolcall/toolcalls_scan.go
+++ b/internal/toolcall/toolcalls_scan.go
@@ -1,6 +1,9 @@
 package toolcall

-import "strings"
+import (
+	"strings"
+	"unicode/utf8"
+)

 type toolMarkupNameAlias struct {
 	raw       string
@@ -184,7 +187,7 @@ func scanToolMarkupTagAt(text string, start int) (ToolMarkupTag, bool) {
 }

 func IsPartialToolMarkupTagPrefix(text string) bool {
-	if text == "" || text[0] != '<' || strings.Contains(text, ">") {
+	if text == "" || text[0] != '<' || strings.Contains(text, ">") || strings.Contains(text, "＞") {
 		return false
 	}
 	i := 1
@@ -236,9 +239,10 @@ func consumeToolMarkupNamePrefixOnce(text string, idx int) (int, bool) {
 		return idx + 1, true
 	}
 	if hasASCIIPrefixFoldAt(text, idx, "dsml") {
-		next := idx + len("dsml")
-		if next < len(text) && (text[next] == '-' || text[next] == '_') {
-			next++
+		dsmlLen, _ := matchASCIIPrefixFoldAt(text, idx, "dsml")
+		next := idx + dsmlLen
+		if sep, size := normalizedASCIIAt(text, next); sep == '-' || sep == '_' {
+			next += size
 		}
 		return next, true
 	}
@@ -249,12 +253,17 @@ func consumeToolMarkupNamePrefixOnce(text string, idx int) (int, bool) {
 }

 func consumeArbitraryToolMarkupNamePrefix(text string, idx int) (int, bool) {
-	if idx < 0 || idx >= len(text) || !isToolMarkupPrefixSegmentByte(text[idx]) {
+	nextSegment, ok := consumeToolMarkupPrefixSegment(text, idx)
+	if !ok {
 		return idx, false
 	}
-	j := idx + 1
-	for j < len(text) && isToolMarkupPrefixSegmentByte(text[j]) {
-		j++
+	j := nextSegment
+	for {
+		nextSegment, ok = consumeToolMarkupPrefixSegment(text, j)
+		if !ok {
+			break
+		}
+		j = nextSegment
 	}
 	k := j
 	for k < len(text) && (text[k] == ' ' || text[k] == '\t' || text[k] == '\r' || text[k] == '\n') {
@@ -262,8 +271,8 @@ func consumeArbitraryToolMarkupNamePrefix(text string, idx int) (int, bool) {
 	}
 	next, ok := consumeToolMarkupPipe(text, k)
 	if !ok {
-		if k < len(text) && (text[k] == '_' || text[k] == '-') {
-			next = k + 1
+		if sep, size := normalizedASCIIAt(text, k); sep == '_' || sep == '-' {
+			next = k + size
 			ok = true
 		}
 	}
@@ -279,21 +288,32 @@ func consumeArbitraryToolMarkupNamePrefix(text string, idx int) (int, bool) {
 	return next, true
 }

-func isToolMarkupPrefixSegmentByte(b byte) bool {
-	return (b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') || (b >= '0' && b <= '9')
+func consumeToolMarkupPrefixSegment(text string, idx int) (int, bool) {
+	ch, size := normalizedASCIIAt(text, idx)
+	if size <= 0 {
+		return idx, false
+	}
+	if (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') || (ch >= '0' && ch <= '9') {
+		return idx + size, true
+	}
+	return idx, false
 }

 func hasASCIIPartialPrefixFoldAt(text string, start int, prefix string) bool {
-	remain := len(text) - start
-	if remain <= 0 || remain > len(prefix) {
+	if start < 0 || start >= len(text) {
 		return false
 	}
-	for j := 0; j < remain; j++ {
-		if asciiLower(text[start+j]) != asciiLower(prefix[j]) {
+	idx := start
+	matched := 0
+	for matched < len(prefix) && idx < len(text) {
+		ch, size := normalizedASCIIAt(text, idx)
+		if size <= 0 || asciiLower(ch) != asciiLower(prefix[matched]) {
 			return false
 		}
+		idx += size
+		matched++
 	}
-	return true
+	return matched > 0 && matched < len(prefix) && idx == len(text)
 }

 func hasToolMarkupNamePrefix(text string, start int) bool {
@@ -313,8 +333,8 @@ func matchToolMarkupName(text string, start int, dsmlLike bool) (string, int) {
 		if name.dsmlOnly && !dsmlLike {
 			continue
 		}
-		if hasASCIIPrefixFoldAt(text, start, name.raw) {
-			return name.canonical, len(name.raw)
+		if nameLen, ok := matchASCIIPrefixFoldAt(text, start, name.raw); ok {
+			return name.canonical, nameLen
 		}
 	}
 	return "", 0
@@ -341,6 +361,29 @@ func hasToolMarkupBoundary(text string, idx int) bool {
 	case ' ', '\t', '\n', '\r', '>', '/':
 		return true
 	default:
-		return false
+		r, _ := utf8.DecodeRuneInString(text[idx:])
+		return normalizeFullwidthASCII(r) == '>'
 	}
 }
+
+func normalizedASCIIAt(text string, idx int) (byte, int) {
+	if idx < 0 || idx >= len(text) {
+		return 0, 0
+	}
+	r, size := utf8.DecodeRuneInString(text[idx:])
+	if r == utf8.RuneError && size == 0 {
+		return 0, 0
+	}
+	normalized := normalizeFullwidthASCII(r)
+	if normalized > 0x7f {
+		return 0, 0
+	}
+	return byte(normalized), size
+}
+
+func normalizeFullwidthASCII(r rune) rune {
+	if r >= '！' && r <= '～' {
+		return r - 0xFEE0
+	}
+	return r
+}
--- a/internal/toolcall/toolcalls_test.go
+++ b/internal/toolcall/toolcalls_test.go
@@ -111,6 +111,27 @@ func TestParseToolCallsSupportsArbitraryPrefixedToolMarkup(t *testing.T) {
 	}
 }

+func TestParseToolCallsSupportsFullwidthDSMLShell(t *testing.T) {
+	text := `<ｄＳＭＬ｜tool_calls>
+  <ｄＳＭＬ｜invoke name="Read">
+    <ｄＳＭＬ｜parameter name="file_path"＞<![CDATA[/Users/aq/Desktop/myproject/Personal_Blog/README.md]]＞</ｄＳＭＬ｜parameter>
+  </ｄＳＭＬ｜invoke>
+  <ｄＳＭＬ｜invoke name="Read">
+    <ｄＳＭＬ｜parameter name="file_path"＞<![CDATA[/Users/aq/Desktop/myproject/Personal_Blog/index.html]]＞</ｄＳＭＬ｜parameter>
+  </ｄＳＭＬ｜invoke>
+</ｄＳＭＬ｜tool_calls>`
+	calls := ParseToolCalls(text, []string{"Read"})
+	if len(calls) != 2 {
+		t.Fatalf("expected two fullwidth DSML calls, got %#v", calls)
+	}
+	if calls[0].Name != "Read" || calls[0].Input["file_path"] != "/Users/aq/Desktop/myproject/Personal_Blog/README.md" {
+		t.Fatalf("unexpected first fullwidth DSML call: %#v", calls[0])
+	}
+	if calls[1].Name != "Read" || calls[1].Input["file_path"] != "/Users/aq/Desktop/myproject/Personal_Blog/index.html" {
+		t.Fatalf("unexpected second fullwidth DSML call: %#v", calls[1])
+	}
+}
+
 func TestParseToolCallsIgnoresBareHyphenatedToolCallsLookalike(t *testing.T) {
 	text := `<tool-calls><invoke name="Bash"><parameter name="command">pwd</parameter></invoke></tool-calls>`
 	calls := ParseToolCalls(text, []string{"Bash"})
--- a/tests/node/chat-stream.test.js
+++ b/tests/node/chat-stream.test.js
@@ -187,9 +187,9 @@ test('vercel stream emits Go-parity empty-output failure on DONE', async () => {
  const { frames } = await runMockVercelStream(['data: [DONE]\n\n']);
  assert.equal(frames.length, 2);
  const failed = JSON.parse(frames[0]);
-  assert.equal(failed.status_code, 429);
-  assert.equal(failed.error.type, 'rate_limit_error');
-  assert.equal(failed.error.code, 'upstream_empty_output');
+  assert.equal(failed.status_code, 503);
+  assert.equal(failed.error.type, 'service_unavailable_error');
+  assert.equal(failed.error.code, 'upstream_unavailable');
  assert.equal(frames[1], '[DONE]');
 });

@@ -209,6 +209,21 @@ test('vercel stream retries empty output once and keeps one terminal frame', asy
  assert.match(completionBodies[1].prompt, /Previous reply had no visible output\. Please regenerate the visible final answer or tool call now\.$/);
 });

+test('vercel stream retries thinking-only output once', async () => {
+  const { frames, fetchURLs, fetchBodies } = await runMockVercelStreamSequence([
+    ['data: {"response_message_id":42,"p":"response/thinking_content","v":"plan"}\n\n', 'data: [DONE]\n\n'],
+    ['data: {"p":"response/content","v":"visible"}\n\n', 'data: [DONE]\n\n'],
+  ], { thinking_enabled: true });
+  const parsed = frames.filter((frame) => frame !== '[DONE]').map((frame) => JSON.parse(frame));
+  const completionBodies = fetchBodies.filter((body) => Object.hasOwn(body, 'prompt'));
+  assert.equal(fetchURLs.filter((url) => url === 'https://chat.deepseek.com/api/v0/chat/completion').length, 2);
+  assert.equal(frames.filter((frame) => frame === '[DONE]').length, 1);
+  assert.equal(completionBodies[1].parent_message_id, 42);
+  assert.equal(parsed[0].choices[0].delta.reasoning_content, 'plan');
+  assert.equal(parsed[1].choices[0].delta.content, 'visible');
+  assert.equal(parsed[2].choices[0].finish_reason, 'stop');
+});
+
 test('vercel stream coalesces many small content deltas while keeping one choice', async () => {
  const lines = Array.from({ length: 100 }, () => `data: ${JSON.stringify({ p: 'response/content', v: '字' })}\n\n`);
  lines.push('data: [DONE]\n\n');
--- a/tests/node/stream-tool-sieve.test.js
+++ b/tests/node/stream-tool-sieve.test.js
@@ -112,6 +112,23 @@ test('parseToolCalls parses arbitrary-prefixed tool markup shells', () => {
  }
 });

+test('parseToolCalls parses fullwidth DSML shell drift', () => {
+  const payload = `<ｄＳＭＬ｜tool_calls>
+  <ｄＳＭＬ｜invoke name="Read">
+    <ｄＳＭＬ｜parameter name="file_path"＞<![CDATA[/Users/aq/Desktop/myproject/Personal_Blog/README.md]]＞</ｄＳＭＬ｜parameter>
+  </ｄＳＭＬ｜invoke>
+  <ｄＳＭＬ｜invoke name="Read">
+    <ｄＳＭＬ｜parameter name="file_path"＞<![CDATA[/Users/aq/Desktop/myproject/Personal_Blog/index.html]]＞</ｄＳＭＬ｜parameter>
+  </ｄＳＭＬ｜invoke>
+</ｄＳＭＬ｜tool_calls>`;
+  const calls = parseToolCalls(payload, ['Read']);
+  assert.equal(calls.length, 2);
+  assert.equal(calls[0].name, 'Read');
+  assert.deepEqual(calls[0].input, { file_path: '/Users/aq/Desktop/myproject/Personal_Blog/README.md' });
+  assert.equal(calls[1].name, 'Read');
+  assert.deepEqual(calls[1].input, { file_path: '/Users/aq/Desktop/myproject/Personal_Blog/index.html' });
+});
+
 test('parseToolCalls ignores bare hyphenated tool_calls lookalike', () => {
  const payload = '<tool-calls><invoke name="Bash"><parameter name="command">pwd</parameter></invoke></tool-calls>';
  const calls = parseToolCalls(payload, ['Bash']);