Update VERSION

Merge pull request #483 from NgoQuocViet2001/ai/tool-result-leak-sanitize
fix(openai): strip leaked tool result markers
2026-05-11 03:37:40 +08:00 · 2026-05-10 23:33:52 +08:00 · 2026-05-10 23:33:16 +08:00 · 2026-05-10 22:05:33 +07:00 · 2026-05-10 18:55:57 +08:00 · 2026-05-10 18:41:51 +08:00
118 changed files with 8437 additions and 996 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -22,6 +22,13 @@ These rules apply to all agent-made changes in this repository.
 - Keep changes additive and tightly scoped to the requested feature or bugfix.
 - Do not mix unrelated refactors into feature PRs unless they are required to make the change pass gates.

+## Protocol Adapter Boundary
+
+- Do not let OpenAI Chat, OpenAI Responses, Claude, Gemini, or other interface protocol formatting own shared business behavior.
+- Normalize protocol-specific request shapes into the project standard request/turn model first, run shared business logic in one place, then render back to the target protocol at the boundary.
+- Business logic that must stay globally consistent includes empty-output retry, thinking/reasoning handling, tool-call detection and policy, usage accounting, current-input-file injection, history persistence, file/reference handling, and completion payload assembly.
+- If a behavior must differ by protocol, keep the difference as an explicit adapter/rendering concern and document why it cannot live in the shared normalized path.
+
 ## Documentation Sync

 - When business logic or user-visible behavior changes, update the corresponding documentation in the same change.
--- a/API.en.md
+++ b/API.en.md
@@ -18,6 +18,7 @@ Docs: [Overview](README.en.md) / [Architecture](docs/ARCHITECTURE.en.md) / [Depl
 - [OpenAI-Compatible API](#openai-compatible-api)
 - [Claude-Compatible API](#claude-compatible-api)
 - [Gemini-Compatible API](#gemini-compatible-api)
+- [Ollama API](#ollama-api)
 - [Admin API](#admin-api)
 - [Error Payloads](#error-payloads)
 - [cURL Examples](#curl-examples)
@@ -31,7 +32,7 @@ Docs: [Overview](README.en.md) / [Architecture](docs/ARCHITECTURE.en.md) / [Depl
 | Base URL | `http://localhost:5001` or your deployment domain |
 | Default Content-Type | `application/json` |
 | Health probes | `GET /healthz`, `GET /readyz` |
-| CORS | Enabled (uniformly covers `/v1/*`, `/anthropic/*`, `/v1beta/models/*`, and `/admin/*`; echoes the browser `Origin` when present, otherwise `*`; default allow-list includes `Content-Type`, `Authorization`, `X-API-Key`, `X-Ds2-Target-Account`, `X-Ds2-Source`, `X-Vercel-Protection-Bypass`, `X-Goog-Api-Key`, `Anthropic-Version`, `Anthropic-Beta`, and also accepts third-party preflight-requested headers such as `x-stainless-*`; `/v1/chat/completions` on Vercel Node Runtime matches the same behavior; internal-only `X-Ds2-Internal-Token` remains blocked) |
+| CORS | Enabled (uniformly covers `/v1/*`, `/anthropic/*`, `/v1beta/models/*`, `/api/*`, and `/admin/*`; echoes the browser `Origin` when present, otherwise `*`; default allow-list includes `Content-Type`, `Authorization`, `X-API-Key`, `X-Ds2-Target-Account`, `X-Ds2-Source`, `X-Vercel-Protection-Bypass`, `X-Goog-Api-Key`, `Anthropic-Version`, `Anthropic-Beta`, and also accepts third-party preflight-requested headers such as `x-stainless-*`; `/v1/chat/completions` on Vercel Node Runtime matches the same behavior; internal-only `X-Ds2-Internal-Token` remains blocked) |

 - All JSON request bodies must be valid UTF-8; malformed byte sequences are rejected on ingress with `400 invalid json`.

@@ -39,8 +40,10 @@ Docs: [Overview](README.en.md) / [Architecture](docs/ARCHITECTURE.en.md) / [Depl

 - OpenAI / Claude / Gemini protocols are now mounted on one shared `chi` router tree assembled in `internal/server/router.go`.
 - Adapter responsibilities are streamlined to: **request normalization → DeepSeek invocation → protocol-shaped rendering**, reducing legacy split-logic paths.
- Tool-calling semantics are aligned between Go and Node runtime: models should output the DSML shell `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`; DS2API also accepts legacy canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`. DSML is normalized back to XML at the parser entry, so internal parsing remains XML-based, with stream-time anti-leak filtering.
+- Tool-calling semantics are aligned between Go and Node runtime: models should output the halfwidth-pipe DSML shell `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`; DS2API also accepts DSML wrapper aliases such as `<dsml|tool_calls>` and `<|tool_calls>`, common DSML separator drift such as `<|DSML tool_calls>`, collapsed DSML local names such as `<DSMLtool_calls>`, control-separator drift such as `<DSML␂tool_calls>` / raw STX `\x02`, CJK angle bracket, fullwidth-bang / ideographic-comma separator drift, PascalCase local-name drift, and trailing attribute separator drift such as `<DSM|parameter name="command"|>...〈/DSM|parameter〉`, `<！DSML！invoke name=“Bash”>`, `<、DSML、tool_calls>`, `<DSmartToolCalls>`, or `<DSMLtool_calls※>`, arbitrary protocol prefixes such as `<proto💥tool_calls>`, and legacy canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`. The scanner normalizes fixed local names (`tool_calls` / `invoke` / `parameter`) with non-structural separators before or after them back to XML before parsing, and also tolerates CDATA opener drift such as `<！[CDATA[` / `<、[CDATA[`; only wrapped tool blocks or the narrow missing-opening-wrapper repair path enter the tool path, while bare `<invoke>` does not count as supported syntax. JSON literal parameter bodies are preserved as structured values, explicit empty or whitespace-only parameters are preserved as empty strings, malformed complete wrappers are released as plain text, and loose CDATA is narrowly repaired at final parse/flush when it can preserve a complete outer tool call.
 - `Admin API` separates static config from runtime policy: `/admin/config*` for configuration state, `/admin/settings*` for runtime behavior.
+- When upstream returns a thinking-only response with no visible text, the Go main path and the Vercel Node streaming path retry once in the same DeepSeek session: it appends the prompt suffix `"Previous reply had no visible output. Please regenerate the visible final answer or tool call now."` and sets `parent_message_id`. If that same-account retry would still end as `429 upstream_empty_output`, managed-account mode switches to the next available account, creates a fresh session, and retries the original payload once before returning 429.
+- Citation/reference marker boundary: streaming output hides upstream `[citation:N]` / `[reference:N]` placeholders by default; non-stream output converts DeepSeek search reference markers into Markdown links.

 ---

@@ -83,7 +86,7 @@ Two header formats accepted:
 - Token is in `config.keys` → **Managed account mode**: DS2API auto-selects an account via rotation
 - Token is not in `config.keys` → **Direct token mode**: treated as a DeepSeek token directly

-**Optional header**: `X-Ds2-Target-Account: <email_or_mobile>` — Pin a specific managed account; if the target account does not exist or the managed-account queue is exhausted, the request returns `429`, and current responses do not include `Retry-After`. If the account exists but login/refresh fails, the request returns the underlying `401` or upstream error.
+**Optional header**: `X-Ds2-Target-Account: <email_or_mobile>` — Pin a specific managed account; if the target account does not exist or the managed-account queue is exhausted, the request returns `429`, and current responses do not include `Retry-After`. If the account exists but login/refresh fails, the request returns the underlying `401` or upstream error. Without a pinned target, managed-account completion requests try one alternate-account fresh retry before returning an empty-output 429; pinned-target requests and requests with no other available account do not switch.
 Gemini-compatible clients can also send `x-goog-api-key`, `?key=`, or `?api_key=` as the caller credential source.

 ### Admin Endpoints (`/admin/*`)
@@ -123,6 +126,9 @@ Gemini-compatible clients can also send `x-goog-api-key`, `?key=`, or `?api_key=
 | POST | `/v1beta/models/{model}:streamGenerateContent` | Business | Gemini stream |
 | POST | `/v1/models/{model}:generateContent` | Business | Gemini non-stream compat path |
 | POST | `/v1/models/{model}:streamGenerateContent` | Business | Gemini stream compat path |
+| GET | `/api/version` | None | Ollama version endpoint |
+| GET | `/api/tags` | None | Ollama model list |
+| POST | `/api/show` | None | Ollama model capability query (returns `id` + `capabilities`) |
 | POST | `/admin/login` | None | Admin login |
 | GET | `/admin/verify` | JWT | Verify admin JWT |
 | GET | `/admin/vercel/config` | Admin | Read preconfigured Vercel creds |
@@ -222,16 +228,18 @@ For `chat` / `responses` / `embeddings`, DS2API follows a wide-input/strict-outp

 1. Match DeepSeek native model IDs first.
 2. Then match exact keys in `model_aliases`.
-3. If still unmatched, fall back by known family heuristics (`o*`, `gpt-*`, `claude-*`, etc.).
-4. If still unmatched, return `invalid_request_error`.
+3. If the request name ends with `-nothinking`, resolve the base alias and append the corresponding no-thinking variant.
+4. If still unmatched, return `invalid_request_error`. Unknown model families are not guessed heuristically; add explicit compatibility names through `model_aliases`.

 Built-in aliases come from `internal/config/models.go`; `config.model_aliases` can override or add mappings at runtime. Excerpt:

 - OpenAI / Codex: `gpt-4o`, `gpt-4.1`, `gpt-5`, `gpt-5.5`, `gpt-5-codex`, `gpt-5.3-codex`, `codex-mini-latest`
 - OpenAI reasoning: `o1`, `o3`, `o3-deep-research`, `o4-mini`
 - Claude: `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`, `claude-3-5-sonnet-latest`
- Gemini: `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-pro-vision`
- Other compatibility families: `llama-*`, `qwen-*`, `mistral-*`, and `command-*` fall back through family heuristics
+- Gemini: `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-3.1-pro`, `gemini-3-pro`, `gemini-3-flash`, `gemini-3.1-flash-lite`, `gemini-pro-vision`
+- Other exact built-in aliases: `llama-3.1-70b-instruct`, `qwen-max`
+
+Aliases with a `-nothinking` suffix also map to the corresponding forced no-thinking DeepSeek model.

 Current vision support resolves only to `deepseek-v4-vision` and does not expose a separate `vision-search` variant.

@@ -239,6 +247,8 @@ Retired historical families such as `claude-1.*`, `claude-2.*`, `claude-instant-

 ### `POST /v1/chat/completions`

+> Path note: besides the canonical `/v1/chat/completions`, DS2API also accepts the root shortcut `/chat/completions`. On Vercel Runtime, `vercel.json` rewrites only the canonical `/v1/chat/completions` path to the Node streaming bridge; the root shortcut stays on the Go primary path. Use `/v1/chat/completions` on Vercel when real-time streaming is required.
+
 **Headers**:

 ```http
@@ -250,7 +260,7 @@ Content-Type: application/json

 | Field | Type | Required | Notes |
 | --- | --- | --- | --- |
-| `model` | string | ✅ | DeepSeek native models + common aliases (`gpt-5.5`, `gpt-5.4-mini`, `gpt-5.3-codex`, `o3`, `claude-opus-4-6`, `gemini-2.5-pro`, `gemini-2.5-flash`, etc.) |
+| `model` | string | ✅ | DeepSeek native models + common aliases (`gpt-5.5`, `gpt-5.4-mini`, `gpt-5.3-codex`, `o3`, `claude-opus-4-6`, `gemini-2.5-pro`, `gemini-3.1-pro`, `gemini-3-flash`, etc.); `-nothinking` suffixes force thinking / reasoning off |
 | `messages` | array | ✅ | OpenAI-style messages |
 | `stream` | boolean | ❌ | Default `false` |
 | `tools` | array | ❌ | Function calling schema |
@@ -345,7 +355,8 @@ When `tools` is present, DS2API performs anti-leak handling:

 Additional notes:

- The parser treats DSML shell tool blocks (`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`) and legacy canonical XML tool blocks (`<tool_calls>` / `<invoke name="...">` / `<parameter name="...">`) as executable tool calls. DSML is normalized back to XML at the parser entry; internal parsing remains XML-based. Legacy `<tools>`, `<tool_call>`, `<tool_name>`, `<param>`, `<function_call>`, `tool_use`, antml variants, and standalone JSON `tool_calls` payloads are treated as plain text.
+- The parser treats the recommended halfwidth-pipe DSML shell tool blocks (`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`), DSML wrapper aliases (`<dsml|tool_calls>`, `<|tool_calls>`), common DSML separator drift (`<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`), collapsed DSML local names (`<DSMLtool_calls>` / `<DSMLinvoke>` / `<DSMLparameter>`), control-separator drift (`<DSML␂tool_calls>` / raw STX `\x02`), CJK angle bracket, fullwidth-bang / ideographic-comma separator drift, PascalCase local-name drift, and trailing attribute separator drift (`<DSM|parameter name="command"|>...〈/DSM|parameter〉` / `<！DSML！invoke name=“Bash”>` / `<、DSML、tool_calls>` / `<DSmartToolCalls>` / `<DSMLtool_calls※>`), arbitrary protocol prefixes (`<proto💥tool_calls>`), and legacy canonical XML tool blocks (`<tool_calls>` / `<invoke name="...">` / `<parameter name="...">`) as executable tool calls. These shells normalize non-structural separators back to XML first, while internal parsing remains XML-based; CDATA opener drift such as `<！[CDATA[` / `<、[CDATA[` is also normalized for parameter bodies. Legacy `<tools>`, `<tool_call>`, `<tool_name>`, `<param>`, `<function_call>`, `tool_use`, antml variants, and standalone JSON `tool_calls` payloads are treated as plain text; complete but malformed wrappers are also released as plain text.
+- The parser no longer drops tool calls solely because parameter values are empty; explicit empty strings or whitespace-only parameters become empty strings in structured `tool_calls`. Prompting still tells the model not to emit blank parameters, and missing/empty argument rejection belongs in the tool executor or client schema validation.
 - If the final visible response text is empty but the reasoning stream contains an executable tool call, Chat / Responses emits a standard OpenAI `tool_calls` / `function_call` output during finalization. If thinking/reasoning was not enabled by the client, that reasoning text is used only for detection and is not exposed as visible text or `reasoning_content`.
 - `tool_calls` shown inside fenced markdown code blocks (for example, ```json ... ```) are treated as examples, not executable calls.

@@ -617,6 +628,20 @@ Returns SSE (`text/event-stream`), each chunk as `data: <json>`:

 ---

+## Ollama API
+
+- `POST /api/show` request body: `{"model":"<model-id>"}`.
+- Response uses lowercase `id` (not `ID`) and includes `capabilities` for Ollama-style clients and strict schemas.
+
+Example response:
+
+```json
+{
+  "id": "deepseek-v4-flash",
+  "capabilities": ["tools", "thinking"]
+}
+```
+
 ## Admin API

 ### `POST /admin/login`
@@ -744,6 +769,7 @@ Reads runtime settings and status, including:
 - `responses` / `embeddings`
 - `auto_delete` (`mode`: `none` / `single` / `all`; legacy `sessions=true` is still treated as `all`)
 - `current_input_file` (`enabled` defaults to `true`, plus `min_chars`)
+- `thinking_injection` (`enabled` defaults to `true`, `prompt`, and `default_prompt`)
 - `model_aliases`
 - `env_backed`, `needs_vercel_sync`
 - `toolcall` policy is fixed to `feature_match + high` and is no longer returned or editable via settings
@@ -758,6 +784,7 @@ Hot-updates runtime settings. Supported fields:
 - `embeddings.provider`
 - `auto_delete.mode`
 - `current_input_file.enabled` / `current_input_file.min_chars`
+- `thinking_injection.enabled` / `thinking_injection.prompt`
 - `model_aliases`
 - `toolcall` policy is fixed and is no longer writable through settings

@@ -1238,7 +1265,7 @@ Clients should handle HTTP status code plus `error` / `detail` fields.
 | Code | Meaning |
 | --- | --- |
 | `401` | Authentication failed (invalid key/token, or expired admin JWT) |
-| `429` | Too many requests (exceeded inflight + queue capacity; current responses do not include `Retry-After`) |
+| `429` | Too many requests (exceeded inflight + queue capacity, or upstream thinking-only output with no visible answer; managed-account mode first tries one alternate-account fresh retry; current responses do not include `Retry-After`) |
 | `503` | Model unavailable or upstream error |

 ---
--- a/API.md
+++ b/API.md
@@ -18,6 +18,7 @@
 - [OpenAI 兼容接口](#openai-兼容接口)
 - [Claude 兼容接口](#claude-兼容接口)
 - [Gemini 兼容接口](#gemini-兼容接口)
+- [Ollama 兼容接口](#ollama-兼容接口)
 - [Admin 接口](#admin-接口)
 - [错误响应格式](#错误响应格式)
 - [cURL 示例](#curl-示例)
@@ -31,7 +32,7 @@
 | Base URL | `http://localhost:5001` 或你的部署域名 |
 | 默认 Content-Type | `application/json` |
 | 健康检查 | `GET /healthz`、`GET /readyz` |
-| CORS | 已启用（统一覆盖 `/v1/*`、`/anthropic/*`、`/v1beta/models/*`、`/admin/*`；浏览器有 `Origin` 时回显该 Origin，否则为 `*`；默认允许 `Content-Type`, `Authorization`, `X-API-Key`, `X-Ds2-Target-Account`, `X-Ds2-Source`, `X-Vercel-Protection-Bypass`, `X-Goog-Api-Key`, `Anthropic-Version`, `Anthropic-Beta`，并会放行预检里声明的第三方请求头，如 `x-stainless-*`；Vercel 上 `/v1/chat/completions` 的 Node Runtime 也对齐相同行为；内部专用头 `X-Ds2-Internal-Token` 仍被拦截） |
+| CORS | 已启用（统一覆盖 `/v1/*`、`/anthropic/*`、`/v1beta/models/*`、`/api/*`、`/admin/*`；浏览器有 `Origin` 时回显该 Origin，否则为 `*`；默认允许 `Content-Type`, `Authorization`, `X-API-Key`, `X-Ds2-Target-Account`, `X-Ds2-Source`, `X-Vercel-Protection-Bypass`, `X-Goog-Api-Key`, `Anthropic-Version`, `Anthropic-Beta`，并会放行预检里声明的第三方请求头，如 `x-stainless-*`；Vercel 上 `/v1/chat/completions` 的 Node Runtime 也对齐相同行为；内部专用头 `X-Ds2-Internal-Token` 仍被拦截） |

 - 所有 JSON 请求体都必须是合法 UTF-8；非法字节序列会在入站阶段被拒绝为 `400 invalid json`。

@@ -39,9 +40,9 @@

 - OpenAI / Claude / Gemini 三套协议已统一挂在同一 `chi` 路由树上，由 `internal/server/router.go` 负责装配。
 - 适配器层职责收敛为：**请求归一化 → DeepSeek 调用 → 协议形态渲染**，减少历史版本中“同能力多处实现”的分叉。
- Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致：推荐模型输出 DSML 外壳 `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`；兼容层也接受 DSML wrapper 别名 `<dsml|tool_calls>`、`<|tool_calls>`、`<｜tool_calls>`、常见 DSML 分隔符漏写形态（如 `<|DSML tool_calls>`）、`DSML` 与工具标签名黏连的常见 typo（如 `<DSMLtool_calls>`），以及旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`。实现上采用窄容错结构扫描：只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 会进入工具路径，裸 `<invoke>` 不计为已支持语法；流式场景继续执行防泄漏筛分。若参数体本身是合法 JSON 字面量（如 `123`、`true`、`null`、数组或对象），会按结构化值输出，不再一律当作字符串；若 CDATA 偶发漏闭合，则会在最终 parse / flush 恢复阶段做窄修复，尽量保住已完整包裹的外层工具调用。
+- Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致：推荐模型输出半角管道符 DSML 外壳 `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`；兼容层也接受 DSML wrapper 别名 `<dsml|tool_calls>`、`<|tool_calls>`、常见 DSML 分隔符漏写形态（如 `<|DSML tool_calls>`）、`DSML` 与工具标签名黏连的常见 typo（如 `<DSMLtool_calls>`）、控制分隔符漂移（如 `<DSML␂tool_calls>` / 原始 STX `\x02`）、CJK 尖括号、全角感叹号、顿号、PascalCase 本地名、弯引号属性值与属性尾部分隔符漂移（如 `<DSM|parameter name="command"|>...〈/DSM|parameter〉` / `<！DSML！invoke name=“Bash”>` / `<、DSML、tool_calls>` / `<DSmartToolCalls>` / `<DSMLtool_calls※>`）、任意协议前缀壳（如 `<proto💥tool_calls>`），以及旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`。实现上采用结构扫描：只要固定本地标签名是 `tool_calls` / `invoke` / `parameter`，标签名前或标签名后的非结构性分隔符会在解析入口归一化；CDATA 开头也会容错 `<！[CDATA[` / `<、[CDATA[` 这类分隔符漂移；只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 会进入工具路径，裸 `<invoke>` 不计为已支持语法；流式场景继续执行防泄漏筛分。若参数体本身是合法 JSON 字面量（如 `123`、`true`、`null`、数组或对象），会按结构化值输出，不再一律当作字符串；显式空字符串和纯空白参数会结构化保留为空字符串，是否拒绝缺参由工具执行侧决定；完整但 malformed 的 wrapper 会作为普通文本释放，不会吞掉或伪造成工具调用；若 CDATA 偶发漏闭合，则会在最终 parse / flush 恢复阶段做窄修复，尽量保住已完整包裹的外层工具调用。
 - `Admin API` 将配置与运行时策略分开：`/admin/config*` 管静态配置，`/admin/settings*` 管运行时行为。
- 当上游返回 thinking-only 响应（模型输出了推理链但无可见文本）时，非流式补全会自动重试一次：以多轮对话 follow-up 方式追加 prompt 后缀 `"Previous reply had no visible output. Please regenerate the visible final answer or tool call now."` 并设置 `parent_message_id` 在同一 DeepSeek session 内让模型重新输出；重试最大 1 次。
+- 当上游返回 thinking-only 响应（模型输出了推理链但无可见文本）时，Go 主路径与 Vercel Node 流式路径都会先自动重试一次：以多轮对话 follow-up 方式追加 prompt 后缀 `"Previous reply had no visible output. Please regenerate the visible final answer or tool call now."` 并设置 `parent_message_id` 在同一 DeepSeek session 内让模型重新输出；同账号重试最大 1 次。若同账号重试后仍即将返回 `429 upstream_empty_output`，托管账号模式会在返回 429 前自动切换到下一个可用账号，新建 session，用原始 payload 再 fresh retry 一次。
 - 引用标记处理边界：流式输出默认隐藏 `[citation:N]` / `[reference:N]` 这类上游内部占位符；非流式输出默认把 DeepSeek 搜索引用标记转换为 Markdown 引用链接。

 ---
@@ -85,7 +86,7 @@ Vercel 一键部署可先只填 `DS2API_ADMIN_KEY`，部署后在 `/admin` 导
 - token 在 `config.keys` 中 → **托管账号模式**，自动轮询选择账号
 - token 不在 `config.keys` 中 → **直通 token 模式**，直接作为 DeepSeek token 使用

-**可选请求头**：`X-Ds2-Target-Account: <email_or_mobile>` — 指定使用某个托管账号；如果目标账号不存在，或管理账号队列已耗尽，相关业务请求会返回 `429`，当前不会附带 `Retry-After` 头。若账号存在但登录/刷新失败，则返回对应的 `401` 或上游错误。
+**可选请求头**：`X-Ds2-Target-Account: <email_or_mobile>` — 指定使用某个托管账号；如果目标账号不存在，或管理账号队列已耗尽，相关业务请求会返回 `429`，当前不会附带 `Retry-After` 头。若账号存在但登录/刷新失败，则返回对应的 `401` 或上游错误。未指定目标账号时，托管账号模式的 completion 空输出 429 会先尝试切到另一个可用账号 fresh retry 一次；指定目标账号或无其他可用账号时不会切号。
 Gemini 兼容客户端还可以使用 `x-goog-api-key`、`?key=` 或 `?api_key=` 作为凭据来源。

 ### Admin 接口（`/admin/*`）
@@ -125,6 +126,9 @@ Gemini 兼容客户端还可以使用 `x-goog-api-key`、`?key=` 或 `?api_key=`
 | POST | `/v1beta/models/{model}:streamGenerateContent` | 业务 | Gemini 流式 |
 | POST | `/v1/models/{model}:generateContent` | 业务 | Gemini 非流式兼容路径 |
 | POST | `/v1/models/{model}:streamGenerateContent` | 业务 | Gemini 流式兼容路径 |
+| GET | `/api/version` | 无 | Ollama 版本接口 |
+| GET | `/api/tags` | 无 | Ollama 模型列表 |
+| POST | `/api/show` | 无 | Ollama 单模型能力查询（返回 `id` 与 `capabilities`） |
 | POST | `/admin/login` | 无 | 管理登录 |
 | GET | `/admin/verify` | JWT | 校验管理 JWT |
 | GET | `/admin/vercel/config` | Admin | 读取 Vercel 预配置 |
@@ -168,12 +172,12 @@ Gemini 兼容客户端还可以使用 `x-goog-api-key`、`?key=` 或 `?api_key=`
 | GET | `/admin/chat-history/{id}` | Admin | 查看单条服务器端对话记录 |
 | DELETE | `/admin/chat-history/{id}` | Admin | 删除单条服务器端对话记录 |
 | PUT | `/admin/chat-history/settings` | Admin | 更新对话记录保留条数 |
-
-服务器端记录本质上是 DeepSeek 上游响应归档：OpenAI Chat、OpenAI Responses、Claude Messages、Gemini GenerateContent 等直连 DeepSeek 的生成接口，在收到上游响应后会于各协议回译/裁剪前写入记录；列表按请求创建时间倒序展示，流式请求会在生成过程中持续刷新状态与详情。WebUI「API 测试」发出的请求也会进入该记录。
 | GET | `/admin/version` | Admin | 查询当前版本与最新 Release |

 OpenAI `/v1/*` 仍是规范路径。对于只配置 DS2API 根地址的客户端，同一套 OpenAI handler 也通过根路径快捷路由暴露：`/models`、`/models/{id}`、`/chat/completions`、`/responses`、`/responses/{response_id}`、`/embeddings`、`/files`、`/files/{file_id}`。

+服务器端记录本质上是 DeepSeek 上游响应归档：OpenAI Chat、OpenAI Responses、Claude Messages、Gemini GenerateContent 等直连 DeepSeek 的生成接口，在收到上游响应后会于各协议回译/裁剪前写入记录；列表按请求创建时间倒序展示，流式请求会在生成过程中持续刷新状态与详情。WebUI「API 测试」发出的请求也会进入该记录。
+
 ---

 ## 健康检查
@@ -227,16 +231,15 @@ OpenAI `/v1/*` 仍是规范路径。对于只配置 DS2API 根地址的客户端
 1. 先匹配 DeepSeek 原生模型。
 2. 再匹配 `model_aliases` 精确映射。
 3. 如果请求名以 `-nothinking` 结尾，则在最终解析出的规范模型上追加对应的无思考变体。
-4. 未命中时按模型家族规则回退（如 `o*`、`gpt-*`、`claude-*`）。
-5. 仍未命中则返回 `invalid_request_error`。
+4. 仍未命中则返回 `invalid_request_error`。当前不会按未知模型家族做启发式兜底；需要新增兼容名时请通过 `model_aliases` 明确配置。

 当前内置默认 alias 来自 `internal/config/models.go`，`config.model_aliases` 会在运行时覆盖或补充同名映射。节选：

 - OpenAI / Codex：`gpt-4o`、`gpt-4.1`、`gpt-5`、`gpt-5.5`、`gpt-5-codex`、`gpt-5.3-codex`、`codex-mini-latest`
 - OpenAI reasoning：`o1`、`o3`、`o3-deep-research`、`o4-mini`
 - Claude：`claude-opus-4-6`、`claude-sonnet-4-6`、`claude-haiku-4-5`、`claude-3-5-sonnet-latest`
- Gemini：`gemini-2.5-pro`、`gemini-2.5-flash`、`gemini-pro-vision`
- 其他兼容族：`llama-*`、`qwen-*`、`mistral-*`、`command-*` 会按家族启发式回退
+- Gemini：`gemini-2.5-pro`、`gemini-2.5-flash`、`gemini-3.1-pro`、`gemini-3-pro`、`gemini-3-flash`、`gemini-3.1-flash-lite`、`gemini-pro-vision`
+- 其他内置精确 alias：`llama-3.1-70b-instruct`、`qwen-max`

 上述 alias 若在请求名后追加 `-nothinking` 后缀，也会映射到对应的强制关闭 thinking 版本。
 当前视觉能力仅对应 `deepseek-v4-vision` / `deepseek-v4-vision-nothinking`，不会解析出独立的 `vision-search` 变体。
@@ -245,6 +248,8 @@ OpenAI `/v1/*` 仍是规范路径。对于只配置 DS2API 根地址的客户端

 ### `POST /v1/chat/completions`

+> 路径说明：除规范路径 `/v1/chat/completions` 外，也支持根路径快捷别名 `/chat/completions`。在 Vercel Runtime 上，`vercel.json` 仅把规范路径 `/v1/chat/completions` 重写到 Node 流式桥接；根路径快捷别名仍走 Go 主链路。因此 Vercel 上需要实时流式时请使用 `/v1/chat/completions`。
+
 **请求头**：

 ```http
@@ -256,7 +261,7 @@ Content-Type: application/json

 | 字段 | 类型 | 必填 | 说明 |
 | --- | --- | --- | --- |
-| `model` | string | ✅ | 支持 DeepSeek 原生模型 + 常见 alias（如 `gpt-5.5`、`gpt-5.4-mini`、`gpt-5.3-codex`、`o3`、`claude-opus-4-6`、`claude-sonnet-4-6`、`gemini-2.5-pro`、`gemini-2.5-flash` 等）；若模型名带 `-nothinking` 后缀，则强制关闭 thinking / reasoning |
+| `model` | string | ✅ | 支持 DeepSeek 原生模型 + 常见 alias（如 `gpt-5.5`、`gpt-5.4-mini`、`gpt-5.3-codex`、`o3`、`claude-opus-4-6`、`claude-sonnet-4-6`、`gemini-2.5-pro`、`gemini-3.1-pro`、`gemini-3-flash` 等）；若模型名带 `-nothinking` 后缀，则强制关闭 thinking / reasoning |
 | `messages` | array | ✅ | OpenAI 风格消息数组 |
 | `stream` | boolean | ❌ | 默认 `false` |
 | `tools` | array | ❌ | Function Calling 定义 |
@@ -352,9 +357,10 @@ data: [DONE]
 补充说明：

 - **非代码块上下文**下，工具负载即使与普通文本混合，也会按特征识别并产出可执行 tool call（前后普通文本仍可透传）。
- 解析器当前把 DSML 外壳（`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`）、DSML wrapper 别名（`<dsml|tool_calls>`、`<|tool_calls>`、`<｜tool_calls>`）、常见 DSML 分隔符漏写形态（如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`）、`DSML` 与工具标签名黏连的常见 typo（如 `<DSMLtool_calls>` / `<DSMLinvoke>` / `<DSMLparameter>`）和旧式 canonical XML 工具块（`<tool_calls>` / `<invoke name="...">` / `<parameter name="...">`）作为可执行调用解析；DSML 会先归一化回 XML，内部仍以 XML 解析语义为准。旧式 `<tools>`、`<tool_call>`、`<tool_name>`、`<param>`、`<function_call>`、`tool_use`、antml 风格与纯 JSON `tool_calls` 片段默认都会按普通文本处理。
+- 解析器当前把推荐半角管道符 DSML 外壳（`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`）、DSML wrapper 别名（`<dsml|tool_calls>`、`<|tool_calls>`）、常见 DSML 分隔符漏写形态（如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`）、`DSML` 与工具标签名黏连的常见 typo（如 `<DSMLtool_calls>` / `<DSMLinvoke>` / `<DSMLparameter>`）、控制分隔符漂移（如 `<DSML␂tool_calls>` / 原始 STX `\x02`）、CJK 尖括号、全角感叹号、顿号、PascalCase 本地名、弯引号属性值与属性尾部分隔符漂移（如 `<DSM|parameter name="command"|>...〈/DSM|parameter〉` / `<！DSML！invoke name=“Bash”>` / `<、DSML、tool_calls>` / `<DSmartToolCalls>` / `<DSMLtool_calls※>`）、任意协议前缀壳（如 `<proto💥tool_calls>`）和旧式 canonical XML 工具块（`<tool_calls>` / `<invoke name="...">` / `<parameter name="...">`）作为可执行调用解析；这些非结构性分隔符壳会先归一化回 XML，内部仍以 XML 解析语义为准，CDATA 开头也会容错 `<！[CDATA[` / `<、[CDATA[`。旧式 `<tools>`、`<tool_call>`、`<tool_name>`、`<param>`、`<function_call>`、`tool_use`、antml 风格与纯 JSON `tool_calls` 片段默认都会按普通文本处理；完整但 malformed 的 wrapper 同样会作为普通文本释放。
+- 解析层不会因为参数值为空而丢弃工具调用；显式空字符串或纯空白参数会按空字符串进入结构化 `tool_calls`。Prompt 会要求模型不要主动输出空参数，缺参/空命令的拒绝应由工具执行侧或客户端 schema 校验负责。
 - 当最终可见正文为空但思维链里包含可执行工具调用时，Chat / Responses 会在收尾阶段补发标准 OpenAI `tool_calls` / `function_call` 输出；如果客户端未开启 thinking / reasoning，该思维链只用于检测，不会作为可见正文或 `reasoning_content` 暴露。
- Markdown fenced code block（例如 ```json ... ```）中的 `tool_calls` 仅视为示例文本，不会被执行。
+- Markdown fenced code block（例如 ```json ... ```）和行内 code span（例如 `` `<tool_calls>...</tool_calls>` ``）中的 `tool_calls` 仅视为示例文本，不会被执行。

 ---

@@ -628,6 +634,20 @@ data: {"type":"message_stop"}

 ---

+## Ollama 兼容接口
+
+- `POST /api/show` 请求体：`{"model":"<model-id>"}`。
+- 响应字段使用小写 `id`（不是 `ID`），并返回 `capabilities` 数组，便于与 Ollama 风格客户端/严格 schema 对齐。
+
+示例响应：
+
+```json
+{
+  "id": "deepseek-v4-flash",
+  "capabilities": ["tools", "thinking"]
+}
+```
+
 ## Admin 接口

 ### `POST /admin/login`
@@ -755,6 +775,7 @@ data: {"type":"message_stop"}
 - `responses` / `embeddings`
 - `auto_delete`（`mode`：`none` / `single` / `all`；旧配置 `sessions=true` 仍按 `all` 处理）
 - `current_input_file`（`enabled` 默认返回 `true`、`min_chars`）
+- `thinking_injection`（`enabled` 默认返回 `true`、`prompt`、`default_prompt`）
 - `model_aliases`
 - `env_backed`、`needs_vercel_sync`
 - `toolcall` 策略已固定为 `feature_match + high`，不再通过 settings 返回或修改
@@ -769,6 +790,7 @@ data: {"type":"message_stop"}
 - `embeddings.provider`
 - `auto_delete.mode`
 - `current_input_file.enabled` / `current_input_file.min_chars`
+- `thinking_injection.enabled` / `thinking_injection.prompt`
 - `model_aliases`
 - `toolcall` 策略已固定，不再作为可写入字段

@@ -1251,7 +1273,7 @@ Gemini 路由使用 Google 风格错误结构：
 | 状态码 | 说明 |
 | --- | --- |
 | `401` | 鉴权失败（key/token 无效，或 Admin JWT 过期） |
-| `429` | 请求过多（超出并发上限 + 等待队列；当前不附带 `Retry-After` 头） |
+| `429` | 请求过多（超出并发上限 + 等待队列，或上游账号 thinking-only 后仍无可见输出；托管账号模式会先尝试一次切号 fresh retry；当前不附带 `Retry-After` 头） |
 | `503` | 模型不可用或上游服务异常 |

 ---
--- a/README.MD
+++ b/README.MD
@@ -134,7 +134,8 @@ flowchart LR
 | OpenAI 兼容 | `GET /v1/models`、`GET /v1/models/{id}`、`POST /v1/chat/completions`、`POST /v1/responses`、`GET /v1/responses/{response_id}`、`POST /v1/embeddings`、`POST /v1/files`、`GET /v1/files/{file_id}` |
 | Claude 兼容 | `GET /anthropic/v1/models`、`POST /anthropic/v1/messages`、`POST /anthropic/v1/messages/count_tokens`（及快捷路径 `/v1/messages`、`/messages`） |
 | Gemini 兼容 | `POST /v1beta/models/{model}:generateContent`、`POST /v1beta/models/{model}:streamGenerateContent`（及 `/v1/models/{model}:*` 路径） |
-| 统一 CORS 兼容 | `/v1/*`、`/anthropic/*`、`/v1beta/models/*`、`/admin/*` 统一走同一套 CORS 策略；Vercel 上 `/v1/chat/completions` 的 Node Runtime 也对齐相同放行规则，尽量减少第三方预检请求头限制 |
+| Ollama 兼容 | `GET /api/version`、`GET /api/tags`、`POST /api/show` |
+| 统一 CORS 兼容 | `/v1/*`、`/anthropic/*`、`/v1beta/models/*`、`/api/*`、`/admin/*` 统一走同一套 CORS 策略；Vercel 上 `/v1/chat/completions` 的 Node Runtime 也对齐相同放行规则，尽量减少第三方预检请求头限制 |
 | 多账号轮询 | 自动 token 刷新、邮箱/手机号双登录方式 |
 | 并发队列控制 | 每账号 in-flight 上限 + 等待队列，动态计算建议并发值 |
 | DeepSeek PoW | 纯 Go 高性能实现（DeepSeekHashV1），毫秒级响应 |
@@ -195,11 +196,11 @@ OpenAI `/v1/*` 仍是推荐的规范路径；同时支持 `/models`、`/chat/com
 - `ANTHROPIC_BASE_URL` 推荐直接指向 DS2API 根地址（例如 `http://127.0.0.1:5001`），Claude Code 会请求 `/v1/messages?beta=true`。
 - `ANTHROPIC_API_KEY` 需要与 `config.json` 中 `keys` 一致；建议同时保留常规 key 与 `sk-ant-*` 形态 key，兼容不同客户端校验习惯。
 - 若系统设置了代理，建议对 DS2API 地址配置 `NO_PROXY=127.0.0.1,localhost,<你的主机IP>`，避免本地回环请求被代理拦截。
- 如遇“工具调用输出成文本、未执行”问题，请优先检查模型输出是否为推荐的 DSML 工具块：`<|DSML|tool_calls><|DSML|invoke name="..."><|DSML|parameter name="...">...`。兼容层也接受旧式 canonical XML：`<tool_calls><invoke name="..."><parameter name="...">...`；旧式 `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`、`<function_call>`、`tool_use` 或纯 JSON `tool_calls` 片段不会执行。
+- 如遇“工具调用输出成文本、未执行”问题，请优先检查模型输出是否为推荐的半角管道符 DSML 工具块：`<|DSML|tool_calls><|DSML|invoke name="..."><|DSML|parameter name="...">...`。兼容层也接受旧式 canonical XML：`<tool_calls><invoke name="..."><parameter name="...">...`；旧式 `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`、`<function_call>`、`tool_use` 或纯 JSON `tool_calls` 片段不会执行，会作为普通文本处理。

 ### Gemini 接口

-Gemini 适配器将模型名通过 `model_aliases` 或内置规则映射到 DeepSeek 原生模型，支持 `generateContent` 和 `streamGenerateContent` 两种调用方式，并完整支持 Tool Calling（`functionDeclarations` → `functionCall` 输出）。若 Gemini 模型名带 `-nothinking` 后缀，例如 `gemini-2.5-pro-nothinking`，会映射到对应的强制关闭思考模型。
+Gemini 适配器将模型名通过 `model_aliases` 或内置精确 alias 映射到 DeepSeek 原生模型（覆盖 `gemini-2.5-*`、`gemini-3*`、`gemini-pro-vision` 等常见名称），支持 `generateContent` 和 `streamGenerateContent` 两种调用方式，并完整支持 Tool Calling（`functionDeclarations` → `functionCall` 输出）。若 Gemini 模型名带 `-nothinking` 后缀，例如 `gemini-2.5-pro-nothinking`，会映射到对应的强制关闭思考模型。

 ## 快速开始

@@ -295,13 +296,13 @@ cp config.example.json config.json
 base64 < config.json | tr -d '\n'
 ```

-> **流式说明**：`/v1/chat/completions` 在 Vercel 上默认走 `api/chat-stream.js`（Node Runtime）以保证实时 SSE。鉴权、账号选择、会话/PoW 准备仍由 Go 内部 prepare 接口完成；流式响应（含 `tools`）在 Node 侧执行与 Go 对齐的输出组装与防泄漏处理。虽然这里只有 OpenAI chat 流式走 Node，但 CORS 放行策略仍与 Go 主路由保持一致，统一覆盖第三方客户端预检场景。
+> **流式说明**：OpenAI Chat 流式在 Vercel 上会由 `api/chat-stream.js`（Node Runtime）承接，但 `vercel.json` 只把规范路径 `/v1/chat/completions` 重写到 Node；根路径快捷别名 `/chat/completions` 仍走 Go 主链路。鉴权、账号选择、会话/PoW 准备仍由 Go 内部 prepare 接口完成；流式响应（含 `tools`）在 Node 侧执行与 Go 对齐的输出组装与防泄漏处理。Vercel 上需要实时流式时请使用 `/v1/chat/completions`。

 详细部署说明请参阅 [部署指南](docs/DEPLOY.md)。

 ### 方式四：本地源码运行

-**前置要求**：Go 1.26+，Node.js `20.19+` 或 `22.12+`（仅在需要构建 WebUI 时）；同时确保 `npm` 可用，建议 `npm 10+`
+**前置要求**：Go 1.26+，Node.js `20.19+` 或 `22.12+`（仅在需要构建 WebUI 时；CI / Docker 构建使用 Node 24）；同时确保 `npm` 可用，建议 `npm 10+`

 ```bash
 # 1. 克隆仓库
@@ -320,7 +321,7 @@ go run ./cmd/ds2api

 服务实际绑定：`0.0.0.0:5001`，因此同一局域网设备通常也可以通过你的内网 IP 访问。

-> **WebUI 自动构建**：本地首次启动时，若 `static/admin` 不存在，会自动尝试执行 `npm ci`（仅在缺少依赖时）和 `npm run build -- --outDir static/admin --emptyOutDir`（需要本机有 Node.js 和 npm）。你也可以手动构建：`./scripts/build-webui.sh`
+> **WebUI 自动构建**：本地首次启动时，若 WebUI 静态目录不存在，会自动尝试执行 `npm ci --prefix webui`（仅在缺少依赖时）和 `npm run build --prefix webui -- --outDir static/admin --emptyOutDir`（需要本机有 Node.js 和 npm；静态目录可用 `DS2API_STATIC_ADMIN_DIR` 覆盖）。你也可以手动构建：`./scripts/build-webui.sh`

 ## 配置说明

@@ -350,6 +351,7 @@ go run ./cmd/ds2api

 可选请求头 `X-Ds2-Target-Account`：指定使用某个托管账号（值为 email 或 mobile）。
 如果指定账号不存在，或者当前管理账号队列已满，请求会返回 `429`；当前 `429` 不附带 `Retry-After` 头。若账号存在但登录/刷新失败，则返回对应的鉴权错误。
+未指定目标账号时，如果 completion 因上游 thinking-only 空输出在同账号补偿重试后仍将返回 `429 upstream_empty_output`，托管账号模式会自动切到下一个可用账号，新建 session，并用原始 payload 再 fresh retry 一次。
 Gemini 路由还可以使用 `x-goog-api-key`，或在没有认证头时使用 `?key=` / `?api_key=` 作为调用方凭据。

 ## 并发模型
@@ -363,19 +365,21 @@ Gemini 路由还可以使用 `x-goog-api-key`，或在没有认证头时使用 `

 - 当 in-flight 槽位满时，请求进入等待队列，**不会立即 429**
 - 超出总承载上限后才返回 `429 Too Many Requests`，当前响应不附带 `Retry-After`
+- completion 空输出类 429 会先做同账号补偿重试；托管账号模式还会在最终返回 429 前切到另一个可用账号 fresh retry 一次
 - `GET /admin/queue/status` 返回实时并发状态

 ## Tool Call 适配

 当请求中带 `tools` 时，DS2API 会做防泄漏处理与结构化转译：

-1. 只在**非代码块上下文**启用执行型 toolcall 识别（代码块示例默认不触发）
-2. 解析层当前把 DSML 外壳视为推荐可执行调用：`<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`；兼容旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`。DSML 只是外壳别名，内部仍以 XML 解析语义为准；旧式 `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`、`<function_call>`、`tool_use` / antml 变体与纯 JSON `tool_calls` 片段都会按普通文本处理
+1. 只在**非 Markdown 代码上下文**启用执行型 toolcall 识别（fenced code block 和行内 code span 中的示例默认不触发）
+2. 解析层当前把半角管道符 DSML 外壳视为推荐可执行调用：`<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`；兼容旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`，以及若干 DSML 前缀/分隔符漂移。DSML 只是外壳别名，内部仍以 XML 解析语义为准；旧式 `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`、`<function_call>`、`tool_use` / antml 变体与纯 JSON `tool_calls` 片段都会按普通文本处理，完整但 malformed 的 wrapper 也会作为普通文本释放
 3. `responses` 流式严格使用官方 item 生命周期事件（`response.output_item.*`、`response.content_part.*`、`response.function_call_arguments.*`）
 4. `responses` 支持并执行 `tool_choice`（`auto`/`none`/`required`/强制函数）；`required` 违规时非流式返回 `422`，流式返回 `response.failed`
 5. 客户端请求哪种协议，就按该协议返回工具调用（OpenAI/Claude/Gemini 各自原生结构）；模型侧优先约束输出规范 XML，再由兼容层转译

 > 说明：当前版本 parser 层以”尽量解析成功”为优先，所有格式合法的 XML 工具调用都会通过，不做工具名 allow-list 过滤。
+> 解析层会保留显式空字符串或纯空白参数；Prompt 会要求模型不要主动输出空参数，缺参/空命令的拒绝应由工具执行侧或客户端 schema 校验负责。
 >
 > 想评估”把工具调用封装成 XML 再输入模型”的方案，可参考：`docs/toolcall-semantics.md`。

--- a/README.en.md
+++ b/README.en.md
@@ -131,7 +131,8 @@ For the full module-by-module architecture and directory responsibilities, see [
 | OpenAI compatible | `GET /v1/models`, `GET /v1/models/{id}`, `POST /v1/chat/completions`, `POST /v1/responses`, `GET /v1/responses/{response_id}`, `POST /v1/embeddings`, `POST /v1/files`, `GET /v1/files/{file_id}` |
 | Claude compatible | `GET /anthropic/v1/models`, `POST /anthropic/v1/messages`, `POST /anthropic/v1/messages/count_tokens` (plus shortcut paths `/v1/messages`, `/messages`) |
 | Gemini compatible | `POST /v1beta/models/{model}:generateContent`, `POST /v1beta/models/{model}:streamGenerateContent` (plus `/v1/models/{model}:*` paths) |
-| Unified CORS compatibility | `/v1/*`, `/anthropic/*`, `/v1beta/models/*`, and `/admin/*` share one CORS policy; on Vercel, the Node Runtime for `/v1/chat/completions` mirrors the same relaxed preflight behavior for third-party clients |
+| Ollama compatible | `GET /api/version`, `GET /api/tags`, `POST /api/show` |
+| Unified CORS compatibility | `/v1/*`, `/anthropic/*`, `/v1beta/models/*`, `/api/*`, and `/admin/*` share one CORS policy; on Vercel, the Node Runtime for `/v1/chat/completions` mirrors the same relaxed preflight behavior for third-party clients |
 | Multi-account rotation | Auto token refresh, email/mobile dual login |
 | Concurrency control | Per-account in-flight limit + waiting queue, dynamic recommended concurrency |
 | DeepSeek PoW | Pure Go high-performance solver (DeepSeekHashV1), ms-level response |
@@ -184,11 +185,11 @@ Besides the primary aliases above, `/anthropic/v1/models` also returns Claude 4.
 - Set `ANTHROPIC_BASE_URL` to the DS2API root URL (for example `http://127.0.0.1:5001`). Claude Code sends requests to `/v1/messages?beta=true`.
 - `ANTHROPIC_API_KEY` must match an entry in `keys` from `config.json`. Keeping both a regular key and an `sk-ant-*` style key improves client compatibility.
 - If your environment has proxy variables, set `NO_PROXY=127.0.0.1,localhost,<your_host_ip>` for DS2API to avoid proxy interception of local traffic.
- If tool calls are rendered as plain text and not executed, first verify the model output uses the recommended DSML block: `<|DSML|tool_calls><|DSML|invoke name="..."><|DSML|parameter name="...">...`. DS2API also accepts legacy canonical XML: `<tool_calls><invoke name="..."><parameter name="...">...`; legacy `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`, `<function_call>`, `tool_use`, or standalone JSON `tool_calls` are not executed.
+- If tool calls are rendered as plain text and not executed, first verify the model output uses the recommended halfwidth-pipe DSML block: `<|DSML|tool_calls><|DSML|invoke name="..."><|DSML|parameter name="...">...`. DS2API also accepts legacy canonical XML: `<tool_calls><invoke name="..."><parameter name="...">...`; legacy `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`, `<function_call>`, `tool_use`, or standalone JSON `tool_calls` are not executed and stay plain text.

 ### Gemini Endpoint

-The Gemini adapter maps model names to DeepSeek native models via `model_aliases` or built-in heuristics, supporting both `generateContent` and `streamGenerateContent` call patterns with full Tool Calling support (`functionDeclarations` → `functionCall` output).
+The Gemini adapter maps model names to DeepSeek native models via `model_aliases` or exact built-in aliases (covering common `gemini-2.5-*`, `gemini-3*`, and `gemini-pro-vision` names), supporting both `generateContent` and `streamGenerateContent` call patterns with full Tool Calling support (`functionDeclarations` → `functionCall` output). If the Gemini model name has a `-nothinking` suffix, such as `gemini-2.5-pro-nothinking`, it maps to the corresponding forced no-thinking model.

 ## Quick Start

@@ -283,13 +284,13 @@ Recommended: convert `config.json` to Base64 locally, then paste into `DS2API_CO
 base64 < config.json | tr -d '\n'
 ```

-> **Streaming note**: `/v1/chat/completions` on Vercel is routed to `api/chat-stream.js` (Node Runtime) for real-time SSE. Auth, account selection, and session/PoW preparation are still handled by the Go internal prepare endpoint; streaming output (including `tools`) is assembled on Node with Go-aligned anti-leak handling. This is the only interface family currently routed through Node, and its CORS allow behavior is kept aligned with the Go router so third-party preflight handling stays unified.
+> **Streaming note**: OpenAI Chat streaming on Vercel is routed to `api/chat-stream.js` (Node Runtime), but `vercel.json` rewrites only the canonical `/v1/chat/completions` path to Node; the root shortcut `/chat/completions` stays on the Go main path. Auth, account selection, and session/PoW preparation are still handled by the Go internal prepare endpoint; streaming output (including `tools`) is assembled on Node with Go-aligned anti-leak handling. Use `/v1/chat/completions` on Vercel when real-time streaming is required.

 For detailed deployment instructions, see the [Deployment Guide](docs/DEPLOY.en.md).

 ### Option 4: Local Run

-**Prerequisites**: Go 1.26+, Node.js `20.19+` or `22.12+` (only if building WebUI locally)
+**Prerequisites**: Go 1.26+, Node.js `20.19+` or `22.12+` (only if building WebUI locally; CI / Docker builds use Node 24), and npm available; npm 10+ is recommended

 ```bash
 # 1. Clone
@@ -308,7 +309,7 @@ Default local URL: `http://127.0.0.1:5001`

 The server actually binds to `0.0.0.0:5001`, so devices on the same LAN can usually reach it through your private IP as well.

-> **WebUI auto-build**: On first local startup, if `static/admin` is missing, DS2API will auto-run `npm ci` (only when dependencies are missing) and `npm run build -- --outDir static/admin --emptyOutDir` (requires Node.js). You can also build manually: `./scripts/build-webui.sh`
+> **WebUI auto-build**: On first local startup, if the WebUI static directory is missing, DS2API auto-runs `npm ci --prefix webui` (only when dependencies are missing) and `npm run build --prefix webui -- --outDir static/admin --emptyOutDir` (requires Node.js; `DS2API_STATIC_ADMIN_DIR` can override the static directory). You can also build manually: `./scripts/build-webui.sh`

 ## Configuration

@@ -336,6 +337,7 @@ For business endpoints (`/v1/*`, `/anthropic/*`, Gemini routes), DS2API supports
 | **Direct token** | If the token is not in `config.keys`, DS2API treats it as a DeepSeek token directly |

 Optional header `X-Ds2-Target-Account`: Pin a specific managed account (value is email or mobile).
+When no target account is pinned, if a completion would end as `429 upstream_empty_output` after the same-account empty-output retry, managed-account mode switches to the next available account, creates a fresh session, and retries the original payload once.
 Gemini routes also accept `x-goog-api-key`, or `?key=` / `?api_key=` when no auth header is present.

 ## Concurrency Model
@@ -348,7 +350,8 @@ Queue limit = DS2API_ACCOUNT_MAX_QUEUE (default = recommended concurrency)
 ```

 - When inflight slots are full, requests enter a waiting queue — **no immediate 429**
- 429 is returned only when total load exceeds inflight + queue capacity
+- 429 is returned only when total load exceeds inflight + queue capacity; current responses do not include `Retry-After`
+- Completion empty-output 429s first get the same-account compensation retry; managed-account mode also tries one alternate-account fresh retry before returning the final 429
 - `GET /admin/queue/status` returns real-time concurrency state

 ## Tool Call Adaptation
@@ -356,12 +359,13 @@ Queue limit = DS2API_ACCOUNT_MAX_QUEUE (default = recommended concurrency)
 When `tools` is present in the request, DS2API performs anti-leak handling:

 1. Toolcall feature matching is enabled only in **non-code-block context** (fenced examples are ignored)
-2. The parser now treats the DSML shell as the recommended executable tool-calling syntax: `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`; it also accepts legacy canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`. DSML is a shell alias and internal parsing remains XML-based; legacy `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`, `<function_call>`, `tool_use`, antml variants, and standalone JSON `tool_calls` payloads are treated as plain text
+2. The parser treats the halfwidth-pipe DSML shell as the recommended executable tool-calling syntax: `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`; it also accepts legacy canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`, plus common DSML prefix/separator drift. DSML is a shell alias and internal parsing remains XML-based; legacy `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`, `<function_call>`, `tool_use`, antml variants, and standalone JSON `tool_calls` payloads are treated as plain text, and complete but malformed wrappers are released as plain text too
 3. `responses` streaming strictly uses official item lifecycle events (`response.output_item.*`, `response.content_part.*`, `response.function_call_arguments.*`)
 4. `responses` supports and enforces `tool_choice` (`auto`/`none`/`required`/forced function); `required` violations return `422` for non-stream and `response.failed` for stream
 5. The output protocol follows the client request (OpenAI / Claude / Gemini native shapes); model-side prompting can prefer XML, and the compatibility layer handles the protocol-specific translation

 > Note: the current parser still prioritizes “parse successfully whenever possible”; hard allow-list rejection for undeclared tool names is not enabled yet.
+> Explicit empty strings or whitespace-only parameters are preserved by the parser; prompting tells the model not to emit blank parameters, and missing/empty argument rejection belongs in the tool executor or client schema validation.

 ## Local Dev Packet Capture

--- a/2
+++ b/2
@@ -1 +1 @@
-4.4.3
+4.6.2
--- a/docs/ARCHITECTURE.en.md
+++ b/docs/ARCHITECTURE.en.md
@@ -27,7 +27,7 @@ ds2api/
 │   ├── claudeconv/                       # Claude message conversion helpers
 │   ├── compat/                           # Compatibility and regression helpers
 │   ├── assistantturn/                    # Upstream output to canonical assistant turn / stream event semantics
-│   ├── completionruntime/                # Shared Go DeepSeek completion startup, non-stream collection, and retry
+│   ├── completionruntime/                # Shared Go DeepSeek completion startup, collection, empty-output/account-switch retry
 │   ├── config/                           # Config loading/validation/hot reload
 │   ├── deepseek/                         # DeepSeek upstream client/protocol/transport
 │   │   ├── client/                       # Login/session/completion/upload/delete calls
@@ -41,6 +41,7 @@ ds2api/
 │   │   ├── admin/                        # Admin API root assembly and resource packages
 │   │   ├── claude/                       # Claude HTTP protocol adapter
 │   │   ├── gemini/                       # Gemini HTTP protocol adapter
+│   │   ├── ollama/                       # Ollama-compatible model/capability query endpoints
 │   │   ├── openai/                       # OpenAI HTTP surface
 │   │   │   ├── chat/                     # Chat Completions execution entrypoint
 │   │   │   ├── responses/                # Responses API and response store
@@ -57,6 +58,7 @@ ds2api/
 │   ├── prompt/                           # Prompt composition
 │   ├── promptcompat/                     # API request -> DeepSeek web-chat plain-text compatibility
 │   ├── rawsample/                        # Raw sample read/write and management
+│   ├── responsehistory/                  # DeepSeek upstream response archive and session snapshots
 │   ├── server/                           # Router and middleware assembly
 │   │   └── data/                         # Router/runtime helper data
 │   ├── sse/                              # SSE parsing utilities
@@ -188,10 +190,11 @@ flowchart LR
 - `internal/server`: router tree + middlewares (health, protocol routes, Admin/WebUI).
 - `internal/httpapi/openai/*`: OpenAI HTTP surface split into chat, responses, files, embeddings, history, and shared packages; chat/responses share the promptcompat, stream, and toolcall semantics.
 - `internal/httpapi/{claude,gemini}`: protocol adapters that normalize into the same prompt compatibility semantics; normal direct paths must share DeepSeek session/PoW/completion execution through `completionruntime`, while `translatorcliproxy` is reserved for Vercel prepare/release, missing-backend fallback, and regression tests.
+- `internal/httpapi/ollama`: Ollama-compatible model list and capability query endpoints.
 - `internal/httpapi/requestbody`: shared HTTP body reading, JSON pre-validation, and UTF-8 error helpers across protocol adapters.
 - `internal/promptcompat`: compatibility core for turning OpenAI/Claude/Gemini requests into DeepSeek web-chat plain-text context.
 - `internal/assistantturn`: Go output-side canonical semantics, converting DeepSeek SSE collection results and stream finalization state into assistant turns and centralizing thinking, tool call, citation, usage, stop/error behavior.
- `internal/completionruntime`: shared Go completion execution helpers for DeepSeek session/PoW/call startup, non-stream collection, and empty-output retry; streaming paths use it to start upstream requests, continue to use `internal/stream` for real-time consumption, and use `assistantturn` during finalization.
+- `internal/completionruntime`: shared Go completion execution helpers for DeepSeek session/PoW/call startup, non-stream collection, empty-output retry, and one managed-account fresh retry before a final 429; streaming paths use it to start upstream requests, continue to use `internal/stream` for real-time consumption, and use `assistantturn` during finalization.
 - `internal/translatorcliproxy`: bridge compatibility layer for Claude/Gemini and OpenAI shape translation; it is not the main business protocol conversion center.
 - `internal/deepseek/{client,protocol,transport}`: upstream requests, sessions, PoW adaptation, protocol constants, and transport details.
 - `internal/js/chat-stream` + `api/chat-stream.js`: Vercel Node streaming bridge; Go prepare/release owns auth, account lease, and completion payload assembly, while Node relays real-time SSE with Go-aligned finalization and tool sieve semantics.
@@ -199,6 +202,7 @@ flowchart LR
 - `internal/toolcall` + `internal/toolstream`: DSML shell compatibility plus canonical XML tool-call parsing and anti-leak sieve; DSML is normalized back to XML at the entrypoint, and internal parsing remains XML-based.
 - `internal/httpapi/admin/*`: Admin API root assembly plus auth/accounts/config/settings/proxies/rawsamples/vercel/history/devcapture/version resource packages.
 - `internal/chathistory`: server-side conversation history persistence, pagination, detail lookup, and retention policy.
+- `internal/responsehistory`: DeepSeek upstream response archive, saving assistant text, thinking, raw tool-call fragments, and streaming detail before protocol rendering/trimming.
 - `internal/config`: config loading/validation + runtime settings hot-reload.
 - `internal/account`: managed account pool, inflight slots, waiting queue.
 - `internal/textclean`: text cleanup helpers, e.g. stripping `[reference: N]` markers.
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -27,7 +27,7 @@ ds2api/
 │   ├── claudeconv/                       # Claude 消息格式转换工具
 │   ├── compat/                           # 兼容性辅助与回归支持
 │   ├── assistantturn/                    # 上游输出到统一 assistant turn / stream event 的语义层
-│   ├── completionruntime/                # Go 主路径共享 DeepSeek completion 启动、非流式收集与 retry
+│   ├── completionruntime/                # Go 主路径共享 DeepSeek completion 启动、收集、空输出/切号 retry
 │   ├── config/                           # 配置加载、校验、热更新
 │   ├── deepseek/                         # DeepSeek 上游 client/protocol/transport
 │   │   ├── client/                       # 登录、会话、completion、上传/删除等上游调用
@@ -41,6 +41,7 @@ ds2api/
 │   │   ├── admin/                        # Admin API 根装配与资源子包
 │   │   ├── claude/                       # Claude HTTP 协议适配
 │   │   ├── gemini/                       # Gemini HTTP 协议适配
+│   │   ├── ollama/                       # Ollama 兼容模型/能力查询接口
 │   │   ├── openai/                       # OpenAI HTTP surface
 │   │   │   ├── chat/                     # Chat Completions 执行入口
 │   │   │   ├── responses/                # Responses API 与 response store
@@ -57,6 +58,7 @@ ds2api/
 │   ├── prompt/                           # Prompt 组装
 │   ├── promptcompat/                     # API 请求到 DeepSeek 网页纯文本上下文兼容层
 │   ├── rawsample/                        # raw sample 读写与管理
+│   ├── responsehistory/                  # DeepSeek 上游响应归档与会话快照
 │   ├── server/                           # 路由与中间件装配
 │   │   └── data/                         # 路由/运行时辅助数据
 │   ├── sse/                              # SSE 解析工具
@@ -188,10 +190,11 @@ flowchart LR
 - `internal/server`：路由树和中间件挂载（健康检查、协议入口、Admin/WebUI）。
 - `internal/httpapi/openai/*`：OpenAI HTTP surface，按 chat、responses、files、embeddings、history、shared 拆分；chat/responses 共享 promptcompat、stream、toolcall 等核心语义。
 - `internal/httpapi/{claude,gemini}`：协议输入输出适配，归一到同一套 prompt compatibility 语义；正常直连路径必须通过 `completionruntime` 共享 DeepSeek session/PoW/completion 调用，`translatorcliproxy` 仅保留给 Vercel prepare/release、后端缺失 fallback 和回归测试。
+- `internal/httpapi/ollama`：Ollama 兼容的模型列表与能力查询入口。
 - `internal/httpapi/requestbody`：跨协议复用的请求体读取、JSON 解码前置校验与 UTF-8 错误处理辅助。
 - `internal/promptcompat`：OpenAI/Claude/Gemini 请求到 DeepSeek 网页纯文本上下文的兼容内核。
 - `internal/assistantturn`：Go 输出侧统一语义层，把 DeepSeek SSE 收集结果和流式收尾状态归一成 assistant turn，集中处理 thinking、tool call、citation、usage、stop/error 语义。
- `internal/completionruntime`：Go surface 共享的 completion 执行辅助，负责 DeepSeek session/PoW/call 启动、非流式 collect 和 empty-output retry；流式路径复用它启动上游请求，继续用 `internal/stream` 做实时消费，并在最终收尾阶段接入 `assistantturn`。
+- `internal/completionruntime`：Go surface 共享的 completion 执行辅助，负责 DeepSeek session/PoW/call 启动、非流式 collect、empty-output retry，以及托管账号在最终 429 前的一次切号 fresh retry；流式路径复用它启动上游请求，继续用 `internal/stream` 做实时消费，并在最终收尾阶段接入 `assistantturn`。
 - `internal/translatorcliproxy`：Claude/Gemini 与 OpenAI 结构互转的桥接兼容层，不作为主业务协议转换中心。
 - `internal/deepseek/{client,protocol,transport}`：上游请求、会话、PoW 适配、协议常量与传输层。
 - `internal/js/chat-stream` + `api/chat-stream.js`：Vercel Node 流式桥；Go prepare/release 管理鉴权、账号租约和 completion payload，Node 侧负责实时 SSE 转发并保持 Go 对齐的终结态和 tool sieve 语义。
@@ -199,6 +202,7 @@ flowchart LR
 - `internal/toolcall` + `internal/toolstream`：DSML 外壳兼容与 canonical XML 工具调用解析、防泄漏筛分；DSML 会在入口归一化回 XML，内部仍按 XML 语义解析。
 - `internal/httpapi/admin/*`：Admin API 根装配与 auth/accounts/config/settings/proxies/rawsamples/vercel/history/devcapture/version 等资源子包。
 - `internal/chathistory`：服务器端对话记录持久化、分页、单条详情和保留策略。
+- `internal/responsehistory`：DeepSeek 上游响应归档，会在协议回译/裁剪前保存 assistant text、thinking、tool-call 原始片段和流式详情。
 - `internal/config`：配置加载、校验、运行时 settings 热更新。
 - `internal/account`：托管账号池、并发槽位、等待队列。
 - `internal/textclean`：文本清洗，移除 `[reference: N]` 标记等噪声。
--- a/docs/CONTRIBUTING.en.md
+++ b/docs/CONTRIBUTING.en.md
@@ -9,8 +9,8 @@ Thanks for your interest in contributing to DS2API!
 ### Prerequisites

 - Go 1.26+
- Node.js `20.19+` or `22.12+` (for WebUI development)
- npm (bundled with Node.js)
+- Node.js `20.19+` or `22.12+` (for WebUI development; CI / Docker builds use Node 24)
+- npm (bundled with Node.js; 10+ recommended)

 ### Backend Development

--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -9,8 +9,8 @@
 ### 前置要求

 - Go 1.26+
- Node.js `20.19+` 或 `22.12+`（WebUI 开发时）
- npm（随 Node.js 提供）
+- Node.js `20.19+` 或 `22.12+`（WebUI 开发时；CI / Docker 构建使用 Node 24）
+- npm（随 Node.js 提供，建议 10+）

 ### 后端开发

--- a/docs/DEPLOY.en.md
+++ b/docs/DEPLOY.en.md
@@ -39,8 +39,8 @@ Recommended order when choosing a deployment method:
 | Dependency | Minimum Version | Notes |
 | --- | --- | --- |
 | Go | 1.26+ | Build backend |
-| Node.js | `20.19+` or `22.12+` | Only needed to build WebUI locally |
-| npm | Bundled with Node.js | Install WebUI dependencies |
+| Node.js | `20.19+` or `22.12+` (CI / Docker builds use Node 24) | Only needed to build WebUI locally |
+| npm | Bundled with Node.js; 10+ recommended | Install WebUI dependencies |

 Config source (choose one):

@@ -299,6 +299,8 @@ VERCEL_TEAM_ID=team_xxxxxxxxxxxx   # optional for personal accounts
 | `DS2API_VERCEL_INTERNAL_SECRET` | Hybrid streaming internal auth | Falls back to `DS2API_ADMIN_KEY` |
 | `DS2API_VERCEL_STREAM_LEASE_TTL_SECONDS` | Stream lease TTL | `900` |
 | `DS2API_RAW_STREAM_SAMPLE_ROOT` | Raw stream sample root for saving/reading samples | `tests/raw_stream_samples` |
+| `DS2API_STATIC_ADMIN_DIR` | WebUI static asset directory | `static/admin` |
+| `DS2API_AUTO_BUILD_WEBUI` | Whether local startup auto-builds missing WebUI assets (`1/true/yes/on` or `0/false/no/off`) | Enabled outside Vercel |
 | `VERCEL_TOKEN` | Vercel sync token | — |
 | `VERCEL_PROJECT_ID` | Vercel project ID | — |
 | `VERCEL_TEAM_ID` | Vercel team ID | — |
@@ -321,7 +323,7 @@ Request ──────┐
 ```

 - **Go entry**: `api/index.go` (Serverless Go)
- **Stream entry**: `api/chat-stream.js` (Node Runtime for real-time SSE)
+- **Stream entry**: `api/chat-stream.js` (Node Runtime for real-time SSE; `vercel.json` rewrites only the canonical `/v1/chat/completions` path here, while the root shortcut `/chat/completions` stays on the Go entry)
 - **Routing**: `vercel.json`
 - **Build command**: `npm ci --prefix webui && npm run build --prefix webui` (automatic)

@@ -438,7 +440,7 @@ Default local access URL: `http://127.0.0.1:5001`; the server actually binds to

 ### 4.2 WebUI Build

-On first local startup, if `static/admin/` is missing, DS2API will automatically attempt to build the WebUI (requires Node.js/npm; when dependencies are missing it runs `npm ci` first, then `npm run build -- --outDir static/admin --emptyOutDir`).
+On first local startup, if the WebUI static directory is missing, DS2API automatically attempts to build it (requires Node.js/npm; when dependencies are missing it runs `npm ci --prefix webui`, then `npm run build --prefix webui -- --outDir <static-dir> --emptyOutDir`). The default static directory is `static/admin/`, and `DS2API_STATIC_ADMIN_DIR` can override it.

 Manual build:

--- a/docs/DEPLOY.md
+++ b/docs/DEPLOY.md
@@ -4,7 +4,7 @@

 本指南基于当前 Go 代码库，详细说明各种部署方式。

-本页导航：[文档总索引](./README.md)｜[架构说明](./ARCHITECTURE.md)｜[接口文档](../API.md)｜[测试指南](./TESTING.md)
+本页导航：[文档总索引](./README.md)|[架构说明](./ARCHITECTURE.md)|[接口文档](../API.md)|[测试指南](./TESTING.md)

 ---

@@ -39,8 +39,8 @@
 | 依赖 | 最低版本 | 说明 |
 | --- | --- | --- |
 | Go | 1.26+ | 编译后端 |
-| Node.js | `20.19+` 或 `22.12+` | 仅在需要本地构建 WebUI 时 |
-| npm | 随 Node.js 提供 | 安装 WebUI 依赖 |
+| Node.js | `20.19+` 或 `22.12+`（CI / Docker 构建使用 Node 24） | 仅在需要本地构建 WebUI 时 |
+| npm | 随 Node.js 提供，建议 10+ | 安装 WebUI 依赖 |

 配置来源（任选其一）：

@@ -299,6 +299,8 @@ VERCEL_TEAM_ID=team_xxxxxxxxxxxx   # 个人账号可留空
 | `DS2API_VERCEL_INTERNAL_SECRET` | 混合流式内部鉴权 | 回退用 `DS2API_ADMIN_KEY` |
 | `DS2API_VERCEL_STREAM_LEASE_TTL_SECONDS` | 流式 lease TTL | `900` |
 | `DS2API_RAW_STREAM_SAMPLE_ROOT` | raw stream 样本保存/读取根目录 | `tests/raw_stream_samples` |
+| `DS2API_STATIC_ADMIN_DIR` | WebUI 静态资源目录 | `static/admin` |
+| `DS2API_AUTO_BUILD_WEBUI` | 本地启动时是否自动构建缺失的 WebUI（`1/true/yes/on` 或 `0/false/no/off`） | 非 Vercel 默认开启 |
 | `VERCEL_TOKEN` | Vercel 同步 token | — |
 | `VERCEL_PROJECT_ID` | Vercel 项目 ID | — |
 | `VERCEL_TEAM_ID` | Vercel 团队 ID | — |
@@ -331,7 +333,7 @@ api/index.go  api/chat-stream.js
 ```

 - **入口文件**：`api/index.go`（Serverless Go）
- **流式入口**：`api/chat-stream.js`（Node Runtime，保证实时 SSE）
+- **流式入口**：`api/chat-stream.js`（Node Runtime，保证实时 SSE；`vercel.json` 仅把规范路径 `/v1/chat/completions` 重写到这里，根路径快捷别名 `/chat/completions` 仍走 Go 入口）
 - **路由重写**：`vercel.json`
 - **构建命令**：`npm ci --prefix webui && npm run build --prefix webui`（自动执行）

@@ -448,7 +450,7 @@ go run ./cmd/ds2api

 ### 4.2 WebUI 构建

-本地首次启动时，若 `static/admin/` 不存在，服务会自动尝试构建 WebUI（需要 Node.js/npm；缺依赖时会先执行 `npm ci`，再执行 `npm run build -- --outDir static/admin --emptyOutDir`）。
+本地首次启动时，若 WebUI 静态目录不存在，服务会自动尝试构建 WebUI（需要 Node.js/npm；缺依赖时会先执行 `npm ci --prefix webui`，再执行 `npm run build --prefix webui -- --outDir <静态目录> --emptyOutDir`）。默认静态目录为 `static/admin/`，可用 `DS2API_STATIC_ADMIN_DIR` 覆盖。

 你也可以手动构建：

--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -81,7 +81,7 @@ Tool call 问题优先跑：

 ```bash
 go test -v ./internal/toolcall ./internal/toolstream -count=1
-node --test tests/node/stream-tool-sieve.test.js tests/node/chat-stream.test.js
+./tests/scripts/run-unit-node.sh
 ```

 ## 5. 测试选择
--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -75,7 +75,7 @@ npm run build --prefix webui
 1. **Preflight 检查**：
   - `go test ./... -count=1`（单元测试）
   - `./tests/scripts/check-node-split-syntax.sh`（Node 拆分模块语法门禁）
-   - `node --test tests/node/stream-tool-sieve.test.js tests/node/chat-stream.test.js tests/node/js_compat_test.js`
+   - `node --test --test-concurrency=1 tests/node/stream-tool-sieve.test.js tests/node/chat-stream.test.js tests/node/chat-history-utils.test.js tests/node/js_compat_test.js`
   - `npm run build --prefix webui`（WebUI 构建检查）

 2. **隔离启动**：复制 `config.json` 到临时目录，启动独立服务进程
@@ -203,10 +203,10 @@ go test ./...

 ```bash
 # 运行 tool calls 相关测试（推荐用于调试 tool call 解析问题）
-go test -v -run 'TestParseToolCalls|TestRepair' ./internal/toolcall/
+go test -v -run 'TestParseToolCalls|TestProcessToolSieve|TestRepair' ./internal/toolcall ./internal/toolstream

 # 运行单个测试用例
-go test -v -run TestParseToolCallsWithDeepSeekHallucination ./internal/toolcall/
+go test -v -run TestParseToolCallsAllowsAllEmptyParameterPayload ./internal/toolcall

 # 运行 format 相关测试
 go test -v ./internal/format/...
@@ -221,23 +221,23 @@ go test -v ./internal/httpapi/openai/...

 ```bash
 # 1. 运行 tool calls 相关的所有测试
-go test -v -run 'TestParseToolCalls|TestRepair' ./internal/toolcall/
+go test -v -run 'TestParseToolCalls|TestProcessToolSieve|TestRepair' ./internal/toolcall ./internal/toolstream

 # 2. 查看测试输出中的详细调试信息
-go test -v -run TestParseToolCallsWithDeepSeekHallucination ./internal/toolcall/ 2>&1
+go test -v -run TestProcessToolSieveReleasesMalformedExecutableXMLBlock ./internal/toolstream 2>&1

 # 3. 检查具体测试用例的修复效果
-# 测试用例位于 internal/toolcall/toolcalls_test.go，包含：
-# - TestParseToolCallsWithDeepSeekHallucination: DeepSeek 典型幻觉输出
+# 重点测试位于 internal/toolcall/toolcalls_test.go 与 internal/toolstream/tool_sieve_xml_test.go，包含：
+# - TestParseToolCallsAllowsAllEmptyParameterPayload: 空参数结构化保留
+# - TestProcessToolSieveReleasesMalformedExecutableXMLBlock: malformed XML wrapper 释放为文本
 # - TestRepairLooseJSONWithNestedObjects: 嵌套对象的方括号修复
-# - TestParseToolCallsWithMixedWindowsPaths: Windows 路径处理
 ```

 ### 运行 Node.js 测试

 ```bash
 # 运行 Node 测试
-node --test tests/node/stream-tool-sieve.test.js
+node --test --test-concurrency=1 tests/node/stream-tool-sieve.test.js tests/node/chat-stream.test.js tests/node/chat-history-utils.test.js tests/node/js_compat_test.js

 # 或使用脚本
 ./tests/scripts/run-unit-node.sh
--- a/docs/prompt-compatibility.md
+++ b/docs/prompt-compatibility.md
@@ -89,7 +89,7 @@ DS2API 当前的核心思路，不是把客户端传来的 `messages`、`tools`
  "chat_session_id": "session-id",
  "model_type": "default",
  "parent_message_id": null,
-  "prompt": "<｜begin▁of▁sentence｜>...",
+  "prompt": "<|begin▁of▁sentence|>...",
  "ref_file_ids": [
    "file-history",
    "file-systemprompt",
@@ -111,8 +111,8 @@ DS2API 当前的核心思路，不是把客户端传来的 `messages`、`tools`
 - OpenAI Chat / Responses 原生走统一 OpenAI 标准化与 DeepSeek payload 组装；Claude / Gemini 会尽量复用 OpenAI prompt/tool 语义，其中 Gemini 直接复用 `promptcompat.BuildOpenAIPromptForAdapter`。Go 主服务新增 `completionruntime` 启动层，统一执行 DeepSeek session/PoW/call；输出侧新增 `assistantturn` 语义层：非流式 OpenAI Chat / Responses / Claude / Gemini 会把 DeepSeek SSE 收集结果先归一成同一份 assistant turn，再分别渲染成各协议原生外形；流式 OpenAI Chat / Responses / Claude / Gemini 继续保持各协议实时 SSE framing，但最终收尾的 tool fallback、schema 归一、usage、empty-output / content-filter 错误语义同样由 `assistantturn` 判定。Claude / Gemini 的常规 Go 主路径不再依赖内部 `httptest` 转发到 OpenAI handler；`translatorcliproxy` 仅保留用于 Vercel bridge、后端缺失 fallback 和回归测试，不作为主业务协议转换中心。
 - Vercel Node 流式路径本轮不迁移，仍使用现有 Node bridge / stream-tool-sieve 实现；后续若变更 Node 流式语义，需要按 `assistantturn` 的 Go canonical 输出语义同步对齐。
 - 客户端传入的 thinking / reasoning 开关会被归一到下游 `thinking_enabled`。Gemini `generationConfig.thinkingConfig.thinkingBudget` 会翻译成同一套 thinking 开关；关闭时即使上游返回 `response/thinking_content`，兼容层也不会把它当作可见正文输出。若最终解析出的模型名带 `-nothinking` 后缀，则会无条件强制关闭 thinking，优先级高于请求体中的 `thinking` / `reasoning` / `reasoning_effort`。未显式关闭时，各 surface 会按解析后的 DeepSeek 模型默认能力开启 thinking，并用各自协议的原生形态暴露：OpenAI Chat 为 `reasoning_content`，OpenAI Responses 为 `response.reasoning.delta` / `reasoning` content，Claude 为 `thinking` block / `thinking_delta`，Gemini 为 `thought: true` part。
- 对 OpenAI Chat / Responses 的非流式收尾，如果最终可见正文为空，兼容层会优先尝试把思维链中的独立 DSML / XML 工具块当作真实工具调用解析出来。流式链路也会在收尾阶段做同样的 fallback 检测，但不会因为思维链内容去中途拦截或改写流式输出；真正的工具识别始终基于原始上游文本，而不是基于“已经做过可见输出清洗”的版本，因此即使最终可见层会剥离完整 leaked DSML / XML `tool_calls` wrapper、并抑制全空参数或无效 wrapper 块，也不会影响真实工具调用转成结构化 `tool_calls` / `function_call`。补发结果会作为本轮 assistant 的结构化 `tool_calls` / `function_call` 输出返回，而不是塞进 `content` 文本；如果客户端没有开启 thinking / reasoning，思维链只用于检测，不会作为 `reasoning_content` 或可见正文暴露。只有正文为空且思维链里也没有可执行工具调用时，才继续按空回复错误处理。
- OpenAI Chat / Responses 的空回复错误处理之前会默认做一次内部补偿重试：第一次上游完整结束后，如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用，并且终止原因不是 `content_filter`，兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略，把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。重试遵循 DeepSeek 多轮对话协议：从第一次上游 SSE 流中提取 `response_message_id`，并在重试 payload 中设置 `parent_message_id` 为该值，使重试成为同一会话的后续轮次而非断裂的根消息；同时重新获取一次 PoW（若 PoW 获取失败则回退到原始 PoW）。该重试不会重新标准化消息、不会新建 session、不会切换账号，也不会向流式客户端插入重试标记；第二次 thinking / reasoning 会按正常增量直接接到第一次之后，并继续使用 overlap trim 去重。若第二次仍为空，终端错误码仍保持现有 `upstream_empty_output`；若任一尝试触发空 `content_filter`，不做补偿重试并保持 `content_filter` 错误。JS Vercel 运行时同样设置 `parent_message_id`，但因无法直接调用 PoW API 而复用原始 PoW。
+- 对 OpenAI Chat / Responses 的非流式收尾，如果最终可见正文为空，兼容层会优先尝试把思维链中的独立 DSML / XML 工具块当作真实工具调用解析出来。流式链路也会在收尾阶段做同样的 fallback 检测，但不会因为思维链内容去中途拦截或改写流式输出；真正的工具识别始终基于原始上游文本，而不是基于“已经做过可见输出清洗”的版本。最终可见层会剥离已经成功解析成工具调用的完整 leaked DSML / XML `tool_calls` wrapper；如果遇到完整 wrapper 但内部形态不符合可执行工具调用语义（例如 `<param>` 这类 malformed XML 工具壳），流式 sieve 会把该块作为普通文本释放，而不是吞掉或伪造成工具调用。补发结果会作为本轮 assistant 的结构化 `tool_calls` / `function_call` 输出返回，而不是塞进 `content` 文本；如果客户端没有开启 thinking / reasoning，思维链只用于检测，不会作为 `reasoning_content` 或可见正文暴露。只有正文为空且思维链里也没有可执行工具调用时，才继续按空回复错误处理。
+- OpenAI Chat / Responses、Claude Messages、Gemini generateContent 的空回复错误处理之前会默认做一次内部补偿重试：第一次上游完整结束后，如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用，并且终止原因不是 `content_filter`，兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略，把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。Go 主路径的非流式重试由 `completionruntime.ExecuteNonStreamWithRetry` 统一处理；流式重试由 `completionruntime.ExecuteStreamWithRetry` 统一处理，各协议 runtime 只负责消费/渲染本协议 SSE framing。重试遵循 DeepSeek 多轮对话协议：从第一次上游 SSE 流中提取 `response_message_id`，并在重试 payload 中设置 `parent_message_id` 为该值，使重试成为同一会话的后续轮次而非断裂的根消息；同时重新获取一次 PoW（若 PoW 获取失败则回退到原始 PoW）。该同账号重试不会重新标准化消息、不会新建 session，也不会向流式客户端插入重试标记；第二次 thinking / reasoning 会按正常增量直接接到第一次之后，并继续使用 overlap trim 去重。若同账号补偿重试后即将返回 429 `upstream_empty_output`，并且当前是托管账号模式，runtime 会在返回 429 前切换到下一个可用账号，新建 `chat_session_id`，使用原始 completion payload 再做一次 fresh retry；该切号重试不携带空回复 prompt 后缀，也不设置上一账号的 `parent_message_id`。如果 current input file 已触发，切号前会在新账号上重新上传同一份 `DS2API_HISTORY.txt`（以及需要时的 `DS2API_TOOLS.txt`），并用新账号可见的 file_id 替换自动生成的旧 file_id；客户端原本传入的其他文件引用保持不变。如果没有可切换账号，或切号后的 fresh retry 仍没有可见正文或工具调用，则继续按原错误返回：无任何输出为 503 `upstream_unavailable`，有 reasoning 但没有可见正文或工具调用为 429 `upstream_empty_output`。若任一尝试触发空 `content_filter`，不做补偿重试并保持 `content_filter` 错误。Vercel Node 流式路径通过 Go 内部 prepare / pow / switch 端点获取初始 payload、重试 PoW 和切号 fresh retry payload，因此同样会重新上传 current-input 自动文件并替换为新账号 file_id。

 - 非流式 OpenAI Chat / Responses、Claude Messages、Gemini generateContent 在最终可见正文渲染阶段，会把 DeepSeek 搜索返回中的 `[citation:N]` / `[reference:N]` 标记替换成对应 Markdown 链接。`citation` 标记按一基序号解析；`reference` 标记只有在同一段正文中出现 `[reference:0]`（允许冒号后有空格）时才按零基序号映射，并且不会影响同段正文里的 `citation` 标记。
 - 流式输出仍默认隐藏 `[citation:N]` / `[reference:N]` 这类上游内部标记，避免分片输出中泄漏尚未完成映射的引用占位符。
@@ -135,14 +135,14 @@ OpenAI Chat / Responses 在标准化后、current input file 之前，会默认

 最终 prompt 使用 DeepSeek 风格角色标记：

- `<｜begin▁of▁sentence｜>`
- `<｜System｜>`
- `<｜User｜>`
- `<｜Assistant｜>`
- `<｜Tool｜>`
- `<｜end▁of▁instructions｜>`
- `<｜end▁of▁sentence｜>`
- `<｜end▁of▁toolresults｜>`
+- `<|begin▁of▁sentence|>`
+- `<|System|>`
+- `<|User|>`
+- `<|Assistant|>`
+- `<|Tool|>`
+- `<|end▁of▁instructions|>`
+- `<|end▁of▁sentence|>`
+- `<|end▁of▁toolresults|>`

 实现位置：
 [internal/prompt/messages.go](../internal/prompt/messages.go)
@@ -165,16 +165,17 @@ OpenAI Chat / Responses 在标准化后、current input file 之前，会默认
 1. 把每个 tool 的名称、描述、参数 schema 序列化成文本。
 2. 拼成 `You have access to these tools:` 大段说明。
 3. 再附上统一的 DSML tool call 外壳格式约束。
-4. 把这整段内容并入 system prompt。
+4. 普通直传请求会把“工具描述 + 格式约束”一起并入 system prompt；如果 `current_input_file` 触发，则工具描述/schema 会单独上传成 `DS2API_TOOLS.txt`，live prompt 和 system tool 格式提示都会明确要求模型把 `DS2API_TOOLS.txt` 当作可调用工具和参数 schema 的权威来源。

-工具调用正例现在优先示范官方 DSML 风格：`<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`。
-兼容层仍接受旧式纯 `<tool_calls>` wrapper，并会容错若干 DSML 标签变体，包括短横线形式 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`；但提示词会优先要求模型输出官方 DSML 标签，并强调不能只输出 closing wrapper 而漏掉 opening tag。需要注意：这是“兼容 DSML 外壳，内部仍以 XML 解析语义为准”，不是原生 DSML 全链路实现；DSML 标签会在解析入口归一化回现有 XML 标签后继续走同一套 parser。
+工具调用正例现在优先示范半角管道符 DSML 风格：`<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`。
+兼容层仍接受旧式纯 `<tool_calls>` wrapper，并会容错若干 DSML 标签变体，包括短横线形式 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`、下划线形式 `<dsml_tool_calls>` / `<dsml_invoke>` / `<dsml_parameter>`，以及其他前缀分隔形态如 `<vendor|tool_calls>` / `<vendor_tool_calls>` / `<vendor - tool_calls>`；标签壳扫描还会把全角 ASCII 漂移归一化，例如 `<ｄＳＭＬ|tool_calls>` 与全角 `＞` 结束符，也会容错 CJK 尖括号、全角感叹号或顿号分隔符、弯引号属性值、PascalCase 本地名和属性尾部分隔符漂移，例如 `<DSM|parameter name="command"|>...〈/DSM|parameter〉`、`<！DSML！invoke name=“Bash”>`、`<、DSML、tool_calls>`、`<DSmartToolCalls>`、`<DSMLtool_calls※>`。更一般地，Go / Node tag 扫描以固定本地标签名 `tool_calls` / `invoke` / `parameter` 为准，标签名前或标签名后的非结构性协议分隔符都会在解析入口剥离，例如 `<DSML␂tool_calls>`、`<proto💥tool_calls>` 这类控制符或非 ASCII 分隔符漂移也会归一化回现有 XML 标签后继续走同一套 parser；结构性字符如 `<` / `>` / `/` / `=` / 引号、空白和 ASCII 字母数字不会被当作这类分隔符。进入现有 DSML rewrite / XML parse 之前，Go / Node 还会先对“已经识别成工具标签壳的 candidate span”做一次窄 canonicalization：只折叠 wrapper / `invoke` / `parameter` / `name` / `CDATA` / `DSML` 及其壳层分隔符里的 confusable 字符，清理零宽 / BOM / 控制类干扰，并把引号、空白、dash / underscore 变体等统一回可解析的工具语法。这个阶段不会广义改写普通正文、参数内容、Markdown 行内 code span、CDATA 里的示例文本或其他非工具 XML。CDATA 开头也使用同一类扫描式容错，`<![CDATA[` / `<！[CDATA[` / `<、[CDATA[` 都会作为参数原文容器处理。但提示词会优先要求模型输出官方 DSML 标签，并强调不能只输出 closing wrapper 而漏掉 opening tag。需要注意：这是“兼容 DSML 外壳，内部仍以 XML 解析语义为准”，不是原生 DSML 全链路实现。解析器会先截获非 Markdown 代码上下文中的疑似工具 wrapper，完整解析失败或工具语义无效时再按普通文本放行。
 数组参数使用 `<item>...</item>` 子节点表示；当某个参数体只包含 item 子节点时，Go / Node 解析器会把它还原成数组，避免 `questions` / `options` 这类 schema 中要求 array 的参数被误解析成 `{ "item": ... }` 对象。除此之外，解析器还会回收一些更松散的列表写法，例如 JSON array 字面量或逗号分隔的 JSON 项序列，只要它们足够明确；但 `<item>` 仍然是首选形态。若模型把完整结构化 XML fragment 误包进 CDATA，兼容层会在保护 `content` / `command` 等原文字段的前提下，尝试把非原文字段中的 CDATA XML fragment 还原成 object / array。不过，如果 CDATA 只是单个平面的 XML/HTML 标签，例如 `<b>urgent</b>` 这种行内标记，兼容层会保留原始字符串，不会强行升成 object / array；只有明显表示结构的 CDATA 片段，例如多兄弟节点、嵌套子节点或 `item` 列表，才会触发结构化恢复。对 `command` / `content` 等长文本参数，CDATA 内部的 Markdown fenced DSML / XML 示例会作为原文保护；示例里的 `]]></parameter>` 或 `</tool_calls>` 不会截断外层工具调用，解析器会继续等待围栏外真正的参数 / wrapper 结束标签。
 Go 侧读取 DeepSeek SSE 时不再依赖 `bufio.Scanner` 的固定 2MiB 单行上限；当写文件类工具把很长的 `content` 放在单个 `data:` 行里返回时，非流式收集、流式解析和 auto-continue 透传都会保留完整行，再进入同一套工具解析与序列化流程。
 在 assistant 最终回包阶段，如果某个 tool 参数在声明 schema 中明确是 `string`，兼容层会在把解析后的 `tool_calls` / `function_call` 重新序列化成 OpenAI / Responses / Claude 可见参数前，递归把该路径上的 number / bool / object / array 统一转成字符串；其中 object / array 会压成紧凑 JSON 字符串。这个保护只对 schema 明确声明为 string 的路径生效，不会改写本来就是 `number` / `boolean` / `object` / `array` 的参数。这样可以兼容 DeepSeek 输出了结构化片段、但上游客户端工具 schema 又严格要求字符串参数的场景（例如 `content`、`prompt`、`path`、`taskId` 等）。
 工具 schema 的权威来源始终是**当前请求实际携带的 schema**，而不是同名工具在其他 runtime（Claude Code / OpenCode / Codex 等）里的默认印象。兼容层现在会同时兼容 OpenAI 风格 `function.parameters`、直接工具对象上的 `parameters` / `input_schema`、以及 camelCase 的 `inputSchema` / `schema`，并在最终输出阶段按这份请求内 schema 决定是保留 array/object，还是仅对明确声明为 `string` 的路径做字符串化。该规则同样适用于 Claude 的流式收尾和 Vercel Node 流式 tool-call formatter，避免不同 runtime 因 schema shape 差异而出现同名工具参数类型漂移。
 正例中的工具名只会来自当前请求实际声明的工具；如果当前请求没有足够的已知工具形态，就省略对应的单工具、多工具或嵌套示例，避免把不可用工具名写进 prompt。
 对执行类工具，脚本内容必须进入执行参数本身：`Bash` / `execute_command` 使用 `command`，`exec_command` 使用 `cmd`；不要把脚本示范成 `path` / `content` 文件写入参数。
+工具提示词也会明确要求模型按本次调用实际需要填写参数，禁止输出 placeholder、空字符串或纯空白参数；如果必填参数未知，应先追问用户或正常文字回复，而不是输出空工具壳。对 `Bash` / `execute_command` 这类 shell 工具，命令或脚本必须写入 `command` 参数。解析层仍会把空字符串参数结构化返回；是否拒绝空 `command` 由后续工具执行侧 / 客户端 schema 校验决定。
 如果当前请求声明了 `Read` / `read_file` 这类读取工具，兼容层会额外注入一条 read-tool cache guard：当读取结果只表示“文件未变更 / 已在历史中 / 请引用先前上下文 / 没有正文内容”时，模型必须把它视为内容不可用，不能反复调用同一个无正文读取；应改为请求完整正文读取能力，或向用户说明需要重新提供文件内容。这个约束只缓解客户端缓存返回空内容导致的死循环，DS2API 不会也无法凭空恢复客户端本地文件正文。

 OpenAI 路径实现：
@@ -205,6 +206,10 @@ assistant 的 reasoning 会变成一个显式标签块：

 然后再接可见回答正文。

+对最终返回给客户端的 assistant 轮次，reasoning 不会因为本轮输出了工具调用而被丢弃。OpenAI Chat 会在同一个 assistant message 上同时返回 `reasoning_content` 和 `tool_calls`；OpenAI Responses 会先返回一个包含 `reasoning` content 的 assistant message item，再返回后续 `function_call` item；Claude / Gemini 也会在各自原生 thinking / thought 结构后继续返回 tool_use / functionCall。
+
+对进入后续 prompt / `DS2API_HISTORY.txt` 的历史轮次，兼容层也会把同一轮工具调用前的 reasoning 绑定到 assistant tool call 历史上。OpenAI Chat 原生 `reasoning_content + tool_calls` 会直接保留；OpenAI Responses 若以 `reasoning` message item 后接 `function_call` item 的形式回放历史，会在归一化时合并为同一个 assistant 历史块；Claude 的 `thinking` block 会绑定到后续 `tool_use`；Gemini 的 `thought: true` part 会绑定到后续 `functionCall`。最终 prompt 中的顺序固定为 `[reasoning_content]...[/reasoning_content]`，再接 DSML tool call 外壳。
+
 ### 7.2 历史 tool_calls 保留方式

 assistant 历史 `tool_calls` 不会保留成 OpenAI 原生 JSON，而会转成 prompt 可见的 DSML 外壳：
@@ -217,8 +222,10 @@ assistant 历史 `tool_calls` 不会保留成 OpenAI 原生 JSON，而会转成
 </|DSML|tool_calls>
 ```

+如果客户端历史里没有结构化 `tool_calls` 字段、却把一个可独立解析的 assistant 工具块放进了普通 `content`，兼容层会在写入后续 prompt 前先按工具调用解析它，再重渲染为规范 DSML 历史外壳。这样可以避免一次 malformed 工具块未被结构化保存后，作为普通 assistant 文本回灌，继续污染后续模型的 few-shot 工具格式。
+
 解析层同时兼容旧式纯 XML 形态：`<tool_calls>` / `<invoke>` / `<parameter>`。两者都会先归一到现有 XML 解析语义；其他旧格式都会作为普通文本保留，不会作为可执行调用语法。
-例外是 parser 会对一个非常窄的模型失误做修复：如果 assistant 输出了 `<invoke ...>` ... `</tool_calls>`（或 DSML 对应标签），但漏掉最前面的 opening wrapper，解析阶段会补回 wrapper 后再尝试识别。
+例外是 parser 会对一个非常窄的模型失误做修复：如果 assistant 输出了 `<invoke ...>` ... `</tool_calls>`（或 DSML 对应标签），但漏掉最前面的 opening wrapper，解析阶段会在 wrapper-confidence 足够高时补回 wrapper 后再尝试识别。这里的 wrapper-confidence 指 scanner 已经识别出白名单工具壳结构，剩余失败只像壳层结构漂移，而不是语义上接近但不在白名单内的 near-miss 标签名。修复成功时，wrapper 后面的 suffix prose 会继续保留在可见文本里；修复失败时，该块仍按普通文本处理。

 这件事很重要，因为它决定了：

@@ -230,7 +237,7 @@ assistant 历史 `tool_calls` 不会保留成 OpenAI 原生 JSON，而会转成

 ### 7.3 tool result 保留方式

-tool / function role 的结果会作为 `<｜Tool｜>...<｜end▁of▁toolresults｜>` 进入 prompt。
+tool / function role 的结果会作为 `<|Tool|>...<|end▁of▁toolresults|>` 进入 prompt。

 如果 tool content 为空，当前会补成字符串 `"null"`，避免整个 tool turn 丢失。

@@ -271,7 +278,7 @@ OpenAI 的文件上传现在不再是“只传文件本体”的通用路径，

 兼容层现在只保留 `current_input_file` 这一种拆分方式；旧的 `history_split` 配置字段已移除，读取旧配置时会忽略它且不会再写回。

- `current_input_file` 默认开启；它在统一 completion runtime 入口全局生效，用于把“完整上下文”合并进 `DS2API_HISTORY.txt` 上下文文件。当最新 user turn 的纯文本长度达到 `current_input_file.min_chars`（默认 `0`）时，runtime 会上传一个文件名为 `DS2API_HISTORY.txt` 的上下文文件。文件内容会先经过各协议入口的标准化，再序列化成按轮次编号的 `DS2API_HISTORY.txt` 风格 transcript，带有 `# DS2API_HISTORY.txt` 标题和 `=== N. ROLE ===` 分段；live prompt 中则会给出一个 continuation 语气的 user 消息，引导模型从 `DS2API_HISTORY.txt` 的最新状态继续推进，并直接回答最新请求，避免把任务拉回起点。
+- `current_input_file` 默认开启；它在统一 completion runtime 入口全局生效，用于把“完整上下文”合并进 `DS2API_HISTORY.txt` 上下文文件。当最新 user turn 的纯文本长度达到 `current_input_file.min_chars`（默认 `0`）时，runtime 会上传一个文件名为 `DS2API_HISTORY.txt` 的上下文文件。文件内容会先经过各协议入口的标准化，再序列化成按轮次编号的 `DS2API_HISTORY.txt` 风格 transcript，带有 `# DS2API_HISTORY.txt` 标题和 `=== N. ROLE ===` 分段；如果当前请求声明了可用工具，还会把工具名称、描述和参数 schema 单独上传成 `DS2API_TOOLS.txt`，带有 `# DS2API_TOOLS.txt` 标题。live prompt 中则会给出一个 continuation 语气的 user 消息，引导模型从 `DS2API_HISTORY.txt` 的最新状态继续推进，并在有工具文件时明确可用工具 schema 位于 `DS2API_TOOLS.txt`；system prompt 也会在统一 DSML 工具格式约束前说明 `DS2API_TOOLS.txt` 是可调用工具和 schema 的权威来源，同时保留本轮工具选择策略，避免把任务拉回起点。
 - 如果 `current_input_file.enabled=false`，请求会直接透传，不上传任何拆分上下文文件。
 - 即使触发 `current_input_file` 后 live prompt 被缩短，对客户端回包里的上下文 token 统计，仍会沿用**拆分前的完整 prompt 语义**做计数，而不是按缩短后的占位 prompt 计算；否则会把真实上下文显著算小。

@@ -284,7 +291,7 @@ OpenAI 的文件上传现在不再是“只传文件本体”的通用路径，
 - 全局 completion runtime 应用点：
  [internal/completionruntime/nonstream.go](../internal/completionruntime/nonstream.go)

-当前输入转文件启用并触发时，上传文件的真实文件名是 `DS2API_HISTORY.txt`，文件内容是完整 `messages` 上下文；它会使用 OpenAI-compatible 的消息/transcript 序列化规则和 DeepSeek 角色标记，再按轮次编号成 `DS2API_HISTORY.txt` 风格的 transcript（不再注入文件边界标签）：
+当前输入转文件启用并触发时，上传的历史文件真实文件名是 `DS2API_HISTORY.txt`，文件内容是完整 `messages` 上下文；它会使用 OpenAI-compatible 的消息/transcript 序列化规则和 DeepSeek 角色标记，再按轮次编号成 `DS2API_HISTORY.txt` 风格的 transcript（不再注入文件边界标签）：

 ```text
 [uploaded filename]: DS2API_HISTORY.txt
@@ -304,7 +311,21 @@ Prior conversation history and tool progress.
 ...
 ```

-开启后，请求的 live prompt 不再直接内联完整上下文，而是保留一个 user role 的短提示，提示模型基于已提供上下文直接回答最新请求；上传后的 `file_id` 会进入 `ref_file_ids`。
+如果当前请求带有工具，runtime 同时上传 `DS2API_TOOLS.txt`：
+
+```text
+[uploaded filename]: DS2API_TOOLS.txt
+# DS2API_TOOLS.txt
+Available tool descriptions and parameter schemas for this request.
+
+You have access to these tools:
+
+Tool: ...
+Description: ...
+Parameters: ...
+```
+
+开启后，请求的 live prompt 不再直接内联完整上下文，也不再内联大段工具 schema；它保留一个 user role 的短提示，提示模型基于已提供上下文直接回答最新请求，并在有工具时引用 `DS2API_TOOLS.txt`。上传后的 `DS2API_HISTORY.txt` file_id 会排在 `ref_file_ids` 最前；如果存在 `DS2API_TOOLS.txt`，它的 file_id 紧随其后；客户端已有的其他 file_id 保持在后面。上下文 token 统计会包含上传的历史文件、工具文件和 live prompt。自动生成的 current-input 文件引用会被记录为 runtime 状态；如果托管账号模式切号 fresh retry，runtime 会重新上传这些自动文件，而不是把上一账号的 file_id 交给新账号。

 ## 10. 各协议入口的差异

@@ -314,7 +335,7 @@ Prior conversation history and tool progress.

 - `developer` 会映射到 `system`
 - Responses `instructions` 会 prepend 为 system message
- `tools` 会注入 system prompt
+- 普通直传时 `tools` 会注入 system prompt；`current_input_file` 触发时工具描述/schema 会拆成 `DS2API_TOOLS.txt`，system prompt 保留格式/策略规则并明确要求模型从 `DS2API_TOOLS.txt` 获取可调用工具和 schema
 - `attachments` / `input_file` / inline 文件会进入 `ref_file_ids`
 - current input file 在统一 completion runtime 入口全局生效

@@ -324,7 +345,7 @@ Prior conversation history and tool progress.

 - top-level `system` 优先作为系统提示
 - `tool_use` / `tool_result` 会被转换成统一的 assistant/tool 历史语义
- `tools` 同样会被并进 system prompt
+- 普通直传时 `tools` 同样会被并进 system prompt；`current_input_file` 触发时会沿用统一的 `DS2API_TOOLS.txt` 拆分上传路径
 - 常规执行通过 `internal/httpapi/claude/handler_messages.go` 转到 OpenAI chat 路径，模型 alias 会先解析成 DeepSeek 原生模型
 - 当前代码里没有像 OpenAI 那样完整的 `ref_file_ids` 附件链路

@@ -334,7 +355,7 @@ Prior conversation history and tool progress.

 - `systemInstruction`、`contents.parts`、`functionCall`、`functionResponse` 会先归一
 - tools 会转成 OpenAI 风格 function schema
- prompt 构建复用 OpenAI 的 `promptcompat.BuildOpenAIPromptForAdapter`
+- prompt 构建复用 OpenAI 的 `promptcompat.BuildOpenAIPromptForAdapter`，`current_input_file` 触发时也会使用统一的 `DS2API_TOOLS.txt` 拆分上传路径
 - 未识别的非文本 part 会被安全序列化进 prompt，并对二进制/疑似 base64 内容做省略或截断处理

 也就是说，Gemini 在“最终 prompt 语义”上，尽量和 OpenAI 保持一致。
@@ -353,9 +374,10 @@ Prior conversation history and tool progress.

 ```json
 {
-  "prompt": "<｜begin▁of▁sentence｜><｜System｜>原 system / developer\n\nYou have access to these tools: ...<｜end▁of▁instructions｜><｜User｜>Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly.<｜Assistant｜>",
+  "prompt": "<|begin▁of▁sentence|><|System|>原 system / developer\n\nTOOL CALL FORMAT — FOLLOW EXACTLY: ...<|end▁of▁instructions|><|User|>Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly. Available tool descriptions and parameter schemas are attached in DS2API_TOOLS.txt; use only those tools and follow the tool-call format rules in this prompt.<|Assistant|>",
  "ref_file_ids": [
-    "file-current-input-ignore",
+    "file-ds2api-history",
+    "file-ds2api-tools",
    "file-systemprompt",
    "file-other-attachment"
  ],
@@ -419,7 +441,8 @@ Prior conversation history and tool progress.
 如果改的是 tool call 相关兼容语义，还应同时检查：

 - `go test ./internal/toolcall/...`
- `node --test tests/node/stream-tool-sieve.test.js`
+- `go test ./internal/toolstream/...`
+- `./tests/scripts/run-unit-node.sh`

 ## 14. 文档同步约定

--- a/docs/toolcall-semantics.md
+++ b/docs/toolcall-semantics.md
@@ -6,7 +6,7 @@

 ## 1) 当前可执行格式

-当前版本推荐模型输出 DSML 外壳：
+当前版本推荐模型输出半角管道符 DSML 外壳：

 ```xml
 <|DSML|tool_calls>
@@ -39,8 +39,11 @@
 兼容修复：

 - 如果模型漏掉 opening wrapper，但后面仍输出了一个或多个 invoke 并以 closing wrapper 收尾，Go 解析链路会在解析前补回缺失的 opening wrapper。
- Go / Node 解析层不再枚举每一种 DSML typo。它会把工具标签名前的 `DSML`、管道符 `|` / `｜`、空白、重复 leading `<` 视为可容忍的协议噪声，然后只匹配固定本地标签名 `tool_calls` / `invoke` / `parameter`。例如 `<DSML|tool_calls>`、`<<|DSML|tool_calls>`、`<|DSML tool_calls>`、`<DSMLtool_calls>`、`<<DSML|DSML|tool_calls>` 都会归一化；相似但非固定标签名（如 `tool_calls_extra`）仍按普通文本处理。
- 如果模型在固定工具标签名后多输出一个尾部管道符，例如 `<|DSML|tool_calls|` / `<|DSML|invoke|` / `<|DSML|parameter|`，兼容层会把这个尾部 `|` 当作异常标签终止符并补齐缺失的 `>`；如果后面已经有 `>`，也会消费这个多余 `|` 后再归一化。
+- 在进入现有 DSML rewrite / XML parse 之前，Go / Node 都会先做一次非常窄的 candidate-span canonicalization：只处理已经被 scanner 识别为工具标签壳的 wrapper / `invoke` / `parameter` / `name` / `CDATA` / `DSML` 及其结构分隔符；这里会移除零宽 / BOM / 控制类干扰字符，并把 `<`、`>`、`/`、`|`、`=`、引号、Unicode 空白、常见 dash / underscore 变体这类工具语法外壳符号折回 ASCII 语义。
+- Go / Node 解析层不再枚举每一种 DSML typo。它以固定本地标签名 `tool_calls` / `invoke` / `parameter` 为准，把标签名前的任意协议前缀壳视为可容忍噪声，并继续兼容半角管道符、全角感叹号 `！`、顿号 `、`、空白、重复 leading `<`、可视控制符 `␂`、原始 STX `\x02`、非 ASCII 分隔符、CJK 尖括号 `〈` / `〉`、弯引号属性值、PascalCase 本地名等漂移。例如 `<DSML|tool_calls>`、`<<|DSML|tool_calls>`、`<|DSML tool_calls>`、`<DSMLtool_calls>`、`<DSmartToolCalls>`、`<<DSML|DSML|tool_calls>`、`<DSML␂tool_calls>`、`<proto💥tool_calls>`、`<DSM|tool_calls>...〈/DSM|tool_calls〉`、`<！DSML！tool_calls>...<！/DSML！tool_calls>`、`<、DSML、tool_calls>...<、/DSML、tool_calls>` 都会归一化；相似但非固定标签名（如 `tool_calls_extra` / `ToolCallsExtra`）仍按普通文本处理。
+- 这个 candidate-span canonicalization 不会对普通 prose、参数正文、CDATA 内容或嵌套的非工具 XML 做广义 Unicode 归一化。也就是说，参数里的示例 `<invοke>`、普通聊天文本里的 confusable 单词、或其他非工具壳 XML 片段都保持原样；只有真正落在工具标签壳上的 whitelist 关键字和结构符号会被折叠。
+- 如果模型在固定工具标签名后多输出一个非结构性分隔符，例如 `<|DSML|tool_calls|` / `<|DSML|invoke|` / `<|DSML|parameter|` / `<DSMLtool_calls※>`，或在带属性标签的结束符前多输出一个尾部分隔符（如 `<DSM|parameter name="command"|>`），兼容层会把这个尾部分隔符当作异常标签终止符并补齐或归一化；如果后面已经有 `>` / `〉`，也会消费这个多余分隔符后再归一化。结构性字符如 `<` / `>` / `/` / `=` / 引号、空白和 ASCII 字母数字不会被当作这类分隔符。
+- “缺失 opening wrapper”的修复只会在 wrapper-confidence 足够高时触发：scanner 必须已经识别出白名单工具壳结构（wrapper / invoke / parameter / `name=` 等），且剩余失败看起来只是壳层结构问题。相似但不在白名单内的 near-miss 标签名，或缺少足够 wrapper 证据的 malformed 片段，仍会按普通文本透传。
 - 这是一个针对常见模型失误的窄修复，不改变推荐输出格式；prompt 仍要求模型直接输出完整 DSML 外壳。
 - 裸 `<invoke ...>` / `<parameter ...>` 不会被当成“已支持的工具语法”；只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 才会进入工具调用路径。

@@ -54,15 +57,17 @@

 在流式链路中（Go / Node 一致）：

- DSML `<|DSML|tool_calls>` wrapper、短横线形式（如 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`）、基于固定本地标签名的 DSML 噪声容错形态、尾部管道符形态（如 `<|DSML|tool_calls|`）和 canonical `<tool_calls>` wrapper 都会进入结构化捕获
+- DSML `<|DSML|tool_calls>` wrapper、短横线形式（如 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`）、基于固定本地标签名的 DSML 噪声容错形态、尾部非结构性分隔符形态（如 `<|DSML|tool_calls|` / `<DSMLtool_calls※>`）和 canonical `<tool_calls>` wrapper 都会进入结构化捕获
 - 如果流里直接从 invoke 开始，但后面补上了 closing wrapper，Go 流式筛分也会按缺失 opening wrapper 的修复路径尝试恢复
 - 已识别成功的工具调用不会再次回流到普通文本
 - 不符合新格式的块不会执行，并继续按原样文本透传
- fenced code block（反引号 `` ``` `` 和波浪线 `~~~`）中的 XML 示例始终按普通文本处理
+- 如果一个 confusable / 漂移过的工具壳在 candidate-span canonicalization + repair 后仍能形成有效工具调用，wrapper 后面的 suffix prose 会继续按普通文本输出；如果 canonicalization 后仍不满足 wrapper-confidence 或 XML 语义，整块就作为普通文本释放，不会半吞半漏。
+- fenced code block（反引号 `` ``` `` 和波浪线 `~~~`）以及 Markdown inline code span（例如 `` `<tool_calls>...</tool_calls>` ``）中的 XML 示例始终按普通文本处理
 - 支持嵌套围栏（如 4 反引号嵌套 3 反引号）和 CDATA 内围栏保护
 - 对 `command` / `content` 等长文本参数，CDATA 内部如果包含 Markdown fenced DSML / XML 示例，即使示例里出现 `]]></parameter>` / `</tool_calls>` 这类看起来像外层结束标签的片段，也会继续按参数原文保留，直到真正位于围栏外的外层结束标签
+- CDATA 开头也按扫描式识别，除了标准 `<![CDATA[`，还会接受 `<！[CDATA[`、`<、[CDATA[` 这类分隔符漂移，并统一还原为原文字段内容。
 - 如果模型把 `<![CDATA[` 打开后却没有闭合，流式扫描阶段仍会保守地继续缓冲，不会误把 CDATA 里的示例 XML 当成真实工具调用；在最终 parse / flush 恢复阶段，会对这类 loose CDATA 做窄修复，尽量保住外层已完整包裹的真实工具调用
- 当文本中 mention 了某种标签名（如 `<dsml|tool_calls>` 或 Markdown inline code 里的 `<|DSML|tool_calls>`）而后面紧跟真正工具调用时，sieve 会跳过不可解析的 mention 候选并继续匹配后续真实工具块，不会因 mention 导致工具调用丢失，也不会截断 mention 后的正文
+- 当文本中 mention 了某种标签名（如 `<dsml|tool_calls>` 或 Markdown inline code 里的 `<|DSML|tool_calls>`）而后面紧跟真正工具调用时，sieve 会跳过不可解析的 mention 候选并继续匹配后续真实工具块；行内 code span 中即使出现完整 `<tool_calls>...</tool_calls>` 示例也不会执行，不会因 mention 导致工具调用丢失，也不会截断 mention 后的正文
 - Go 侧 SSE 读取不再使用 `bufio.Scanner` 的固定 token 上限；单个 `data:` 行中包含很长的写文件参数时，非流式收集、流式解析与 auto-continue 透传都应保留完整行，再交给 tool parser 处理

 另外，`<parameter>` 的值如果本身是合法 JSON 字面量，也会按结构化值解析，而不是一律保留为字符串。例如 `123`、`true`、`null`、`[1,2]`、`{"a":1}` 都会还原成对应的 number / boolean / null / array / object。
@@ -78,11 +83,16 @@
 - `rejectedByPolicy`：当前固定为 `false`
 - `rejectedToolNames`：当前固定为空数组

+解析层不会因为参数值为空而丢弃工具调用。若模型输出了显式空字符串或纯空白参数，它们会按空字符串进入结构化 `tool_calls`；是否拒绝缺参或空命令应由后续工具执行侧 / 客户端 schema 校验决定。Prompt 层仍会要求模型不要主动输出空参数。
+
+完整的 DSML / XML wrapper 只有在成功解析出有效 `invoke name`，并且参数节点（如存在）符合 `parameter` 语义后，才会变成结构化工具调用；真正的零参数工具调用仍然有效。如果 wrapper 完整但内部不是可执行工具调用形态（例如使用 `<param>`、缺少有效 `invoke name`、或其他 malformed XML 工具壳），流式 sieve 会把原始 wrapper 作为普通文本释放，不会吞掉内容，也不会生成空的工具调用。
+
 ## 5) 落地建议

 1. Prompt 里只示范 DSML 外壳语法。
 2. 上游客户端应直接输出完整 DSML 外壳；DS2API 兼容旧式 canonical XML，并只对“closing tag 在、opening tag 漏掉”的常见失误做窄修复，不会泛化接受其他旧格式。
-3. 不要依赖 parser 做安全控制；执行器侧仍应做工具名和参数校验。
+3. 模型只有在知道本次调用所需参数值时才应输出工具调用；不要输出 placeholder、空字符串或纯空白参数。对 `Bash` / `execute_command`，实际命令必须在 `command` 参数里。
+4. 不要依赖 parser 做安全控制；执行器侧仍应做工具名和参数校验。

 ## 6) 回归验证

@@ -90,17 +100,19 @@

 ```bash
 go test -v -run 'TestParseToolCalls|TestProcessToolSieve' ./internal/toolcall ./internal/toolstream ./internal/httpapi/openai/...
-node --test tests/node/stream-tool-sieve.test.js
+./tests/scripts/run-unit-node.sh
 ```

 重点覆盖：

 - DSML `<|DSML|tool_calls>` wrapper 正常解析
 - legacy canonical `<tool_calls>` wrapper 正常解析
- 固定本地标签名的 DSML 噪声容错形态（如 `<DSML|tool_calls>`、`<<|DSML|tool_calls>`、`<|DSML tool_calls>`、`<DSMLtool_calls>`、`<<DSML|DSML|tool_calls>`）正常解析
+- 固定本地标签名的 DSML 噪声容错形态（如 `<DSML|tool_calls>`、`<<|DSML|tool_calls>`、`<|DSML tool_calls>`、`<DSMLtool_calls>`、`<DSmartToolCalls>`、`<<DSML|DSML|tool_calls>`、`<DSM|tool_calls>...〈/DSM|tool_calls〉`、`<！DSML！tool_calls>...<！/DSML！tool_calls>`）正常解析
 - 混搭标签（DSML wrapper + canonical inner）归一化后正常解析
 - 波浪线围栏 `~~~` 内的示例不执行
 - 嵌套围栏（4 反引号嵌套 3 反引号）内的示例不执行
+- Markdown 行内 code span 内的完整工具调用示例不执行
 - 文本 mention 标签名后紧跟真正工具调用的场景（含同一 wrapper 变体）
+- 空参数结构化保留，malformed executable-looking XML wrapper 作为文本释放
 - 非兼容内容按普通文本透传
 - 代码块示例不执行
--- a/internal/assistantturn/turn.go
+++ b/internal/assistantturn/turn.go
@@ -218,7 +218,7 @@ func UpstreamEmptyOutputDetail(contentFilter bool, text, thinking string) (int,
 	if strings.TrimSpace(thinking) != "" {
 		return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned reasoning without visible output.", "upstream_empty_output"
 	}
-	return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned empty output.", "upstream_empty_output"
+	return http.StatusServiceUnavailable, "Upstream service is unavailable and returned no output.", "upstream_unavailable"
 }

 // ShouldRetryEmptyOutput returns true when the turn produced no visible text
--- a/internal/assistantturn/turn_test.go
+++ b/internal/assistantturn/turn_test.go
@@ -1,6 +1,7 @@
 package assistantturn

 import (
+	"net/http"
 	"testing"

 	"ds2api/internal/promptcompat"
@@ -70,6 +71,13 @@ func TestBuildTurnFromCollectedThinkingOnlyIsEmptyOutput(t *testing.T) {
 	}
 }

+func TestBuildTurnFromCollectedPureEmptyOutputIsUpstreamUnavailable(t *testing.T) {
+	turn := BuildTurnFromCollected(sse.CollectResult{}, BuildOptions{})
+	if turn.Error == nil || turn.Error.Status != http.StatusServiceUnavailable || turn.Error.Code != "upstream_unavailable" {
+		t.Fatalf("expected upstream unavailable error, got %#v", turn.Error)
+	}
+}
+
 func TestBuildTurnFromCollectedToolChoiceRequired(t *testing.T) {
 	turn := BuildTurnFromCollected(sse.CollectResult{Text: "hello"}, BuildOptions{
 		ToolChoice: promptcompat.ToolChoicePolicy{Mode: promptcompat.ToolChoiceRequired},
--- a/internal/auth/auth_edge_test.go
+++ b/internal/auth/auth_edge_test.go
@@ -241,6 +241,36 @@ func TestSwitchAccountSkipsLoginFailureAndContinues(t *testing.T) {
 	}
 }

+func TestSwitchAccountRespectsPinnedTargetAccount(t *testing.T) {
+	t.Setenv("DS2API_CONFIG_JSON", `{
+		"keys":["managed-key"],
+		"accounts":[
+			{"email":"acc1@test.com","token":"t1"},
+			{"email":"acc2@test.com","token":"t2"}
+		]
+	}`)
+	store := config.LoadStore()
+	pool := account.NewPool(store)
+	r := NewResolver(store, pool, func(_ context.Context, _ config.Account) (string, error) {
+		return "new-token", nil
+	})
+
+	req, _ := http.NewRequest("POST", "/", nil)
+	req.Header.Set("Authorization", "Bearer managed-key")
+	req.Header.Set("X-Ds2-Target-Account", "acc1@test.com")
+	a, err := r.Determine(req)
+	if err != nil {
+		t.Fatalf("determine failed: %v", err)
+	}
+	defer r.Release(a)
+	if r.SwitchAccount(context.Background(), a) {
+		t.Fatal("expected switch to be disabled for pinned target account")
+	}
+	if a.AccountID != "acc1@test.com" {
+		t.Fatalf("expected pinned account to remain selected, got %q", a.AccountID)
+	}
+}
+
 // ─── Release edge cases ─────────────────────────────────────────────

 func TestReleaseNilAuth(t *testing.T) {
--- a/internal/auth/request.go
+++ b/internal/auth/request.go
@@ -28,6 +28,7 @@ type RequestAuth struct {
 	DeepSeekToken  string
 	CallerID       string
 	AccountID      string
+	TargetAccount  string
 	Account        config.Account
 	TriedAccounts  map[string]bool
 	resolver       *Resolver
@@ -99,6 +100,7 @@ func (r *Resolver) acquireManagedRequestAuth(ctx context.Context, callerID, targ
 			UseConfigToken: true,
 			CallerID:       callerID,
 			AccountID:      acc.Identifier(),
+			TargetAccount:  target,
 			Account:        acc,
 			TriedAccounts:  tried,
 			resolver:       r,
@@ -185,6 +187,9 @@ func (r *Resolver) SwitchAccount(ctx context.Context, a *RequestAuth) bool {
 	if !a.UseConfigToken {
 		return false
 	}
+	if strings.TrimSpace(a.TargetAccount) != "" {
+		return false
+	}
 	if a.TriedAccounts == nil {
 		a.TriedAccounts = map[string]bool{}
 	}
@@ -208,6 +213,13 @@ func (r *Resolver) SwitchAccount(ctx context.Context, a *RequestAuth) bool {
 	}
 }

+func (a *RequestAuth) SwitchAccount(ctx context.Context) bool {
+	if a == nil || a.resolver == nil {
+		return false
+	}
+	return a.resolver.SwitchAccount(ctx, a)
+}
+
 func (r *Resolver) Release(a *RequestAuth) {
 	if a == nil || !a.UseConfigToken || a.AccountID == "" {
 		return
--- a/internal/completionruntime/nonstream.go
+++ b/internal/completionruntime/nonstream.go
@@ -90,7 +90,11 @@ func ExecuteNonStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.R
 	if startErr != nil {
 		return NonStreamResult{SessionID: start.SessionID, Payload: start.Payload}, startErr
 	}
-	stdReq = start.Request
+	return ExecuteNonStreamStartedWithRetry(ctx, ds, a, start, opts)
+}
+
+func ExecuteNonStreamStartedWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, start StartResult, opts Options) (NonStreamResult, *assistantturn.OutputError) {
+	stdReq := start.Request
 	maxAttempts := opts.MaxAttempts
 	if maxAttempts <= 0 {
 		maxAttempts = 3
@@ -100,6 +104,7 @@ func ExecuteNonStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.R
 	pow := start.Pow

 	attempts := 0
+	accountSwitchAttempted := false
 	currentResp := start.Response
 	usagePrompt := stdReq.PromptTokenText
 	accumulatedThinking := ""
@@ -108,6 +113,24 @@ func ExecuteNonStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.R
 	for {
 		turn, outErr := collectAttempt(currentResp, stdReq, usagePrompt, opts)
 		if outErr != nil {
+			if canRetryOnAlternateAccount(ctx, a, outErr, opts.RetryEnabled, &accountSwitchAttempted) {
+				switched, switchErr := startStandardCompletionOnAlternateAccount(ctx, ds, a, stdReq, opts, maxAttempts)
+				if switchErr != nil {
+					return NonStreamResult{SessionID: sessionID, Payload: payload, Attempts: attempts}, switchErr
+				}
+				if switched.Response != nil {
+					config.Logger.Info("[completion_runtime_account_switch_retry] retrying after 429", "surface", stdReq.Surface, "stream", false, "account", a.AccountID)
+					sessionID = switched.SessionID
+					payload = switched.Payload
+					pow = switched.Pow
+					currentResp = switched.Response
+					usagePrompt = stdReq.PromptTokenText
+					accumulatedThinking = ""
+					accumulatedRawThinking = ""
+					accumulatedToolDetectionThinking = ""
+					continue
+				}
+			}
 			return NonStreamResult{SessionID: sessionID, Payload: payload, Attempts: attempts}, outErr
 		}
 		accumulatedThinking += sse.TrimContinuationOverlap(accumulatedThinking, turn.Thinking)
@@ -130,6 +153,24 @@ func ExecuteNonStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.R
 			retryMax = shared.EmptyOutputRetryMaxAttempts()
 		}
 		if !opts.RetryEnabled || !assistantturn.ShouldRetryEmptyOutput(turn, attempts, retryMax) {
+			if canRetryOnAlternateAccount(ctx, a, turn.Error, opts.RetryEnabled, &accountSwitchAttempted) {
+				switched, switchErr := startStandardCompletionOnAlternateAccount(ctx, ds, a, stdReq, opts, maxAttempts)
+				if switchErr != nil {
+					return NonStreamResult{SessionID: sessionID, Payload: payload, Turn: turn, Attempts: attempts}, switchErr
+				}
+				if switched.Response != nil {
+					config.Logger.Info("[completion_runtime_account_switch_retry] retrying after 429", "surface", stdReq.Surface, "stream", false, "account", a.AccountID)
+					sessionID = switched.SessionID
+					payload = switched.Payload
+					pow = switched.Pow
+					currentResp = switched.Response
+					usagePrompt = stdReq.PromptTokenText
+					accumulatedThinking = ""
+					accumulatedRawThinking = ""
+					accumulatedToolDetectionThinking = ""
+					continue
+				}
+			}
 			return NonStreamResult{SessionID: sessionID, Payload: payload, Turn: turn, Attempts: attempts}, turn.Error
 		}

@@ -150,6 +191,54 @@ func ExecuteNonStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.R
 	}
 }

+func canRetryOnAlternateAccount(ctx context.Context, a *auth.RequestAuth, outErr *assistantturn.OutputError, retryEnabled bool, attempted *bool) bool {
+	if outErr == nil || outErr.Status != http.StatusTooManyRequests {
+		return false
+	}
+	if !retryEnabled || attempted == nil || *attempted {
+		return false
+	}
+	if a == nil || !a.UseConfigToken {
+		return false
+	}
+	*attempted = true
+	return a.SwitchAccount(ctx)
+}
+
+func startStandardCompletionOnAlternateAccount(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, stdReq promptcompat.StandardRequest, opts Options, maxAttempts int) (StartResult, *assistantturn.OutputError) {
+	var prepErr *assistantturn.OutputError
+	stdReq, prepErr = reuploadCurrentInputFileForAccount(ctx, ds, a, stdReq, opts)
+	if prepErr != nil {
+		return StartResult{Request: stdReq}, prepErr
+	}
+	sessionID, err := ds.CreateSession(ctx, a, maxAttempts)
+	if err != nil {
+		return StartResult{}, authOutputError(a)
+	}
+	pow, err := ds.GetPow(ctx, a, maxAttempts)
+	if err != nil {
+		return StartResult{SessionID: sessionID}, &assistantturn.OutputError{Status: http.StatusUnauthorized, Message: "Failed to get PoW (invalid token or unknown error).", Code: "error"}
+	}
+	payload := stdReq.CompletionPayload(sessionID)
+	resp, err := ds.CallCompletion(ctx, a, payload, pow, maxAttempts)
+	if err != nil {
+		return StartResult{SessionID: sessionID, Payload: payload, Pow: pow}, &assistantturn.OutputError{Status: http.StatusInternalServerError, Message: "Failed to get completion.", Code: "error"}
+	}
+	return StartResult{SessionID: sessionID, Payload: payload, Pow: pow, Response: resp, Request: stdReq}, nil
+}
+
+func reuploadCurrentInputFileForAccount(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, stdReq promptcompat.StandardRequest, opts Options) (promptcompat.StandardRequest, *assistantturn.OutputError) {
+	if opts.CurrentInputFile == nil || !stdReq.CurrentInputFileApplied {
+		return stdReq, nil
+	}
+	out, err := (history.Service{Store: opts.CurrentInputFile, DS: ds}).ReuploadAppliedCurrentInputFile(ctx, a, stdReq)
+	if err != nil {
+		status, message := history.MapError(err)
+		return out, &assistantturn.OutputError{Status: status, Message: message, Code: "error"}
+	}
+	return out, nil
+}
+
 func collectAttempt(resp *http.Response, stdReq promptcompat.StandardRequest, usagePrompt string, opts Options) (assistantturn.Turn, *assistantturn.OutputError) {
 	defer func() {
 		if err := resp.Body.Close(); err != nil {
--- a/internal/completionruntime/nonstream_test.go
+++ b/internal/completionruntime/nonstream_test.go
@@ -7,7 +7,9 @@ import (
 	"strings"
 	"testing"

+	"ds2api/internal/account"
 	"ds2api/internal/auth"
+	"ds2api/internal/config"
 	dsclient "ds2api/internal/deepseek/client"
 	"ds2api/internal/promptcompat"
 )
@@ -16,6 +18,8 @@ type fakeDeepSeekCaller struct {
 	responses          []*http.Response
 	payloads           []map[string]any
 	uploads            []dsclient.UploadFileRequest
+	completionAccounts []string
+	sessionByAccount   bool
 }

 type currentInputRuntimeConfig struct{}
@@ -23,7 +27,10 @@ type currentInputRuntimeConfig struct{}
 func (currentInputRuntimeConfig) CurrentInputFileEnabled() bool { return true }
 func (currentInputRuntimeConfig) CurrentInputFileMinChars() int { return 0 }

-func (f *fakeDeepSeekCaller) CreateSession(context.Context, *auth.RequestAuth, int) (string, error) {
+func (f *fakeDeepSeekCaller) CreateSession(_ context.Context, a *auth.RequestAuth, _ int) (string, error) {
+	if f.sessionByAccount && a != nil && a.AccountID != "" {
+		return "session-" + a.AccountID, nil
+	}
 	return "session-1", nil
 }

@@ -31,13 +38,19 @@ func (f *fakeDeepSeekCaller) GetPow(context.Context, *auth.RequestAuth, int) (st
 	return "pow", nil
 }

-func (f *fakeDeepSeekCaller) UploadFile(_ context.Context, _ *auth.RequestAuth, req dsclient.UploadFileRequest, _ int) (*dsclient.UploadFileResult, error) {
+func (f *fakeDeepSeekCaller) UploadFile(_ context.Context, a *auth.RequestAuth, req dsclient.UploadFileRequest, _ int) (*dsclient.UploadFileResult, error) {
 	f.uploads = append(f.uploads, req)
+	if a != nil && a.AccountID != "" {
+		return &dsclient.UploadFileResult{ID: "file-runtime-" + a.AccountID}, nil
+	}
 	return &dsclient.UploadFileResult{ID: "file-runtime-1"}, nil
 }

-func (f *fakeDeepSeekCaller) CallCompletion(_ context.Context, _ *auth.RequestAuth, payload map[string]any, _ string, _ int) (*http.Response, error) {
+func (f *fakeDeepSeekCaller) CallCompletion(_ context.Context, a *auth.RequestAuth, payload map[string]any, _ string, _ int) (*http.Response, error) {
 	f.payloads = append(f.payloads, payload)
+	if a != nil {
+		f.completionAccounts = append(f.completionAccounts, a.AccountID)
+	}
 	if len(f.responses) == 0 {
 		return sseHTTPResponse(http.StatusOK, `data: {"p":"response/content","v":"fallback"}`), nil
 	}
@@ -89,9 +102,132 @@ func TestExecuteNonStreamWithRetryBuildsCanonicalTurn(t *testing.T) {
 	}
 }

+func TestExecuteNonStreamWithRetrySwitchesManagedAccountBeforeFinal429(t *testing.T) {
+	t.Setenv("DS2API_CONFIG_JSON", `{
+		"keys":["managed-key"],
+		"accounts":[
+			{"email":"acc1@test.com","password":"pwd"},
+			{"email":"acc2@test.com","password":"pwd"}
+		]
+	}`)
+	store := config.LoadStore()
+	resolver := auth.NewResolver(store, account.NewPool(store), func(_ context.Context, acc config.Account) (string, error) {
+		return "token-" + acc.Identifier(), nil
+	})
+	req, _ := http.NewRequest(http.MethodPost, "/", nil)
+	req.Header.Set("Authorization", "Bearer managed-key")
+	a, err := resolver.Determine(req)
+	if err != nil {
+		t.Fatalf("determine failed: %v", err)
+	}
+	defer resolver.Release(a)
+
+	ds := &fakeDeepSeekCaller{
+		sessionByAccount: true,
+		responses: []*http.Response{
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":11,"p":"response/thinking_content","v":"first empty"}`),
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":12,"p":"response/thinking_content","v":"retry empty"}`),
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":21,"p":"response/content","v":"ok from second account"}`),
+		},
+	}
+	stdReq := promptcompat.StandardRequest{
+		Surface:         "test",
+		ResponseModel:   "deepseek-v4-flash",
+		PromptTokenText: "prompt",
+		FinalPrompt:     "final prompt",
+		Thinking:        true,
+	}
+
+	result, outErr := ExecuteNonStreamWithRetry(context.Background(), ds, a, stdReq, Options{RetryEnabled: true})
+	if outErr != nil {
+		t.Fatalf("unexpected output error after account switch retry: %#v", outErr)
+	}
+	if result.Turn.Text != "ok from second account" {
+		t.Fatalf("text mismatch after switch retry: %q", result.Turn.Text)
+	}
+	if result.SessionID != "session-acc2@test.com" {
+		t.Fatalf("expected switched account session, got %q", result.SessionID)
+	}
+	wantAccounts := []string{"acc1@test.com", "acc1@test.com", "acc2@test.com"}
+	if len(ds.completionAccounts) != len(wantAccounts) {
+		t.Fatalf("completion account count mismatch: got %v want %v", ds.completionAccounts, wantAccounts)
+	}
+	for i, want := range wantAccounts {
+		if ds.completionAccounts[i] != want {
+			t.Fatalf("completion account %d = %q want %q (all=%v)", i, ds.completionAccounts[i], want, ds.completionAccounts)
+		}
+	}
+	if got := ds.payloads[2]["chat_session_id"]; got != "session-acc2@test.com" {
+		t.Fatalf("switched payload session mismatch: %#v", got)
+	}
+	if prompt, _ := ds.payloads[2]["prompt"].(string); strings.Contains(prompt, "Previous reply had no visible output") {
+		t.Fatalf("expected fresh switched-account prompt without empty-output suffix, got %q", prompt)
+	}
+}
+
+func TestExecuteNonStreamWithRetryReuploadsCurrentInputFileAfterAccountSwitch(t *testing.T) {
+	t.Setenv("DS2API_CONFIG_JSON", `{
+		"keys":["managed-key"],
+		"accounts":[
+			{"email":"acc1@test.com","password":"pwd"},
+			{"email":"acc2@test.com","password":"pwd"}
+		]
+	}`)
+	store := config.LoadStore()
+	resolver := auth.NewResolver(store, account.NewPool(store), func(_ context.Context, acc config.Account) (string, error) {
+		return "token-" + acc.Identifier(), nil
+	})
+	req, _ := http.NewRequest(http.MethodPost, "/", nil)
+	req.Header.Set("Authorization", "Bearer managed-key")
+	a, err := resolver.Determine(req)
+	if err != nil {
+		t.Fatalf("determine failed: %v", err)
+	}
+	defer resolver.Release(a)
+
+	ds := &fakeDeepSeekCaller{
+		sessionByAccount: true,
+		responses: []*http.Response{
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":11,"p":"response/thinking_content","v":"first empty"}`),
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":12,"p":"response/thinking_content","v":"retry empty"}`),
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":21,"p":"response/content","v":"ok from second account"}`),
+		},
+	}
+	stdReq := promptcompat.StandardRequest{
+		Surface:        "test",
+		RequestedModel: "deepseek-v4-flash",
+		ResolvedModel:  "deepseek-v4-flash",
+		ResponseModel:  "deepseek-v4-flash",
+		Messages: []any{
+			map[string]any{"role": "user", "content": "large current input"},
+		},
+		PromptTokenText: "large current input",
+		FinalPrompt:     "large current input",
+		Thinking:        true,
+	}
+
+	result, outErr := ExecuteNonStreamWithRetry(context.Background(), ds, a, stdReq, Options{
+		RetryEnabled:     true,
+		CurrentInputFile: currentInputRuntimeConfig{},
+	})
+	if outErr != nil {
+		t.Fatalf("unexpected output error after account switch retry: %#v", outErr)
+	}
+	if result.Turn.Text != "ok from second account" {
+		t.Fatalf("text mismatch after switch retry: %q", result.Turn.Text)
+	}
+	if len(ds.uploads) != 2 {
+		t.Fatalf("expected current input file uploaded once per account, got %d", len(ds.uploads))
+	}
+	refIDs, _ := ds.payloads[2]["ref_file_ids"].([]any)
+	if len(refIDs) != 1 || refIDs[0] != "file-runtime-acc2@test.com" {
+		t.Fatalf("expected switched account ref_file_ids to use reuploaded file, got %#v", ds.payloads[2]["ref_file_ids"])
+	}
+}
+
 func TestExecuteNonStreamWithRetryUsesParentMessageForEmptyRetry(t *testing.T) {
 	ds := &fakeDeepSeekCaller{responses: []*http.Response{
-		sseHTTPResponse(http.StatusOK, `data: {"response_message_id":77,"p":"response/status","v":"FINISHED"}`),
+		sseHTTPResponse(http.StatusOK, `data: {"response_message_id":77,"p":"response/thinking_content","v":"plan"}`),
 		sseHTTPResponse(http.StatusOK, `data: {"response_message_id":78,"p":"response/content","v":"ok"}`),
 	}}
 	stdReq := promptcompat.StandardRequest{
--- a/internal/completionruntime/stream_retry.go
+++ b/internal/completionruntime/stream_retry.go
@@ -0,0 +1,190 @@
+package completionruntime
+
+import (
+	"context"
+	"io"
+	"net/http"
+	"strings"
+
+	"ds2api/internal/assistantturn"
+	"ds2api/internal/auth"
+	"ds2api/internal/config"
+	"ds2api/internal/httpapi/openai/history"
+	"ds2api/internal/httpapi/openai/shared"
+	"ds2api/internal/promptcompat"
+)
+
+type StreamRetryOptions struct {
+	Surface          string
+	Stream           bool
+	RetryEnabled     bool
+	RetryMaxAttempts int
+	MaxAttempts      int
+	UsagePrompt      string
+	Request          promptcompat.StandardRequest
+	CurrentInputFile history.CurrentInputConfigReader
+}
+
+type StreamRetryHooks struct {
+	ConsumeAttempt  func(resp *http.Response, allowDeferEmpty bool) (terminalWritten bool, retryable bool)
+	Finalize        func(attempts int)
+	ParentMessageID func() int
+	OnRetry         func(attempts int)
+	OnRetryPrompt   func(prompt string)
+	OnRetryFailure  func(status int, message, code string)
+	OnAccountSwitch func(sessionID string)
+	OnTerminal      func(attempts int)
+}
+
+func ExecuteStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, initialResp *http.Response, payload map[string]any, pow string, opts StreamRetryOptions, hooks StreamRetryHooks) {
+	if hooks.ConsumeAttempt == nil {
+		return
+	}
+	surface := strings.TrimSpace(opts.Surface)
+	if surface == "" {
+		surface = "completion"
+	}
+	maxAttempts := opts.MaxAttempts
+	if maxAttempts <= 0 {
+		maxAttempts = 3
+	}
+	retryMax := opts.RetryMaxAttempts
+	if retryMax <= 0 {
+		retryMax = shared.EmptyOutputRetryMaxAttempts()
+	}
+
+	attempts := 0
+	accountSwitchAttempted := false
+	currentResp := initialResp
+	currentPayload := clonePayload(payload)
+	for {
+		allowAccountSwitch := opts.RetryEnabled && attempts >= retryMax && !accountSwitchAttempted && a != nil && a.UseConfigToken
+		terminalWritten, retryable := hooks.ConsumeAttempt(currentResp, opts.RetryEnabled && (attempts < retryMax || allowAccountSwitch))
+		if terminalWritten {
+			if hooks.OnTerminal != nil {
+				hooks.OnTerminal(attempts)
+			}
+			return
+		}
+		if !retryable || !opts.RetryEnabled {
+			if hooks.Finalize != nil {
+				hooks.Finalize(attempts)
+			}
+			return
+		}
+
+		if attempts >= retryMax {
+			if canRetryOnAlternateAccount(ctx, a, &assistantturn.OutputError{Status: http.StatusTooManyRequests}, opts.RetryEnabled, &accountSwitchAttempted) {
+				switched, switchErr := startPayloadCompletionOnAlternateAccount(ctx, ds, a, payload, opts, maxAttempts)
+				if switchErr != nil {
+					if hooks.OnRetryFailure != nil {
+						hooks.OnRetryFailure(switchErr.Status, switchErr.Message, switchErr.Code)
+					}
+					return
+				}
+				if switched.Response != nil {
+					config.Logger.Info("[completion_runtime_account_switch_retry] retrying after 429", "surface", surface, "stream", opts.Stream, "account", a.AccountID)
+					currentResp = switched.Response
+					currentPayload = switched.Payload
+					pow = switched.Pow
+					if hooks.OnAccountSwitch != nil {
+						hooks.OnAccountSwitch(switched.SessionID)
+					}
+					if hooks.OnRetryPrompt != nil {
+						hooks.OnRetryPrompt(opts.UsagePrompt)
+					}
+					continue
+				}
+			}
+			if hooks.Finalize != nil {
+				hooks.Finalize(attempts)
+			}
+			return
+		}
+
+		attempts++
+		parentMessageID := 0
+		if hooks.ParentMessageID != nil {
+			parentMessageID = hooks.ParentMessageID()
+		}
+		config.Logger.Info("[completion_runtime_empty_retry] attempting synthetic retry", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "parent_message_id", parentMessageID)
+		retryPow, powErr := ds.GetPow(ctx, a, maxAttempts)
+		if powErr != nil {
+			config.Logger.Warn("[completion_runtime_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "error", powErr)
+			retryPow = pow
+		}
+		nextResp, err := ds.CallCompletion(ctx, a, shared.ClonePayloadForEmptyOutputRetry(currentPayload, parentMessageID), retryPow, maxAttempts)
+		if err != nil {
+			if hooks.OnRetryFailure != nil {
+				hooks.OnRetryFailure(http.StatusInternalServerError, "Failed to get completion.", "error")
+			}
+			config.Logger.Warn("[completion_runtime_empty_retry] retry request failed", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "error", err)
+			return
+		}
+		if nextResp.StatusCode != http.StatusOK {
+			body, readErr := io.ReadAll(nextResp.Body)
+			if readErr != nil {
+				config.Logger.Warn("[completion_runtime_empty_retry] retry error body read failed", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "error", readErr)
+			}
+			closeRetryBody(surface, nextResp.Body)
+			msg := strings.TrimSpace(string(body))
+			if msg == "" {
+				msg = http.StatusText(nextResp.StatusCode)
+			}
+			if hooks.OnRetryFailure != nil {
+				hooks.OnRetryFailure(nextResp.StatusCode, msg, "error")
+			}
+			return
+		}
+		if hooks.OnRetry != nil {
+			hooks.OnRetry(attempts)
+		}
+		if hooks.OnRetryPrompt != nil {
+			hooks.OnRetryPrompt(shared.UsagePromptWithEmptyOutputRetry(opts.UsagePrompt, attempts))
+		}
+		currentResp = nextResp
+	}
+}
+
+func startPayloadCompletionOnAlternateAccount(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, payload map[string]any, opts StreamRetryOptions, maxAttempts int) (StartResult, *assistantturn.OutputError) {
+	sessionID, err := ds.CreateSession(ctx, a, maxAttempts)
+	if err != nil {
+		return StartResult{}, authOutputError(a)
+	}
+	pow, err := ds.GetPow(ctx, a, maxAttempts)
+	if err != nil {
+		return StartResult{SessionID: sessionID}, &assistantturn.OutputError{Status: http.StatusUnauthorized, Message: "Failed to get PoW (invalid token or unknown error).", Code: "error"}
+	}
+	nextPayload := clonePayload(payload)
+	if opts.CurrentInputFile != nil && opts.Request.CurrentInputFileApplied {
+		stdReq, prepErr := reuploadCurrentInputFileForAccount(ctx, ds, a, opts.Request, Options{CurrentInputFile: opts.CurrentInputFile})
+		if prepErr != nil {
+			return StartResult{SessionID: sessionID}, prepErr
+		}
+		nextPayload = stdReq.CompletionPayload(sessionID)
+	}
+	nextPayload["chat_session_id"] = sessionID
+	delete(nextPayload, "parent_message_id")
+	resp, err := ds.CallCompletion(ctx, a, nextPayload, pow, maxAttempts)
+	if err != nil {
+		return StartResult{SessionID: sessionID, Payload: nextPayload, Pow: pow}, &assistantturn.OutputError{Status: http.StatusInternalServerError, Message: "Failed to get completion.", Code: "error"}
+	}
+	return StartResult{SessionID: sessionID, Payload: nextPayload, Pow: pow, Response: resp}, nil
+}
+
+func clonePayload(payload map[string]any) map[string]any {
+	clone := make(map[string]any, len(payload))
+	for k, v := range payload {
+		clone[k] = v
+	}
+	return clone
+}
+
+func closeRetryBody(surface string, body io.Closer) {
+	if body == nil {
+		return
+	}
+	if err := body.Close(); err != nil {
+		config.Logger.Warn("[completion_runtime_empty_retry] retry response body close failed", "surface", surface, "error", err)
+	}
+}
--- a/internal/completionruntime/stream_retry_test.go
+++ b/internal/completionruntime/stream_retry_test.go
@@ -0,0 +1,150 @@
+package completionruntime
+
+import (
+	"context"
+	"io"
+	"net/http"
+	"strings"
+	"testing"
+
+	"ds2api/internal/account"
+	"ds2api/internal/auth"
+	"ds2api/internal/config"
+	"ds2api/internal/httpapi/openai/shared"
+)
+
+func TestExecuteStreamWithRetryUsesSharedRetryPayloadAndUsagePrompt(t *testing.T) {
+	ds := &fakeDeepSeekCaller{responses: []*http.Response{
+		sseHTTPResponse(http.StatusOK, `data: {"p":"response/content","v":"ok"}`),
+	}}
+	initial := sseHTTPResponse(http.StatusOK, `data: {"response_message_id":77,"p":"response/thinking_content","v":"plan"}`)
+	payload := map[string]any{"prompt": "original prompt"}
+	attemptsSeen := 0
+	retryPrompt := ""
+
+	ExecuteStreamWithRetry(context.Background(), ds, &auth.RequestAuth{}, initial, payload, "pow", StreamRetryOptions{
+		Surface:      "test.stream",
+		Stream:       true,
+		RetryEnabled: true,
+		UsagePrompt:  "original prompt",
+	}, StreamRetryHooks{
+		ConsumeAttempt: func(resp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			defer func() {
+				if err := resp.Body.Close(); err != nil {
+					t.Fatalf("close failed: %v", err)
+				}
+			}()
+			_, _ = io.ReadAll(resp.Body)
+			attemptsSeen++
+			return attemptsSeen == 2, attemptsSeen == 1 && allowDeferEmpty
+		},
+		ParentMessageID: func() int {
+			return 77
+		},
+		OnRetryPrompt: func(prompt string) {
+			retryPrompt = prompt
+		},
+	})
+
+	if attemptsSeen != 2 {
+		t.Fatalf("expected two stream attempts, got %d", attemptsSeen)
+	}
+	if len(ds.payloads) != 1 {
+		t.Fatalf("expected one retry completion call, got %d", len(ds.payloads))
+	}
+	if got := ds.payloads[0]["parent_message_id"]; got != 77 {
+		t.Fatalf("retry parent_message_id mismatch: %#v", got)
+	}
+	if prompt, _ := ds.payloads[0]["prompt"].(string); !strings.Contains(prompt, shared.EmptyOutputRetrySuffix) {
+		t.Fatalf("expected retry suffix in payload prompt, got %q", prompt)
+	}
+	if !strings.Contains(retryPrompt, shared.EmptyOutputRetrySuffix) {
+		t.Fatalf("expected retry suffix in usage prompt, got %q", retryPrompt)
+	}
+}
+
+func TestExecuteStreamWithRetrySwitchesManagedAccountBeforeFinal429(t *testing.T) {
+	t.Setenv("DS2API_CONFIG_JSON", `{
+		"keys":["managed-key"],
+		"accounts":[
+			{"email":"acc1@test.com","password":"pwd"},
+			{"email":"acc2@test.com","password":"pwd"}
+		]
+	}`)
+	store := config.LoadStore()
+	resolver := auth.NewResolver(store, account.NewPool(store), func(_ context.Context, acc config.Account) (string, error) {
+		return "token-" + acc.Identifier(), nil
+	})
+	req, _ := http.NewRequest(http.MethodPost, "/", nil)
+	req.Header.Set("Authorization", "Bearer managed-key")
+	a, err := resolver.Determine(req)
+	if err != nil {
+		t.Fatalf("determine failed: %v", err)
+	}
+	defer resolver.Release(a)
+
+	ds := &fakeDeepSeekCaller{
+		sessionByAccount: true,
+		responses: []*http.Response{
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":12,"p":"response/thinking_content","v":"retry empty"}`),
+			sseHTTPResponse(http.StatusOK, `data: {"response_message_id":21,"p":"response/content","v":"ok from second account"}`),
+		},
+	}
+	initial := sseHTTPResponse(http.StatusOK, `data: {"response_message_id":11,"p":"response/thinking_content","v":"first empty"}`)
+	payload := map[string]any{"prompt": "original prompt", "chat_session_id": "session-acc1@test.com"}
+	attemptsSeen := 0
+	switchedSession := ""
+
+	ExecuteStreamWithRetry(context.Background(), ds, a, initial, payload, "pow", StreamRetryOptions{
+		Surface:          "test.stream",
+		Stream:           true,
+		RetryEnabled:     true,
+		RetryMaxAttempts: 1,
+		UsagePrompt:      "original prompt",
+	}, StreamRetryHooks{
+		ConsumeAttempt: func(resp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			defer func() {
+				if err := resp.Body.Close(); err != nil {
+					t.Fatalf("close failed: %v", err)
+				}
+			}()
+			body, _ := io.ReadAll(resp.Body)
+			attemptsSeen++
+			if strings.Contains(string(body), "ok from second account") {
+				return true, false
+			}
+			if !allowDeferEmpty {
+				t.Fatalf("expected empty attempt %d to be deferred before final 429", attemptsSeen)
+			}
+			return false, true
+		},
+		ParentMessageID: func() int {
+			return 11 + attemptsSeen
+		},
+		OnAccountSwitch: func(sessionID string) {
+			switchedSession = sessionID
+		},
+	})
+
+	if attemptsSeen != 3 {
+		t.Fatalf("expected three stream attempts, got %d", attemptsSeen)
+	}
+	if switchedSession != "session-acc2@test.com" {
+		t.Fatalf("expected switched session id, got %q", switchedSession)
+	}
+	wantAccounts := []string{"acc1@test.com", "acc2@test.com"}
+	if len(ds.completionAccounts) != len(wantAccounts) {
+		t.Fatalf("completion accounts mismatch: got %v want %v", ds.completionAccounts, wantAccounts)
+	}
+	for i, want := range wantAccounts {
+		if ds.completionAccounts[i] != want {
+			t.Fatalf("completion account %d = %q want %q (all=%v)", i, ds.completionAccounts[i], want, ds.completionAccounts)
+		}
+	}
+	if got := ds.payloads[1]["chat_session_id"]; got != "session-acc2@test.com" {
+		t.Fatalf("switched payload session mismatch: %#v", got)
+	}
+	if prompt, _ := ds.payloads[1]["prompt"].(string); strings.Contains(prompt, shared.EmptyOutputRetrySuffix) {
+		t.Fatalf("expected switched-account prompt without empty-output suffix, got %q", prompt)
+	}
+}
--- a/internal/config/models.go
+++ b/internal/config/models.go
@@ -1,6 +1,9 @@
 package config

-import "strings"
+import (
+	"strings"
+	"time"
+)

 type ModelInfo struct {
 	ID         string `json:"id"`
@@ -9,6 +12,16 @@ type ModelInfo struct {
 	OwnedBy    string `json:"owned_by"`
 	Permission []any  `json:"permission,omitempty"`
 }
+type OllamaModelInfo struct {
+	Name       string `json:"name"`
+	Model      string `json:"model"`
+	Size       int64  `json:"size"`
+	ModifiedAt string `json:"modified_at"`
+}
+type OllamaCapabilitiesModelInfo struct {
+	ID           string   `json:"id"`
+	Capabilities []string `json:"capabilities"`
+}

 type ModelAliasReader interface {
 	ModelAliases() map[string]string
@@ -24,8 +37,21 @@ var deepSeekBaseModels = []ModelInfo{
 	{ID: "deepseek-v4-vision", Object: "model", Created: 1677610602, OwnedBy: "deepseek", Permission: []any{}},
 }

-var DeepSeekModels = appendNoThinkingVariants(deepSeekBaseModels)
+var OllamaCapabilitiesModels = []OllamaCapabilitiesModelInfo{
+	{ID: "deepseek-v4-flash", Capabilities: []string{"tools", "thinking"}},
+	{ID: "deepseek-v4-pro", Capabilities: []string{"tools", "thinking"}},
+	{ID: "deepseek-v4-flash-search", Capabilities: []string{"tools", "thinking"}},
+	{ID: "deepseek-v4-pro-search", Capabilities: []string{"tools", "thinking"}},
+	{ID: "deepseek-v4-vision", Capabilities: []string{"tools", "thinking", "vision"}},
+	{ID: "deepseek-v4-flash-nothinking", Capabilities: []string{"tools"}},
+	{ID: "deepseek-v4-pro-nothinking", Capabilities: []string{"tools"}},
+	{ID: "deepseek-v4-flash-search-nothinking", Capabilities: []string{"tools"}},
+	{ID: "deepseek-v4-pro-search-nothinking", Capabilities: []string{"tools"}},
+	{ID: "deepseek-v4-vision-nothinking", Capabilities: []string{"tools", "vision"}},
+}

+var DeepSeekModels = appendNoThinkingVariants(deepSeekBaseModels)
+var OllamaModels = mapToOllamaModels(DeepSeekModels)
 var claudeBaseModels = []ModelInfo{
 	// Current aliases
 	{ID: "claude-opus-4-6", Object: "model", Created: 1715635200, OwnedBy: "anthropic"},
@@ -247,6 +273,23 @@ func OpenAIModelByID(store ModelAliasReader, id string) (ModelInfo, bool) {
 	return ModelInfo{}, false
 }

+func OllamaModelsResponse() map[string]any {
+	return map[string]any{"models": OllamaModels}
+}
+
+func OllamaModelByID(store ModelAliasReader, id string) (OllamaCapabilitiesModelInfo, bool) {
+	canonical, ok := ResolveModel(store, id)
+	if !ok {
+		return OllamaCapabilitiesModelInfo{}, false
+	}
+	for _, model := range OllamaCapabilitiesModels {
+		if model.ID == canonical {
+			return model, true
+		}
+	}
+	return OllamaCapabilitiesModelInfo{}, false
+}
+
 func ClaudeModelsResponse() map[string]any {
 	resp := map[string]any{"object": "list", "data": ClaudeModels}
 	if len(ClaudeModels) > 0 {
@@ -270,6 +313,23 @@ func appendNoThinkingVariants(models []ModelInfo) []ModelInfo {
 	}
 	return out
 }
+func mapToOllamaModels(models []ModelInfo) []OllamaModelInfo {
+	out := make([]OllamaModelInfo, 0, len(models))
+	for _, model := range models {
+		var modifiedAt string
+		if model.Created > 0 {
+			modifiedAt = time.Unix(model.Created, 0).Format(time.RFC3339)
+		}
+		ollamaModel := OllamaModelInfo{
+			Name:       model.ID,
+			Model:      model.ID,
+			Size:       0,
+			ModifiedAt: modifiedAt,
+		}
+		out = append(out, ollamaModel)
+	}
+	return out
+}

 func splitNoThinkingModel(model string) (string, bool) {
 	model = lower(strings.TrimSpace(model))
--- a/internal/config/paths.go
+++ b/internal/config/paths.go
@@ -58,6 +58,11 @@ func RawStreamSampleRoot() string {
 }

 func ChatHistoryPath() string {
+	// On Vercel, /var/task is read-only at runtime. If no explicit path is set,
+	// default to /tmp/chat_history.json (the only writable directory).
+	if IsVercel() && strings.TrimSpace(os.Getenv("DS2API_CHAT_HISTORY_PATH")) == "" {
+		return "/tmp/chat_history.json"
+	}
 	return ResolvePath("DS2API_CHAT_HISTORY_PATH", "data/chat_history.json")
 }

--- a/internal/deepseek/client/client_completion.go
+++ b/internal/deepseek/client/client_completion.go
@@ -5,9 +5,7 @@ import (
 	"context"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	"encoding/json"
-	"errors"
 	"net/http"
-	"time"

 	"ds2api/internal/auth"
 	"ds2api/internal/config"
@@ -15,39 +13,33 @@ import (
 )

 func (c *Client) CallCompletion(ctx context.Context, a *auth.RequestAuth, payload map[string]any, powResp string, maxAttempts int) (*http.Response, error) {
-	if maxAttempts <= 0 {
-		maxAttempts = c.maxRetries
-	}
+	_ = maxAttempts
 	clients := c.requestClientsForAuth(ctx, a)
 	headers := c.authHeaders(a.DeepSeekToken)
 	headers["x-ds-pow-response"] = powResp
 	captureSession := c.capture.Start("deepseek_completion", dsprotocol.DeepSeekCompletionURL, a.AccountID, payload)
-	attempts := 0
-	for attempts < maxAttempts {
-		resp, err := c.streamPost(ctx, clients.stream, dsprotocol.DeepSeekCompletionURL, headers, payload)
+	resp, err := c.streamPostOnce(ctx, clients.stream, dsprotocol.DeepSeekCompletionURL, headers, payload)
 	if err != nil {
-			attempts++
-			time.Sleep(time.Second)
-			continue
+		return nil, err
+	}
+	if captureSession != nil {
+		resp.Body = captureSession.WrapBody(resp.Body, resp.StatusCode)
 	}
 	if resp.StatusCode == http.StatusOK {
-			if captureSession != nil {
-				resp.Body = captureSession.WrapBody(resp.Body, resp.StatusCode)
-			}
 		resp = c.wrapCompletionWithAutoContinue(ctx, a, payload, powResp, resp)
+	}
 	return resp, nil
-		}
-		if captureSession != nil {
-			resp.Body = captureSession.WrapBody(resp.Body, resp.StatusCode)
-		}
-		_ = resp.Body.Close()
-		attempts++
-		time.Sleep(time.Second)
-	}
-	return nil, errors.New("completion failed")
 }

 func (c *Client) streamPost(ctx context.Context, doer trans.Doer, url string, headers map[string]string, payload any) (*http.Response, error) {
+	return c.streamPostWithFallback(ctx, doer, url, headers, payload, true)
+}
+
+func (c *Client) streamPostOnce(ctx context.Context, doer trans.Doer, url string, headers map[string]string, payload any) (*http.Response, error) {
+	return c.streamPostWithFallback(ctx, doer, url, headers, payload, false)
+}
+
+func (c *Client) streamPostWithFallback(ctx context.Context, doer trans.Doer, url string, headers map[string]string, payload any, allowFallback bool) (*http.Response, error) {
 	b, err := json.Marshal(payload)
 	if err != nil {
 		return nil, err
@@ -63,6 +55,7 @@ func (c *Client) streamPost(ctx context.Context, doer trans.Doer, url string, he
 	}
 	resp, err := doer.Do(req)
 	if err != nil {
+		if allowFallback {
 			config.Logger.Warn("[deepseek] fingerprint stream request failed, fallback to std transport", "url", url, "error", err)
 			req2, reqErr := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(b))
 			if reqErr != nil {
@@ -73,5 +66,7 @@ func (c *Client) streamPost(ctx context.Context, doer trans.Doer, url string, he
 			}
 			return clients.fallbackS.Do(req2)
 		}
+		return nil, err
+	}
 	return resp, nil
 }
--- a/internal/deepseek/client/client_completion_test.go
+++ b/internal/deepseek/client/client_completion_test.go
@@ -0,0 +1,36 @@
+package client
+
+import (
+	"context"
+	"errors"
+	"net/http"
+	"testing"
+
+	"ds2api/internal/auth"
+)
+
+func TestCallCompletionDoesNotFallbackForNonIdempotentCompletion(t *testing.T) {
+	var fallbackCalled bool
+	client := &Client{
+		stream: doerFunc(func(*http.Request) (*http.Response, error) {
+			return nil, errors.New("ambiguous completion write failure")
+		}),
+		fallbackS: &http.Client{Transport: roundTripperFunc(func(*http.Request) (*http.Response, error) {
+			fallbackCalled = true
+			return &http.Response{StatusCode: http.StatusOK}, nil
+		})},
+	}
+	_, err := client.CallCompletion(
+		context.Background(),
+		&auth.RequestAuth{DeepSeekToken: "token"},
+		map[string]any{"prompt": "hello"},
+		"pow",
+		3,
+	)
+	if err == nil {
+		t.Fatal("expected completion error")
+	}
+	if fallbackCalled {
+		t.Fatal("completion fallback should not be called for a non-idempotent request")
+	}
+}
--- a/internal/deepseek/client/client_upload.go
+++ b/internal/deepseek/client/client_upload.go
@@ -95,11 +95,7 @@ func (c *Client) UploadFile(ctx context.Context, a *auth.RequestAuth, req Upload
 		resp, err := c.doUpload(ctx, clients.regular, clients.fallback, dsprotocol.DeepSeekUploadFileURL, headers, body)
 		if err != nil {
 			config.Logger.Warn("[upload_file] request error", "error", err, "account", a.AccountID, "filename", filename)
-			powHeader = ""
-			lastFailureKind = FailureUnknown
-			lastFailureMessage = err.Error()
-			attempts++
-			continue
+			return nil, err
 		}
 		if captureSession != nil {
 			resp.Body = captureSession.WrapBody(resp.Body, resp.StatusCode)
@@ -201,7 +197,7 @@ func escapeMultipartFilename(filename string) string {
 	return filename
 }

-func (c *Client) doUpload(ctx context.Context, doer trans.Doer, fallback trans.Doer, url string, headers map[string]string, body []byte) (*http.Response, error) {
+func (c *Client) doUpload(ctx context.Context, doer trans.Doer, _ trans.Doer, url string, headers map[string]string, body []byte) (*http.Response, error) {
 	req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(body))
 	if err != nil {
 		return nil, err
@@ -213,15 +209,7 @@ func (c *Client) doUpload(ctx context.Context, doer trans.Doer, fallback trans.D
 	if err == nil {
 		return resp, nil
 	}
-	config.Logger.Warn("[deepseek] fingerprint upload request failed, fallback to std transport", "url", url, "error", err)
-	req2, reqErr := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(body))
-	if reqErr != nil {
-		return nil, reqErr
-	}
-	for k, v := range headers {
-		req2.Header.Set(k, v)
-	}
-	return fallback.Do(req2)
+	return nil, err
 }

 func extractUploadFileResult(resp map[string]any) *UploadFileResult {
--- a/internal/deepseek/client/client_upload_test.go
+++ b/internal/deepseek/client/client_upload_test.go
@@ -6,6 +6,7 @@ import (
 	"encoding/base64"
 	"encoding/hex"
 	"encoding/json"
+	"errors"
 	"io"
 	"net/http"
 	"strings"
@@ -39,6 +40,31 @@ func TestBuildUploadMultipartBodyOmitsPurposeAndIncludesFilePart(t *testing.T) {
 	}
 }

+func TestDoUploadDoesNotFallbackForNonIdempotentUpload(t *testing.T) {
+	var fallbackCalled bool
+	client := &Client{}
+	_, err := client.doUpload(
+		context.Background(),
+		doerFunc(func(req *http.Request) (*http.Response, error) {
+			_, _ = io.ReadAll(req.Body)
+			return nil, errors.New("ambiguous upload write failure")
+		}),
+		doerFunc(func(*http.Request) (*http.Response, error) {
+			fallbackCalled = true
+			return &http.Response{StatusCode: http.StatusOK, Header: make(http.Header), Body: io.NopCloser(strings.NewReader("{}"))}, nil
+		}),
+		dsprotocol.DeepSeekUploadFileURL,
+		map[string]string{"Content-Type": "multipart/form-data"},
+		[]byte("body"),
+	)
+	if err == nil {
+		t.Fatal("expected upload error")
+	}
+	if fallbackCalled {
+		t.Fatal("upload fallback should not be called for a non-idempotent request")
+	}
+}
+
 func TestExtractUploadFileResultSupportsNestedShapes(t *testing.T) {
 	got := extractUploadFileResult(map[string]any{
 		"data": map[string]any{
--- a/internal/format/openai/render_responses.go
+++ b/internal/format/openai/render_responses.go
@@ -21,6 +21,18 @@ func BuildResponseObjectWithToolCalls(responseID, model, finalPrompt, finalThink
 	output := make([]any, 0, 2)
 	if len(detected) > 0 {
 		exposedOutputText = ""
+		if strings.TrimSpace(finalThinking) != "" {
+			output = append(output, map[string]any{
+				"type":   "message",
+				"id":     "msg_" + strings.ReplaceAll(uuid.NewString(), "-", ""),
+				"role":   "assistant",
+				"status": "completed",
+				"content": []any{map[string]any{
+					"type": "reasoning",
+					"text": finalThinking,
+				}},
+			})
+		}
 		output = append(output, toResponsesFunctionCallItems(detected, toolsRaw)...)
 	} else {
 		content := make([]any, 0, 2)
--- a/internal/format/openai/render_test.go
+++ b/internal/format/openai/render_test.go
@@ -85,12 +85,24 @@ func TestBuildResponseObjectPromotesToolCallFromThinkingWhenTextEmpty(t *testing
 	)

 	output, _ := obj["output"].([]any)
-	if len(output) != 1 {
-		t.Fatalf("expected one output item, got %#v", obj["output"])
+	if len(output) != 2 {
+		t.Fatalf("expected reasoning message plus function_call output, got %#v", obj["output"])
 	}
 	first, _ := output[0].(map[string]any)
-	if first["type"] != "function_call" {
-		t.Fatalf("expected function_call output, got %#v", first["type"])
+	if first["type"] != "message" {
+		t.Fatalf("expected reasoning message output first, got %#v", first["type"])
+	}
+	content, _ := first["content"].([]any)
+	if len(content) != 1 {
+		t.Fatalf("expected reasoning content, got %#v", first["content"])
+	}
+	block0, _ := content[0].(map[string]any)
+	if block0["type"] != "reasoning" {
+		t.Fatalf("expected reasoning block, got %#v", block0["type"])
+	}
+	second, _ := output[1].(map[string]any)
+	if second["type"] != "function_call" {
+		t.Fatalf("expected function_call output, got %#v", second["type"])
 	}
 }

--- a/internal/httpapi/claude/current_input_file_test.go
+++ b/internal/httpapi/claude/current_input_file_test.go
@@ -93,7 +93,11 @@ func (d *claudeCurrentInputDS) GetPow(context.Context, *auth.RequestAuth, int) (

 func (d *claudeCurrentInputDS) UploadFile(_ context.Context, _ *auth.RequestAuth, req dsclient.UploadFileRequest, _ int) (*dsclient.UploadFileResult, error) {
 	d.uploads = append(d.uploads, req)
-	return &dsclient.UploadFileResult{ID: "file-claude-history"}, nil
+	id := "file-claude-history"
+	if len(d.uploads) > 1 {
+		id = "file-claude-tools"
+	}
+	return &dsclient.UploadFileResult{ID: id}, nil
 }

 func (d *claudeCurrentInputDS) CallCompletion(_ context.Context, _ *auth.RequestAuth, payload map[string]any, _ string, _ int) (*http.Response, error) {
@@ -156,3 +160,47 @@ func TestClaudeDirectAppliesCurrentInputFile(t *testing.T) {
 		t.Fatalf("expected persisted message to match upstream continuation prompt, got %#v", full.Messages)
 	}
 }
+
+func TestClaudeCurrentInputFileUploadsToolsSeparately(t *testing.T) {
+	ds := &claudeCurrentInputDS{}
+	h := &Handler{
+		Store: mockClaudeConfig{aliases: map[string]string{"claude-sonnet-4-6": "deepseek-v4-flash"}},
+		Auth:  claudeCurrentInputAuth{},
+		DS:    ds,
+	}
+	reqBody := `{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"hello from claude"}],"tools":[{"name":"search","description":"Search docs","input_schema":{"type":"object"}}],"max_tokens":1024}`
+	req := httptest.NewRequest(http.MethodPost, "/v1/messages", strings.NewReader(reqBody))
+	req.Header.Set("Content-Type", "application/json")
+	rec := httptest.NewRecorder()
+
+	h.Messages(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	if len(ds.uploads) != 2 {
+		t.Fatalf("expected history and tools uploads, got %d", len(ds.uploads))
+	}
+	if ds.uploads[0].Filename != "DS2API_HISTORY.txt" || ds.uploads[1].Filename != "DS2API_TOOLS.txt" {
+		t.Fatalf("unexpected upload filenames: %#v", ds.uploads)
+	}
+	historyText := string(ds.uploads[0].Data)
+	if strings.Contains(historyText, "You have access to these tools") || strings.Contains(historyText, "Description: Search docs") {
+		t.Fatalf("history transcript should not embed tool descriptions, got %q", historyText)
+	}
+	toolsText := string(ds.uploads[1].Data)
+	if !strings.Contains(toolsText, "# DS2API_TOOLS.txt") || !strings.Contains(toolsText, "Tool: search") || !strings.Contains(toolsText, "Description: Search docs") {
+		t.Fatalf("expected tools transcript to include tool schema, got %q", toolsText)
+	}
+	refIDs, _ := ds.payload["ref_file_ids"].([]any)
+	if len(refIDs) < 2 || refIDs[0] != "file-claude-history" || refIDs[1] != "file-claude-tools" {
+		t.Fatalf("expected history and tools ref ids first, got %#v", ds.payload["ref_file_ids"])
+	}
+	prompt, _ := ds.payload["prompt"].(string)
+	if !strings.Contains(prompt, "DS2API_TOOLS.txt") || !strings.Contains(prompt, "TOOL CALL FORMAT") {
+		t.Fatalf("expected live prompt to reference tools file and retain format instructions, got %q", prompt)
+	}
+	if strings.Contains(prompt, "Description: Search docs") {
+		t.Fatalf("live prompt should not inline tool descriptions, got %q", prompt)
+	}
+}
--- a/internal/httpapi/claude/handler_messages.go
+++ b/internal/httpapi/claude/handler_messages.go
@@ -145,7 +145,7 @@ func (h *Handler) handleClaudeDirectStream(w http.ResponseWriter, r *http.Reques
 		return
 	}
 	streamReq := start.Request
-	h.handleClaudeStreamRealtime(w, r, start.Response, streamReq.ResponseModel, streamReq.Messages, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
+	h.handleClaudeStreamRealtimeWithRetry(w, r, a, start.Response, start.Payload, start.Pow, streamReq, streamReq.ResponseModel, streamReq.Messages, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.PromptTokenText, historySession)
 }

 func (h *Handler) proxyViaOpenAI(w http.ResponseWriter, r *http.Request, store ConfigReader) bool {
@@ -360,3 +360,114 @@ func (h *Handler) handleClaudeStreamRealtime(w http.ResponseWriter, r *http.Requ
 		OnFinalize: streamRuntime.onFinalize,
 	})
 }
+
+func (h *Handler) handleClaudeStreamRealtimeWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow string, stdReq promptcompat.StandardRequest, model string, messages []any, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, promptTokenText string, historySession *responsehistory.Session) {
+	if resp.StatusCode != http.StatusOK {
+		defer func() { _ = resp.Body.Close() }()
+		body, _ := io.ReadAll(resp.Body)
+		if historySession != nil {
+			historySession.Error(resp.StatusCode, strings.TrimSpace(string(body)), "error", "", "")
+		}
+		writeClaudeError(w, http.StatusInternalServerError, string(body))
+		return
+	}
+
+	w.Header().Set("Content-Type", "text/event-stream")
+	w.Header().Set("Cache-Control", "no-cache, no-transform")
+	w.Header().Set("Connection", "keep-alive")
+	w.Header().Set("X-Accel-Buffering", "no")
+	rc := http.NewResponseController(w)
+	_, canFlush := w.(http.Flusher)
+	if !canFlush {
+		config.Logger.Warn("[claude_stream] response writer does not support flush; streaming may be buffered")
+	}
+
+	streamRuntime := newClaudeStreamRuntime(
+		w,
+		rc,
+		canFlush,
+		model,
+		messages,
+		thinkingEnabled,
+		searchEnabled,
+		stripReferenceMarkersEnabled(),
+		toolNames,
+		toolsRaw,
+		promptTokenText,
+		historySession,
+	)
+	streamRuntime.sendMessageStart()
+
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:          "claude.messages",
+		Stream:           true,
+		RetryEnabled:     true,
+		MaxAttempts:      3,
+		UsagePrompt:      promptTokenText,
+		Request:          stdReq,
+		CurrentInputFile: h.Store,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeClaudeStreamAttempt(r, currentResp, streamRuntime, thinkingEnabled, allowDeferEmpty)
+		},
+		Finalize: func(_ int) {
+			streamRuntime.finalize("end_turn", false)
+		},
+		ParentMessageID: func() int {
+			return streamRuntime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			streamRuntime.promptTokenText = prompt
+		},
+		OnRetryFailure: func(status int, message, code string) {
+			streamRuntime.sendErrorWithCode(status, strings.TrimSpace(message), code)
+		},
+	})
+}
+
+func (h *Handler) consumeClaudeStreamAttempt(r *http.Request, resp *http.Response, streamRuntime *claudeStreamRuntime, thinkingEnabled bool, allowDeferEmpty bool) (bool, bool) {
+	defer func() { _ = resp.Body.Close() }()
+	initialType := "text"
+	if thinkingEnabled {
+		initialType = "thinking"
+	}
+	finalReason := streamengine.StopReason("")
+	var scannerErr error
+	streamengine.ConsumeSSE(streamengine.ConsumeConfig{
+		Context:             r.Context(),
+		Body:                resp.Body,
+		ThinkingEnabled:     thinkingEnabled,
+		InitialType:         initialType,
+		KeepAliveInterval:   claudeStreamPingInterval,
+		IdleTimeout:         claudeStreamIdleTimeout,
+		MaxKeepAliveNoInput: claudeStreamMaxKeepaliveCnt,
+	}, streamengine.ConsumeHooks{
+		OnKeepAlive: func() {
+			streamRuntime.sendPing()
+		},
+		OnParsed: streamRuntime.onParsed,
+		OnFinalize: func(reason streamengine.StopReason, err error) {
+			finalReason = reason
+			scannerErr = err
+		},
+	})
+	if string(finalReason) == "upstream_error" {
+		if streamRuntime.history != nil {
+			streamRuntime.history.Error(500, streamRuntime.upstreamErr, "upstream_error", responsehistory.ThinkingForArchive(streamRuntime.rawThinking.String(), streamRuntime.toolDetectionThinking.String(), streamRuntime.thinking.String()), responsehistory.TextForArchive(streamRuntime.rawText.String(), streamRuntime.text.String()))
+		}
+		streamRuntime.sendError(streamRuntime.upstreamErr)
+		return true, false
+	}
+	if scannerErr != nil {
+		if streamRuntime.history != nil {
+			streamRuntime.history.Error(500, scannerErr.Error(), "error", responsehistory.ThinkingForArchive(streamRuntime.rawThinking.String(), streamRuntime.toolDetectionThinking.String(), streamRuntime.thinking.String()), responsehistory.TextForArchive(streamRuntime.rawText.String(), streamRuntime.text.String()))
+		}
+		streamRuntime.sendError(scannerErr.Error())
+		return true, false
+	}
+	terminalWritten := streamRuntime.finalize("end_turn", allowDeferEmpty)
+	if terminalWritten {
+		return true, false
+	}
+	return false, true
+}
--- a/internal/httpapi/claude/handler_util_test.go
+++ b/internal/httpapi/claude/handler_util_test.go
@@ -101,6 +101,43 @@ func TestNormalizeClaudeMessagesToolUseToAssistantToolCalls(t *testing.T) {
 	}
 }

+func TestNormalizeClaudeMessagesPreservesThinkingOnToolUseHistory(t *testing.T) {
+	msgs := []any{
+		map[string]any{
+			"role": "assistant",
+			"content": []any{
+				map[string]any{"type": "thinking", "thinking": "need live search before answering"},
+				map[string]any{
+					"type":  "tool_use",
+					"id":    "call_1",
+					"name":  "search_web",
+					"input": map[string]any{"query": "latest"},
+				},
+			},
+		},
+	}
+
+	got := normalizeClaudeMessages(msgs)
+	if len(got) != 1 {
+		t.Fatalf("expected one normalized tool-call message, got %#v", got)
+	}
+	m := got[0].(map[string]any)
+	if m["reasoning_content"] != "need live search before answering" {
+		t.Fatalf("expected thinking preserved as reasoning_content, got %#v", m)
+	}
+	tc, _ := m["tool_calls"].([]any)
+	if len(tc) != 1 {
+		t.Fatalf("expected one tool call, got %#v", m["tool_calls"])
+	}
+	prompt := buildClaudePromptTokenText(got, true)
+	if !containsStr(prompt, "[reasoning_content]\nneed live search before answering\n[/reasoning_content]") {
+		t.Fatalf("expected thinking in prompt history, got %q", prompt)
+	}
+	if !containsStr(prompt, `<|DSML|invoke name="search_web">`) {
+		t.Fatalf("expected tool call in prompt history, got %q", prompt)
+	}
+}
+
 func TestNormalizeClaudeMessagesDoesNotPromoteUserToolUse(t *testing.T) {
 	msgs := []any{
 		map[string]any{
--- a/internal/httpapi/claude/handler_utils.go
+++ b/internal/httpapi/claude/handler_utils.go
@@ -25,14 +25,21 @@ func normalizeClaudeMessages(messages []any) []any {
 		switch content := msg["content"].(type) {
 		case []any:
 			textParts := make([]string, 0, len(content))
+			pendingThinking := ""
 			flushText := func() {
 				if len(textParts) == 0 {
 					return
 				}
-				out = append(out, map[string]any{
+				message := map[string]any{
 					"role":    role,
 					"content": strings.Join(textParts, "\n"),
-				})
+				}
+				if role == "assistant" && strings.TrimSpace(pendingThinking) != "" {
+					message["reasoning_content"] = pendingThinking
+					message["content"] = prependClaudeReasoningForPrompt(pendingThinking, safeStringValue(message["content"]))
+					pendingThinking = ""
+				}
+				out = append(out, message)
 				textParts = textParts[:0]
 			}
 			for _, block := range content {
@@ -46,10 +53,29 @@ func normalizeClaudeMessages(messages []any) []any {
 					if t, ok := b["text"].(string); ok {
 						textParts = append(textParts, t)
 					}
+				case "thinking":
+					if role == "assistant" {
+						if thinking := extractClaudeThinkingBlockText(b); thinking != "" {
+							if pendingThinking == "" {
+								pendingThinking = thinking
+							} else {
+								pendingThinking += "\n" + thinking
+							}
+						}
+						continue
+					}
+					if raw := strings.TrimSpace(formatClaudeUnknownBlockForPrompt(b)); raw != "" {
+						textParts = append(textParts, raw)
+					}
 				case "tool_use":
 					if role == "assistant" {
 						flushText()
 						if toolMsg := normalizeClaudeToolUseToAssistant(b, state); toolMsg != nil {
+							if strings.TrimSpace(pendingThinking) != "" {
+								toolMsg["reasoning_content"] = pendingThinking
+								toolMsg["content"] = prependClaudeReasoningForPrompt(pendingThinking, safeStringValue(toolMsg["content"]))
+								pendingThinking = ""
+							}
 							out = append(out, toolMsg)
 						}
 						continue
@@ -69,6 +95,13 @@ func normalizeClaudeMessages(messages []any) []any {
 				}
 			}
 			flushText()
+			if role == "assistant" && strings.TrimSpace(pendingThinking) != "" {
+				out = append(out, map[string]any{
+					"role":              "assistant",
+					"reasoning_content": pendingThinking,
+					"content":           formatClaudeReasoningForPrompt(pendingThinking),
+				})
+			}
 		default:
 			copied := cloneMap(msg)
 			out = append(out, copied)
@@ -77,6 +110,39 @@ func normalizeClaudeMessages(messages []any) []any {
 	return out
 }

+func prependClaudeReasoningForPrompt(reasoning, content string) string {
+	reasoning = strings.TrimSpace(reasoning)
+	content = strings.TrimSpace(content)
+	if reasoning == "" {
+		return content
+	}
+	block := formatClaudeReasoningForPrompt(reasoning)
+	if content == "" {
+		return block
+	}
+	return block + "\n\n" + content
+}
+
+func formatClaudeReasoningForPrompt(reasoning string) string {
+	reasoning = strings.TrimSpace(reasoning)
+	if reasoning == "" {
+		return ""
+	}
+	return "[reasoning_content]\n" + reasoning + "\n[/reasoning_content]"
+}
+
+func extractClaudeThinkingBlockText(block map[string]any) string {
+	if block == nil {
+		return ""
+	}
+	for _, key := range []string{"thinking", "text", "content"} {
+		if text := strings.TrimSpace(safeStringValue(block[key])); text != "" {
+			return text
+		}
+	}
+	return ""
+}
+
 func buildClaudeToolPrompt(tools []any) string {
 	toolSchemas := make([]string, 0, len(tools))
 	names := make([]string, 0, len(tools))
--- a/internal/httpapi/claude/standard_request.go
+++ b/internal/httpapi/claude/standard_request.go
@@ -52,7 +52,7 @@ func normalizeClaudeRequest(store ConfigReader, req map[string]any) (claudeNorma
 			RequestedModel:  strings.TrimSpace(model),
 			ResolvedModel:   dsModel,
 			ResponseModel:   strings.TrimSpace(model),
-			Messages:        payload["messages"].([]any),
+			Messages:        normalizedMessages,
 			PromptTokenText: finalPrompt,
 			ToolsRaw:        toolsRequested,
 			FinalPrompt:     finalPrompt,
--- a/internal/httpapi/claude/stream_runtime_core.go
+++ b/internal/httpapi/claude/stream_runtime_core.go
@@ -32,6 +32,7 @@ type claudeStreamRuntime struct {
 	messageID         string
 	thinking          strings.Builder
 	text              strings.Builder
+	responseMessageID int

 	sieve                 toolstream.State
 	rawText               strings.Builder
@@ -92,6 +93,9 @@ func (s *claudeStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 		s.upstreamErr = parsed.ErrorMessage
 		return streamengine.ParsedDecision{Stop: true, StopReason: streamengine.StopReason("upstream_error")}
 	}
+	if parsed.ResponseMessageID > 0 {
+		s.responseMessageID = parsed.ResponseMessageID
+	}
 	if parsed.Stop {
 		return streamengine.ParsedDecision{Stop: true}
 	}
--- a/internal/httpapi/claude/stream_runtime_emit.go
+++ b/internal/httpapi/claude/stream_runtime_emit.go
@@ -22,16 +22,27 @@ func (s *claudeStreamRuntime) send(event string, v any) {
 }

 func (s *claudeStreamRuntime) sendError(message string) {
+	s.sendErrorWithCode(500, message, "internal_error")
+}
+
+func (s *claudeStreamRuntime) sendErrorWithCode(status int, message, code string) {
 	msg := strings.TrimSpace(message)
 	if msg == "" {
 		msg = "upstream stream error"
 	}
+	if code == "" {
+		code = "internal_error"
+	}
+	errType := "api_error"
+	if status == 429 {
+		errType = "rate_limit_error"
+	}
 	s.send("error", map[string]any{
 		"type": "error",
 		"error": map[string]any{
-			"type":    "api_error",
+			"type":    errType,
 			"message": msg,
-			"code":    "internal_error",
+			"code":    code,
 			"param":   nil,
 		},
 	})
--- a/internal/httpapi/claude/stream_runtime_finalize.go
+++ b/internal/httpapi/claude/stream_runtime_finalize.go
@@ -63,13 +63,10 @@ func (s *claudeStreamRuntime) sendToolUseBlock(idx int, tc toolcall.ParsedToolCa
 	})
 }

-func (s *claudeStreamRuntime) finalize(stopReason string) {
+func (s *claudeStreamRuntime) finalize(stopReason string, deferEmptyOutput bool) bool {
 	if s.ended {
-		return
+		return true
 	}
-	s.ended = true
-
-	s.closeThinkingBlock()

 	if s.bufferToolContent {
 		for _, evt := range toolstream.Flush(&s.sieve, s.toolNames) {
@@ -123,6 +120,7 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {
 		RawThinking:           s.rawThinking.String(),
 		VisibleThinking:       s.thinking.String(),
 		DetectionThinking:     s.toolDetectionThinking.String(),
+		ResponseMessageID:     s.responseMessageID,
 		AlreadyEmittedCalls:   s.toolCallsDetected,
 		AlreadyEmittedToolRaw: s.toolCallsDetected,
 	}, assistantturn.BuildOptions{
@@ -137,6 +135,22 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {
 	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{
 		AlreadyEmittedToolCalls: s.toolCallsDetected,
 	})
+	if outcome.ShouldFail {
+		if deferEmptyOutput {
+			return false
+		}
+		s.ended = true
+		s.closeThinkingBlock()
+		s.closeTextBlock()
+		if s.history != nil {
+			s.history.Error(outcome.Error.Status, outcome.Error.Message, outcome.Error.Code, responsehistory.ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking), responsehistory.TextForArchive(turn.RawText, turn.Text))
+		}
+		s.sendErrorWithCode(outcome.Error.Status, outcome.Error.Message, outcome.Error.Code)
+		return true
+	}
+
+	s.ended = true
+	s.closeThinkingBlock()

 	if s.bufferToolContent && !s.toolCallsDetected {
 		if len(turn.ToolCalls) > 0 {
@@ -197,6 +211,7 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {
 		},
 	})
 	s.send("message_stop", map[string]any{"type": "message_stop"})
+	return true
 }

 func (s *claudeStreamRuntime) onFinalize(reason streamengine.StopReason, scannerErr error) {
@@ -214,5 +229,5 @@ func (s *claudeStreamRuntime) onFinalize(reason streamengine.StopReason, scanner
 		s.sendError(scannerErr.Error())
 		return
 	}
-	s.finalize("end_turn")
+	s.finalize("end_turn", false)
 }
--- a/internal/httpapi/gemini/convert_messages.go
+++ b/internal/httpapi/gemini/convert_messages.go
@@ -44,14 +44,20 @@ func geminiMessagesFromRequest(req map[string]any) []any {
 		}

 		textParts := make([]string, 0, len(parts))
+		pendingThinking := ""
 		flushText := func() {
 			if len(textParts) == 0 {
 				return
 			}
-			out = append(out, map[string]any{
+			msg := map[string]any{
 				"role":    role,
 				"content": strings.Join(textParts, "\n"),
-			})
+			}
+			if role == "assistant" && strings.TrimSpace(pendingThinking) != "" {
+				msg["reasoning_content"] = pendingThinking
+				pendingThinking = ""
+			}
+			out = append(out, msg)
 			textParts = textParts[:0]
 		}

@@ -61,6 +67,14 @@ func geminiMessagesFromRequest(req map[string]any) []any {
 				continue
 			}
 			if text := strings.TrimSpace(asString(part["text"])); text != "" {
+				if role == "assistant" && isGeminiThoughtPart(part) {
+					if pendingThinking == "" {
+						pendingThinking = text
+					} else {
+						pendingThinking += "\n" + text
+					}
+					continue
+				}
 				textParts = append(textParts, text)
 				continue
 			}
@@ -75,7 +89,7 @@ func geminiMessagesFromRequest(req map[string]any) []any {
 						}
 					}
 					lastToolCallIDByName[strings.ToLower(name)] = callID
-					out = append(out, map[string]any{
+					msg := map[string]any{
 						"role": "assistant",
 						"tool_calls": []any{
 							map[string]any{
@@ -87,7 +101,12 @@ func geminiMessagesFromRequest(req map[string]any) []any {
 								},
 							},
 						},
-					})
+					}
+					if strings.TrimSpace(pendingThinking) != "" {
+						msg["reasoning_content"] = pendingThinking
+						pendingThinking = ""
+					}
+					out = append(out, msg)
 				}
 				continue
 			}
@@ -132,10 +151,29 @@ func geminiMessagesFromRequest(req map[string]any) []any {
 			}
 		}
 		flushText()
+		if role == "assistant" && strings.TrimSpace(pendingThinking) != "" {
+			out = append(out, map[string]any{
+				"role":              "assistant",
+				"reasoning_content": pendingThinking,
+			})
+		}
 	}
 	return out
 }

+func isGeminiThoughtPart(part map[string]any) bool {
+	if part == nil {
+		return false
+	}
+	if v, ok := part["thought"].(bool); ok {
+		return v
+	}
+	if v, ok := part["thoughtSignature"].(string); ok && strings.TrimSpace(v) != "" {
+		return true
+	}
+	return false
+}
+
 func normalizeGeminiSystemInstruction(raw any) string {
 	switch v := raw.(type) {
 	case string:
--- a/internal/httpapi/gemini/convert_messages_test.go
+++ b/internal/httpapi/gemini/convert_messages_test.go
@@ -1,6 +1,7 @@
 package gemini

 import (
+	"ds2api/internal/promptcompat"
 	"strings"
 	"testing"
 )
@@ -53,6 +54,46 @@ func TestGeminiMessagesFromRequestPreservesFunctionRoundtrip(t *testing.T) {
 	}
 }

+func TestGeminiMessagesFromRequestPreservesThoughtOnFunctionCallHistory(t *testing.T) {
+	req := map[string]any{
+		"contents": []any{
+			map[string]any{
+				"role": "model",
+				"parts": []any{
+					map[string]any{"text": "need current state before answering", "thought": true},
+					map[string]any{
+						"functionCall": map[string]any{
+							"id":   "call_g1",
+							"name": "search_web",
+							"args": map[string]any{"query": "ai"},
+						},
+					},
+				},
+			},
+		},
+	}
+
+	got := geminiMessagesFromRequest(req)
+	if len(got) != 1 {
+		t.Fatalf("expected one normalized message, got %#v", got)
+	}
+	assistant, _ := got[0].(map[string]any)
+	if assistant["reasoning_content"] != "need current state before answering" {
+		t.Fatalf("expected thought preserved as reasoning_content, got %#v", assistant)
+	}
+	tc, _ := assistant["tool_calls"].([]any)
+	if len(tc) != 1 {
+		t.Fatalf("expected one tool call, got %#v", assistant["tool_calls"])
+	}
+	prompt, _ := promptcompat.BuildOpenAIPromptForAdapter(got, nil, "", true)
+	if !strings.Contains(prompt, "[reasoning_content]\nneed current state before answering\n[/reasoning_content]") {
+		t.Fatalf("expected thought in prompt history, got %q", prompt)
+	}
+	if !strings.Contains(prompt, `<|DSML|invoke name="search_web">`) {
+		t.Fatalf("expected tool call in prompt history, got %q", prompt)
+	}
+}
+
 func TestGeminiMessagesFromRequestPreservesUnknownPartAsRawJSONText(t *testing.T) {
 	req := map[string]any{
 		"contents": []any{
--- a/internal/httpapi/gemini/handler_generate.go
+++ b/internal/httpapi/gemini/handler_generate.go
@@ -137,7 +137,7 @@ func (h *Handler) handleGeminiDirectStream(w http.ResponseWriter, r *http.Reques
 		return
 	}
 	streamReq := start.Request
-	h.handleStreamGenerateContent(w, r, start.Response, streamReq.ResponseModel, streamReq.PromptTokenText, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
+	h.handleStreamGenerateContentWithRetry(w, r, a, start.Response, start.Payload, start.Pow, streamReq, streamReq.ResponseModel, streamReq.PromptTokenText, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
 }

 func (h *Handler) proxyViaOpenAI(w http.ResponseWriter, r *http.Request, stream bool) bool {
--- a/internal/httpapi/gemini/handler_stream_runtime.go
+++ b/internal/httpapi/gemini/handler_stream_runtime.go
@@ -1,6 +1,7 @@
 package gemini

 import (
+	"context"
 	"encoding/json"
 	"io"
 	"net/http"
@@ -8,7 +9,10 @@ import (
 	"time"

 	"ds2api/internal/assistantturn"
+	"ds2api/internal/auth"
+	"ds2api/internal/completionruntime"
 	dsprotocol "ds2api/internal/deepseek/protocol"
+	"ds2api/internal/promptcompat"
 	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
 	streamengine "ds2api/internal/stream"
@@ -54,7 +58,7 @@ func (h *Handler) handleStreamGenerateContent(w http.ResponseWriter, r *http.Req
 	}, streamengine.ConsumeHooks{
 		OnParsed: runtime.onParsed,
 		OnFinalize: func(_ streamengine.StopReason, _ error) {
-			runtime.finalize()
+			runtime.finalize(false)
 		},
 	})
 }
@@ -78,9 +82,85 @@ type geminiStreamRuntime struct {
 	accumulator       *assistantturn.Accumulator
 	contentFilter     bool
 	responseMessageID int
+	finalErrorStatus  int
+	finalErrorMessage string
+	finalErrorCode    string
 	history           *responsehistory.Session
 }

+func (h *Handler) handleStreamGenerateContentWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow string, stdReq promptcompat.StandardRequest, model, finalPrompt string, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, historySession *responsehistory.Session) {
+	if resp.StatusCode != http.StatusOK {
+		defer func() { _ = resp.Body.Close() }()
+		body, _ := io.ReadAll(resp.Body)
+		if historySession != nil {
+			historySession.Error(resp.StatusCode, strings.TrimSpace(string(body)), "error", "", "")
+		}
+		writeGeminiError(w, resp.StatusCode, strings.TrimSpace(string(body)))
+		return
+	}
+
+	w.Header().Set("Content-Type", "text/event-stream")
+	w.Header().Set("Cache-Control", "no-cache, no-transform")
+	w.Header().Set("Connection", "keep-alive")
+	w.Header().Set("X-Accel-Buffering", "no")
+
+	rc := http.NewResponseController(w)
+	_, canFlush := w.(http.Flusher)
+	runtime := newGeminiStreamRuntime(w, rc, canFlush, model, finalPrompt, thinkingEnabled, searchEnabled, stripReferenceMarkersEnabled(), toolNames, toolsRaw, historySession)
+
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:          "gemini.generate_content",
+		Stream:           true,
+		RetryEnabled:     true,
+		MaxAttempts:      3,
+		UsagePrompt:      finalPrompt,
+		Request:          stdReq,
+		CurrentInputFile: h.Store,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeGeminiStreamAttempt(r.Context(), currentResp, runtime, thinkingEnabled, allowDeferEmpty)
+		},
+		Finalize: func(_ int) {
+			runtime.finalize(false)
+		},
+		ParentMessageID: func() int {
+			return runtime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			runtime.finalPrompt = prompt
+		},
+		OnRetryFailure: func(status int, message, _ string) {
+			runtime.sendErrorChunk(status, strings.TrimSpace(message))
+		},
+	})
+}
+
+func (h *Handler) consumeGeminiStreamAttempt(ctx context.Context, resp *http.Response, runtime *geminiStreamRuntime, thinkingEnabled bool, allowDeferEmpty bool) (bool, bool) {
+	defer func() { _ = resp.Body.Close() }()
+	initialType := "text"
+	if thinkingEnabled {
+		initialType = "thinking"
+	}
+	streamengine.ConsumeSSE(streamengine.ConsumeConfig{
+		Context:             ctx,
+		Body:                resp.Body,
+		ThinkingEnabled:     thinkingEnabled,
+		InitialType:         initialType,
+		KeepAliveInterval:   time.Duration(dsprotocol.KeepAliveTimeout) * time.Second,
+		IdleTimeout:         time.Duration(dsprotocol.StreamIdleTimeout) * time.Second,
+		MaxKeepAliveNoInput: dsprotocol.MaxKeepaliveCount,
+	}, streamengine.ConsumeHooks{
+		OnParsed: runtime.onParsed,
+		OnFinalize: func(_ streamengine.StopReason, _ error) {
+		},
+	})
+	terminalWritten := runtime.finalize(allowDeferEmpty)
+	if terminalWritten {
+		return true, false
+	}
+	return false, true
+}
+
 //nolint:unused // retained for native Gemini stream handling path.
 func newGeminiStreamRuntime(
 	w http.ResponseWriter,
@@ -127,6 +207,35 @@ func (s *geminiStreamRuntime) sendChunk(payload map[string]any) {
 	}
 }

+func (s *geminiStreamRuntime) sendErrorChunk(status int, message string) {
+	msg := strings.TrimSpace(message)
+	if msg == "" {
+		msg = http.StatusText(status)
+	}
+	errorStatus := "INVALID_ARGUMENT"
+	switch status {
+	case http.StatusUnauthorized:
+		errorStatus = "UNAUTHENTICATED"
+	case http.StatusForbidden:
+		errorStatus = "PERMISSION_DENIED"
+	case http.StatusTooManyRequests:
+		errorStatus = "RESOURCE_EXHAUSTED"
+	case http.StatusNotFound:
+		errorStatus = "NOT_FOUND"
+	default:
+		if status >= 500 {
+			errorStatus = "INTERNAL"
+		}
+	}
+	s.sendChunk(map[string]any{
+		"error": map[string]any{
+			"code":    status,
+			"message": msg,
+			"status":  errorStatus,
+		},
+	})
+}
+
 //nolint:unused // retained for native Gemini stream handling path.
 func (s *geminiStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedDecision {
 	if !parsed.Parsed {
@@ -192,7 +301,7 @@ func (s *geminiStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 }

 //nolint:unused // retained for native Gemini stream handling path.
-func (s *geminiStreamRuntime) finalize() {
+func (s *geminiStreamRuntime) finalize(deferEmptyOutput bool) bool {
 	rawText, text, rawThinking, thinking, detectionThinking := s.accumulator.Snapshot()
 	turn := assistantturn.BuildTurnFromStreamSnapshot(assistantturn.StreamSnapshot{
 		RawText:           rawText,
@@ -211,6 +320,19 @@ func (s *geminiStreamRuntime) finalize() {
 		ToolsRaw:              s.toolsRaw,
 	})
 	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{})
+	if outcome.ShouldFail {
+		if deferEmptyOutput {
+			s.finalErrorStatus = outcome.Error.Status
+			s.finalErrorMessage = outcome.Error.Message
+			s.finalErrorCode = outcome.Error.Code
+			return false
+		}
+		if s.history != nil {
+			s.history.Error(outcome.Error.Status, outcome.Error.Message, outcome.Error.Code, responsehistory.ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking), responsehistory.TextForArchive(turn.RawText, turn.Text))
+		}
+		s.sendErrorChunk(outcome.Error.Status, outcome.Error.Message)
+		return true
+	}
 	if s.history != nil {
 		s.history.Success(
 			http.StatusOK,
@@ -257,4 +379,5 @@ func (s *geminiStreamRuntime) finalize() {
 			"totalTokenCount":      outcome.Usage.TotalTokens,
 		},
 	})
+	return true
 }
--- a/internal/httpapi/gemini/handler_test.go
+++ b/internal/httpapi/gemini/handler_test.go
@@ -67,7 +67,11 @@ func (m *testGeminiDS) GetPow(_ context.Context, _ *auth.RequestAuth, _ int) (st
 //nolint:unused // reserved test double for native Gemini DS-call path coverage.
 func (m *testGeminiDS) UploadFile(_ context.Context, _ *auth.RequestAuth, req dsclient.UploadFileRequest, _ int) (*dsclient.UploadFileResult, error) {
 	m.uploadCalls = append(m.uploadCalls, req)
-	return &dsclient.UploadFileResult{ID: "file-gemini-history"}, nil
+	id := "file-gemini-history"
+	if len(m.uploadCalls) > 1 {
+		id = "file-gemini-tools"
+	}
+	return &dsclient.UploadFileResult{ID: id}, nil
 }

 //nolint:unused // reserved test double for native Gemini DS-call path coverage.
@@ -201,6 +205,57 @@ func TestGeminiDirectAppliesCurrentInputFile(t *testing.T) {
 	}
 }

+func TestGeminiCurrentInputFileUploadsToolsSeparately(t *testing.T) {
+	ds := &testGeminiDS{
+		resp: makeGeminiUpstreamResponse(`data: {"p":"response/content","v":"ok"}`),
+	}
+	h := &Handler{
+		Store: testGeminiConfig{},
+		Auth:  testGeminiAuth{},
+		DS:    ds,
+	}
+	reqBody := `{
+		"contents":[{"role":"user","parts":[{"text":"run code"}]}],
+		"tools":[{"functionDeclarations":[{"name":"eval_javascript","description":"eval","parameters":{"type":"object","properties":{"code":{"type":"string"}}}}]}]
+	}`
+	req := httptest.NewRequest(http.MethodPost, "/v1beta/models/gemini-2.5-pro:generateContent", strings.NewReader(reqBody))
+	req.Header.Set("Content-Type", "application/json")
+	rec := httptest.NewRecorder()
+	r := chi.NewRouter()
+	RegisterRoutes(r, h)
+
+	r.ServeHTTP(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	if len(ds.uploadCalls) != 2 {
+		t.Fatalf("expected history and tools uploads, got %d", len(ds.uploadCalls))
+	}
+	if ds.uploadCalls[0].Filename != "DS2API_HISTORY.txt" || ds.uploadCalls[1].Filename != "DS2API_TOOLS.txt" {
+		t.Fatalf("unexpected upload filenames: %#v", ds.uploadCalls)
+	}
+	historyText := string(ds.uploadCalls[0].Data)
+	if strings.Contains(historyText, "Description: eval") {
+		t.Fatalf("history transcript should not embed tool descriptions, got %q", historyText)
+	}
+	toolsText := string(ds.uploadCalls[1].Data)
+	if !strings.Contains(toolsText, "# DS2API_TOOLS.txt") || !strings.Contains(toolsText, "Tool: eval_javascript") || !strings.Contains(toolsText, "Description: eval") {
+		t.Fatalf("expected tools transcript to include Gemini tool schema, got %q", toolsText)
+	}
+	refIDs, _ := ds.payloads[0]["ref_file_ids"].([]any)
+	if len(refIDs) < 2 || refIDs[0] != "file-gemini-history" || refIDs[1] != "file-gemini-tools" {
+		t.Fatalf("expected history and tools ref ids first, got %#v", ds.payloads[0]["ref_file_ids"])
+	}
+	prompt, _ := ds.payloads[0]["prompt"].(string)
+	if !strings.Contains(prompt, "DS2API_TOOLS.txt") || !strings.Contains(prompt, "TOOL CALL FORMAT") {
+		t.Fatalf("expected live prompt to reference tools file and retain format instructions, got %q", prompt)
+	}
+	if strings.Contains(prompt, "Description: eval") {
+		t.Fatalf("live prompt should not inline tool descriptions, got %q", prompt)
+	}
+}
+
 func TestGeminiRoutesRegistered(t *testing.T) {
 	h := &Handler{
 		Store: testGeminiConfig{},
--- a/internal/httpapi/ollama/handler_routes.go
+++ b/internal/httpapi/ollama/handler_routes.go
@@ -0,0 +1,58 @@
+package ollama
+
+import (
+	"ds2api/internal/config"
+	"ds2api/internal/util"
+	"encoding/json"
+	"github.com/go-chi/chi/v5"
+	"log/slog"
+	"net/http"
+)
+
+var WriteJSON = util.WriteJSON
+
+type ConfigReader interface {
+	ModelAliases() map[string]string
+}
+
+type Handler struct {
+	Store ConfigReader
+}
+
+type OllamaModelRequest struct {
+	Model string `json:"model"`
+}
+
+func RegisterRoutes(r chi.Router, h *Handler) {
+	r.Get("/api/version", h.GetVersion)
+	r.Get("/api/tags", h.ListOllamaModels)
+	r.Post("/api/show", h.GetOllamaModel)
+}
+
+func (h *Handler) GetVersion(w http.ResponseWriter, r *http.Request) {
+	w.Header().Set("Content-Type", "application/json")
+	w.WriteHeader(http.StatusOK)
+	_, _ = w.Write([]byte(`{"version":"0.23.1"}`))
+}
+func (h *Handler) ListOllamaModels(w http.ResponseWriter, r *http.Request) {
+	WriteJSON(w, http.StatusOK, config.OllamaModelsResponse())
+}
+func (h *Handler) GetOllamaModel(w http.ResponseWriter, r *http.Request) {
+	var payload OllamaModelRequest
+	if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
+		http.Error(w, "Invalid JSON body: "+err.Error(), http.StatusBadRequest)
+		return
+	}
+	defer func() {
+		if err := r.Body.Close(); err != nil {
+			slog.Warn("[ollama] failed to close request body", "error", err)
+		}
+	}()
+	modelID := payload.Model
+	model, ok := config.OllamaModelByID(h.Store, modelID)
+	if !ok {
+		http.Error(w, "Model not found.", http.StatusNotFound)
+		return
+	}
+	WriteJSON(w, http.StatusOK, model)
+}
--- a/internal/httpapi/ollama/handler_routes_test.go
+++ b/internal/httpapi/ollama/handler_routes_test.go
@@ -0,0 +1,127 @@
+package ollama
+
+import (
+	"encoding/json"
+	"github.com/go-chi/chi/v5"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+)
+
+type ollamaTestSurface struct {
+	Store   ConfigReader
+	handler *Handler
+}
+
+func (h *ollamaTestSurface) apiHandler() *Handler {
+	if h.handler == nil {
+		h.handler = &Handler{Store: h.Store}
+	}
+	return h.handler
+}
+
+func registerOllamaTestRoutes(r chi.Router, h *ollamaTestSurface) {
+	r.Get("/api/version", h.apiHandler().GetVersion)
+	r.Get("/api/tags", h.apiHandler().ListOllamaModels)
+	r.Post("/api/show", h.apiHandler().GetOllamaModel)
+}
+
+func TestGetOllamaVersionRoute(t *testing.T) {
+	h := &ollamaTestSurface{}
+	r := chi.NewRouter()
+	registerOllamaTestRoutes(r, h)
+	req := httptest.NewRequest(http.MethodGet, "/api/version", nil)
+	rec := httptest.NewRecorder()
+	r.ServeHTTP(rec, req)
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+}
+
+func TestGetOllamaModelsRoute(t *testing.T) {
+	h := &ollamaTestSurface{}
+	r := chi.NewRouter()
+	registerOllamaTestRoutes(r, h)
+	req := httptest.NewRequest(http.MethodGet, "/api/tags", nil)
+	rec := httptest.NewRecorder()
+	r.ServeHTTP(rec, req)
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+}
+
+func TestGetOllamaModelRoute(t *testing.T) {
+	h := &ollamaTestSurface{}
+	r := chi.NewRouter()
+	registerOllamaTestRoutes(r, h)
+
+	t.Run("direct", func(t *testing.T) {
+		body := `{"model":"deepseek-v4-flash"}`
+		req := httptest.NewRequest(http.MethodPost, "/api/show", strings.NewReader(body))
+		req.Header.Set("Content-Type", "application/json")
+		rec := httptest.NewRecorder()
+		r.ServeHTTP(rec, req)
+		if rec.Code != http.StatusOK {
+			t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+		}
+		var payload map[string]any
+		if err := json.Unmarshal(rec.Body.Bytes(), &payload); err != nil {
+			t.Fatalf("expected valid json body, got err=%v body=%s", err, rec.Body.String())
+		}
+		if _, ok := payload["id"]; !ok {
+			t.Fatalf("expected response has lowercase id field, body=%s", rec.Body.String())
+		}
+		if _, ok := payload["ID"]; ok {
+			t.Fatalf("expected response does not expose uppercase ID field, body=%s", rec.Body.String())
+		}
+	})
+
+	t.Run("direct_nothinking", func(t *testing.T) {
+		body := `{"model":"deepseek-v4-flash-nothinking"}`
+		req := httptest.NewRequest(http.MethodPost, "/api/show", strings.NewReader(body))
+		req.Header.Set("Content-Type", "application/json")
+		rec := httptest.NewRecorder()
+		r.ServeHTTP(rec, req)
+		if rec.Code != http.StatusOK {
+			t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+		}
+	})
+
+	t.Run("direct_expert", func(t *testing.T) {
+		body := `{"model":"deepseek-v4-pro"}`
+		req := httptest.NewRequest(http.MethodPost, "/api/show", strings.NewReader(body))
+		req.Header.Set("Content-Type", "application/json")
+		rec := httptest.NewRecorder()
+		r.ServeHTTP(rec, req)
+		if rec.Code != http.StatusOK {
+			t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+		}
+	})
+
+	t.Run("direct_vision", func(t *testing.T) {
+		body := `{"model":"deepseek-v4-vision"}`
+		req := httptest.NewRequest(http.MethodPost, "/api/show", strings.NewReader(body))
+		req.Header.Set("Content-Type", "application/json")
+		rec := httptest.NewRecorder()
+		r.ServeHTTP(rec, req)
+		if rec.Code != http.StatusOK {
+			t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+		}
+	})
+}
+
+func TestGetOllamaModelRouteNotFound(t *testing.T) {
+	h := &ollamaTestSurface{}
+	r := chi.NewRouter()
+	registerOllamaTestRoutes(r, h)
+
+	body := `{"model":"not-exists"}`
+	req := httptest.NewRequest(http.MethodPost, "/api/show", strings.NewReader(body))
+	req.Header.Set("Content-Type", "application/json")
+	rec := httptest.NewRecorder()
+	r.ServeHTTP(rec, req)
+	if rec.Code != http.StatusNotFound {
+		t.Fatalf("expected 404, got %d body=%s", rec.Code, rec.Body.String())
+	}
+}
--- a/internal/httpapi/openai/chat/empty_retry_runtime.go
+++ b/internal/httpapi/openai/chat/empty_retry_runtime.go
@@ -4,11 +4,11 @@ import (
 	"context"
 	"io"
 	"net/http"
-	"strings"
 	"time"

 	"ds2api/internal/assistantturn"
 	"ds2api/internal/auth"
+	"ds2api/internal/completionruntime"
 	"ds2api/internal/config"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	openaifmt "ds2api/internal/format/openai"
@@ -17,191 +17,96 @@ import (
 	streamengine "ds2api/internal/stream"
 )

-type chatNonStreamResult struct {
-	rawThinking           string
-	rawText               string
-	thinking              string
-	toolDetectionThinking string
-	text                  string
-	contentFilter         bool
-	detectedCalls         int
-	body                  map[string]any
-	finishReason          string
-	responseMessageID     int
-	outputError           *assistantturn.OutputError
-}
-
-func (r chatNonStreamResult) historyText() string {
-	return historyTextForArchive(r.rawText, r.text)
-}
-
-func (r chatNonStreamResult) historyThinking() string {
-	return historyThinkingForArchive(r.rawThinking, r.toolDetectionThinking, r.thinking)
-}
-
 func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Context, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, historySession *chatHistorySession) {
-	attempts := 0
-	currentResp := resp
-	usagePrompt := finalPrompt
-	accumulatedThinking := ""
-	accumulatedRawThinking := ""
-	accumulatedToolDetectionThinking := ""
-	for {
-		result, ok := h.collectChatNonStreamAttempt(w, currentResp, completionID, model, usagePrompt, thinkingEnabled, searchEnabled, toolNames, toolsRaw)
-		if !ok {
-			return
-		}
-		accumulatedThinking += sse.TrimContinuationOverlap(accumulatedThinking, result.thinking)
-		accumulatedRawThinking += sse.TrimContinuationOverlap(accumulatedRawThinking, result.rawThinking)
-		accumulatedToolDetectionThinking += sse.TrimContinuationOverlap(accumulatedToolDetectionThinking, result.toolDetectionThinking)
-		result.thinking = accumulatedThinking
-		result.rawThinking = accumulatedRawThinking
-		result.toolDetectionThinking = accumulatedToolDetectionThinking
-		detected := detectAssistantToolCalls(result.rawText, result.text, result.rawThinking, result.toolDetectionThinking, toolNames)
-		result.detectedCalls = len(detected.Calls)
-		result.body = openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, result.thinking, result.text, detected.Calls, toolsRaw)
-		addRefFileTokensToUsage(result.body, refFileTokens)
-		result.finishReason = chatFinishReason(result.body)
-		if !shouldRetryChatNonStream(result, attempts) {
-			h.finishChatNonStreamResult(w, result, attempts, usagePrompt, refFileTokens, historySession)
-			return
-		}
-
-		attempts++
-		config.Logger.Info("[openai_empty_retry] attempting synthetic retry", "surface", "chat.completions", "stream", false, "retry_attempt", attempts, "parent_message_id", result.responseMessageID)
-		retryPow, powErr := h.DS.GetPow(ctx, a, 3)
-		if powErr != nil {
-			config.Logger.Warn("[openai_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", "chat.completions", "stream", false, "retry_attempt", attempts, "error", powErr)
-			retryPow = pow
-		}
-		retryPayload := clonePayloadForEmptyOutputRetry(payload, result.responseMessageID)
-		nextResp, err := h.DS.CallCompletion(ctx, a, retryPayload, retryPow, 3)
-		if err != nil {
-			if historySession != nil {
-				historySession.error(http.StatusInternalServerError, "Failed to get completion.", "error", result.historyThinking(), result.historyText())
-			}
-			writeOpenAIError(w, http.StatusInternalServerError, "Failed to get completion.")
-			config.Logger.Warn("[openai_empty_retry] retry request failed", "surface", "chat.completions", "stream", false, "retry_attempt", attempts, "error", err)
-			return
-		}
-		usagePrompt = usagePromptWithEmptyOutputRetry(usagePrompt, attempts)
-		currentResp = nextResp
-	}
-}
-
-func (h *Handler) collectChatNonStreamAttempt(w http.ResponseWriter, resp *http.Response, completionID, model, usagePrompt string, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any) (chatNonStreamResult, bool) {
 	if resp.StatusCode != http.StatusOK {
 		defer func() { _ = resp.Body.Close() }()
 		body, _ := io.ReadAll(resp.Body)
-		writeOpenAIError(w, resp.StatusCode, string(body))
-		return chatNonStreamResult{}, false
-	}
-	result := sse.CollectStream(resp, thinkingEnabled, true)
-	turn := assistantturn.BuildTurnFromCollected(result, assistantturn.BuildOptions{
-		Model:         model,
-		Prompt:        usagePrompt,
-		SearchEnabled: searchEnabled,
-		ToolNames:     toolNames,
-		ToolsRaw:      toolsRaw,
-	})
-	respBody := openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, turn.Thinking, turn.Text, turn.ToolCalls, toolsRaw)
-	return chatNonStreamResult{
-		rawThinking:           result.Thinking,
-		rawText:               result.Text,
-		thinking:              turn.Thinking,
-		toolDetectionThinking: result.ToolDetectionThinking,
-		text:                  turn.Text,
-		contentFilter:         result.ContentFilter,
-		detectedCalls:         len(turn.ToolCalls),
-		body:                  respBody,
-		finishReason:          chatFinishReason(respBody),
-		responseMessageID:     result.ResponseMessageID,
-		outputError:           turn.Error,
-	}, true
-}
-
-func (h *Handler) finishChatNonStreamResult(w http.ResponseWriter, result chatNonStreamResult, attempts int, usagePrompt string, refFileTokens int, historySession *chatHistorySession) {
-	if result.detectedCalls == 0 && strings.TrimSpace(result.text) == "" {
-		status, message, code := upstreamEmptyOutputDetail(result.contentFilter, result.text, result.thinking)
-		if result.outputError != nil {
-			status, message, code = result.outputError.Status, result.outputError.Message, result.outputError.Code
-		}
 		if historySession != nil {
-			historySession.error(status, message, code, result.historyThinking(), result.historyText())
+			historySession.error(resp.StatusCode, string(body), "error", "", "")
 		}
-		writeOpenAIErrorWithCode(w, status, message, code)
-		config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "chat.completions", "stream", false, "retry_attempts", attempts, "success_source", "none", "content_filter", result.contentFilter)
+		writeOpenAIError(w, resp.StatusCode, string(body))
 		return
 	}
+	stdReq := promptcompat.StandardRequest{
+		Surface:         "chat.completions",
+		ResponseModel:   model,
+		PromptTokenText: finalPrompt,
+		FinalPrompt:     finalPrompt,
+		RefFileTokens:   refFileTokens,
+		Thinking:        thinkingEnabled,
+		Search:          searchEnabled,
+		ToolNames:       toolNames,
+		ToolsRaw:        toolsRaw,
+		ToolChoice:      promptcompat.DefaultToolChoicePolicy(),
+	}
+	retryEnabled := h != nil && h.DS != nil && emptyOutputRetryEnabled()
+	result, outErr := completionruntime.ExecuteNonStreamStartedWithRetry(ctx, h.DS, a, completionruntime.StartResult{
+		SessionID: completionID,
+		Payload:   payload,
+		Pow:       pow,
+		Response:  resp,
+		Request:   stdReq,
+	}, completionruntime.Options{
+		RetryEnabled:     retryEnabled,
+		RetryMaxAttempts: emptyOutputRetryMaxAttempts(),
+	})
+	if outErr != nil {
 		if historySession != nil {
-		historySession.success(http.StatusOK, result.historyThinking(), result.historyText(), result.finishReason, openaifmt.BuildChatUsageForModel("", usagePrompt, result.thinking, result.text, refFileTokens))
+			historySession.error(outErr.Status, outErr.Message, outErr.Code, historyThinkingForArchive(result.Turn.RawThinking, result.Turn.DetectionThinking, result.Turn.Thinking), historyTextForArchive(result.Turn.RawText, result.Turn.Text))
 		}
-	writeJSON(w, http.StatusOK, result.body)
-	source := "first_attempt"
-	if attempts > 0 {
-		source = "synthetic_retry"
+		writeOpenAIErrorWithCode(w, outErr.Status, outErr.Message, outErr.Code)
+		return
 	}
-	config.Logger.Info("[openai_empty_retry] completed", "surface", "chat.completions", "stream", false, "retry_attempts", attempts, "success_source", source)
+	respBody := openaifmt.BuildChatCompletionWithToolCalls(result.SessionID, model, result.Turn.Prompt, result.Turn.Thinking, result.Turn.Text, result.Turn.ToolCalls, toolsRaw)
+	respBody["usage"] = assistantturn.OpenAIChatUsage(result.Turn)
+	outcome := assistantturn.FinalizeTurn(result.Turn, assistantturn.FinalizeOptions{})
+	if historySession != nil {
+		historySession.success(http.StatusOK, historyThinkingForArchive(result.Turn.RawThinking, result.Turn.DetectionThinking, result.Turn.Thinking), historyTextForArchive(result.Turn.RawText, result.Turn.Text), outcome.FinishReason, assistantturn.OpenAIChatUsage(result.Turn))
+	}
+	writeJSON(w, http.StatusOK, respBody)
 }

-func chatFinishReason(respBody map[string]any) string {
-	if choices, ok := respBody["choices"].([]map[string]any); ok && len(choices) > 0 {
-		if fr, _ := choices[0]["finish_reason"].(string); strings.TrimSpace(fr) != "" {
-			return fr
-		}
-	}
-	return "stop"
-}
-
-func shouldRetryChatNonStream(result chatNonStreamResult, attempts int) bool {
-	return emptyOutputRetryEnabled() &&
-		attempts < emptyOutputRetryMaxAttempts() &&
-		!result.contentFilter &&
-		result.detectedCalls == 0 &&
-		strings.TrimSpace(result.text) == ""
-}
-
-func (h *Handler) handleStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, historySession *chatHistorySession) {
+func (h *Handler) handleStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID string, sessionIDRef *string, stdReq promptcompat.StandardRequest, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, historySession *chatHistorySession) {
 	streamRuntime, initialType, ok := h.prepareChatStreamRuntime(w, resp, completionID, model, finalPrompt, refFileTokens, thinkingEnabled, searchEnabled, toolNames, toolsRaw, toolChoice, historySession)
 	if !ok {
 		return
 	}
-	attempts := 0
-	currentResp := resp
-	for {
-		terminalWritten, retryable := h.consumeChatStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, historySession, attempts < emptyOutputRetryMaxAttempts())
-		if terminalWritten {
-			logChatStreamTerminal(streamRuntime, attempts)
-			return
-		}
-		if !retryable || !emptyOutputRetryEnabled() || attempts >= emptyOutputRetryMaxAttempts() {
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:          "chat.completions",
+		Stream:           true,
+		RetryEnabled:     emptyOutputRetryEnabled(),
+		RetryMaxAttempts: emptyOutputRetryMaxAttempts(),
+		MaxAttempts:      3,
+		UsagePrompt:      finalPrompt,
+		Request:          stdReq,
+		CurrentInputFile: h.Store,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeChatStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, historySession, allowDeferEmpty)
+		},
+		Finalize: func(attempts int) {
 			streamRuntime.finalize("stop", false)
 			recordChatStreamHistory(streamRuntime, historySession)
 			config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "chat.completions", "stream", true, "retry_attempts", attempts, "success_source", "none")
-			return
-		}
-		attempts++
-		config.Logger.Info("[openai_empty_retry] attempting synthetic retry", "surface", "chat.completions", "stream", true, "retry_attempt", attempts, "parent_message_id", streamRuntime.responseMessageID)
-		retryPow, powErr := h.DS.GetPow(r.Context(), a, 3)
-		if powErr != nil {
-			config.Logger.Warn("[openai_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", "chat.completions", "stream", true, "retry_attempt", attempts, "error", powErr)
-			retryPow = pow
-		}
-		nextResp, err := h.DS.CallCompletion(r.Context(), a, clonePayloadForEmptyOutputRetry(payload, streamRuntime.responseMessageID), retryPow, 3)
-		if err != nil {
-			failChatStreamRetry(streamRuntime, historySession, http.StatusInternalServerError, "Failed to get completion.", "error")
-			config.Logger.Warn("[openai_empty_retry] retry request failed", "surface", "chat.completions", "stream", true, "retry_attempt", attempts, "error", err)
-			return
-		}
-		if nextResp.StatusCode != http.StatusOK {
-			defer func() { _ = nextResp.Body.Close() }()
-			body, _ := io.ReadAll(nextResp.Body)
-			failChatStreamRetry(streamRuntime, historySession, nextResp.StatusCode, string(body), "error")
-			return
-		}
-		streamRuntime.finalPrompt = usagePromptWithEmptyOutputRetry(finalPrompt, attempts)
-		currentResp = nextResp
+		},
+		ParentMessageID: func() int {
+			return streamRuntime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			streamRuntime.finalPrompt = prompt
+		},
+		OnRetryFailure: func(status int, message, code string) {
+			failChatStreamRetry(streamRuntime, historySession, status, message, code)
+		},
+		OnAccountSwitch: func(sessionID string) {
+			if sessionIDRef != nil {
+				*sessionIDRef = sessionID
 			}
+		},
+		OnTerminal: func(attempts int) {
+			logChatStreamTerminal(streamRuntime, attempts)
+		},
+	})
 }

 func (h *Handler) prepareChatStreamRuntime(w http.ResponseWriter, resp *http.Response, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, historySession *chatHistorySession) (*chatStreamRuntime, string, bool) {
--- a/internal/httpapi/openai/chat/handler.go
+++ b/internal/httpapi/openai/chat/handler.go
@@ -33,6 +33,8 @@ type Handler struct {

 type streamLease struct {
 	Auth      *auth.RequestAuth
+	Standard  promptcompat.StandardRequest
+	SessionID string
 	ExpiresAt time.Time
 }

@@ -106,10 +108,6 @@ func cleanVisibleOutput(text string, stripReferenceMarkers bool) string {
 	return shared.CleanVisibleOutput(text, stripReferenceMarkers)
 }

-func upstreamEmptyOutputDetail(contentFilter bool, text, thinking string) (int, string, string) {
-	return shared.UpstreamEmptyOutputDetail(contentFilter, text, thinking)
-}
-
 func emptyOutputRetryEnabled() bool {
 	return shared.EmptyOutputRetryEnabled()
 }
@@ -118,14 +116,6 @@ func emptyOutputRetryMaxAttempts() int {
 	return shared.EmptyOutputRetryMaxAttempts()
 }

-func clonePayloadForEmptyOutputRetry(payload map[string]any, parentMessageID int) map[string]any {
-	return shared.ClonePayloadForEmptyOutputRetry(payload, parentMessageID)
-}
-
-func usagePromptWithEmptyOutputRetry(originalPrompt string, retryAttempts int) string {
-	return shared.UsagePromptWithEmptyOutputRetry(originalPrompt, retryAttempts)
-}
-
 func formatIncrementalStreamToolCallDeltas(deltas []toolstream.ToolCallDelta, ids map[int]string) []map[string]any {
 	return shared.FormatIncrementalStreamToolCallDeltas(deltas, ids)
 }
@@ -137,7 +127,3 @@ func filterIncrementalToolCallDeltasByAllowed(deltas []toolstream.ToolCallDelta,
 func formatFinalStreamToolCallsWithStableIDs(calls []toolcall.ParsedToolCall, ids map[int]string, toolsRaw any) []map[string]any {
 	return shared.FormatFinalStreamToolCallsWithStableIDs(calls, ids, toolsRaw)
 }
-
-func detectAssistantToolCalls(rawText, visibleText, exposedThinking, detectionThinking string, toolNames []string) toolcall.ToolCallParseResult {
-	return shared.DetectAssistantToolCalls(rawText, visibleText, exposedThinking, detectionThinking, toolNames)
-}
--- a/internal/httpapi/openai/chat/handler_chat.go
+++ b/internal/httpapi/openai/chat/handler_chat.go
@@ -28,6 +28,10 @@ func (h *Handler) ChatCompletions(w http.ResponseWriter, r *http.Request) {
 		h.handleVercelStreamPow(w, r)
 		return
 	}
+	if isVercelStreamSwitchRequest(r) {
+		h.handleVercelStreamSwitch(w, r)
+		return
+	}
 	if isVercelStreamPrepareRequest(r) {
 		h.handleVercelStreamPrepare(w, r)
 		return
@@ -114,7 +118,7 @@ func (h *Handler) ChatCompletions(w http.ResponseWriter, r *http.Request) {
 	}
 	streamReq := start.Request
 	refFileTokens := streamReq.RefFileTokens
-	h.handleStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, sessionID, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, historySession)
+	h.handleStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, sessionID, &sessionID, streamReq, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, historySession)
 }

 func (h *Handler) autoDeleteRemoteSession(ctx context.Context, a *auth.RequestAuth, sessionID string) {
--- a/internal/httpapi/openai/chat/handler_toolcall_test.go
+++ b/internal/httpapi/openai/chat/handler_toolcall_test.go
@@ -85,8 +85,7 @@ func streamFinishReason(frames []map[string]any) string {
 	return ""
 }

-// Backward-compatible alias for historical test name used in CI logs.
-func TestHandleNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T) {
+func TestHandleNonStreamSingleAttemptReturns503WhenUpstreamOutputEmpty(t *testing.T) {
 	h := &Handler{}
 	resp := makeSSEHTTPResponse(
 		`data: {"p":"response/content","v":""}`,
@@ -95,17 +94,17 @@ func TestHandleNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T) {
 	rec := httptest.NewRecorder()

 	h.handleNonStream(rec, resp, "cid-empty", "deepseek-v4-flash", "prompt", 0, false, false, nil, nil, nil)
-	if rec.Code != http.StatusTooManyRequests {
-		t.Fatalf("expected status 429 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
+	if rec.Code != http.StatusServiceUnavailable {
+		t.Fatalf("expected status 503 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
 	}
 	out := decodeJSONBody(t, rec.Body.String())
 	errObj, _ := out["error"].(map[string]any)
-	if asString(errObj["code"]) != "upstream_empty_output" {
-		t.Fatalf("expected code=upstream_empty_output, got %#v", out)
+	if asString(errObj["code"]) != "upstream_unavailable" {
+		t.Fatalf("expected code=upstream_unavailable, got %#v", out)
 	}
 }

-func TestHandleNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
+func TestHandleNonStreamSingleAttemptReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
 	h := &Handler{}
 	resp := makeSSEHTTPResponse(
 		`data: {"code":"content_filter"}`,
@@ -124,7 +123,7 @@ func TestHandleNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutp
 	}
 }

-func TestHandleNonStreamReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
+func TestHandleNonStreamSingleAttemptReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
 	h := &Handler{}
 	resp := makeSSEHTTPResponse(
 		`data: {"p":"response/thinking_content","v":"Only thinking"}`,
--- a/internal/httpapi/openai/chat/ref_file_tokens.go
+++ b/internal/httpapi/openai/chat/ref_file_tokens.go
@@ -1,26 +0,0 @@
-package chat
-
-// addRefFileTokensToUsage adds inline-uploaded file token estimates to an existing
-// usage map inside a response object. This keeps the token accounting aware of file
-// content that the upstream model processes but that is not part of the prompt text.
-func addRefFileTokensToUsage(obj map[string]any, refFileTokens int) {
-	if refFileTokens <= 0 || obj == nil {
-		return
-	}
-	usage, ok := obj["usage"].(map[string]any)
-	if !ok || usage == nil {
-		return
-	}
-	for _, key := range []string{"input_tokens", "prompt_tokens"} {
-		if v, ok := usage[key]; ok {
-			if n, ok := v.(int); ok {
-				usage[key] = n + refFileTokens
-			}
-		}
-	}
-	if v, ok := usage["total_tokens"]; ok {
-		if n, ok := v.(int); ok {
-			usage["total_tokens"] = n + refFileTokens
-		}
-	}
-}
--- a/internal/httpapi/openai/chat/test_helpers_test.go
+++ b/internal/httpapi/openai/chat/test_helpers_test.go
@@ -2,6 +2,7 @@ package chat

 import (
 	"context"
+	"fmt"
 	"io"
 	"net/http"
 	"strings"
@@ -148,8 +149,12 @@ func (m *inlineUploadDSStub) UploadFile(ctx context.Context, _ *auth.RequestAuth
 	if m.uploadErr != nil {
 		return nil, m.uploadErr
 	}
+	id := "file-inline-1"
+	if len(m.uploadCalls) > 1 {
+		id = "file-inline-" + fmt.Sprint(len(m.uploadCalls))
+	}
 	return &dsclient.UploadFileResult{
-		ID:       "file-inline-1",
+		ID:       id,
 		Filename: req.Filename,
 		Bytes:    int64(len(req.Data)),
 		Status:   "uploaded",
--- a/internal/httpapi/openai/chat/vercel_prepare_test.go
+++ b/internal/httpapi/openai/chat/vercel_prepare_test.go
@@ -1,6 +1,7 @@
 package chat

 import (
+	"context"
 	"encoding/json"
 	"net/http"
 	"net/http/httptest"
@@ -8,8 +9,11 @@ import (
 	"testing"
 	"time"

+	"ds2api/internal/account"
 	"ds2api/internal/auth"
+	"ds2api/internal/config"
 	dsclient "ds2api/internal/deepseek/client"
+	"ds2api/internal/promptcompat"
 )

 func TestIsVercelStreamPrepareRequest(t *testing.T) {
@@ -64,14 +68,16 @@ func TestVercelInternalSecret(t *testing.T) {

 func TestStreamLeaseLifecycle(t *testing.T) {
 	h := &Handler{}
-	leaseID := h.holdStreamLease(&auth.RequestAuth{UseConfigToken: false})
+	leaseID := h.holdStreamLease(&auth.RequestAuth{UseConfigToken: false}, promptcompat.StandardRequest{}, "test-session-id")
 	if leaseID == "" {
 		t.Fatalf("expected non-empty lease id")
 	}
-	if ok := h.releaseStreamLease(leaseID); !ok {
+	if lease, ok := h.releaseStreamLease(leaseID); !ok {
 		t.Fatalf("expected lease release success")
+	} else if lease.SessionID != "test-session-id" {
+		t.Fatalf("expected released session id, got %q", lease.SessionID)
 	}
-	if ok := h.releaseStreamLease(leaseID); ok {
+	if _, ok := h.releaseStreamLease(leaseID); ok {
 		t.Fatalf("expected duplicate release to fail")
 	}
 }
@@ -141,6 +147,243 @@ func TestHandleVercelStreamPrepareAppliesCurrentInputFile(t *testing.T) {
 	}
 }

+func TestHandleVercelStreamPrepareUsesHalfwidthDSMLToolPrompt(t *testing.T) {
+	t.Setenv("VERCEL", "1")
+	t.Setenv("DS2API_VERCEL_INTERNAL_SECRET", "stream-secret")
+
+	h := &Handler{
+		Store: mockOpenAIConfig{},
+		Auth:  streamStatusAuthStub{},
+		DS:    &inlineUploadDSStub{},
+	}
+
+	reqBody, _ := json.Marshal(map[string]any{
+		"model": "deepseek-v4-flash",
+		"messages": []any{
+			map[string]any{"role": "user", "content": "search docs"},
+		},
+		"tools": []any{
+			map[string]any{
+				"type": "function",
+				"function": map[string]any{
+					"name":        "search",
+					"description": "search docs",
+					"parameters": map[string]any{
+						"type": "object",
+						"properties": map[string]any{
+							"query": map[string]any{"type": "string"},
+						},
+						"required": []any{"query"},
+					},
+				},
+			},
+		},
+		"stream": true,
+	})
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions?__stream_prepare=1", strings.NewReader(string(reqBody)))
+	req.Header.Set("Authorization", "Bearer direct-token")
+	req.Header.Set("Content-Type", "application/json")
+	req.Header.Set("X-Ds2-Internal-Token", "stream-secret")
+	rec := httptest.NewRecorder()
+
+	h.handleVercelStreamPrepare(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	var body map[string]any
+	if err := json.NewDecoder(rec.Body).Decode(&body); err != nil {
+		t.Fatalf("decode failed: %v", err)
+	}
+	finalPrompt, _ := body["final_prompt"].(string)
+	payload, _ := body["payload"].(map[string]any)
+	payloadPrompt, _ := payload["prompt"].(string)
+	for label, promptText := range map[string]string{"final_prompt": finalPrompt, "payload.prompt": payloadPrompt} {
+		if !strings.Contains(promptText, "<|DSML|tool_calls>") || !strings.Contains(promptText, "Tag punctuation alphabet: ASCII < > / = \" plus the halfwidth pipe |.") {
+			t.Fatalf("expected %s to contain halfwidth DSML tool instructions, got %q", label, promptText)
+		}
+		if strings.Contains(promptText, "\uff5c") || strings.Contains(promptText, "full"+"width vertical bar") {
+			t.Fatalf("expected %s not to contain legacy pipe guidance, got %q", label, promptText)
+		}
+	}
+	toolNames, _ := body["tool_names"].([]any)
+	if len(toolNames) != 1 || toolNames[0] != "search" {
+		t.Fatalf("expected prepared tool names to align with request tools, got %#v", body["tool_names"])
+	}
+}
+
+type vercelReleaseAutoDeleteDSStub struct {
+	resp             *http.Response
+	deleteCallCount  int
+	deletedSessionID string
+	deletedToken     string
+	deleteErr        error
+	events           *[]string
+}
+
+func (m *vercelReleaseAutoDeleteDSStub) CreateSession(_ context.Context, _ *auth.RequestAuth, _ int) (string, error) {
+	return "session-id", nil
+}
+
+func (m *vercelReleaseAutoDeleteDSStub) GetPow(_ context.Context, _ *auth.RequestAuth, _ int) (string, error) {
+	return "pow", nil
+}
+
+func (m *vercelReleaseAutoDeleteDSStub) UploadFile(_ context.Context, _ *auth.RequestAuth, _ dsclient.UploadFileRequest, _ int) (*dsclient.UploadFileResult, error) {
+	return &dsclient.UploadFileResult{ID: "file-id", Filename: "file.txt", Bytes: 1, Status: "uploaded"}, nil
+}
+
+func (m *vercelReleaseAutoDeleteDSStub) CallCompletion(_ context.Context, _ *auth.RequestAuth, _ map[string]any, _ string, _ int) (*http.Response, error) {
+	return m.resp, nil
+}
+
+func (m *vercelReleaseAutoDeleteDSStub) DeleteSessionForToken(_ context.Context, token string, sessionID string) (*dsclient.DeleteSessionResult, error) {
+	if m.events != nil {
+		*m.events = append(*m.events, "delete")
+	}
+	m.deleteCallCount++
+	m.deletedSessionID = sessionID
+	m.deletedToken = token
+	if m.deleteErr != nil {
+		return nil, m.deleteErr
+	}
+	return &dsclient.DeleteSessionResult{SessionID: sessionID, Success: true}, nil
+}
+
+func (m *vercelReleaseAutoDeleteDSStub) DeleteAllSessionsForToken(_ context.Context, _ string) error {
+	return nil
+}
+
+type vercelReleaseAuthStub struct {
+	events *[]string
+}
+
+func (a *vercelReleaseAuthStub) Determine(_ *http.Request) (*auth.RequestAuth, error) {
+	return &auth.RequestAuth{DeepSeekToken: "test-token", AccountID: "test-account"}, nil
+}
+
+func (a *vercelReleaseAuthStub) DetermineCaller(_ *http.Request) (*auth.RequestAuth, error) {
+	return &auth.RequestAuth{DeepSeekToken: "test-token", AccountID: "test-account"}, nil
+}
+
+func (a *vercelReleaseAuthStub) Release(_ *auth.RequestAuth) {
+	if a.events != nil {
+		*a.events = append(*a.events, "release")
+	}
+}
+
+func TestHandleVercelStreamReleaseTriggersAutoDelete(t *testing.T) {
+	t.Setenv("VERCEL", "1")
+	t.Setenv("DS2API_VERCEL_INTERNAL_SECRET", "stream-secret")
+
+	events := []string{}
+	ds := &vercelReleaseAutoDeleteDSStub{events: &events}
+	h := &Handler{
+		Store: mockOpenAIConfig{
+			autoDeleteMode: "single",
+		},
+		Auth: &vercelReleaseAuthStub{events: &events},
+		DS:   ds,
+	}
+
+	leaseID := h.holdStreamLease(&auth.RequestAuth{DeepSeekToken: "test-token", AccountID: "test-account"}, promptcompat.StandardRequest{}, "session-to-delete")
+	if leaseID == "" {
+		t.Fatalf("expected non-empty lease id")
+	}
+
+	reqBody := map[string]any{"lease_id": leaseID}
+	reqJSON, _ := json.Marshal(reqBody)
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions?__stream_release=1", strings.NewReader(string(reqJSON)))
+	req.Header.Set("X-Ds2-Internal-Token", "stream-secret")
+	req.Header.Set("Content-Type", "application/json")
+	rec := httptest.NewRecorder()
+
+	h.handleVercelStreamRelease(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	if ds.deleteCallCount != 1 {
+		t.Fatalf("expected auto delete call count=1, got %d", ds.deleteCallCount)
+	}
+	if ds.deletedSessionID != "session-to-delete" {
+		t.Fatalf("expected deleted session id=session-to-delete, got %q", ds.deletedSessionID)
+	}
+	if got, want := strings.Join(events, ","), "delete,release"; got != want {
+		t.Fatalf("expected auto-delete before auth release, got %s", got)
+	}
+}
+
+func TestHandleVercelStreamPrepareUploadsToolsSeparately(t *testing.T) {
+	t.Setenv("VERCEL", "1")
+	t.Setenv("DS2API_VERCEL_INTERNAL_SECRET", "stream-secret")
+
+	ds := &inlineUploadDSStub{}
+	h := &Handler{
+		Store: mockOpenAIConfig{currentInputEnabled: true},
+		Auth:  streamStatusAuthStub{},
+		DS:    ds,
+	}
+
+	reqBody, _ := json.Marshal(map[string]any{
+		"model": "deepseek-v4-flash",
+		"messages": []any{
+			map[string]any{"role": "user", "content": "search docs"},
+		},
+		"tools": []any{
+			map[string]any{
+				"type": "function",
+				"function": map[string]any{
+					"name":        "search",
+					"description": "search docs",
+					"parameters":  map[string]any{"type": "object"},
+				},
+			},
+		},
+		"stream": true,
+	})
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions?__stream_prepare=1", strings.NewReader(string(reqBody)))
+	req.Header.Set("Authorization", "Bearer direct-token")
+	req.Header.Set("Content-Type", "application/json")
+	req.Header.Set("X-Ds2-Internal-Token", "stream-secret")
+	rec := httptest.NewRecorder()
+
+	h.handleVercelStreamPrepare(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	if len(ds.uploadCalls) != 2 {
+		t.Fatalf("expected history and tools uploads, got %d", len(ds.uploadCalls))
+	}
+	if ds.uploadCalls[0].Filename != "DS2API_HISTORY.txt" || ds.uploadCalls[1].Filename != "DS2API_TOOLS.txt" {
+		t.Fatalf("unexpected upload filenames: %#v", ds.uploadCalls)
+	}
+	if strings.Contains(string(ds.uploadCalls[0].Data), "Description: search docs") {
+		t.Fatalf("history transcript should not embed tool descriptions, got %q", string(ds.uploadCalls[0].Data))
+	}
+
+	var body map[string]any
+	if err := json.NewDecoder(rec.Body).Decode(&body); err != nil {
+		t.Fatalf("decode failed: %v", err)
+	}
+	finalPrompt, _ := body["final_prompt"].(string)
+	payload, _ := body["payload"].(map[string]any)
+	payloadPrompt, _ := payload["prompt"].(string)
+	for label, promptText := range map[string]string{"final_prompt": finalPrompt, "payload.prompt": payloadPrompt} {
+		if !strings.Contains(promptText, "DS2API_TOOLS.txt") || !strings.Contains(promptText, "TOOL CALL FORMAT") {
+			t.Fatalf("expected %s to reference tools file and retain tool instructions, got %q", label, promptText)
+		}
+		if strings.Contains(promptText, "Description: search docs") {
+			t.Fatalf("expected %s not to inline tool descriptions, got %q", label, promptText)
+		}
+	}
+	refIDs, _ := payload["ref_file_ids"].([]any)
+	if len(refIDs) < 2 || refIDs[0] != "file-inline-1" || refIDs[1] != "file-inline-2" {
+		t.Fatalf("expected history and tools ref ids first, got %#v", payload["ref_file_ids"])
+	}
+}
+
 func TestHandleVercelStreamPrepareMapsCurrentInputFileManagedAuthFailureTo401(t *testing.T) {
 	t.Setenv("VERCEL", "1")
 	t.Setenv("DS2API_VERCEL_INTERNAL_SECRET", "stream-secret")
@@ -176,3 +419,88 @@ func TestHandleVercelStreamPrepareMapsCurrentInputFileManagedAuthFailureTo401(t
 		t.Fatalf("expected managed auth error message, got %s", rec.Body.String())
 	}
 }
+
+func TestHandleVercelStreamSwitchReuploadsCurrentInputFile(t *testing.T) {
+	t.Setenv("VERCEL", "1")
+	t.Setenv("DS2API_VERCEL_INTERNAL_SECRET", "stream-secret")
+	t.Setenv("DS2API_CONFIG_JSON", `{
+		"keys":["managed-key"],
+		"accounts":[
+			{"email":"acc1@test.com","password":"pwd"},
+			{"email":"acc2@test.com","password":"pwd"}
+		]
+	}`)
+	store := config.LoadStore()
+	resolver := auth.NewResolver(store, account.NewPool(store), func(_ context.Context, acc config.Account) (string, error) {
+		return "token-" + acc.Identifier(), nil
+	})
+	authReq := httptest.NewRequest(http.MethodPost, "/", nil)
+	authReq.Header.Set("Authorization", "Bearer managed-key")
+	a, err := resolver.Determine(authReq)
+	if err != nil {
+		t.Fatalf("determine failed: %v", err)
+	}
+	defer resolver.Release(a)
+
+	ds := &inlineUploadDSStub{}
+	h := &Handler{
+		Store: mockOpenAIConfig{currentInputEnabled: true},
+		Auth:  resolver,
+		DS:    ds,
+	}
+	stdReq := promptcompat.StandardRequest{
+		RequestedModel:          "deepseek-v4-flash",
+		ResolvedModel:           "deepseek-v4-flash",
+		ResponseModel:           "deepseek-v4-flash",
+		FinalPrompt:             "Continue from the latest state in the attached DS2API_HISTORY.txt context. Available tool descriptions and parameter schemas are attached in DS2API_TOOLS.txt; use only those tools and follow the tool-call format rules in this prompt.",
+		PromptTokenText:         "# DS2API_HISTORY.txt\n\n=== 1. USER ===\nhello\n\n# DS2API_TOOLS.txt\nAvailable tool descriptions and parameter schemas for this request.\n\nYou have access to these tools:\n\nTool: search\nDescription: search docs\nParameters: {\"type\":\"object\"}\n",
+		HistoryText:             "# DS2API_HISTORY.txt\n\n=== 1. USER ===\nhello\n",
+		CurrentInputFileApplied: true,
+		CurrentInputFileID:      "file-old",
+		CurrentToolsFileID:      "file-old-tools",
+		ToolsRaw: []any{
+			map[string]any{
+				"type": "function",
+				"function": map[string]any{
+					"name":        "search",
+					"description": "search docs",
+					"parameters":  map[string]any{"type": "object"},
+				},
+			},
+		},
+		RefFileIDs: []string{"file-old", "file-old-tools", "client-file"},
+		Thinking:   true,
+	}
+	leaseID := h.holdStreamLease(a, stdReq, "")
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions?__stream_switch=1", strings.NewReader(`{"lease_id":"`+leaseID+`"}`))
+	req.Header.Set("X-Ds2-Internal-Token", "stream-secret")
+	rec := httptest.NewRecorder()
+
+	h.handleVercelStreamSwitch(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	if len(ds.uploadCalls) != 2 {
+		t.Fatalf("expected current input and tools reupload on switched account, got %d", len(ds.uploadCalls))
+	}
+	if ds.uploadCalls[0].Filename != "DS2API_HISTORY.txt" || ds.uploadCalls[1].Filename != "DS2API_TOOLS.txt" {
+		t.Fatalf("unexpected reupload filenames: %#v", ds.uploadCalls)
+	}
+	var body map[string]any
+	if err := json.NewDecoder(rec.Body).Decode(&body); err != nil {
+		t.Fatalf("decode failed: %v", err)
+	}
+	if body["deepseek_token"] != "token-acc2@test.com" {
+		t.Fatalf("expected switched account token, got %#v", body["deepseek_token"])
+	}
+	payload, _ := body["payload"].(map[string]any)
+	refIDs, _ := payload["ref_file_ids"].([]any)
+	if len(refIDs) != 3 || refIDs[0] != "file-inline-1" || refIDs[1] != "file-inline-2" || refIDs[2] != "client-file" {
+		t.Fatalf("expected reuploaded current input ref plus client ref, got %#v", payload["ref_file_ids"])
+	}
+	promptText, _ := payload["prompt"].(string)
+	if !strings.Contains(promptText, "DS2API_TOOLS.txt") {
+		t.Fatalf("expected switched payload prompt to retain tools file reference, got %q", promptText)
+	}
+}
--- a/internal/httpapi/openai/chat/vercel_stream.go
+++ b/internal/httpapi/openai/chat/vercel_stream.go
@@ -11,6 +11,7 @@ import (

 	"ds2api/internal/auth"
 	"ds2api/internal/config"
+	"ds2api/internal/httpapi/openai/history"
 	"ds2api/internal/promptcompat"
 	"ds2api/internal/util"

@@ -96,7 +97,7 @@ func (h *Handler) handleVercelStreamPrepare(w http.ResponseWriter, r *http.Reque
 	}

 	payload := stdReq.CompletionPayload(sessionID)
-	leaseID := h.holdStreamLease(a)
+	leaseID := h.holdStreamLease(a, stdReq, sessionID)
 	if leaseID == "" {
 		writeOpenAIError(w, http.StatusInternalServerError, "failed to create stream lease")
 		return
@@ -140,10 +141,17 @@ func (h *Handler) handleVercelStreamRelease(w http.ResponseWriter, r *http.Reque
 		writeOpenAIError(w, http.StatusBadRequest, "lease_id is required")
 		return
 	}
-	if !h.releaseStreamLease(leaseID) {
+	lease, ok := h.releaseStreamLease(leaseID)
+	if !ok {
 		writeOpenAIError(w, http.StatusNotFound, "stream lease not found")
 		return
 	}
+	if h.Auth != nil && lease.Auth != nil {
+		defer h.Auth.Release(lease.Auth)
+	}
+	if lease.Auth != nil {
+		h.autoDeleteRemoteSession(r.Context(), lease.Auth, lease.SessionID)
+	}
 	writeJSON(w, http.StatusOK, map[string]any{"success": true})
 }

@@ -185,6 +193,80 @@ func (h *Handler) handleVercelStreamPow(w http.ResponseWriter, r *http.Request)
 	})
 }

+func (h *Handler) handleVercelStreamSwitch(w http.ResponseWriter, r *http.Request) {
+	if !config.IsVercel() {
+		http.NotFound(w, r)
+		return
+	}
+	h.sweepExpiredStreamLeases()
+	internalSecret := vercelInternalSecret()
+	internalToken := strings.TrimSpace(r.Header.Get("X-Ds2-Internal-Token"))
+	if internalSecret == "" || subtle.ConstantTimeCompare([]byte(internalToken), []byte(internalSecret)) != 1 {
+		writeOpenAIError(w, http.StatusUnauthorized, "unauthorized internal request")
+		return
+	}
+
+	var req map[string]any
+	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+		writeOpenAIError(w, http.StatusBadRequest, "invalid json")
+		return
+	}
+	leaseID, _ := req["lease_id"].(string)
+	leaseID = strings.TrimSpace(leaseID)
+	if leaseID == "" {
+		writeOpenAIError(w, http.StatusBadRequest, "lease_id is required")
+		return
+	}
+	lease, ok := h.lookupStreamLease(leaseID)
+	if !ok || lease.Auth == nil {
+		writeOpenAIError(w, http.StatusNotFound, "stream lease not found or expired")
+		return
+	}
+	a := lease.Auth
+	if !a.UseConfigToken || !a.SwitchAccount(r.Context()) {
+		writeOpenAIErrorWithCode(w, http.StatusTooManyRequests, "Upstream account hit a rate limit and returned reasoning without visible output.", "upstream_empty_output")
+		return
+	}
+
+	stdReq := lease.Standard
+	var err error
+	if stdReq.CurrentInputFileApplied {
+		stdReq, err = (history.Service{Store: h.Store, DS: h.DS}).ReuploadAppliedCurrentInputFile(r.Context(), a, stdReq)
+		if err != nil {
+			status, message := mapCurrentInputFileError(err)
+			writeOpenAIError(w, status, message)
+			return
+		}
+	}
+	sessionID, err := h.DS.CreateSession(r.Context(), a, 3)
+	if err != nil {
+		writeOpenAIError(w, http.StatusUnauthorized, "Account token is invalid. Please re-login the account in admin.")
+		return
+	}
+	powHeader, err := h.DS.GetPow(r.Context(), a, 3)
+	if err != nil {
+		writeOpenAIError(w, http.StatusUnauthorized, "Failed to get PoW (invalid token or unknown error).")
+		return
+	}
+	if strings.TrimSpace(a.DeepSeekToken) == "" {
+		writeOpenAIError(w, http.StatusUnauthorized, "Account token is invalid. Please re-login the account in admin.")
+		return
+	}
+	h.updateStreamLeaseState(leaseID, stdReq, sessionID)
+	writeJSON(w, http.StatusOK, map[string]any{
+		"session_id":       sessionID,
+		"lease_id":         leaseID,
+		"model":            stdReq.ResponseModel,
+		"final_prompt":     stdReq.FinalPrompt,
+		"thinking_enabled": stdReq.Thinking,
+		"search_enabled":   stdReq.Search,
+		"tool_names":       stdReq.ToolNames,
+		"deepseek_token":   a.DeepSeekToken,
+		"pow_header":       powHeader,
+		"payload":          stdReq.CompletionPayload(sessionID),
+	})
+}
+
 func isVercelStreamPrepareRequest(r *http.Request) bool {
 	if r == nil {
 		return false
@@ -206,6 +288,13 @@ func isVercelStreamPowRequest(r *http.Request) bool {
 	return strings.TrimSpace(r.URL.Query().Get("__stream_pow")) == "1"
 }

+func isVercelStreamSwitchRequest(r *http.Request) bool {
+	if r == nil {
+		return false
+	}
+	return strings.TrimSpace(r.URL.Query().Get("__stream_switch")) == "1"
+}
+
 func vercelInternalSecret() string {
 	if v := strings.TrimSpace(os.Getenv("DS2API_VERCEL_INTERNAL_SECRET")); v != "" {
 		return v
@@ -216,7 +305,7 @@ func vercelInternalSecret() string {
 	return "admin"
 }

-func (h *Handler) holdStreamLease(a *auth.RequestAuth) string {
+func (h *Handler) holdStreamLease(a *auth.RequestAuth, stdReq promptcompat.StandardRequest, sessionID string) string {
 	if a == nil {
 		return ""
 	}
@@ -234,6 +323,8 @@ func (h *Handler) holdStreamLease(a *auth.RequestAuth) string {
 	leaseID := newLeaseID()
 	h.streamLeases[leaseID] = streamLease{
 		Auth:      a,
+		Standard:  stdReq,
+		SessionID: sessionID,
 		ExpiresAt: now.Add(ttl),
 	}
 	h.leaseMu.Unlock()
@@ -241,24 +332,48 @@ func (h *Handler) holdStreamLease(a *auth.RequestAuth) string {
 	return leaseID
 }

-func (h *Handler) lookupStreamLeaseAuth(leaseID string) *auth.RequestAuth {
+func (h *Handler) lookupStreamLease(leaseID string) (streamLease, bool) {
 	leaseID = strings.TrimSpace(leaseID)
 	if leaseID == "" {
-		return nil
+		return streamLease{}, false
 	}
 	h.leaseMu.Lock()
 	lease, ok := h.streamLeases[leaseID]
 	h.leaseMu.Unlock()
 	if !ok || time.Now().After(lease.ExpiresAt) {
+		return streamLease{}, false
+	}
+	return lease, true
+}
+
+func (h *Handler) lookupStreamLeaseAuth(leaseID string) *auth.RequestAuth {
+	lease, ok := h.lookupStreamLease(leaseID)
+	if !ok {
 		return nil
 	}
 	return lease.Auth
 }

-func (h *Handler) releaseStreamLease(leaseID string) bool {
+func (h *Handler) updateStreamLeaseState(leaseID string, stdReq promptcompat.StandardRequest, sessionID string) {
 	leaseID = strings.TrimSpace(leaseID)
 	if leaseID == "" {
-		return false
+		return
+	}
+	h.leaseMu.Lock()
+	defer h.leaseMu.Unlock()
+	lease, ok := h.streamLeases[leaseID]
+	if !ok {
+		return
+	}
+	lease.Standard = stdReq
+	lease.SessionID = sessionID
+	h.streamLeases[leaseID] = lease
+}
+
+func (h *Handler) releaseStreamLease(leaseID string) (streamLease, bool) {
+	leaseID = strings.TrimSpace(leaseID)
+	if leaseID == "" {
+		return streamLease{}, false
 	}

 	h.leaseMu.Lock()
@@ -271,12 +386,9 @@ func (h *Handler) releaseStreamLease(leaseID string) bool {
 	h.releaseExpiredAuths(expired)

 	if !ok {
-		return false
+		return streamLease{}, false
 	}
-	if h.Auth != nil {
-		h.Auth.Release(lease.Auth)
-	}
-	return true
+	return lease, true
 }

 func (h *Handler) popExpiredLeasesLocked(now time.Time) []*auth.RequestAuth {
--- a/internal/httpapi/openai/deps_injection_test.go
+++ b/internal/httpapi/openai/deps_injection_test.go
@@ -103,7 +103,7 @@ func TestNormalizeOpenAIResponsesRequestAlwaysAcceptsWideInput(t *testing.T) {
 	if out.Surface != "openai_responses" {
 		t.Fatalf("unexpected surface: %q", out.Surface)
 	}
-	if !strings.Contains(out.FinalPrompt, "<｜User｜>hi") {
+	if !strings.Contains(out.FinalPrompt, "<|User|>hi") {
 		t.Fatalf("unexpected final prompt: %q", out.FinalPrompt)
 	}
 }
--- a/internal/httpapi/openai/file_inline_upload_test.go
+++ b/internal/httpapi/openai/file_inline_upload_test.go
@@ -4,6 +4,7 @@ import (
 	"context"
 	"encoding/json"
 	"errors"
+	"fmt"
 	"net/http"
 	"net/http/httptest"
 	"strings"
@@ -41,8 +42,12 @@ func (m *inlineUploadDSStub) UploadFile(ctx context.Context, _ *auth.RequestAuth
 	if m.uploadErr != nil {
 		return nil, m.uploadErr
 	}
+	id := "file-inline-1"
+	if len(m.uploadCalls) > 1 {
+		id = "file-inline-" + fmt.Sprint(len(m.uploadCalls))
+	}
 	return &dsclient.UploadFileResult{
-		ID:       "file-inline-1",
+		ID:       id,
 		Filename: req.Filename,
 		Bytes:    int64(len(req.Data)),
 		Status:   "uploaded",
--- a/internal/httpapi/openai/history/current_input_file.go
+++ b/internal/httpapi/openai/history/current_input_file.go
@@ -15,6 +15,7 @@ import (

 const (
 	currentInputFilename    = promptcompat.CurrentInputContextFilename
+	currentToolsFilename    = promptcompat.CurrentToolsContextFilename
 	currentInputContentType = "text/plain; charset=utf-8"
 	currentInputPurpose     = "assistants"
 )
@@ -50,6 +51,7 @@ func (s Service) ApplyCurrentInputFile(ctx context.Context, a *auth.RequestAuth,
 	if strings.TrimSpace(fileText) == "" {
 		return stdReq, errors.New("current user input file produced empty transcript")
 	}
+	toolsText, _ := promptcompat.BuildOpenAIToolsContextTranscript(stdReq.ToolsRaw, stdReq.ToolChoice)
 	modelType := "default"
 	if resolvedType, ok := config.GetModelType(stdReq.ResolvedModel); ok {
 		modelType = resolvedType
@@ -69,21 +71,98 @@ func (s Service) ApplyCurrentInputFile(ctx context.Context, a *auth.RequestAuth,
 		return stdReq, errors.New("upload current user input file returned empty file id")
 	}

+	toolFileID := ""
+	if strings.TrimSpace(toolsText) != "" {
+		result, err := s.DS.UploadFile(ctx, a, dsclient.UploadFileRequest{
+			Filename:    currentToolsFilename,
+			ContentType: currentInputContentType,
+			Purpose:     currentInputPurpose,
+			ModelType:   modelType,
+			Data:        []byte(toolsText),
+		}, 3)
+		if err != nil {
+			return stdReq, fmt.Errorf("upload current tools file: %w", err)
+		}
+		toolFileID = strings.TrimSpace(result.ID)
+		if toolFileID == "" {
+			return stdReq, errors.New("upload current tools file returned empty file id")
+		}
+	}
+
 	messages := []any{
 		map[string]any{
 			"role":    "user",
-			"content": currentInputFilePrompt(),
+			"content": currentInputFilePrompt(toolFileID != ""),
 		},
 	}

 	stdReq.Messages = messages
 	stdReq.HistoryText = fileText
 	stdReq.CurrentInputFileApplied = true
-	stdReq.RefFileIDs = prependUniqueRefFileID(stdReq.RefFileIDs, fileID)
-	stdReq.FinalPrompt, stdReq.ToolNames = promptcompat.BuildOpenAIPrompt(messages, stdReq.ToolsRaw, "", stdReq.ToolChoice, stdReq.Thinking)
+	stdReq.CurrentInputFileID = fileID
+	stdReq.CurrentToolsFileID = toolFileID
+	stdReq.RefFileIDs = prependUniqueRefFileIDs(stdReq.RefFileIDs, fileID, toolFileID)
+	stdReq.FinalPrompt, stdReq.ToolNames = promptcompat.BuildOpenAIPromptWithToolInstructionsOnly(messages, stdReq.ToolsRaw, "", stdReq.ToolChoice, stdReq.Thinking)
 	// Token accounting must reflect the actual downstream context:
-	// the uploaded DS2API_HISTORY.txt file content + the continuation live prompt.
-	stdReq.PromptTokenText = fileText + "\n" + stdReq.FinalPrompt
+	// uploaded context files + the continuation live prompt.
+	tokenParts := []string{fileText}
+	if strings.TrimSpace(toolsText) != "" {
+		tokenParts = append(tokenParts, toolsText)
+	}
+	tokenParts = append(tokenParts, stdReq.FinalPrompt)
+	stdReq.PromptTokenText = strings.Join(tokenParts, "\n")
+	return stdReq, nil
+}
+
+func (s Service) ReuploadAppliedCurrentInputFile(ctx context.Context, a *auth.RequestAuth, stdReq promptcompat.StandardRequest) (promptcompat.StandardRequest, error) {
+	if !stdReq.CurrentInputFileApplied || s.DS == nil || a == nil {
+		return stdReq, nil
+	}
+	fileText := strings.TrimSpace(stdReq.HistoryText)
+	if fileText == "" {
+		return stdReq, nil
+	}
+	modelType := "default"
+	if resolvedType, ok := config.GetModelType(stdReq.ResolvedModel); ok {
+		modelType = resolvedType
+	}
+	result, err := s.DS.UploadFile(ctx, a, dsclient.UploadFileRequest{
+		Filename:    currentInputFilename,
+		ContentType: currentInputContentType,
+		Purpose:     currentInputPurpose,
+		ModelType:   modelType,
+		Data:        []byte(stdReq.HistoryText),
+	}, 3)
+	if err != nil {
+		return stdReq, fmt.Errorf("upload current user input file: %w", err)
+	}
+	fileID := strings.TrimSpace(result.ID)
+	if fileID == "" {
+		return stdReq, errors.New("upload current user input file returned empty file id")
+	}
+
+	toolsText, _ := promptcompat.BuildOpenAIToolsContextTranscript(stdReq.ToolsRaw, stdReq.ToolChoice)
+	toolFileID := ""
+	if strings.TrimSpace(toolsText) != "" {
+		result, err := s.DS.UploadFile(ctx, a, dsclient.UploadFileRequest{
+			Filename:    currentToolsFilename,
+			ContentType: currentInputContentType,
+			Purpose:     currentInputPurpose,
+			ModelType:   modelType,
+			Data:        []byte(toolsText),
+		}, 3)
+		if err != nil {
+			return stdReq, fmt.Errorf("upload current tools file: %w", err)
+		}
+		toolFileID = strings.TrimSpace(result.ID)
+		if toolFileID == "" {
+			return stdReq, errors.New("upload current tools file returned empty file id")
+		}
+	}
+
+	stdReq.RefFileIDs = replaceGeneratedCurrentInputRefs(stdReq.RefFileIDs, stdReq.CurrentInputFileID, stdReq.CurrentToolsFileID, fileID, toolFileID)
+	stdReq.CurrentInputFileID = fileID
+	stdReq.CurrentToolsFileID = toolFileID
 	return stdReq, nil
 }

@@ -106,23 +185,62 @@ func latestUserInputForFile(messages []any) (int, string) {
 	return -1, ""
 }

-func currentInputFilePrompt() string {
-	return "Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly."
+func currentInputFilePrompt(hasToolsFile bool) string {
+	prompt := "Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly."
+	if hasToolsFile {
+		prompt += " Available tool descriptions and parameter schemas are attached in DS2API_TOOLS.txt; use only those tools and follow the tool-call format rules in this prompt."
+	}
+	return prompt
 }

-func prependUniqueRefFileID(existing []string, fileID string) []string {
-	fileID = strings.TrimSpace(fileID)
-	if fileID == "" {
-		return existing
+func prependUniqueRefFileIDs(existing []string, fileIDs ...string) []string {
+	out := make([]string, 0, len(existing)+len(fileIDs))
+	seen := map[string]struct{}{}
+	for _, fileID := range fileIDs {
+		trimmed := strings.TrimSpace(fileID)
+		if trimmed == "" {
+			continue
 		}
-	out := make([]string, 0, len(existing)+1)
-	out = append(out, fileID)
-	for _, id := range existing {
-		trimmed := strings.TrimSpace(id)
-		if trimmed == "" || strings.EqualFold(trimmed, fileID) {
+		key := strings.ToLower(trimmed)
+		if _, ok := seen[key]; ok {
 			continue
 		}
 		out = append(out, trimmed)
+		seen[key] = struct{}{}
+	}
+	for _, id := range existing {
+		trimmed := strings.TrimSpace(id)
+		if trimmed == "" {
+			continue
+		}
+		key := strings.ToLower(trimmed)
+		if _, ok := seen[key]; ok {
+			continue
+		}
+		out = append(out, trimmed)
+		seen[key] = struct{}{}
 	}
 	return out
 }
+
+func replaceGeneratedCurrentInputRefs(existing []string, oldHistoryID, oldToolsID, newHistoryID, newToolsID string) []string {
+	filtered := make([]string, 0, len(existing))
+	old := map[string]struct{}{}
+	for _, id := range []string{oldHistoryID, oldToolsID} {
+		trimmed := strings.ToLower(strings.TrimSpace(id))
+		if trimmed != "" {
+			old[trimmed] = struct{}{}
+		}
+	}
+	for _, id := range existing {
+		trimmed := strings.TrimSpace(id)
+		if trimmed == "" {
+			continue
+		}
+		if _, ok := old[strings.ToLower(trimmed)]; ok {
+			continue
+		}
+		filtered = append(filtered, trimmed)
+	}
+	return prependUniqueRefFileIDs(filtered, newHistoryID, newToolsID)
+}
--- a/internal/httpapi/openai/history_split_test.go
+++ b/internal/httpapi/openai/history_split_test.go
@@ -380,6 +380,79 @@ func TestApplyCurrentInputFileUploadsFullContextFile(t *testing.T) {
 	}
 }

+func TestApplyCurrentInputFileUploadsToolsContextSeparately(t *testing.T) {
+	ds := &inlineUploadDSStub{}
+	h := &openAITestSurface{
+		Store: mockOpenAIConfig{
+			currentInputEnabled: true,
+			currentInputMin:     0,
+		},
+		DS: ds,
+	}
+	req := map[string]any{
+		"model":    "deepseek-v4-flash",
+		"messages": historySplitTestMessages(),
+		"tools": []any{
+			map[string]any{
+				"type": "function",
+				"function": map[string]any{
+					"name":        "search",
+					"description": "search docs",
+					"parameters": map[string]any{
+						"type": "object",
+					},
+				},
+			},
+		},
+	}
+	stdReq, err := promptcompat.NormalizeOpenAIChatRequest(h.Store, req, "")
+	if err != nil {
+		t.Fatalf("normalize failed: %v", err)
+	}
+
+	out, err := h.applyCurrentInputFile(context.Background(), &auth.RequestAuth{DeepSeekToken: "token"}, stdReq)
+	if err != nil {
+		t.Fatalf("apply current input file failed: %v", err)
+	}
+	if len(ds.uploadCalls) != 2 {
+		t.Fatalf("expected history and tools uploads, got %d", len(ds.uploadCalls))
+	}
+	if ds.uploadCalls[0].Filename != "DS2API_HISTORY.txt" {
+		t.Fatalf("expected first upload to be DS2API_HISTORY.txt, got %q", ds.uploadCalls[0].Filename)
+	}
+	if ds.uploadCalls[1].Filename != "DS2API_TOOLS.txt" {
+		t.Fatalf("expected second upload to be DS2API_TOOLS.txt, got %q", ds.uploadCalls[1].Filename)
+	}
+	historyText := string(ds.uploadCalls[0].Data)
+	if strings.Contains(historyText, "You have access to these tools") || strings.Contains(historyText, "Description: search docs") {
+		t.Fatalf("history transcript should not embed tool descriptions, got %q", historyText)
+	}
+	toolsText := string(ds.uploadCalls[1].Data)
+	for _, want := range []string{"# DS2API_TOOLS.txt", "Tool: search", "Description: search docs", `Parameters: {"type":"object"}`} {
+		if !strings.Contains(toolsText, want) {
+			t.Fatalf("expected tools transcript to contain %q, got %q", want, toolsText)
+		}
+	}
+	if strings.Contains(toolsText, "TOOL CALL FORMAT") {
+		t.Fatalf("tools transcript should not duplicate tool format instructions, got %q", toolsText)
+	}
+	if !strings.Contains(out.FinalPrompt, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") || !strings.Contains(out.FinalPrompt, "DS2API_TOOLS.txt") {
+		t.Fatalf("expected live prompt to reference both context files, got %q", out.FinalPrompt)
+	}
+	if !strings.Contains(out.FinalPrompt, "TOOL CALL FORMAT") || !strings.Contains(out.FinalPrompt, "Remember: The ONLY valid way to use tools") {
+		t.Fatalf("expected live prompt to retain tool format instructions, got %q", out.FinalPrompt)
+	}
+	if strings.Contains(out.FinalPrompt, "You have access to these tools") || strings.Contains(out.FinalPrompt, "Description: search docs") || strings.Contains(out.FinalPrompt, "Parameters:") {
+		t.Fatalf("expected live prompt to omit tool descriptions after tools upload, got %q", out.FinalPrompt)
+	}
+	if len(out.RefFileIDs) < 2 || out.RefFileIDs[0] != "file-inline-1" || out.RefFileIDs[1] != "file-inline-2" {
+		t.Fatalf("expected history and tools file ids first, got %#v", out.RefFileIDs)
+	}
+	if !strings.Contains(out.PromptTokenText, "# DS2API_HISTORY.txt") || !strings.Contains(out.PromptTokenText, "# DS2API_TOOLS.txt") || !strings.Contains(out.PromptTokenText, "Description: search docs") {
+		t.Fatalf("expected prompt token text to include uploaded history and tools content, got %q", out.PromptTokenText)
+	}
+}
+
 func TestApplyCurrentInputFileCarriesHistoryText(t *testing.T) {
 	ds := &inlineUploadDSStub{}
 	h := &openAITestSurface{
@@ -537,6 +610,69 @@ func TestResponsesCurrentInputFileUploadsContextAndKeepsNeutralPrompt(t *testing
 	}
 }

+func TestResponsesCurrentInputFileUploadsToolsSeparately(t *testing.T) {
+	ds := &inlineUploadDSStub{}
+	h := &openAITestSurface{
+		Store: mockOpenAIConfig{
+			currentInputEnabled: true,
+		},
+		Auth: streamStatusAuthStub{},
+		DS:   ds,
+	}
+	r := chi.NewRouter()
+	registerOpenAITestRoutes(r, h)
+	reqBody, _ := json.Marshal(map[string]any{
+		"model":    "deepseek-v4-flash",
+		"messages": historySplitTestMessages(),
+		"tools": []any{
+			map[string]any{
+				"type": "function",
+				"function": map[string]any{
+					"name":        "search",
+					"description": "search docs",
+					"parameters":  map[string]any{"type": "object"},
+				},
+			},
+		},
+		"stream": false,
+	})
+	req := httptest.NewRequest(http.MethodPost, "/v1/responses", strings.NewReader(string(reqBody)))
+	req.Header.Set("Authorization", "Bearer direct-token")
+	req.Header.Set("Content-Type", "application/json")
+	rec := httptest.NewRecorder()
+
+	r.ServeHTTP(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	if len(ds.uploadCalls) != 2 {
+		t.Fatalf("expected history and tools uploads, got %d", len(ds.uploadCalls))
+	}
+	if ds.uploadCalls[0].Filename != "DS2API_HISTORY.txt" || ds.uploadCalls[1].Filename != "DS2API_TOOLS.txt" {
+		t.Fatalf("unexpected upload filenames: %#v", ds.uploadCalls)
+	}
+	historyText := string(ds.uploadCalls[0].Data)
+	if strings.Contains(historyText, "Description: search docs") {
+		t.Fatalf("history transcript should not embed tool descriptions, got %q", historyText)
+	}
+	toolsText := string(ds.uploadCalls[1].Data)
+	if !strings.Contains(toolsText, "# DS2API_TOOLS.txt") || !strings.Contains(toolsText, "Tool: search") || !strings.Contains(toolsText, "Description: search docs") {
+		t.Fatalf("expected tools transcript to include schema, got %q", toolsText)
+	}
+	promptText, _ := ds.completionReq["prompt"].(string)
+	if !strings.Contains(promptText, "DS2API_TOOLS.txt") || !strings.Contains(promptText, "TOOL CALL FORMAT") {
+		t.Fatalf("expected live prompt to reference tools file and retain format instructions, got %q", promptText)
+	}
+	if strings.Contains(promptText, "Description: search docs") {
+		t.Fatalf("live prompt should not inline tool descriptions, got %q", promptText)
+	}
+	refIDs, _ := ds.completionReq["ref_file_ids"].([]any)
+	if len(refIDs) < 2 || refIDs[0] != "file-inline-1" || refIDs[1] != "file-inline-2" {
+		t.Fatalf("expected history and tools ref ids first, got %#v", ds.completionReq["ref_file_ids"])
+	}
+}
+
 func TestChatCompletionsCurrentInputFileMapsManagedAuthFailureTo401(t *testing.T) {
 	ds := &inlineUploadDSStub{
 		uploadErr: &dsclient.RequestFailure{Op: "upload file", Kind: dsclient.FailureManagedUnauthorized, Message: "expired token"},
--- a/internal/httpapi/openai/leaked_output_sanitize_test.go
+++ b/internal/httpapi/openai/leaked_output_sanitize_test.go
@@ -19,21 +19,65 @@ func TestSanitizeLeakedOutputRemovesLeakedWireToolCallAndResult(t *testing.T) {
 }

 func TestSanitizeLeakedOutputRemovesStandaloneMetaMarkers(t *testing.T) {
-	raw := "A<| end_of_sentence |><| Assistant |>B<| end_of_thinking |>C<｜end▁of▁thinking｜>D<｜end▁of▁sentence｜>E<| end_of_toolresults |>F<｜end▁of▁instructions｜>G"
+	raw := "A<| end_of_sentence |><| Assistant |>B<| end_of_thinking |>C<|end▁of▁thinking|>D<|end▁of▁sentence|>E<| end_of_toolresults |>F<|end▁of▁instructions|>G"
 	got := sanitizeLeakedOutput(raw)
 	if got != "ABCDEFG" {
 		t.Fatalf("unexpected sanitize result for meta markers: %q", got)
 	}
 }

+func TestSanitizeLeakedOutputRemovesFullwidthDelimitedMetaMarkers(t *testing.T) {
+	fw := "\uff5c"
+	raw := "A<" + fw + "end▁of▁sentence" + fw + ">B<" + fw + " Assistant " + fw + ">C<" + fw + "end_of_toolresults" + fw + ">D"
+	got := sanitizeLeakedOutput(raw)
+	if got != "ABCD" {
+		t.Fatalf("unexpected sanitize result for fullwidth-delimited meta markers: %q", got)
+	}
+}
+
+func TestSanitizeLeakedOutputRemovesAssistantEndOfToolCallsMarkers(t *testing.T) {
+	fw := "\uff5c"
+	raw := "A<|Assistant_END_OF_TOOL_CALLS|>B<" + fw + "Assistant▁END▁OF▁TOOL_CALLS" + fw + ">C<|end_of_tool_calls|>D"
+	got := sanitizeLeakedOutput(raw)
+	if got != "ABCD" {
+		t.Fatalf("unexpected sanitize result for assistant end-of-tool-calls markers: %q", got)
+	}
+}
+
+func TestSanitizeLeakedOutputRemovesFullToolResultSection(t *testing.T) {
+	fw := "\uff5c"
+	raw := "开始<" + fw + "Tool" + fw + ">[{\"content\":\"openjdk version 21\"}]<" + fw + "end▁of▁toolresults" + fw + ">结束"
+	got := sanitizeLeakedOutput(raw)
+	if got != "开始结束" {
+		t.Fatalf("unexpected sanitize result for leaked tool result section: %q", got)
+	}
+}
+
 func TestSanitizeLeakedOutputRemovesThinkAndBosMarkers(t *testing.T) {
-	raw := "A<think>B</think>C<｜begin▁of▁sentence｜>D<| begin_of_sentence |>E<｜begin_of_sentence｜>F"
+	raw := "A<think>B</think>C<|begin▁of▁sentence|>D<| begin_of_sentence |>E<|begin_of_sentence|>F"
 	got := sanitizeLeakedOutput(raw)
 	if got != "ABCDEF" {
 		t.Fatalf("unexpected sanitize result for think/BOS markers: %q", got)
 	}
 }

+func TestSanitizeLeakedOutputRemovesThoughtMarkers(t *testing.T) {
+	raw := "A<|▁of▁thought|>B<| of_thought |>C<| begin_of_thought |>D<| end_of_thought |>E"
+	got := sanitizeLeakedOutput(raw)
+	if got != "ABCDE" {
+		t.Fatalf("unexpected sanitize result for leaked thought markers: %q", got)
+	}
+}
+
+func TestSanitizeLeakedOutputRemovesFullwidthDelimitedBosAndThoughtMarkers(t *testing.T) {
+	fw := "\uff5c"
+	raw := "A<" + fw + "begin▁of▁sentence" + fw + ">B<" + fw + "▁of▁thought" + fw + ">C<" + fw + " begin_of_thought " + fw + ">D"
+	got := sanitizeLeakedOutput(raw)
+	if got != "ABCD" {
+		t.Fatalf("unexpected sanitize result for fullwidth-delimited BOS/thought markers: %q", got)
+	}
+}
+
 func TestSanitizeLeakedOutputRemovesDanglingThinkBlock(t *testing.T) {
 	raw := "Answer prefix<think>internal reasoning that never closes"
 	got := sanitizeLeakedOutput(raw)
@@ -43,7 +87,7 @@ func TestSanitizeLeakedOutputRemovesDanglingThinkBlock(t *testing.T) {
 }

 func TestSanitizeLeakedOutputRemovesCompleteDSMLToolCallWrapper(t *testing.T) {
-	raw := "前置文本\n<｜DSML｜tool_calls>\n<｜DSML｜invoke name=\"Bash\">\n<｜DSML｜parameter name=\"command\"></｜DSML｜parameter>\n</｜DSML｜invoke>\n</｜DSML｜tool_calls>\n后置文本"
+	raw := "前置文本\n<|DSML|tool_calls>\n<|DSML|invoke name=\"Bash\">\n<|DSML|parameter name=\"command\"></|DSML|parameter>\n</|DSML|invoke>\n</|DSML|tool_calls>\n后置文本"
 	got := sanitizeLeakedOutput(raw)
 	if got != "前置文本\n\n后置文本" {
 		t.Fatalf("unexpected sanitize result for leaked dsml wrapper: %q", got)
--- a/internal/httpapi/openai/responses/empty_retry_runtime.go
+++ b/internal/httpapi/openai/responses/empty_retry_runtime.go
@@ -7,6 +7,7 @@ import (
 	"time"

 	"ds2api/internal/auth"
+	"ds2api/internal/completionruntime"
 	"ds2api/internal/config"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	"ds2api/internal/promptcompat"
@@ -14,46 +15,41 @@ import (
 	streamengine "ds2api/internal/stream"
 )

-func (h *Handler) handleResponsesStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string, historySession *responsehistory.Session) {
+func (h *Handler) handleResponsesStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, owner, responseID string, stdReq promptcompat.StandardRequest, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string, historySession *responsehistory.Session) {
 	streamRuntime, initialType, ok := h.prepareResponsesStreamRuntime(w, resp, owner, responseID, model, finalPrompt, refFileTokens, thinkingEnabled, searchEnabled, toolNames, toolsRaw, toolChoice, traceID, historySession)
 	if !ok {
 		return
 	}
-	attempts := 0
-	currentResp := resp
-	for {
-		terminalWritten, retryable := h.consumeResponsesStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, attempts < emptyOutputRetryMaxAttempts())
-		if terminalWritten {
-			logResponsesStreamTerminal(streamRuntime, attempts)
-			return
-		}
-		if !retryable || !emptyOutputRetryEnabled() || attempts >= emptyOutputRetryMaxAttempts() {
+	completionruntime.ExecuteStreamWithRetry(r.Context(), h.DS, a, resp, payload, pow, completionruntime.StreamRetryOptions{
+		Surface:          "responses",
+		Stream:           true,
+		RetryEnabled:     emptyOutputRetryEnabled(),
+		RetryMaxAttempts: emptyOutputRetryMaxAttempts(),
+		MaxAttempts:      3,
+		UsagePrompt:      finalPrompt,
+		Request:          stdReq,
+		CurrentInputFile: h.Store,
+	}, completionruntime.StreamRetryHooks{
+		ConsumeAttempt: func(currentResp *http.Response, allowDeferEmpty bool) (bool, bool) {
+			return h.consumeResponsesStreamAttempt(r, currentResp, streamRuntime, initialType, thinkingEnabled, allowDeferEmpty)
+		},
+		Finalize: func(attempts int) {
 			streamRuntime.finalize("stop", false)
 			config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "responses", "stream", true, "retry_attempts", attempts, "success_source", "none", "error_code", streamRuntime.finalErrorCode)
-			return
-		}
-		attempts++
-		config.Logger.Info("[openai_empty_retry] attempting synthetic retry", "surface", "responses", "stream", true, "retry_attempt", attempts, "parent_message_id", streamRuntime.responseMessageID)
-		retryPow, powErr := h.DS.GetPow(r.Context(), a, 3)
-		if powErr != nil {
-			config.Logger.Warn("[openai_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", "responses", "stream", true, "retry_attempt", attempts, "error", powErr)
-			retryPow = pow
-		}
-		nextResp, err := h.DS.CallCompletion(r.Context(), a, clonePayloadForEmptyOutputRetry(payload, streamRuntime.responseMessageID), retryPow, 3)
-		if err != nil {
-			streamRuntime.failResponse(http.StatusInternalServerError, "Failed to get completion.", "error")
-			config.Logger.Warn("[openai_empty_retry] retry request failed", "surface", "responses", "stream", true, "retry_attempt", attempts, "error", err)
-			return
-		}
-		if nextResp.StatusCode != http.StatusOK {
-			defer func() { _ = nextResp.Body.Close() }()
-			body, _ := io.ReadAll(nextResp.Body)
-			streamRuntime.failResponse(nextResp.StatusCode, strings.TrimSpace(string(body)), "error")
-			return
-		}
-		streamRuntime.finalPrompt = usagePromptWithEmptyOutputRetry(finalPrompt, attempts)
-		currentResp = nextResp
-	}
+		},
+		ParentMessageID: func() int {
+			return streamRuntime.responseMessageID
+		},
+		OnRetryPrompt: func(prompt string) {
+			streamRuntime.finalPrompt = prompt
+		},
+		OnRetryFailure: func(status int, message, code string) {
+			streamRuntime.failResponse(status, strings.TrimSpace(message), code)
+		},
+		OnTerminal: func(attempts int) {
+			logResponsesStreamTerminal(streamRuntime, attempts)
+		},
+	})
 }

 func (h *Handler) prepareResponsesStreamRuntime(w http.ResponseWriter, resp *http.Response, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string, historySession *responsehistory.Session) (*responsesStreamRuntime, string, bool) {
--- a/internal/httpapi/openai/responses/handler.go
+++ b/internal/httpapi/openai/responses/handler.go
@@ -103,14 +103,6 @@ func emptyOutputRetryMaxAttempts() int {
 	return shared.EmptyOutputRetryMaxAttempts()
 }

-func clonePayloadForEmptyOutputRetry(payload map[string]any, parentMessageID int) map[string]any {
-	return shared.ClonePayloadForEmptyOutputRetry(payload, parentMessageID)
-}
-
-func usagePromptWithEmptyOutputRetry(originalPrompt string, retryAttempts int) string {
-	return shared.UsagePromptWithEmptyOutputRetry(originalPrompt, retryAttempts)
-}
-
 func filterIncrementalToolCallDeltasByAllowed(deltas []toolstream.ToolCallDelta, seenNames map[int]string) []toolstream.ToolCallDelta {
 	return shared.FilterIncrementalToolCallDeltasByAllowed(deltas, seenNames)
 }
--- a/internal/httpapi/openai/responses/responses_handler.go
+++ b/internal/httpapi/openai/responses/responses_handler.go
@@ -138,7 +138,7 @@ func (h *Handler) Responses(w http.ResponseWriter, r *http.Request) {

 	streamReq := start.Request
 	refFileTokens := streamReq.RefFileTokens
-	h.handleResponsesStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, owner, responseID, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, traceID, historySession)
+	h.handleResponsesStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, owner, responseID, streamReq, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, traceID, historySession)
 }

 func (h *Handler) handleResponsesNonStream(w http.ResponseWriter, resp *http.Response, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string) {
--- a/internal/httpapi/openai/responses/responses_stream_runtime_toolcalls_finalize.go
+++ b/internal/httpapi/openai/responses/responses_stream_runtime_toolcalls_finalize.go
@@ -81,6 +81,22 @@ func (s *responsesStreamRuntime) buildCompletedResponseObject(finalThinking, fin
 				},
 			},
 		})
+	} else if len(calls) > 0 && strings.TrimSpace(finalThinking) != "" {
+		indexed = append(indexed, indexedItem{
+			index: s.ensureMessageOutputIndex(),
+			item: map[string]any{
+				"id":     s.ensureMessageItemID(),
+				"type":   "message",
+				"role":   "assistant",
+				"status": "completed",
+				"content": []map[string]any{
+					{
+						"type": "reasoning",
+						"text": finalThinking,
+					},
+				},
+			},
+		})
 	} else if len(calls) == 0 {
 		content := make([]map[string]any, 0, 2)
 		if finalThinking != "" {
--- a/internal/httpapi/openai/responses/responses_stream_test.go
+++ b/internal/httpapi/openai/responses/responses_stream_test.go
@@ -397,7 +397,7 @@ func TestHandleResponsesNonStreamRequiredToolChoiceIgnoresThinkingToolPayloadWhe
 	}
 }

-func TestHandleResponsesNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T) {
+func TestHandleResponsesNonStreamSingleAttemptReturns503WhenUpstreamOutputEmpty(t *testing.T) {
 	h := &Handler{}
 	rec := httptest.NewRecorder()
 	resp := &http.Response{
@@ -409,17 +409,17 @@ func TestHandleResponsesNonStreamReturns429WhenUpstreamOutputEmpty(t *testing.T)
 	}

 	h.handleResponsesNonStream(rec, resp, "owner-a", "resp_test", "deepseek-v4-flash", "prompt", 0, false, false, nil, nil, promptcompat.DefaultToolChoicePolicy(), "")
-	if rec.Code != http.StatusTooManyRequests {
-		t.Fatalf("expected 429 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
+	if rec.Code != http.StatusServiceUnavailable {
+		t.Fatalf("expected 503 for empty upstream output, got %d body=%s", rec.Code, rec.Body.String())
 	}
 	out := decodeJSONBody(t, rec.Body.String())
 	errObj, _ := out["error"].(map[string]any)
-	if asString(errObj["code"]) != "upstream_empty_output" {
-		t.Fatalf("expected code=upstream_empty_output, got %#v", out)
+	if asString(errObj["code"]) != "upstream_unavailable" {
+		t.Fatalf("expected code=upstream_unavailable, got %#v", out)
 	}
 }

-func TestHandleResponsesNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
+func TestHandleResponsesNonStreamSingleAttemptReturnsContentFilterErrorWhenUpstreamFilteredWithoutOutput(t *testing.T) {
 	h := &Handler{}
 	rec := httptest.NewRecorder()
 	resp := &http.Response{
@@ -441,7 +441,7 @@ func TestHandleResponsesNonStreamReturnsContentFilterErrorWhenUpstreamFilteredWi
 	}
 }

-func TestHandleResponsesNonStreamReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
+func TestHandleResponsesNonStreamSingleAttemptReturns429WhenUpstreamHasOnlyThinking(t *testing.T) {
 	h := &Handler{}
 	rec := httptest.NewRecorder()
 	resp := &http.Response{
--- a/internal/httpapi/openai/shared/leaked_output_sanitize.go
+++ b/internal/httpapi/openai/shared/leaked_output_sanitize.go
@@ -10,18 +10,30 @@ import (
 var emptyJSONFencePattern = regexp.MustCompile("(?is)```json\\s*```")
 var leakedToolCallArrayPattern = regexp.MustCompile(`(?is)\[\{\s*"function"\s*:\s*\{[\s\S]*?\}\s*,\s*"id"\s*:\s*"call[^"]*"\s*,\s*"type"\s*:\s*"function"\s*}\]`)
 var leakedToolResultBlobPattern = regexp.MustCompile(`(?is)<\s*\|\s*tool\s*\|\s*>\s*\{[\s\S]*?"tool_call_id"\s*:\s*"call[^"]*"\s*}`)
+var leakedToolResultOpenMarkerPattern = regexp.MustCompile(`(?is)<[\|\x{ff5c}]\s*tool\s*[\|\x{ff5c}]>`)
+var leakedToolResultCloseMarkerPattern = regexp.MustCompile(`(?is)<[\|\x{ff5c}]\s*end[_▁]of[_▁]tool[_▁]?results\s*[\|\x{ff5c}]>`)
+var leakedToolResultSectionPattern = regexp.MustCompile(`(?is)<[\|\x{ff5c}]\s*tool\s*[\|\x{ff5c}]>[\s\S]*?<[\|\x{ff5c}]\s*end[_▁]of[_▁]tool[_▁]?results\s*[\|\x{ff5c}]>`)

 var leakedThinkTagPattern = regexp.MustCompile(`(?is)</?\s*think\s*>`)

-// leakedBOSMarkerPattern matches DeepSeek BOS markers in BOTH forms:
-//   - ASCII underscore: <｜begin_of_sentence｜>
-//   - U+2581 variant:   <｜begin▁of▁sentence｜>
-var leakedBOSMarkerPattern = regexp.MustCompile(`(?i)<[｜\|]\s*begin[_▁]of[_▁]sentence\s*[｜\|]>`)
+// leakedBOSMarkerPattern matches DeepSeek BOS markers with halfwidth or
+// legacy U+FF5C fullwidth delimiters:
+//   - ASCII underscore: <|begin_of_sentence|>
+//   - U+2581 variant:   <|begin▁of▁sentence|>
+var leakedBOSMarkerPattern = regexp.MustCompile(`(?i)<[\|\x{ff5c}]\s*begin[_▁]of[_▁]sentence\s*[\|\x{ff5c}]>`)

-// leakedMetaMarkerPattern matches the remaining DeepSeek special tokens in BOTH forms:
-//   - ASCII underscore: <｜end_of_sentence｜>, <｜end_of_toolresults｜>, <｜end_of_instructions｜>
-//   - U+2581 variant:   <｜end▁of▁sentence｜>, <｜end▁of▁toolresults｜>, <｜end▁of▁instructions｜>
-var leakedMetaMarkerPattern = regexp.MustCompile(`(?i)<[｜\|]\s*(?:assistant|tool|end[_▁]of[_▁]sentence|end[_▁]of[_▁]thinking|end[_▁]of[_▁]toolresults|end[_▁]of[_▁]instructions)\s*[｜\|]>`)
+// leakedThoughtMarkerPattern matches leaked thought control markers in both
+// explicit and compact forms:
+//   - ASCII underscore: <| of_thought |>, <| begin_of_thought |>
+//   - U+2581 variant:   <|▁of▁thought|>, <|begin▁of▁thought|>
+var leakedThoughtMarkerPattern = regexp.MustCompile(`(?i)<[\|\x{ff5c}]\s*(?:begin[_▁])?[_▁]*of[_▁]thought\s*[\|\x{ff5c}]>`)
+
+// leakedMetaMarkerPattern matches the remaining DeepSeek special tokens with
+// halfwidth or legacy U+FF5C fullwidth delimiters:
+//   - ASCII underscore: <|end_of_sentence|>, <|end_of_toolresults|>, <|end_of_instructions|>
+//   - U+2581 variant:   <|end▁of▁sentence|>, <|end▁of▁toolresults|>, <|end▁of▁instructions|>
+//   - compound assistant markers: <|Assistant_END_OF_TOOL_CALLS|>
+var leakedMetaMarkerPattern = regexp.MustCompile(`(?i)<[\|\x{ff5c}]\s*(?:assistant(?:[_▁]end[_▁]of[_▁]tool[_▁]?calls)?|tool|end[_▁]of[_▁]sentence|end[_▁]of[_▁]thinking|end[_▁]of[_▁]thought|end[_▁]of[_▁]tool[_▁]?results|end[_▁]of[_▁]tool[_▁]?calls|end[_▁]of[_▁]instructions)\s*[\|\x{ff5c}]>`)

 // leakedAgentXMLBlockPatterns catch agent-style XML blocks that leak through
 // when the sieve fails to capture them. These are applied only to complete
@@ -44,16 +56,52 @@ func sanitizeLeakedOutput(text string) string {
 	}
 	out := emptyJSONFencePattern.ReplaceAllString(text, "")
 	out = leakedToolCallArrayPattern.ReplaceAllString(out, "")
+	out = leakedToolResultSectionPattern.ReplaceAllString(out, "")
 	out = leakedToolResultBlobPattern.ReplaceAllString(out, "")
 	out = stripDanglingThinkSuffix(out)
 	out = leakedThinkTagPattern.ReplaceAllString(out, "")
 	out = leakedBOSMarkerPattern.ReplaceAllString(out, "")
+	out = leakedThoughtMarkerPattern.ReplaceAllString(out, "")
 	out = leakedMetaMarkerPattern.ReplaceAllString(out, "")
 	out = stripLeakedToolCallWrapperBlocks(out)
 	out = sanitizeLeakedAgentXMLBlocks(out)
 	return out
 }

+func stripLeakedToolResultSectionsDelta(text string, inside *bool) string {
+	if text == "" || inside == nil {
+		return text
+	}
+	var b strings.Builder
+	pos := 0
+	for pos < len(text) {
+		if *inside {
+			loc := leakedToolResultCloseMarkerPattern.FindStringIndex(text[pos:])
+			if loc == nil {
+				return b.String()
+			}
+			*inside = false
+			pos += loc[1]
+			continue
+		}
+		loc := leakedToolResultOpenMarkerPattern.FindStringIndex(text[pos:])
+		if loc == nil {
+			b.WriteString(text[pos:])
+			break
+		}
+		start := pos + loc[0]
+		openEnd := pos + loc[1]
+		b.WriteString(text[pos:start])
+		closeLoc := leakedToolResultCloseMarkerPattern.FindStringIndex(text[openEnd:])
+		if closeLoc == nil {
+			*inside = true
+			break
+		}
+		pos = openEnd + closeLoc[1]
+	}
+	return b.String()
+}
+
 func stripLeakedToolCallWrapperBlocks(text string) string {
 	if text == "" {
 		return text
--- a/internal/httpapi/openai/shared/stream_accumulator.go
+++ b/internal/httpapi/openai/shared/stream_accumulator.go
@@ -16,6 +16,9 @@ type StreamAccumulator struct {
 	ToolDetectionThinking strings.Builder
 	RawText               strings.Builder
 	Text                  strings.Builder
+
+	thinkingToolResultSectionOpen bool
+	textToolResultSectionOpen     bool
 }

 type StreamPartDelta struct {
@@ -69,7 +72,8 @@ func (a *StreamAccumulator) applyThinkingPart(text string) StreamPartDelta {
 	if !a.ThinkingEnabled || rawTrimmed == "" {
 		return delta
 	}
-	cleanedText := CleanVisibleOutput(rawTrimmed, a.StripReferenceMarkers)
+	visibleCandidate := stripLeakedToolResultSectionsDelta(rawTrimmed, &a.thinkingToolResultSectionOpen)
+	cleanedText := CleanVisibleOutput(visibleCandidate, a.StripReferenceMarkers)
 	if cleanedText == "" {
 		return delta
 	}
@@ -89,11 +93,15 @@ func (a *StreamAccumulator) applyTextPart(text string) StreamPartDelta {
 	}
 	a.RawText.WriteString(rawTrimmed)
 	delta := StreamPartDelta{Type: "text", RawText: rawTrimmed}
-	if a.SearchEnabled && sse.IsCitation(rawTrimmed) {
+	visibleCandidate := stripLeakedToolResultSectionsDelta(rawTrimmed, &a.textToolResultSectionOpen)
+	if visibleCandidate == "" {
+		return delta
+	}
+	if a.SearchEnabled && sse.IsCitation(visibleCandidate) {
 		delta.CitationOnly = true
 		return delta
 	}
-	cleanedText := CleanVisibleOutput(rawTrimmed, a.StripReferenceMarkers)
+	cleanedText := CleanVisibleOutput(visibleCandidate, a.StripReferenceMarkers)
 	trimmed := sse.TrimContinuationOverlapFromBuilder(&a.Text, cleanedText)
 	if trimmed == "" {
 		return delta
--- a/internal/httpapi/openai/shared/stream_accumulator_test.go
+++ b/internal/httpapi/openai/shared/stream_accumulator_test.go
@@ -96,6 +96,87 @@ func TestStreamAccumulatorSuppressesCitationTextWhenSearchEnabled(t *testing.T)
 	}
 }

+func TestStreamAccumulatorStripsToolResultSectionAcrossTextChunks(t *testing.T) {
+	acc := StreamAccumulator{StripReferenceMarkers: true}
+	first := acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "text", Text: "visible:<|Tool|>"}},
+	})
+	second := acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "text", Text: `[{"content":"secret","tool_call_id":"call_123"}]`}},
+	})
+	third := acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "text", Text: "<|end_of_toolresults|> after"}},
+	})
+
+	if got := acc.RawText.String(); got != `visible:<|Tool|>[{"content":"secret","tool_call_id":"call_123"}]<|end_of_toolresults|> after` {
+		t.Fatalf("raw text = %q", got)
+	}
+	if got := acc.Text.String(); got != "visible: after" {
+		t.Fatalf("visible text = %q", got)
+	}
+	if !first.ContentSeen || !second.ContentSeen || !third.ContentSeen {
+		t.Fatalf("expected all chunks to mark upstream content")
+	}
+	if got := first.Parts[0].VisibleText; got != "visible:" {
+		t.Fatalf("first visible delta = %q", got)
+	}
+	if got := second.Parts[0].VisibleText; got != "" {
+		t.Fatalf("payload visible delta = %q", got)
+	}
+	if got := third.Parts[0].VisibleText; got != " after" {
+		t.Fatalf("closing visible delta = %q", got)
+	}
+}
+
+func TestStreamAccumulatorStripsFullwidthToolResultSectionAcrossTextChunks(t *testing.T) {
+	acc := StreamAccumulator{StripReferenceMarkers: true}
+	acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "text", Text: "x<｜Tool｜>"}},
+	})
+	acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "text", Text: `{"content":"secret"}`}},
+	})
+	acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "text", Text: "<｜end▁of▁toolresults｜>y"}},
+	})
+
+	if got := acc.Text.String(); got != "xy" {
+		t.Fatalf("visible text = %q", got)
+	}
+}
+
+func TestStreamAccumulatorStripsToolResultSectionAcrossThinkingChunks(t *testing.T) {
+	acc := StreamAccumulator{ThinkingEnabled: true, StripReferenceMarkers: true}
+	acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "thinking", Text: "thought <|Tool|>"}},
+	})
+	payload := acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "thinking", Text: `[{"content":"secret"}]`}},
+	})
+	acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "thinking", Text: "<|end_of_toolresults|>resumes"}},
+	})
+
+	if got := acc.RawThinking.String(); got != `thought <|Tool|>[{"content":"secret"}]<|end_of_toolresults|>resumes` {
+		t.Fatalf("raw thinking = %q", got)
+	}
+	if got := acc.Thinking.String(); got != "thought resumes" {
+		t.Fatalf("visible thinking = %q", got)
+	}
+	if got := payload.Parts[0].VisibleText; got != "" {
+		t.Fatalf("payload visible delta = %q", got)
+	}
+}
+
 func TestStreamAccumulatorStripsInlineCitationAndReferenceMarkers(t *testing.T) {
 	acc := StreamAccumulator{SearchEnabled: true, StripReferenceMarkers: true}
 	result := acc.Apply(sse.LineResult{
--- a/internal/httpapi/openai/shared/upstream_empty.go
+++ b/internal/httpapi/openai/shared/upstream_empty.go
@@ -17,7 +17,7 @@ func UpstreamEmptyOutputDetail(contentFilter bool, text, thinking string) (int,
 	if thinking != "" {
 		return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned reasoning without visible output.", "upstream_empty_output"
 	}
-	return http.StatusTooManyRequests, "Upstream account hit a rate limit and returned empty output.", "upstream_empty_output"
+	return http.StatusServiceUnavailable, "Upstream service is unavailable and returned no output.", "upstream_unavailable"
 }

 func WriteUpstreamEmptyOutputError(w http.ResponseWriter, text, thinking string, contentFilter bool) bool {
--- a/internal/httpapi/openai/stream_status_test.go
+++ b/internal/httpapi/openai/stream_status_test.go
@@ -274,12 +274,12 @@ func TestChatCompletionsStreamEmitsFailureFrameWhenUpstreamOutputEmpty(t *testin
 	}
 	last := frames[0]
 	statusCode, ok := last["status_code"].(float64)
-	if !ok || int(statusCode) != http.StatusTooManyRequests {
-		t.Fatalf("expected status_code=429, got %#v body=%s", last["status_code"], rec.Body.String())
+	if !ok || int(statusCode) != http.StatusServiceUnavailable {
+		t.Fatalf("expected status_code=503, got %#v body=%s", last["status_code"], rec.Body.String())
 	}
 	errObj, _ := last["error"].(map[string]any)
-	if asString(errObj["code"]) != "upstream_empty_output" {
-		t.Fatalf("expected code=upstream_empty_output, got %#v", last)
+	if asString(errObj["code"]) != "upstream_unavailable" {
+		t.Fatalf("expected code=upstream_unavailable, got %#v", last)
 	}
 }

@@ -345,7 +345,7 @@ func TestChatCompletionsStreamRetriesEmptyOutputOnSameSession(t *testing.T) {

 func TestChatCompletionsNonStreamRetriesThinkingOnlyOutput(t *testing.T) {
 	ds := &streamStatusDSSeqStub{resps: []*http.Response{
-		makeOpenAISSEHTTPResponse(`data: {"response_message_id":99}`, "data: [DONE]"),
+		makeOpenAISSEHTTPResponse(`data: {"response_message_id":99,"p":"response/thinking_content","v":"plan"}`, "data: [DONE]"),
 		makeOpenAISSEHTTPResponse(`data: {"p":"response/content","v":"visible"}`, "data: [DONE]"),
 	}}
 	h := &openAITestSurface{
@@ -496,7 +496,7 @@ func TestResponsesStreamRetriesThinkingOnlyOutput(t *testing.T) {

 func TestResponsesNonStreamRetriesThinkingOnlyOutput(t *testing.T) {
 	ds := &streamStatusDSSeqStub{resps: []*http.Response{
-		makeOpenAISSEHTTPResponse(`data: {"response_message_id":88}`, "data: [DONE]"),
+		makeOpenAISSEHTTPResponse(`data: {"response_message_id":88,"p":"response/thinking_content","v":"plan"}`, "data: [DONE]"),
 		makeOpenAISSEHTTPResponse(`data: {"p":"response/content","v":"visible"}`, "data: [DONE]"),
 	}}
 	h := &openAITestSurface{
@@ -537,8 +537,15 @@ func TestResponsesNonStreamRetriesThinkingOnlyOutput(t *testing.T) {
 	if len(content) == 0 {
 		t.Fatalf("expected content entries, got %#v", item)
 	}
-	textEntry, _ := content[0].(map[string]any)
-	if asString(textEntry["type"]) != "output_text" || asString(textEntry["text"]) != "visible" {
+	var textEntry map[string]any
+	for _, entry := range content {
+		obj, _ := entry.(map[string]any)
+		if asString(obj["type"]) == "output_text" {
+			textEntry = obj
+			break
+		}
+	}
+	if asString(textEntry["text"]) != "visible" {
 		t.Fatalf("expected visible text entry, got %#v", content)
 	}
 }
--- a/internal/js/chat-stream/cors.js
+++ b/internal/js/chat-stream/cors.js
@@ -19,13 +19,15 @@ const BLOCKED_CORS_REQUEST_HEADERS = new Set([
 function setCorsHeaders(res, req) {
  const origin = asString(readHeader(req, 'origin'));
  res.setHeader('Access-Control-Allow-Origin', origin || '*');
+  if (origin) {
+    addVaryHeader(res, 'Origin');
+  }
  res.setHeader('Access-Control-Allow-Methods', 'GET, POST, OPTIONS, PUT, DELETE');
  res.setHeader('Access-Control-Max-Age', '600');
  res.setHeader(
    'Access-Control-Allow-Headers',
    buildCORSAllowHeaders(req),
  );
-  addVaryHeader(res, 'Origin');
  addVaryHeader(res, 'Access-Control-Request-Headers');
  if (asString(readHeader(req, 'access-control-request-private-network')).toLowerCase() === 'true') {
    res.setHeader('Access-Control-Allow-Private-Network', 'true');
--- a/internal/js/chat-stream/http_internal.js
+++ b/internal/js/chat-stream/http_internal.js
@@ -85,6 +85,33 @@ async function fetchStreamPow(req, leaseID) {
  };
 }

+async function fetchStreamSwitch(req, leaseID) {
+  const url = buildInternalGoURL(req);
+  url.searchParams.set('__stream_switch', '1');
+
+  const upstream = await fetch(url.toString(), {
+    method: 'POST',
+    headers: buildInternalGoHeaders(req, { withInternalToken: true, withContentType: true }),
+    body: Buffer.from(JSON.stringify({ lease_id: leaseID })),
+  });
+
+  const text = await upstream.text();
+  let body = {};
+  try {
+    body = JSON.parse(text || '{}');
+  } catch (_err) {
+    body = {};
+  }
+
+  return {
+    ok: upstream.ok,
+    status: upstream.status,
+    contentType: upstream.headers.get('content-type') || 'application/json',
+    text,
+    body,
+  };
+}
+
 function relayPreparedFailure(res, prep) {
  if (prep.status === 401 && looksLikeVercelAuthPage(prep.text)) {
    writeOpenAIError(
@@ -223,6 +250,7 @@ module.exports = {
  readRawBody,
  fetchStreamPrepare,
  fetchStreamPow,
+  fetchStreamSwitch,
  relayPreparedFailure,
  safeReadText,
  buildInternalGoURL,
--- a/internal/js/chat-stream/index.js
+++ b/internal/js/chat-stream/index.js
@@ -88,7 +88,7 @@ function isVercelRuntime() {

 function isNodeStreamSupportedPath(rawURL) {
  const path = extractPathname(rawURL);
-  return path === '/v1/chat/completions';
+  return path === '/v1/chat/completions' || path === '/chat/completions';
 }

 function extractPathname(rawURL) {
--- a/internal/js/chat-stream/sse_parse_impl.js
+++ b/internal/js/chat-stream/sse_parse_impl.js
@@ -7,6 +7,10 @@ const {
  SKIP_EXACT_PATHS,
 } = require('../shared/deepseek-constants');

+const LEAKED_BOS_MARKER_PATTERN = /<[\|\uFF5C]\s*begin[_▁]of[_▁]sentence\s*[\|\uFF5C]>/gi;
+const LEAKED_THOUGHT_MARKER_PATTERN = /<[\|\uFF5C]\s*(?:begin[_▁])?[_▁]*of[_▁]thought\s*[\|\uFF5C]>/gi;
+const LEAKED_META_MARKER_PATTERN = /<[\|\uFF5C]\s*(?:assistant|tool|end[_▁]of[_▁]sentence|end[_▁]of[_▁]thinking|end[_▁]of[_▁]thought|end[_▁]of[_▁]toolresults|end[_▁]of[_▁]instructions)\s*[\|\uFF5C]>/gi;
+


 function stripThinkTags(text) {
@@ -621,7 +625,11 @@ function stripReferenceMarkersText(text) {
  if (!text) {
    return text;
  }
-  return text.replace(/\[(?:citation|reference):\s*\d+\]/gi, '');
+  return text
+    .replace(/\[(?:citation|reference):\s*\d+\]/gi, '')
+    .replace(LEAKED_BOS_MARKER_PATTERN, '')
+    .replace(LEAKED_THOUGHT_MARKER_PATTERN, '')
+    .replace(LEAKED_META_MARKER_PATTERN, '');
 }

 function asString(v) {
--- a/internal/js/chat-stream/vercel_stream_impl.js
+++ b/internal/js/chat-stream/vercel_stream_impl.js
@@ -25,6 +25,7 @@ const {
  isAbortError,
  fetchStreamPrepare,
  fetchStreamPow,
+  fetchStreamSwitch,
  relayPreparedFailure,
  createLeaseReleaser,
 } = require('./http_internal');
@@ -46,11 +47,11 @@ async function handleVercelStream(req, res, rawBody, payload) {
  }

  const model = asString(prep.body.model) || asString(payload.model);
-  const sessionID = asString(prep.body.session_id) || `chatcmpl-${Date.now()}`;
+  const responseID = asString(prep.body.session_id) || `chatcmpl-${Date.now()}`;
  const leaseID = asString(prep.body.lease_id);
-  const deepseekToken = asString(prep.body.deepseek_token);
+  let deepseekToken = asString(prep.body.deepseek_token);
  const initialPowHeader = asString(prep.body.pow_header);
-  const completionPayload = prep.body.payload && typeof prep.body.payload === 'object' ? prep.body.payload : null;
+  let completionPayload = prep.body.payload && typeof prep.body.payload === 'object' ? prep.body.payload : null;
  const finalPrompt = asString(prep.body.final_prompt);
  const thinkingEnabled = toBool(prep.body.thinking_enabled);
  const searchEnabled = toBool(prep.body.search_enabled);
@@ -133,13 +134,14 @@ async function handleVercelStream(req, res, rawBody, payload) {
      }
    };
    const fetchCompletion = (bodyPayload) => fetchDeepSeekStream(DEEPSEEK_COMPLETION_URL, bodyPayload, currentPowHeader);
+    let activeDeepSeekSessionID = responseID;
    const fetchContinue = async (messageID) => {
      const powHeader = await refreshPowHeader('continue');
      if (!powHeader) {
        return null;
      }
      return fetchDeepSeekStream(DEEPSEEK_CONTINUE_URL, {
-        chat_session_id: sessionID,
+        chat_session_id: activeDeepSeekSessionID,
        message_id: messageID,
        fallback_to_resume: true,
      }, powHeader);
@@ -185,7 +187,7 @@ async function handleVercelStream(req, res, rawBody, payload) {
    let ended = false;
    const { sendFrame, sendDeltaFrame } = createChatCompletionEmitter({
      res,
-      sessionID,
+      sessionID: responseID,
      created,
      model,
      isClosed: () => clientClosed,
@@ -242,7 +244,7 @@ async function handleVercelStream(req, res, rawBody, payload) {
      }
      ended = true;
      sendFrame({
-        id: sessionID,
+        id: responseID,
        object: 'chat.completion.chunk',
        created,
        model,
@@ -261,7 +263,7 @@ async function handleVercelStream(req, res, rawBody, payload) {

    const processStream = async (initialResponse, allowDeferEmpty) => {
      let currentResponse = initialResponse;
-      let continueState = createContinueState(sessionID);
+      let continueState = createContinueState(activeDeepSeekSessionID);
      let continueRounds = 0;
      // eslint-disable-next-line no-constant-condition
      while (true) {
@@ -412,13 +414,39 @@ async function handleVercelStream(req, res, rawBody, payload) {
    };

    let retryAttempts = 0;
+    let accountSwitchAttempted = false;
    // eslint-disable-next-line no-constant-condition
    while (true) {
-      const processed = await processStream(completionRes, retryAttempts < EMPTY_OUTPUT_RETRY_MAX_ATTEMPTS);
+      const allowDeferEmpty = retryAttempts < EMPTY_OUTPUT_RETRY_MAX_ATTEMPTS || !accountSwitchAttempted;
+      const processed = await processStream(completionRes, allowDeferEmpty);
      if (processed.terminal) {
        return;
      }
-      if (!processed.retryable || retryAttempts >= EMPTY_OUTPUT_RETRY_MAX_ATTEMPTS) {
+      if (!processed.retryable) {
+        await finish('stop');
+        return;
+      }
+      if (retryAttempts >= EMPTY_OUTPUT_RETRY_MAX_ATTEMPTS) {
+        if (!accountSwitchAttempted) {
+          accountSwitchAttempted = true;
+          const switched = await fetchStreamSwitch(req, leaseID);
+          if (switched.ok && switched.body && switched.body.payload && typeof switched.body.payload === 'object') {
+            completionPayload = switched.body.payload;
+            deepseekToken = asString(switched.body.deepseek_token) || deepseekToken;
+            currentPowHeader = asString(switched.body.pow_header) || currentPowHeader;
+            activeDeepSeekSessionID = asString(switched.body.session_id) || activeDeepSeekSessionID;
+            usagePrompt = finalPrompt;
+            completionRes = await fetchCompletion(completionPayload);
+            if (completionRes === null) {
+              return;
+            }
+            if (!completionRes.ok || !completionRes.body) {
+              await finish('stop');
+              return;
+            }
+            continue;
+          }
+        }
        await finish('stop');
        return;
      }
@@ -641,9 +669,9 @@ function upstreamEmptyOutputDetail(contentFilter, _text, thinking) {
    };
  }
  return {
-    status: 429,
-    message: 'Upstream account hit a rate limit and returned empty output.',
-    code: 'upstream_empty_output',
+    status: 503,
+    message: 'Upstream service is unavailable and returned no output.',
+    code: 'upstream_unavailable',
  };
 }

--- a/internal/js/helpers/stream-tool-sieve/parse.js
+++ b/internal/js/helpers/stream-tool-sieve/parse.js
@@ -7,6 +7,9 @@ const {
  parseMarkupToolCalls,
  stripFencedCodeBlocks,
  containsToolCallWrapperSyntaxOutsideIgnored,
+  normalizeDSMLToolCallMarkup,
+  hasRepairableXMLToolCallsWrapper,
+  indexToolCDATAOpen,
  sanitizeLooseCDATA,
 } = require('./parse_payload');

@@ -37,19 +40,23 @@ function parseToolCalls(text, toolNames) {

 function parseToolCallsDetailed(text, toolNames) {
  const result = emptyParseResult();
-  const normalized = toStringSafe(text);
-  if (!normalized) {
+  const raw = toStringSafe(text);
+  if (!raw) {
    return result;
  }
-  result.sawToolCallSyntax = looksLikeToolCallSyntax(normalized);
-  if (shouldSkipToolCallParsingForCodeFenceExample(normalized)) {
+  if (shouldSkipToolCallParsingForCodeFenceExample(raw)) {
    return result;
  }
+  const normalized = normalizeDSMLToolCallMarkup(stripFencedCodeBlocks(raw).trim());
+  if (!normalized.ok || !normalized.text) {
+    return result;
+  }
+  result.sawToolCallSyntax = looksLikeToolCallSyntax(normalized.text) || hasRepairableXMLToolCallsWrapper(normalized.text);
  // XML markup parsing only.
-  let parsed = parseMarkupToolCalls(normalized);
-  if (parsed.length === 0 && normalized.toLowerCase().includes('<![cdata[')) {
-    const recovered = sanitizeLooseCDATA(normalized);
-    if (recovered !== normalized) {
+  let parsed = parseMarkupToolCalls(normalized.text);
+  if (parsed.length === 0 && indexToolCDATAOpen(normalized.text, 0) >= 0) {
+    const recovered = sanitizeLooseCDATA(normalized.text);
+    if (recovered !== normalized.text) {
      parsed = parseMarkupToolCalls(recovered);
    }
  }
@@ -70,19 +77,23 @@ function parseStandaloneToolCalls(text, toolNames) {

 function parseStandaloneToolCallsDetailed(text, toolNames) {
  const result = emptyParseResult();
-  const trimmed = toStringSafe(text);
-  if (!trimmed) {
+  const raw = toStringSafe(text);
+  if (!raw) {
    return result;
  }
-  result.sawToolCallSyntax = looksLikeToolCallSyntax(trimmed);
-  if (shouldSkipToolCallParsingForCodeFenceExample(trimmed)) {
+  if (shouldSkipToolCallParsingForCodeFenceExample(raw)) {
    return result;
  }
+  const normalized = normalizeDSMLToolCallMarkup(stripFencedCodeBlocks(raw).trim());
+  if (!normalized.ok || !normalized.text) {
+    return result;
+  }
+  result.sawToolCallSyntax = looksLikeToolCallSyntax(normalized.text) || hasRepairableXMLToolCallsWrapper(normalized.text);
  // XML markup parsing only.
-  let parsed = parseMarkupToolCalls(trimmed);
-  if (parsed.length === 0 && trimmed.toLowerCase().includes('<![cdata[')) {
-    const recovered = sanitizeLooseCDATA(trimmed);
-    if (recovered !== trimmed) {
+  let parsed = parseMarkupToolCalls(normalized.text);
+  if (parsed.length === 0 && indexToolCDATAOpen(normalized.text, 0) >= 0) {
+    const recovered = sanitizeLooseCDATA(normalized.text);
+    if (recovered !== normalized.text) {
      parsed = parseMarkupToolCalls(recovered);
    }
  }
@@ -113,9 +124,10 @@ function filterToolCallsDetailed(parsed, toolNames) {
    if (!tc || !tc.name) {
      continue;
    }
+    const input = tc.input && typeof tc.input === 'object' && !Array.isArray(tc.input) ? tc.input : {};
    calls.push({
      name: tc.name,
-      input: tc.input && typeof tc.input === 'object' && !Array.isArray(tc.input) ? tc.input : {},
+      input,
    });
  }
  return { calls, rejectedToolNames: [] };
--- a/internal/js/helpers/stream-tool-sieve/parse_payload.js
+++ b/internal/js/helpers/stream-tool-sieve/parse_payload.js
--- a/internal/js/helpers/stream-tool-sieve/sieve-xml.js
+++ b/internal/js/helpers/stream-tool-sieve/sieve-xml.js
@@ -1,5 +1,5 @@
 'use strict';
-const { parseToolCalls } = require('./parse');
+const { parseToolCallsDetailed } = require('./parse');
 const {
  findToolMarkupTagOutsideIgnored,
  findMatchingToolMarkupClose,
@@ -27,19 +27,30 @@ function consumeXMLToolCapture(captured, toolNames, trimWrappingJSONFence) {
    const xmlBlock = captured.slice(openTag.start, closeTag.end + 1);
    const prefixPart = captured.slice(0, openTag.start);
    const suffixPart = captured.slice(closeTag.end + 1);
-    const parsed = parseToolCalls(xmlBlock, toolNames);
-    if (Array.isArray(parsed) && parsed.length > 0) {
+    const parsed = parseToolCallsDetailed(xmlBlock, toolNames);
+    if (Array.isArray(parsed.calls) && parsed.calls.length > 0) {
      const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
      if (!best || openTag.start < best.start) {
        best = {
          start: openTag.start,
          prefix: trimmedFence.prefix,
-          calls: parsed,
+          calls: parsed.calls,
          suffix: trimmedFence.suffix,
        };
      }
      break;
    }
+    if (parsed.sawToolCallSyntax) {
+      if (!rejected || openTag.start < rejected.start) {
+        rejected = {
+          start: openTag.start,
+          prefix: prefixPart + xmlBlock,
+          suffix: suffixPart,
+        };
+      }
+      searchFrom = openTag.end + 1;
+      continue;
+    }
    if (!rejected || openTag.start < rejected.start) {
      rejected = {
        start: openTag.start,
@@ -69,16 +80,19 @@ function consumeXMLToolCapture(captured, toolNames, trimWrappingJSONFence) {
        const xmlBlock = '<tool_calls>' + captured.slice(invokeTag.start, closeTag.end + 1);
        const prefixPart = captured.slice(0, invokeTag.start);
        const suffixPart = captured.slice(closeTag.end + 1);
-        const parsed = parseToolCalls(xmlBlock, toolNames);
-        if (Array.isArray(parsed) && parsed.length > 0) {
+        const parsed = parseToolCallsDetailed(xmlBlock, toolNames);
+        if (Array.isArray(parsed.calls) && parsed.calls.length > 0) {
          const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
          return {
            ready: true,
            prefix: trimmedFence.prefix,
-            calls: parsed,
+            calls: parsed.calls,
            suffix: trimmedFence.suffix,
          };
        }
+        if (parsed.sawToolCallSyntax) {
+          return { ready: true, prefix: prefixPart + captured.slice(invokeTag.start, closeTag.end + 1), calls: [], suffix: suffixPart };
+        }
        return { ready: true, prefix: prefixPart + captured.slice(invokeTag.start, closeTag.end + 1), calls: [], suffix: suffixPart };
      }
    }
@@ -100,6 +114,39 @@ function hasOpenXMLToolTag(captured) {
  return false;
 }

+function shouldKeepBareInvokeCapture(captured) {
+  const invokeTag = findFirstToolTag(captured, 0, 'invoke', false);
+  if (!invokeTag) {
+    return false;
+  }
+  const wrapperOpen = findFirstToolTag(captured, 0, 'tool_calls', false);
+  if (wrapperOpen && wrapperOpen.start <= invokeTag.start) {
+    return false;
+  }
+  const closeTag = findFirstToolTag(captured, invokeTag.start + 1, 'tool_calls', true);
+  if (closeTag && closeTag.start > invokeTag.start) {
+    return true;
+  }
+  const startEnd = invokeTag.end;
+  if (startEnd < 0) {
+    return true;
+  }
+  const body = captured.slice(startEnd + 1);
+  const trimmedBody = body.replace(/^[ \t\r\n]+/, '');
+  if (!trimmedBody) {
+    return true;
+  }
+  const invokeCloseTag = findFirstToolTag(captured, startEnd + 1, 'invoke', true);
+  if (invokeCloseTag) {
+    return captured.slice(invokeCloseTag.end + 1).trim() === '';
+  }
+  const paramTag = findFirstToolTag(body, 0, 'parameter', false);
+  if (paramTag && body.slice(0, paramTag.start).trim() === '') {
+    return true;
+  }
+  return trimmedBody.startsWith('{') || trimmedBody.startsWith('[');
+}
+
 function findFirstToolTag(text, from, name, closing) {
  for (let pos = Math.max(0, from || 0); pos < text.length;) {
    const tag = findToolMarkupTagOutsideIgnored(text, pos);
@@ -117,5 +164,6 @@ function findFirstToolTag(text, from, name, closing) {
 module.exports = {
  consumeXMLToolCapture,
  hasOpenXMLToolTag,
+  shouldKeepBareInvokeCapture,
  findPartialXMLToolTagStart: findPartialToolMarkupStart,
 };
--- a/internal/js/helpers/stream-tool-sieve/sieve.js
+++ b/internal/js/helpers/stream-tool-sieve/sieve.js
@@ -12,6 +12,7 @@ const {
 const {
  consumeXMLToolCapture: consumeXMLToolCaptureImpl,
  hasOpenXMLToolTag,
+  shouldKeepBareInvokeCapture,
  findPartialXMLToolTagStart,
 } = require('./sieve-xml');
 function processToolSieveChunk(state, chunk, toolNames) {
@@ -69,10 +70,17 @@ function processToolSieveChunk(state, chunk, toolNames) {
      break;
    }
    const start = findToolSegmentStart(state, pending);
+    if (start === HOLD_TOOL_SEGMENT_START) {
+      break;
+    }
    if (start >= 0) {
      const prefix = pending.slice(0, start);
      if (prefix) {
+        const resetMarkdownSpan = shouldResetUnclosedMarkdownPrefix(state, prefix, pending.slice(start));
        noteText(state, prefix);
+        if (resetMarkdownSpan) {
+          state.markdownCodeSpanTicks = 0;
+        }
        events.push({ type: 'text', text: prefix });
      }
      state.pending = '';
@@ -97,6 +105,10 @@ function flushToolSieve(state, toolNames) {
    return [];
  }
  const events = processToolSieveChunk(state, '', toolNames);
+  if (state.pending && Number.isInteger(state.markdownCodeSpanTicks) && state.markdownCodeSpanTicks > 0) {
+    state.markdownCodeSpanTicks = 0;
+    events.push(...processToolSieveChunk(state, '', toolNames));
+  }
  if (Array.isArray(state.pendingToolCalls) && state.pendingToolCalls.length > 0) {
    events.push({ type: 'tool_calls', calls: state.pendingToolCalls });
    state.pendingToolRaw = '';
@@ -163,6 +175,15 @@ function splitSafeContentForToolDetection(state, s) {
    if (insideCodeFenceWithState(state, text.slice(0, xmlIdx))) {
      return [text, ''];
    }
+    const markdown = markdownCodeSpanStateAt(state, text.slice(0, xmlIdx));
+    if (markdown.ticks > 0) {
+      if (markdownCodeSpanCloses(text.slice(xmlIdx), markdown.ticks)) {
+        return [text, ''];
+      }
+      if (markdown.fromPrior) {
+        return ['', text];
+      }
+    }
    if (xmlIdx > 0) {
      return [text.slice(0, xmlIdx), text.slice(xmlIdx)];
    }
@@ -171,6 +192,8 @@ function splitSafeContentForToolDetection(state, s) {
  return [text, ''];
 }

+const HOLD_TOOL_SEGMENT_START = -2;
+
 function findToolSegmentStart(state, s) {
  if (!s) {
    return -1;
@@ -181,11 +204,96 @@ function findToolSegmentStart(state, s) {
    if (!tag) {
      return -1;
    }
-    if (!insideCodeFenceWithState(state, s.slice(0, tag.start))) {
+    if (insideCodeFenceWithState(state, s.slice(0, tag.start))) {
+      offset = tag.end + 1;
+      continue;
+    }
+    const markdown = markdownCodeSpanStateAt(state, s.slice(0, tag.start));
+    if (markdown.ticks === 0) {
      return tag.start;
    }
+    if (markdownCodeSpanCloses(s.slice(tag.start), markdown.ticks)) {
      offset = tag.end + 1;
+      continue;
    }
+    if (markdown.fromPrior) {
+      return HOLD_TOOL_SEGMENT_START;
+    }
+    return tag.start;
+  }
+}
+
+function markdownCodeSpanStateAt(state, text) {
+  const raw = typeof text === 'string' ? text : '';
+  let ticks = state && Number.isInteger(state.markdownCodeSpanTicks) ? state.markdownCodeSpanTicks : 0;
+  let fromPrior = ticks > 0;
+  for (let i = 0; i < raw.length;) {
+    if (raw[i] !== '`') {
+      i += 1;
+      continue;
+    }
+    const run = countBacktickRun(raw, i);
+    if (ticks === 0) {
+      if (run >= 3 && atMarkdownFenceLineStart(raw, i)) {
+        i += run;
+        continue;
+      }
+      if (state && insideCodeFenceWithState(state, raw.slice(0, i))) {
+        i += run;
+        continue;
+      }
+      ticks = run;
+      fromPrior = false;
+    } else if (run === ticks) {
+      ticks = 0;
+      fromPrior = false;
+    }
+    i += run;
+  }
+  return { ticks, fromPrior };
+}
+
+function markdownCodeSpanCloses(text, ticks) {
+  const raw = typeof text === 'string' ? text : '';
+  if (!Number.isInteger(ticks) || ticks <= 0) {
+    return false;
+  }
+  for (let i = 0; i < raw.length;) {
+    if (raw[i] !== '`') {
+      i += 1;
+      continue;
+    }
+    const run = countBacktickRun(raw, i);
+    if (run === ticks) {
+      return true;
+    }
+    i += run;
+  }
+  return false;
+}
+
+function shouldResetUnclosedMarkdownPrefix(state, prefix, suffix) {
+  const markdown = markdownCodeSpanStateAt(state, prefix);
+  return markdown.ticks > 0 && !markdown.fromPrior && !markdownCodeSpanCloses(suffix, markdown.ticks);
+}
+
+function countBacktickRun(text, start) {
+  let count = 0;
+  while (start + count < text.length && text[start + count] === '`') {
+    count += 1;
+  }
+  return count;
+}
+
+function atMarkdownFenceLineStart(text, idx) {
+  for (let i = idx - 1; i >= 0; i -= 1) {
+    const ch = text[i];
+    if (ch === ' ' || ch === '\t') {
+      continue;
+    }
+    return ch === '\n' || ch === '\r';
+  }
+  return true;
 }

 function consumeToolCapture(state, toolNames) {
@@ -203,6 +311,9 @@ function consumeToolCapture(state, toolNames) {
  if (hasOpenXMLToolTag(captured)) {
    return { ready: false, prefix: '', calls: [], suffix: '' };
  }
+  if (shouldKeepBareInvokeCapture(captured)) {
+    return { ready: false, prefix: '', calls: [], suffix: '' };
+  }

  // No XML tool tags detected — release captured content as text.
  return {
--- a/internal/js/helpers/stream-tool-sieve/state.js
+++ b/internal/js/helpers/stream-tool-sieve/state.js
@@ -9,6 +9,7 @@ function createToolSieveState() {
    codeFencePendingTicks: 0,
    codeFencePendingTildes: 0,
    codeFenceLineStart: true,
+    markdownCodeSpanTicks: 0,
    pendingToolRaw: '',
    pendingToolCalls: [],
    disableDeltas: false,
@@ -35,6 +36,7 @@ function noteText(state, text) {
  if (!state || !hasMeaningfulText(text)) {
    return;
  }
+  updateMarkdownCodeSpanState(state, text);
  updateCodeFenceState(state, text);
 }

@@ -64,6 +66,68 @@ function insideCodeFenceWithState(state, text) {
  return simulated.stack.length > 0;
 }

+function insideMarkdownCodeSpanWithState(state, text) {
+  if (!state) {
+    return simulateMarkdownCodeSpanTicks(null, 0, text) > 0;
+  }
+  const ticks = Number.isInteger(state.markdownCodeSpanTicks) ? state.markdownCodeSpanTicks : 0;
+  return simulateMarkdownCodeSpanTicks(state, ticks, text) > 0;
+}
+
+function updateMarkdownCodeSpanState(state, text) {
+  if (!state || !hasMeaningfulText(text)) {
+    return;
+  }
+  const ticks = Number.isInteger(state.markdownCodeSpanTicks) ? state.markdownCodeSpanTicks : 0;
+  state.markdownCodeSpanTicks = simulateMarkdownCodeSpanTicks(state, ticks, text);
+}
+
+function simulateMarkdownCodeSpanTicks(state, initialTicks, text) {
+  const raw = typeof text === 'string' ? text : '';
+  let ticks = Number.isInteger(initialTicks) ? initialTicks : 0;
+  for (let i = 0; i < raw.length;) {
+    if (raw[i] !== '`') {
+      i += 1;
+      continue;
+    }
+    const run = countBacktickRun(raw, i);
+    if (ticks === 0) {
+      if (run >= 3 && atMarkdownFenceLineStart(raw, i)) {
+        i += run;
+        continue;
+      }
+      if (state && insideCodeFenceWithState(state, raw.slice(0, i))) {
+        i += run;
+        continue;
+      }
+      ticks = run;
+    } else if (run === ticks) {
+      ticks = 0;
+    }
+    i += run;
+  }
+  return ticks;
+}
+
+function countBacktickRun(text, start) {
+  let count = 0;
+  while (start + count < text.length && text[start + count] === '`') {
+    count += 1;
+  }
+  return count;
+}
+
+function atMarkdownFenceLineStart(text, idx) {
+  for (let i = idx - 1; i >= 0; i -= 1) {
+    const ch = text[i];
+    if (ch === ' ' || ch === '\t') {
+      continue;
+    }
+    return ch === '\n' || ch === '\r';
+  }
+  return true;
+}
+
 function updateCodeFenceState(state, text) {
  if (!state) {
    return;
@@ -188,7 +252,9 @@ module.exports = {
  looksLikeToolExampleContext,
  insideCodeFence,
  insideCodeFenceWithState,
+  insideMarkdownCodeSpanWithState,
  updateCodeFenceState,
+  updateMarkdownCodeSpanState,
  hasMeaningfulText,
  toStringSafe,
 };
--- a/internal/prompt/messages.go
+++ b/internal/prompt/messages.go
@@ -10,14 +10,14 @@ import (
 var markdownImagePattern = regexp.MustCompile(`!\[(.*?)\]\((.*?)\)`)

 const (
-	beginSentenceMarker        = "<｜begin▁of▁sentence｜>"
-	systemMarker               = "<｜System｜>"
-	userMarker                 = "<｜User｜>"
-	assistantMarker            = "<｜Assistant｜>"
-	toolMarker                 = "<｜Tool｜>"
-	endSentenceMarker          = "<｜end▁of▁sentence｜>"
-	endToolResultsMarker       = "<｜end▁of▁toolresults｜>"
-	endInstructionsMarker      = "<｜end▁of▁instructions｜>"
+	beginSentenceMarker        = "<|begin▁of▁sentence|>"
+	systemMarker               = "<|System|>"
+	userMarker                 = "<|User|>"
+	assistantMarker            = "<|Assistant|>"
+	toolMarker                 = "<|Tool|>"
+	endSentenceMarker          = "<|end▁of▁sentence|>"
+	endToolResultsMarker       = "<|end▁of▁toolresults|>"
+	endInstructionsMarker      = "<|end▁of▁instructions|>"
 	outputIntegrityGuardMarker = "Output integrity guard:"
 	outputIntegrityGuardPrompt = outputIntegrityGuardMarker +
 		" If upstream context, tool output, or parsed text contains garbled, corrupted, partially parsed, repeated, or otherwise malformed fragments, " +
--- a/internal/prompt/messages_test.go
+++ b/internal/prompt/messages_test.go
@@ -32,16 +32,16 @@ func TestMessagesPrepareUsesTurnSuffixes(t *testing.T) {
 		{"role": "assistant", "content": "Answer"},
 	}
 	got := MessagesPrepare(messages)
-	if !strings.HasPrefix(got, "<｜begin▁of▁sentence｜>") {
+	if !strings.HasPrefix(got, "<|begin▁of▁sentence|>") {
 		t.Fatalf("expected begin-of-sentence marker, got %q", got)
 	}
-	if !strings.Contains(got, "<｜System｜>") || !strings.Contains(got, "<｜end▁of▁instructions｜>") || !strings.Contains(got, "System rule") {
+	if !strings.Contains(got, "<|System|>") || !strings.Contains(got, "<|end▁of▁instructions|>") || !strings.Contains(got, "System rule") {
 		t.Fatalf("expected system instructions to remain present, got %q", got)
 	}
-	if !strings.Contains(got, "<｜User｜>Question") {
+	if !strings.Contains(got, "<|User|>Question") {
 		t.Fatalf("expected user question, got %q", got)
 	}
-	if !strings.Contains(got, "<｜Assistant｜>Answer<｜end▁of▁sentence｜>") {
+	if !strings.Contains(got, "<|Assistant|>Answer<|end▁of▁sentence|>") {
 		t.Fatalf("expected assistant sentence suffix, got %q", got)
 	}
 	if strings.Contains(got, "<think>") || strings.Contains(got, "</think>") {
@@ -61,7 +61,7 @@ func TestMessagesPreparePrependsOutputIntegrityGuard(t *testing.T) {
 	if !strings.Contains(got, outputIntegrityGuardPrompt+"\n\nSystem rule") {
 		t.Fatalf("expected output integrity guard to precede system prompt content, got %q", got)
 	}
-	if !strings.Contains(got, "<｜User｜>Question") {
+	if !strings.Contains(got, "<|User|>Question") {
 		t.Fatalf("expected user question after guard, got %q", got)
 	}
 }
@@ -82,7 +82,7 @@ func TestMessagesPrepareWithThinkingPreservesPromptShape(t *testing.T) {
 	if gotThinking != gotPlain {
 		t.Fatalf("expected thinking flag not to add extra continuity instructions, got thinking=%q plain=%q", gotThinking, gotPlain)
 	}
-	if !strings.HasSuffix(gotThinking, "<｜Assistant｜>") {
+	if !strings.HasSuffix(gotThinking, "<|Assistant|>") {
 		t.Fatalf("expected assistant suffix, got %q", gotThinking)
 	}
 }
--- a/internal/prompt/tool_calls.go
+++ b/internal/prompt/tool_calls.go
@@ -16,6 +16,15 @@ var promptXMLTextEscaper = strings.NewReplacer(

 var promptXMLNamePattern = regexp.MustCompile(`^[A-Za-z_][A-Za-z0-9_.:-]*$`)

+const (
+	promptDSMLToolCallsOpen  = "<|DSML|tool_calls>"
+	promptDSMLToolCallsClose = "</|DSML|tool_calls>"
+	promptDSMLInvokeOpen     = "<|DSML|invoke"
+	promptDSMLInvokeClose    = "</|DSML|invoke>"
+	promptDSMLParameterOpen  = "<|DSML|parameter"
+	promptDSMLParameterClose = "</|DSML|parameter>"
+)
+
 // FormatToolCallsForPrompt renders a tool_calls slice into the prompt-visible
 // invoke/parameter history block used across adapters.
 func FormatToolCallsForPrompt(raw any) string {
@@ -38,7 +47,7 @@ func FormatToolCallsForPrompt(raw any) string {
 	if len(blocks) == 0 {
 		return ""
 	}
-	return "<|DSML|tool_calls>\n" + strings.Join(blocks, "\n") + "\n</|DSML|tool_calls>"
+	return promptDSMLToolCallsOpen + "\n" + strings.Join(blocks, "\n") + "\n" + promptDSMLToolCallsClose
 }

 // StringifyToolCallArguments normalizes tool arguments into a compact string
@@ -94,12 +103,12 @@ func formatToolCallForPrompt(call map[string]any) string {

 	parameters := formatToolCallParametersForPrompt(argsRaw)
 	if parameters == "" {
-		return `  <|DSML|invoke name="` + escapeXMLAttribute(name) + `"></|DSML|invoke>`
+		return `  ` + promptDSMLInvokeOpen + ` name="` + escapeXMLAttribute(name) + `">` + promptDSMLInvokeClose
 	}

-	return "  <|DSML|invoke name=\"" + escapeXMLAttribute(name) + "\">\n" +
+	return "  " + promptDSMLInvokeOpen + " name=\"" + escapeXMLAttribute(name) + "\">\n" +
 		parameters + "\n" +
-		"  </|DSML|invoke>"
+		"  " + promptDSMLInvokeClose
 }

 func formatToolCallParametersForPrompt(raw any) string {
@@ -113,7 +122,7 @@ func formatToolCallParametersForPrompt(raw any) string {
 	if strings.TrimSpace(fallback) == "" {
 		return ""
 	}
-	return "    <|DSML|parameter name=\"content\">" + renderPromptXMLText(fallback) + "</|DSML|parameter>"
+	return "    " + promptDSMLParameterOpen + " name=\"content\">" + renderPromptXMLText(fallback) + promptDSMLParameterClose
 }

 func renderPromptToolParameters(value any, indent string) (string, bool) {
@@ -149,9 +158,9 @@ func renderPromptToolParameters(value any, indent string) (string, bool) {
 		}
 		return strings.Join(lines, "\n"), true
 	case string:
-		return indent + `<|DSML|parameter name="content">` + renderPromptXMLText(v) + `</|DSML|parameter>`, true
+		return indent + promptDSMLParameterOpen + ` name="content">` + renderPromptXMLText(v) + promptDSMLParameterClose, true
 	default:
-		return indent + `<|DSML|parameter name="value">` + renderPromptXMLText(fmt.Sprint(v)) + `</|DSML|parameter>`, true
+		return indent + promptDSMLParameterOpen + ` name="value">` + renderPromptXMLText(fmt.Sprint(v)) + promptDSMLParameterClose, true
 	}
 }

@@ -162,29 +171,29 @@ func renderPromptParameterNode(name string, value any, indent string) (string, b
 	}
 	switch v := value.(type) {
 	case nil:
-		return indent + `<|DSML|parameter name="` + escapeXMLAttribute(trimmedName) + `"></|DSML|parameter>`, true
+		return indent + promptDSMLParameterOpen + ` name="` + escapeXMLAttribute(trimmedName) + `">` + promptDSMLParameterClose, true
 	case map[string]any:
 		body, ok := renderPromptToolXMLBody(v, indent+"  ")
 		if !ok {
 			return "", false
 		}
 		if strings.TrimSpace(body) == "" {
-			return indent + `<|DSML|parameter name="` + escapeXMLAttribute(trimmedName) + `"></|DSML|parameter>`, true
+			return indent + promptDSMLParameterOpen + ` name="` + escapeXMLAttribute(trimmedName) + `">` + promptDSMLParameterClose, true
 		}
-		return indent + `<|DSML|parameter name="` + escapeXMLAttribute(trimmedName) + "\">\n" + body + "\n" + indent + `</|DSML|parameter>`, true
+		return indent + promptDSMLParameterOpen + ` name="` + escapeXMLAttribute(trimmedName) + "\">\n" + body + "\n" + indent + promptDSMLParameterClose, true
 	case []any:
 		body, ok := renderPromptToolXMLArray(v, indent+"  ")
 		if !ok {
 			return "", false
 		}
 		if strings.TrimSpace(body) == "" {
-			return indent + `<|DSML|parameter name="` + escapeXMLAttribute(trimmedName) + `"></|DSML|parameter>`, true
+			return indent + promptDSMLParameterOpen + ` name="` + escapeXMLAttribute(trimmedName) + `">` + promptDSMLParameterClose, true
 		}
-		return indent + `<|DSML|parameter name="` + escapeXMLAttribute(trimmedName) + "\">\n" + body + "\n" + indent + `</|DSML|parameter>`, true
+		return indent + promptDSMLParameterOpen + ` name="` + escapeXMLAttribute(trimmedName) + "\">\n" + body + "\n" + indent + promptDSMLParameterClose, true
 	case string:
-		return indent + `<|DSML|parameter name="` + escapeXMLAttribute(trimmedName) + `">` + renderPromptXMLText(v) + `</|DSML|parameter>`, true
+		return indent + promptDSMLParameterOpen + ` name="` + escapeXMLAttribute(trimmedName) + `">` + renderPromptXMLText(v) + promptDSMLParameterClose, true
 	default:
-		return indent + `<|DSML|parameter name="` + escapeXMLAttribute(trimmedName) + `">` + renderPromptXMLText(fmt.Sprint(v)) + `</|DSML|parameter>`, true
+		return indent + promptDSMLParameterOpen + ` name="` + escapeXMLAttribute(trimmedName) + `">` + renderPromptXMLText(fmt.Sprint(v)) + promptDSMLParameterClose, true
 	}
 }

--- a/internal/promptcompat/message_normalize.go
+++ b/internal/promptcompat/message_normalize.go
@@ -4,6 +4,7 @@ import (
 	"strings"

 	"ds2api/internal/prompt"
+	"ds2api/internal/toolcall"
 )

 const assistantReasoningLabel = "reasoning_content"
@@ -62,6 +63,9 @@ func buildAssistantContentForPrompt(msg map[string]any) string {
 		reasoning = strings.TrimSpace(extractOpenAIReasoningContentFromMessage(msg["content"]))
 	}
 	toolHistory := prompt.FormatToolCallsForPrompt(msg["tool_calls"])
+	if toolHistory == "" {
+		content = normalizeAssistantToolMarkupContentForPrompt(content)
+	}
 	parts := make([]string, 0, 3)
 	if reasoning != "" {
 		parts = append(parts, formatPromptLabeledBlock(assistantReasoningLabel, reasoning))
@@ -82,6 +86,40 @@ func buildAssistantContentForPrompt(msg map[string]any) string {
 	}
 }

+func normalizeAssistantToolMarkupContentForPrompt(content string) string {
+	trimmed := strings.TrimSpace(content)
+	if trimmed == "" || !isStandaloneAssistantToolMarkupBlock(trimmed) {
+		return content
+	}
+	parsed := toolcall.ParseStandaloneToolCallsDetailed(trimmed, nil)
+	if len(parsed.Calls) == 0 {
+		return content
+	}
+	raw := make([]any, 0, len(parsed.Calls))
+	for _, call := range parsed.Calls {
+		raw = append(raw, map[string]any{
+			"name":  call.Name,
+			"input": call.Input,
+		})
+	}
+	if formatted := prompt.FormatToolCallsForPrompt(raw); formatted != "" {
+		return formatted
+	}
+	return content
+}
+
+func isStandaloneAssistantToolMarkupBlock(trimmed string) bool {
+	tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(trimmed, 0)
+	if !ok || tag.Start != 0 || tag.Closing || tag.Name != "tool_calls" {
+		return false
+	}
+	closeTag, ok := toolcall.FindMatchingToolMarkupClose(trimmed, tag)
+	if !ok {
+		return false
+	}
+	return strings.TrimSpace(trimmed[closeTag.End+1:]) == ""
+}
+
 func normalizeOpenAIReasoningContentForPrompt(v any) string {
 	switch x := v.(type) {
 	case string:
--- a/internal/promptcompat/message_normalize_test.go
+++ b/internal/promptcompat/message_normalize_test.go
@@ -263,6 +263,42 @@ func TestNormalizeOpenAIMessagesForPrompt_AssistantNilContentDoesNotInjectNullLi
 	}
 }

+func TestNormalizeOpenAIMessagesForPrompt_CanonicalizesStandaloneAssistantToolMarkupContent(t *testing.T) {
+	raw := []any{
+		map[string]any{
+			"role": "assistant",
+			"content": `<！DSML！tool_calls>
+  <！DSML！invoke name=“Bash”>
+  <！DSML！parameter name=“command”><！[CDATA[lsof -i :4321 -t]]><！/DSML！parameter>
+  <！DSML！parameter name=“description”><！[CDATA[Verify port 4321 is free]]><！/DSML！parameter>
+  <！/DSML！invoke>
+  <！/DSML！tool_calls>`,
+		},
+	}
+
+	normalized := NormalizeOpenAIMessagesForPrompt(raw, "")
+	if len(normalized) != 1 {
+		t.Fatalf("expected one normalized assistant message, got %#v", normalized)
+	}
+	content, _ := normalized[0]["content"].(string)
+	for _, want := range []string{
+		"<|DSML|tool_calls>",
+		`<|DSML|invoke name="Bash">`,
+		`<|DSML|parameter name="command"><![CDATA[lsof -i :4321 -t]]></|DSML|parameter>`,
+		`<|DSML|parameter name="description"><![CDATA[Verify port 4321 is free]]></|DSML|parameter>`,
+		"</|DSML|tool_calls>",
+	} {
+		if !strings.Contains(content, want) {
+			t.Fatalf("expected canonicalized assistant tool markup to contain %q, got %q", want, content)
+		}
+	}
+	for _, bad := range []string{"<！DSML", "！tool_calls", "“", "”"} {
+		if strings.Contains(content, bad) {
+			t.Fatalf("expected malformed assistant tool markup to be removed from prompt history, found %q in %q", bad, content)
+		}
+	}
+}
+
 func TestNormalizeOpenAIMessagesForPrompt_DeveloperRoleMapsToSystem(t *testing.T) {
 	raw := []any{
 		map[string]any{"role": "developer", "content": "必须先走工具调用"},
--- a/internal/promptcompat/prompt_build.go
+++ b/internal/promptcompat/prompt_build.go
@@ -9,10 +9,22 @@ func buildOpenAIFinalPrompt(messagesRaw []any, toolsRaw any, traceID string, thi
 }

 func BuildOpenAIPrompt(messagesRaw []any, toolsRaw any, traceID string, toolPolicy ToolChoicePolicy, thinkingEnabled bool) (string, []string) {
+	return buildOpenAIPrompt(messagesRaw, toolsRaw, traceID, toolPolicy, thinkingEnabled, true)
+}
+
+func BuildOpenAIPromptWithToolInstructionsOnly(messagesRaw []any, toolsRaw any, traceID string, toolPolicy ToolChoicePolicy, thinkingEnabled bool) (string, []string) {
+	return buildOpenAIPrompt(messagesRaw, toolsRaw, traceID, toolPolicy, thinkingEnabled, false)
+}
+
+func buildOpenAIPrompt(messagesRaw []any, toolsRaw any, traceID string, toolPolicy ToolChoicePolicy, thinkingEnabled bool, includeToolDescriptions bool) (string, []string) {
 	messages := NormalizeOpenAIMessagesForPrompt(messagesRaw, traceID)
 	toolNames := []string{}
 	if tools, ok := toolsRaw.([]any); ok && len(tools) > 0 {
+		if includeToolDescriptions {
 			messages, toolNames = injectToolPrompt(messages, tools, toolPolicy)
+		} else {
+			messages, toolNames = injectToolPromptInstructionsOnly(messages, tools, toolPolicy)
+		}
 	}
 	return prompt.MessagesPrepareWithThinking(messages, thinkingEnabled), toolNames
 }
--- a/internal/promptcompat/prompt_build_test.go
+++ b/internal/promptcompat/prompt_build_test.go
@@ -88,6 +88,67 @@ func TestBuildOpenAIFinalPrompt_VercelPreparePathKeepsFinalAnswerInstruction(t *
 	}
 }

+func TestBuildOpenAIPromptWithToolInstructionsOnlyOmitsSchemas(t *testing.T) {
+	messages := []any{
+		map[string]any{"role": "system", "content": "You are helpful"},
+		map[string]any{"role": "user", "content": "请调用工具"},
+	}
+	tools := []any{
+		map[string]any{
+			"type": "function",
+			"function": map[string]any{
+				"name":        "search",
+				"description": "search docs",
+				"parameters": map[string]any{
+					"type": "object",
+				},
+			},
+		},
+	}
+
+	finalPrompt, toolNames := BuildOpenAIPromptWithToolInstructionsOnly(messages, tools, "", DefaultToolChoicePolicy(), false)
+	if len(toolNames) != 1 || toolNames[0] != "search" {
+		t.Fatalf("unexpected tool names: %#v", toolNames)
+	}
+	if strings.Contains(finalPrompt, "You have access to these tools") || strings.Contains(finalPrompt, "Description: search docs") || strings.Contains(finalPrompt, "Parameters:") {
+		t.Fatalf("tool descriptions should be externalized, got: %q", finalPrompt)
+	}
+	if !strings.Contains(finalPrompt, "Treat DS2API_TOOLS.txt as the authoritative list of callable tools and schemas") {
+		t.Fatalf("expected instructions-only prompt to point model at tools file, got: %q", finalPrompt)
+	}
+	if !strings.Contains(finalPrompt, "TOOL CALL FORMAT") || !strings.Contains(finalPrompt, "Remember: The ONLY valid way to use tools") {
+		t.Fatalf("expected tool format instructions to remain in live prompt, got: %q", finalPrompt)
+	}
+}
+
+func TestBuildOpenAIToolsContextTranscriptContainsOnlyDescriptions(t *testing.T) {
+	tools := []any{
+		map[string]any{
+			"type": "function",
+			"function": map[string]any{
+				"name":        "search",
+				"description": "search docs",
+				"parameters": map[string]any{
+					"type": "object",
+				},
+			},
+		},
+	}
+
+	transcript, toolNames := BuildOpenAIToolsContextTranscript(tools, DefaultToolChoicePolicy())
+	if len(toolNames) != 1 || toolNames[0] != "search" {
+		t.Fatalf("unexpected tool names: %#v", toolNames)
+	}
+	for _, want := range []string{"# DS2API_TOOLS.txt", "You have access to these tools", "Tool: search", "Description: search docs", `Parameters: {"type":"object"}`} {
+		if !strings.Contains(transcript, want) {
+			t.Fatalf("expected tools transcript to contain %q, got: %q", want, transcript)
+		}
+	}
+	if strings.Contains(transcript, "TOOL CALL FORMAT") || strings.Contains(transcript, "<|DSML|tool_calls>") {
+		t.Fatalf("tools transcript should not duplicate format instructions, got: %q", transcript)
+	}
+}
+
 func TestBuildOpenAIFinalPromptPrependsOutputIntegrityGuard(t *testing.T) {
 	messages := []any{
 		map[string]any{"role": "system", "content": "You are helpful"},
--- a/internal/promptcompat/responses_input_items_test.go
+++ b/internal/promptcompat/responses_input_items_test.go
@@ -1,6 +1,9 @@
 package promptcompat

-import "testing"
+import (
+	"strings"
+	"testing"
+)

 func TestNormalizeResponsesInputItemPreservesAssistantReasoningContent(t *testing.T) {
 	item := map[string]any{
@@ -48,3 +51,44 @@ func TestNormalizeResponsesInputItemAssistantMessageWithReasoningBlocks(t *testi
 		t.Fatalf("expected content blocks preserved, got %#v", got["content"])
 	}
 }
+
+func TestNormalizeResponsesInputArrayMergesReasoningMessageIntoFunctionCallHistory(t *testing.T) {
+	input := []any{
+		map[string]any{
+			"type": "message",
+			"role": "assistant",
+			"content": []any{
+				map[string]any{"type": "reasoning", "text": "need fresh docs before answering"},
+			},
+		},
+		map[string]any{
+			"type":      "function_call",
+			"call_id":   "call_search",
+			"name":      "search_web",
+			"arguments": `{"query":"docs"}`,
+		},
+	}
+
+	got := NormalizeResponsesInputAsMessages(input)
+	if len(got) != 1 {
+		t.Fatalf("expected reasoning and function_call merged into one assistant message, got %#v", got)
+	}
+	msg, _ := got[0].(map[string]any)
+	if msg["role"] != "assistant" {
+		t.Fatalf("expected assistant message, got %#v", msg)
+	}
+	if msg["reasoning_content"] != "need fresh docs before answering" {
+		t.Fatalf("expected reasoning_content on tool-call message, got %#v", msg)
+	}
+	toolCalls, _ := msg["tool_calls"].([]any)
+	if len(toolCalls) != 1 {
+		t.Fatalf("expected one tool call, got %#v", msg["tool_calls"])
+	}
+	history := BuildOpenAIHistoryTranscript(got)
+	if !strings.Contains(history, "[reasoning_content]\nneed fresh docs before answering\n[/reasoning_content]") {
+		t.Fatalf("expected reasoning in history transcript, got %q", history)
+	}
+	if !strings.Contains(history, `<|DSML|invoke name="search_web">`) {
+		t.Fatalf("expected tool call in history transcript, got %q", history)
+	}
+}
--- a/internal/promptcompat/responses_input_normalize.go
+++ b/internal/promptcompat/responses_input_normalize.go
@@ -61,19 +61,52 @@ func normalizeResponsesInputArray(items []any) []any {
 	out := make([]any, 0, len(items))
 	callNameByID := map[string]string{}
 	fallbackParts := make([]string, 0, len(items))
+	pendingAssistantReasoning := ""
 	flushFallback := func() {
 		if len(fallbackParts) == 0 {
 			return
 		}
+		if pendingAssistantReasoning != "" {
+			out = append(out, map[string]any{"role": "assistant", "reasoning_content": pendingAssistantReasoning})
+			pendingAssistantReasoning = ""
+		}
 		out = append(out, map[string]any{"role": "user", "content": strings.Join(fallbackParts, "\n")})
 		fallbackParts = fallbackParts[:0]
 	}
+	flushPendingReasoning := func() {
+		if pendingAssistantReasoning == "" {
+			return
+		}
+		out = append(out, map[string]any{"role": "assistant", "reasoning_content": pendingAssistantReasoning})
+		pendingAssistantReasoning = ""
+	}

 	for _, item := range items {
 		switch x := item.(type) {
 		case map[string]any:
 			if msg := normalizeResponsesInputItemWithState(x, callNameByID); msg != nil {
+				if reasoning := assistantReasoningOnlyContent(msg); reasoning != "" {
+					if pendingAssistantReasoning == "" {
+						pendingAssistantReasoning = reasoning
+					} else {
+						pendingAssistantReasoning += "\n" + reasoning
+					}
+					continue
+				}
+				if isAssistantToolCallMessage(msg) && pendingAssistantReasoning != "" {
+					if strings.TrimSpace(normalizeOpenAIReasoningContentForPrompt(msg["reasoning_content"])) == "" {
+						msg["reasoning_content"] = pendingAssistantReasoning
+					}
+					pendingAssistantReasoning = ""
+				} else {
+					flushPendingReasoning()
+				}
 				flushFallback()
+				if isAssistantToolCallMessage(msg) && len(out) > 0 {
+					if merged := mergeResponsesAssistantToolCalls(out[len(out)-1], msg); merged {
+						continue
+					}
+				}
 				out = append(out, msg)
 				continue
 			}
@@ -86,9 +119,55 @@ func normalizeResponsesInputArray(items []any) []any {
 			}
 		}
 	}
+	flushPendingReasoning()
 	flushFallback()
 	if len(out) == 0 {
 		return nil
 	}
 	return out
 }
+
+func assistantReasoningOnlyContent(msg map[string]any) string {
+	if !isAssistantMessage(msg) || isAssistantToolCallMessage(msg) {
+		return ""
+	}
+	if _, hasContent := msg["content"]; hasContent {
+		normalizedContent := strings.TrimSpace(NormalizeOpenAIContentForPrompt(msg["content"]))
+		reasoningFromContent := strings.TrimSpace(extractOpenAIReasoningContentFromMessage(msg["content"]))
+		if normalizedContent != "" && normalizedContent != reasoningFromContent {
+			return ""
+		}
+		if reasoningFromContent != "" {
+			return reasoningFromContent
+		}
+	}
+	return strings.TrimSpace(normalizeOpenAIReasoningContentForPrompt(msg["reasoning_content"]))
+}
+
+func isAssistantMessage(msg map[string]any) bool {
+	return strings.EqualFold(strings.TrimSpace(asString(msg["role"])), "assistant")
+}
+
+func isAssistantToolCallMessage(msg map[string]any) bool {
+	if !isAssistantMessage(msg) {
+		return false
+	}
+	toolCalls, ok := msg["tool_calls"].([]any)
+	return ok && len(toolCalls) > 0
+}
+
+func mergeResponsesAssistantToolCalls(prev any, next map[string]any) bool {
+	prevMsg, ok := prev.(map[string]any)
+	if !ok || !isAssistantToolCallMessage(prevMsg) || !isAssistantToolCallMessage(next) {
+		return false
+	}
+	prevCalls, _ := prevMsg["tool_calls"].([]any)
+	nextCalls, _ := next["tool_calls"].([]any)
+	prevMsg["tool_calls"] = append(prevCalls, nextCalls...)
+	if strings.TrimSpace(normalizeOpenAIReasoningContentForPrompt(prevMsg["reasoning_content"])) == "" {
+		if reasoning := strings.TrimSpace(normalizeOpenAIReasoningContentForPrompt(next["reasoning_content"])); reasoning != "" {
+			prevMsg["reasoning_content"] = reasoning
+		}
+	}
+	return true
+}
--- a/internal/promptcompat/standard_request.go
+++ b/internal/promptcompat/standard_request.go
@@ -11,6 +11,8 @@ type StandardRequest struct {
 	HistoryText             string
 	PromptTokenText         string
 	CurrentInputFileApplied bool
+	CurrentInputFileID      string
+	CurrentToolsFileID      string
 	ToolsRaw                any
 	FinalPrompt             string
 	ToolNames               []string
--- a/internal/promptcompat/tool_prompt.go
+++ b/internal/promptcompat/tool_prompt.go
@@ -9,10 +9,52 @@ import (
 	"ds2api/internal/toolcall"
 )

+const CurrentToolsContextFilename = "DS2API_TOOLS.txt"
+
+const toolsTranscriptTitle = "# DS2API_TOOLS.txt"
+const toolsTranscriptSummary = "Available tool descriptions and parameter schemas for this request."
+
+type toolPromptParts struct {
+	Descriptions string
+	Instructions string
+	Names        []string
+}
+
 func injectToolPrompt(messages []map[string]any, tools []any, policy ToolChoicePolicy) ([]map[string]any, []string) {
+	return injectToolPromptWithDescriptions(messages, tools, policy, true)
+}
+
+func injectToolPromptInstructionsOnly(messages []map[string]any, tools []any, policy ToolChoicePolicy) ([]map[string]any, []string) {
+	return injectToolPromptWithDescriptions(messages, tools, policy, false)
+}
+
+func injectToolPromptWithDescriptions(messages []map[string]any, tools []any, policy ToolChoicePolicy, includeDescriptions bool) ([]map[string]any, []string) {
 	if policy.IsNone() {
 		return messages, nil
 	}
+	parts := buildToolPromptParts(tools, policy)
+	if parts.Instructions == "" {
+		return messages, parts.Names
+	}
+	toolPrompt := parts.Instructions
+	if includeDescriptions && parts.Descriptions != "" {
+		toolPrompt = parts.Descriptions + "\n\n" + toolPrompt
+	} else if !includeDescriptions && parts.Descriptions != "" {
+		toolPrompt = "Available tool descriptions and parameter schemas are attached in DS2API_TOOLS.txt. Treat DS2API_TOOLS.txt as the authoritative list of callable tools and schemas; use only tools and parameters listed there.\n\n" + toolPrompt
+	}
+
+	for i := range messages {
+		if messages[i]["role"] == "system" {
+			old, _ := messages[i]["content"].(string)
+			messages[i]["content"] = strings.TrimSpace(old + "\n\n" + toolPrompt)
+			return messages, parts.Names
+		}
+	}
+	messages = append([]map[string]any{{"role": "system", "content": toolPrompt}}, messages...)
+	return messages, parts.Names
+}
+
+func buildToolPromptParts(tools []any, policy ToolChoicePolicy) toolPromptParts {
 	toolSchemas := make([]string, 0, len(tools))
 	names := make([]string, 0, len(tools))
 	isAllowed := func(name string) bool {
@@ -44,29 +86,47 @@ func injectToolPrompt(messages []map[string]any, tools []any, policy ToolChoiceP
 		toolSchemas = append(toolSchemas, fmt.Sprintf("Tool: %s\nDescription: %s\nParameters: %s", name, desc, string(b)))
 	}
 	if len(toolSchemas) == 0 {
-		return messages, names
+		return toolPromptParts{Names: names}
 	}
-	toolPrompt := "You have access to these tools:\n\n" + strings.Join(toolSchemas, "\n\n") + "\n\n" + toolcall.BuildToolCallInstructions(names)
+	descriptions := "You have access to these tools:\n\n" + strings.Join(toolSchemas, "\n\n")
+	instructions := toolcall.BuildToolCallInstructions(names)
 	if hasReadLikeTool(names) {
-		toolPrompt += "\n\nRead-tool cache guard: If a Read/read_file-style tool result says the file is unchanged, already available in history, should be referenced from previous context, or otherwise provides no file body, treat that result as missing content. Do not repeatedly call the same read request for that missing body. Request a full-content read if the tool supports it, or tell the user that the file contents need to be provided again."
+		instructions += "\n\nRead-tool cache guard: If a Read/read_file-style tool result says the file is unchanged, already available in history, should be referenced from previous context, or otherwise provides no file body, treat that result as missing content. Do not repeatedly call the same read request for that missing body. Request a full-content read if the tool supports it, or tell the user that the file contents need to be provided again."
 	}
 	if policy.Mode == ToolChoiceRequired {
-		toolPrompt += "\n7) For this response, you MUST call at least one tool from the allowed list."
+		instructions += "\n7) For this response, you MUST call at least one tool from the allowed list."
 	}
 	if policy.Mode == ToolChoiceForced && strings.TrimSpace(policy.ForcedName) != "" {
-		toolPrompt += "\n7) For this response, you MUST call exactly this tool name: " + strings.TrimSpace(policy.ForcedName)
-		toolPrompt += "\n8) Do not call any other tool."
+		instructions += "\n7) For this response, you MUST call exactly this tool name: " + strings.TrimSpace(policy.ForcedName)
+		instructions += "\n8) Do not call any other tool."
 	}
+	return toolPromptParts{
+		Descriptions: descriptions,
+		Instructions: instructions,
+		Names:        names,
+	}
+}

-	for i := range messages {
-		if messages[i]["role"] == "system" {
-			old, _ := messages[i]["content"].(string)
-			messages[i]["content"] = strings.TrimSpace(old + "\n\n" + toolPrompt)
-			return messages, names
+func BuildOpenAIToolsContextTranscript(toolsRaw any, policy ToolChoicePolicy) (string, []string) {
+	if policy.IsNone() {
+		return "", nil
 	}
+	tools, ok := toolsRaw.([]any)
+	if !ok || len(tools) == 0 {
+		return "", nil
 	}
-	messages = append([]map[string]any{{"role": "system", "content": toolPrompt}}, messages...)
-	return messages, names
+	parts := buildToolPromptParts(tools, policy)
+	if strings.TrimSpace(parts.Descriptions) == "" {
+		return "", parts.Names
+	}
+	var b strings.Builder
+	b.WriteString(toolsTranscriptTitle)
+	b.WriteString("\n")
+	b.WriteString(toolsTranscriptSummary)
+	b.WriteString("\n\n")
+	b.WriteString(parts.Descriptions)
+	b.WriteString("\n")
+	return b.String(), parts.Names
 }

 func hasReadLikeTool(names []string) bool {
--- a/internal/server/router.go
+++ b/internal/server/router.go
@@ -6,6 +6,7 @@ import (
 	"fmt"
 	"log"
 	"net/http"
+	"net/url"
 	"os"
 	"runtime"
 	"strings"
@@ -22,6 +23,7 @@ import (
 	"ds2api/internal/httpapi/admin"
 	"ds2api/internal/httpapi/claude"
 	"ds2api/internal/httpapi/gemini"
+	"ds2api/internal/httpapi/ollama"
 	"ds2api/internal/httpapi/openai/chat"
 	"ds2api/internal/httpapi/openai/embeddings"
 	"ds2api/internal/httpapi/openai/files"
@@ -68,6 +70,7 @@ func NewApp() (*App, error) {
 	claudeHandler := &claude.Handler{Store: store, Auth: resolver, DS: dsClient, OpenAI: chatHandler, ChatHistory: chatHistoryStore}
 	geminiHandler := &gemini.Handler{Store: store, Auth: resolver, DS: dsClient, OpenAI: chatHandler, ChatHistory: chatHistoryStore}
 	adminHandler := &admin.Handler{Store: store, Pool: pool, DS: dsClient, OpenAI: chatHandler, ChatHistory: chatHistoryStore}
+	ollamaHandler := &ollama.Handler{Store: store}
 	webuiHandler := webui.NewHandler()

 	r := chi.NewRouter()
@@ -112,6 +115,7 @@ func NewApp() (*App, error) {
 	r.Post("/embeddings", embeddingsHandler.Embeddings)
 	claude.RegisterRoutes(r, claudeHandler)
 	gemini.RegisterRoutes(r, geminiHandler)
+	ollama.RegisterRoutes(r, ollamaHandler)
 	r.Route("/admin", func(ar chi.Router) {
 		admin.RegisterRoutes(ar, adminHandler)
 	})
@@ -157,6 +161,16 @@ func (f *filteredLogFormatter) NewLogEntry(r *http.Request) middleware.LogEntry
 			return noopLogEntry{}
 		}
 	}
+	if r != nil && r.URL != nil {
+		if redacted, changed := redactSensitiveQueryParams(r.URL); changed {
+			cloned := *r
+			clonedURL := *r.URL
+			clonedURL.RawQuery = redacted
+			cloned.URL = &clonedURL
+			cloned.RequestURI = clonedURL.RequestURI()
+			return f.base.NewLogEntry(&cloned)
+		}
+	}
 	return f.base.NewLogEntry(r)
 }

@@ -166,6 +180,86 @@ func (noopLogEntry) Write(_ int, _ int, _ http.Header, _ time.Duration, _ interf

 func (noopLogEntry) Panic(_ interface{}, _ []byte) {}

+func redactSensitiveQueryParams(u *url.URL) (string, bool) {
+	if u == nil || u.RawQuery == "" {
+		return "", false
+	}
+	values, err := url.ParseQuery(u.RawQuery)
+	if err != nil {
+		return redactSensitiveRawQueryParams(u.RawQuery)
+	}
+	changed := false
+	for name, vals := range values {
+		if !isSensitiveQueryParam(name) {
+			continue
+		}
+		for i := range vals {
+			vals[i] = "REDACTED"
+		}
+		values[name] = vals
+		changed = true
+	}
+	if !changed {
+		return "", false
+	}
+	return values.Encode(), true
+}
+
+func redactSensitiveRawQueryParams(rawQuery string) (string, bool) {
+	if rawQuery == "" {
+		return "", false
+	}
+	var b strings.Builder
+	b.Grow(len(rawQuery))
+	changed := false
+	start := 0
+	for i := 0; i <= len(rawQuery); i++ {
+		if i < len(rawQuery) && rawQuery[i] != '&' && rawQuery[i] != ';' {
+			continue
+		}
+		segment := rawQuery[start:i]
+		b.WriteString(redactSensitiveRawQuerySegment(segment, &changed))
+		if i < len(rawQuery) {
+			b.WriteByte(rawQuery[i])
+		}
+		start = i + 1
+	}
+	if !changed {
+		return "", false
+	}
+	return b.String(), true
+}
+
+func redactSensitiveRawQuerySegment(segment string, changed *bool) string {
+	if segment == "" {
+		return segment
+	}
+	name := segment
+	valueStart := -1
+	if eq := strings.IndexByte(segment, '='); eq >= 0 {
+		name = segment[:eq]
+		valueStart = eq + 1
+	}
+	decodedName, err := url.QueryUnescape(name)
+	if err != nil {
+		decodedName = name
+	}
+	if !isSensitiveQueryParam(decodedName) {
+		return segment
+	}
+	if changed != nil {
+		*changed = true
+	}
+	if valueStart < 0 {
+		return name + "=REDACTED"
+	}
+	return segment[:valueStart] + "REDACTED"
+}
+
+func isSensitiveQueryParam(name string) bool {
+	return strings.EqualFold(name, "key") || strings.EqualFold(name, "api_key")
+}
+
 var defaultCORSAllowHeaders = []string{
 	"Content-Type",
 	"Authorization",
--- a/internal/server/router_log_test.go
+++ b/internal/server/router_log_test.go
@@ -0,0 +1,104 @@
+package server
+
+import (
+	"bytes"
+	"log"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/go-chi/chi/v5/middleware"
+)
+
+func TestFilteredLogFormatterRedactsSensitiveQueryParams(t *testing.T) {
+	var buf bytes.Buffer
+	formatter := &filteredLogFormatter{
+		base: &middleware.DefaultLogFormatter{
+			Logger:  log.New(&buf, "", 0),
+			NoColor: true,
+		},
+	}
+	req := httptest.NewRequest(
+		http.MethodPost,
+		"/v1beta/models/gemini-2.5-pro:generateContent?key=caller-secret&api_key=second-secret&alt=sse",
+		nil,
+	)
+
+	entry := formatter.NewLogEntry(req)
+	entry.Write(http.StatusOK, 0, http.Header{}, time.Millisecond, nil)
+
+	got := buf.String()
+	for _, secret := range []string{"caller-secret", "second-secret"} {
+		if strings.Contains(got, secret) {
+			t.Fatalf("log line contains sensitive query value %q: %s", secret, got)
+		}
+	}
+	if !strings.Contains(got, "key=REDACTED") || !strings.Contains(got, "api_key=REDACTED") {
+		t.Fatalf("log line did not include redacted sensitive params: %s", got)
+	}
+	if !strings.Contains(got, "alt=sse") {
+		t.Fatalf("log line did not preserve non-sensitive query param: %s", got)
+	}
+	if req.URL.RawQuery != "key=caller-secret&api_key=second-secret&alt=sse" {
+		t.Fatalf("request was mutated, RawQuery = %q", req.URL.RawQuery)
+	}
+}
+
+func TestFilteredLogFormatterRedactsSensitiveQueryParamsWhenMalformed(t *testing.T) {
+	tests := []struct {
+		name      string
+		target    string
+		secrets   []string
+		redacted  []string
+		preserved []string
+	}{
+		{
+			name:      "semicolon separator",
+			target:    "/v1beta/models/gemini-2.5-pro:generateContent?key=caller-secret;alt=sse",
+			secrets:   []string{"caller-secret"},
+			redacted:  []string{"key=REDACTED"},
+			preserved: []string{"alt=sse"},
+		},
+		{
+			name:     "bad escape in sensitive value",
+			target:   "/v1beta/models/gemini-2.5-pro:generateContent?api_key=second-secret%ZZ",
+			secrets:  []string{"second-secret"},
+			redacted: []string{"api_key=REDACTED"},
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			var buf bytes.Buffer
+			formatter := &filteredLogFormatter{
+				base: &middleware.DefaultLogFormatter{
+					Logger:  log.New(&buf, "", 0),
+					NoColor: true,
+				},
+			}
+			req := httptest.NewRequest(http.MethodPost, tt.target, nil)
+
+			entry := formatter.NewLogEntry(req)
+			entry.Write(http.StatusOK, 0, http.Header{}, time.Millisecond, nil)
+
+			got := buf.String()
+			for _, secret := range tt.secrets {
+				if strings.Contains(got, secret) {
+					t.Fatalf("log line contains sensitive query value %q: %s", secret, got)
+				}
+			}
+			for _, want := range tt.redacted {
+				if !strings.Contains(got, want) {
+					t.Fatalf("log line missing redacted query %q: %s", want, got)
+				}
+			}
+			for _, want := range tt.preserved {
+				if !strings.Contains(got, want) {
+					t.Fatalf("log line missing preserved query %q: %s", want, got)
+				}
+			}
+		})
+	}
+}
--- a/internal/toolcall/fence_edge_test.go
+++ b/internal/toolcall/fence_edge_test.go
@@ -64,3 +64,44 @@ func TestStripFencedCodeBlocks_InlineBackticksNotFence(t *testing.T) {
 		t.Fatalf("expected Before/After, got %q", got)
 	}
 }
+
+func TestParseToolCalls_IgnoresMarkdownDocumentationExamples(t *testing.T) {
+	text := "解析器支持多种工具调用格式。\n\n" +
+		"入口函数 `ParseToolCalls(text, availableToolNames)` 会返回调用列表。\n\n" +
+		"核心流程会解析 XML 格式的 `<tool_calls>` / `<invoke>` 标记。\n\n" +
+		"### 标准 XML 结构\n" +
+		"```xml\n" +
+		"<tool_calls>\n" +
+		"  <invoke name=\"read_file\">\n" +
+		"    <parameter name=\"path\">config.json</parameter>\n" +
+		"  </invoke>\n" +
+		"</tool_calls>\n" +
+		"```\n\n" +
+		"DSML 风格形如 `<invoke name=\"tool\">...</invoke>`，也可能提到 `<tool_calls>` 包裹。\n"
+
+	got := ParseToolCallsDetailed(text, []string{"read_file"})
+	if len(got.Calls) != 0 {
+		t.Fatalf("markdown documentation examples should not parse as tool calls, got %#v", got.Calls)
+	}
+}
+
+func TestParseToolCalls_IgnoresInlineMarkdownToolCallExample(t *testing.T) {
+	text := "示例：`<tool_calls><invoke name=\"read_file\"><parameter name=\"path\">README.md</parameter></invoke></tool_calls>`"
+
+	got := ParseToolCallsDetailed(text, []string{"read_file"})
+	if len(got.Calls) != 0 {
+		t.Fatalf("inline markdown tool example should not parse as tool calls, got %#v", got.Calls)
+	}
+}
+
+func TestParseToolCalls_PreservesBackticksInsideToolParameters(t *testing.T) {
+	text := "<tool_calls><invoke name=\"Bash\"><parameter name=\"command\">echo `date`</parameter></invoke></tool_calls>"
+
+	got := ParseToolCallsDetailed(text, []string{"Bash"})
+	if len(got.Calls) != 1 {
+		t.Fatalf("expected one tool call, got %#v", got.Calls)
+	}
+	if got.Calls[0].Input["command"] != "echo `date`" {
+		t.Fatalf("expected command backticks preserved, got %#v", got.Calls[0].Input["command"])
+	}
+}
--- a/internal/toolcall/tool_prompt.go
+++ b/internal/toolcall/tool_prompt.go
@@ -21,15 +21,19 @@ RULES:
 1) Use the <|DSML|tool_calls> wrapper format.
 2) Put one or more <|DSML|invoke> entries under a single <|DSML|tool_calls> root.
 3) Put the tool name in the invoke name attribute: <|DSML|invoke name="TOOL_NAME">.
+3a) Tag punctuation alphabet: ASCII < > / = " plus the halfwidth pipe |.
 4) All string values must use <![CDATA[...]]>, even short ones. This includes code, scripts, file contents, prompts, paths, names, and queries.
 5) Every top-level argument must be a <|DSML|parameter name="ARG_NAME">...</|DSML|parameter> node.
 6) Objects use nested XML elements inside the parameter body. Arrays may repeat <item> children.
 7) Numbers, booleans, and null stay plain text.
 8) Use only the parameter names in the tool schema. Do not invent fields.
-9) Do NOT wrap XML in markdown fences. Do NOT output explanations, role markers, or internal monologue.
-10) If you call a tool, the first non-whitespace characters of that tool block must be exactly <|DSML|tool_calls>.
-11) Never omit the opening <|DSML|tool_calls> tag, even if you already plan to close with </|DSML|tool_calls>.
-12) Compatibility note: the runtime also accepts the legacy XML tags <tool_calls> / <invoke> / <parameter>, but prefer the DSML-prefixed form above.
+9) Fill parameters with the actual values required for this call. Do not emit placeholder, blank, or whitespace-only parameters.
+10) If a required parameter value is unknown, ask the user or answer normally instead of outputting an empty tool call.
+11) For shell tools such as Bash / execute_command, the command/script must be inside the command parameter. Never call them with an empty command.
+12) Do NOT wrap XML in markdown fences. Do NOT output explanations, role markers, or internal monologue.
+13) If you call a tool, the first non-whitespace characters of that tool block must be exactly <|DSML|tool_calls>.
+14) Never omit the opening <|DSML|tool_calls> tag, even if you already plan to close with </|DSML|tool_calls>.
+15) Compatibility note: the runtime also accepts the legacy XML tags <tool_calls> / <invoke> / <parameter>, but prefer the DSML-prefixed form above.

 PARAMETER SHAPES:
 - string => <|DSML|parameter name="x"><![CDATA[value]]></|DSML|parameter>
@@ -48,9 +52,14 @@ Wrong 2 — Markdown code fences:
 Wrong 3 — missing opening wrapper:
  <|DSML|invoke name="TOOL_NAME">...</|DSML|invoke>
  </|DSML|tool_calls>
+Wrong 4 — empty parameters:
+  <|DSML|tool_calls>
+    <|DSML|invoke name="Bash">
+      <|DSML|parameter name="command"></|DSML|parameter>
+    </|DSML|invoke>
+  </|DSML|tool_calls>

 Remember: The ONLY valid way to use tools is the <|DSML|tool_calls>...</|DSML|tool_calls> block at the end of your response.
-
 ` + buildCorrectToolExamples(toolNames)
 }

--- a/internal/toolcall/tool_prompt_test.go
+++ b/internal/toolcall/tool_prompt_test.go
@@ -119,6 +119,33 @@ func TestBuildToolCallInstructions_AnchorsMissingOpeningWrapperFailureMode(t *te
 	}
 }

+func TestBuildToolCallInstructions_RejectsEmptyParametersInPrompt(t *testing.T) {
+	out := BuildToolCallInstructions([]string{"Bash"})
+	for _, want := range []string{
+		"Do not emit placeholder, blank, or whitespace-only parameters.",
+		"If a required parameter value is unknown, ask the user or answer normally instead of outputting an empty tool call.",
+		"Never call them with an empty command.",
+		"Wrong 4 — empty parameters",
+	} {
+		if !strings.Contains(out, want) {
+			t.Fatalf("expected empty-parameter instruction %q, got: %s", want, out)
+		}
+	}
+}
+
+func TestBuildToolCallInstructions_UsesPositiveTagPunctuationAlphabet(t *testing.T) {
+	out := BuildToolCallInstructions([]string{"Bash"})
+	want := `Tag punctuation alphabet: ASCII < > / = " plus the halfwidth pipe |.`
+	if !strings.Contains(out, want) {
+		t.Fatalf("expected positive tag punctuation alphabet %q, got: %s", want, out)
+	}
+	for _, bad := range []string{"lookalike", "substitute", "！", "〈", "〉", "“", "”", "、"} {
+		if strings.Contains(out, bad) {
+			t.Fatalf("tool prompt should not include negative punctuation examples %q, got: %s", bad, out)
+		}
+	}
+}
+
 func findInvokeBlocks(text, name string) []string {
 	open := `<|DSML|invoke name="` + name + `">`
 	remaining := text
--- a/internal/toolcall/toolcalls_candidates.go
+++ b/internal/toolcall/toolcalls_candidates.go
@@ -1,4 +1,691 @@
 package toolcall

-// toolcalls_candidates.go is reserved for tool-call candidate helper logic.
-// It exists to satisfy the refactor line gate target list.
+import (
+	"strings"
+	"unicode"
+	"unicode/utf8"
+)
+
+type canonicalToolMarkupAttr struct {
+	Key   string
+	Value string
+}
+
+func canonicalizeToolCallCandidateSpans(text string) string {
+	if text == "" {
+		return ""
+	}
+	var b strings.Builder
+	b.Grow(len(text))
+	for i := 0; i < len(text); {
+		next, advanced, blocked := skipXMLIgnoredSection(text, i)
+		if blocked {
+			b.WriteString(text[i:])
+			break
+		}
+		if advanced {
+			b.WriteString(text[i:next])
+			i = next
+			continue
+		}
+		if end, ok := markdownCodeSpanEnd(text, i); ok {
+			b.WriteString(text[i:end])
+			i = end
+			continue
+		}
+		tag, ok := scanToolMarkupTagAt(text, i)
+		if !ok {
+			b.WriteByte(text[i])
+			i++
+			continue
+		}
+		b.WriteString(canonicalizeRecognizedToolMarkupTag(text[tag.Start:tag.End+1], tag))
+		i = tag.End + 1
+	}
+	return b.String()
+}
+
+func canonicalizeRecognizedToolMarkupTag(raw string, tag ToolMarkupTag) string {
+	if raw == "" {
+		return raw
+	}
+	idx := 0
+	if delimLen := xmlTagStartDelimiterLenAt(raw, idx); delimLen > 0 {
+		idx += delimLen
+	}
+	for {
+		idx = skipToolMarkupIgnorables(raw, idx)
+		if delimLen := xmlTagStartDelimiterLenAt(raw, idx); delimLen > 0 {
+			idx += delimLen
+			continue
+		}
+		break
+	}
+	idx = skipToolMarkupIgnorables(raw, idx)
+	if tag.Closing {
+		if next, ok := consumeToolMarkupClosingSlash(raw, idx); ok {
+			idx = next
+		}
+	}
+	idx, _ = consumeToolMarkupNamePrefix(raw, idx)
+	afterName, ok := consumeToolKeyword(raw, idx, rawNameForTag(tag))
+	if !ok {
+		afterName = idx
+	}
+
+	attrs := parseCanonicalToolMarkupAttrs(raw, afterName)
+
+	var b strings.Builder
+	b.Grow(len(raw) + 8)
+	b.WriteByte('<')
+	if tag.Closing {
+		b.WriteByte('/')
+	}
+	if tag.DSMLLike {
+		b.WriteString("|DSML|")
+	}
+	b.WriteString(tag.Name)
+	for _, attr := range attrs {
+		if attr.Key == "" {
+			continue
+		}
+		b.WriteByte(' ')
+		b.WriteString(attr.Key)
+		b.WriteString(`="`)
+		b.WriteString(quoteCanonicalXMLAttrValue(attr.Value))
+		b.WriteByte('"')
+	}
+	if tag.SelfClosing {
+		b.WriteByte('/')
+	}
+	b.WriteByte('>')
+	return b.String()
+}
+
+func rawNameForTag(tag ToolMarkupTag) string {
+	for _, name := range toolMarkupNames {
+		if name.canonical == tag.Name {
+			return name.raw
+		}
+	}
+	return tag.Name
+}
+
+func parseCanonicalToolMarkupAttrs(raw string, idx int) []canonicalToolMarkupAttr {
+	if raw == "" || idx >= len(raw) {
+		return nil
+	}
+	var out []canonicalToolMarkupAttr
+	for idx < len(raw) {
+		idx = skipToolMarkupIgnorables(raw, idx)
+		if idx >= len(raw) {
+			break
+		}
+		if spacingLen := toolMarkupWhitespaceLikeLenAt(raw, idx); spacingLen > 0 {
+			idx += spacingLen
+			continue
+		}
+		if xmlTagEndDelimiterLenAt(raw, idx) > 0 {
+			break
+		}
+		if next, ok := consumeToolMarkupPipe(raw, idx); ok {
+			idx = next
+			continue
+		}
+		if next, ok := consumeToolMarkupClosingSlash(raw, idx); ok {
+			idx = next
+			continue
+		}
+
+		keyStart := idx
+		for idx < len(raw) {
+			idx = skipToolMarkupIgnorables(raw, idx)
+			if idx >= len(raw) {
+				break
+			}
+			if spacingLen := toolMarkupWhitespaceLikeLenAt(raw, idx); spacingLen > 0 {
+				break
+			}
+			if toolMarkupEqualsLenAt(raw, idx) > 0 || xmlTagEndDelimiterLenAt(raw, idx) > 0 {
+				break
+			}
+			if _, ok := consumeToolMarkupPipe(raw, idx); ok {
+				break
+			}
+			if _, ok := consumeToolMarkupClosingSlash(raw, idx); ok {
+				break
+			}
+			_, size := utf8.DecodeRuneInString(raw[idx:])
+			if size <= 0 {
+				idx++
+			} else {
+				idx += size
+			}
+		}
+		keyEnd := idx
+		key := normalizeCanonicalToolAttrKey(raw[keyStart:keyEnd])
+		idx = skipToolMarkupIgnorables(raw, idx)
+		for {
+			spacingLen := toolMarkupWhitespaceLikeLenAt(raw, idx)
+			if spacingLen == 0 {
+				break
+			}
+			idx += spacingLen
+			idx = skipToolMarkupIgnorables(raw, idx)
+		}
+		if eqLen := toolMarkupEqualsLenAt(raw, idx); eqLen > 0 {
+			idx += eqLen
+		} else {
+			continue
+		}
+		idx = skipToolMarkupIgnorables(raw, idx)
+		for {
+			spacingLen := toolMarkupWhitespaceLikeLenAt(raw, idx)
+			if spacingLen == 0 {
+				break
+			}
+			idx += spacingLen
+			idx = skipToolMarkupIgnorables(raw, idx)
+		}
+		if key == "" {
+			_, size := utf8.DecodeRuneInString(raw[idx:])
+			if size <= 0 {
+				idx++
+			} else {
+				idx += size
+			}
+			continue
+		}
+
+		value := ""
+		if quote, quoteLen := xmlQuotePairAt(raw, idx); quoteLen > 0 {
+			valueStart := idx + quoteLen
+			idx = valueStart
+			for idx < len(raw) {
+				if closeLen := xmlQuoteCloseDelimiterLenAt(raw, idx, quote); closeLen > 0 {
+					value = raw[valueStart:idx]
+					idx += closeLen
+					break
+				}
+				_, size := utf8.DecodeRuneInString(raw[idx:])
+				if size <= 0 {
+					idx++
+				} else {
+					idx += size
+				}
+			}
+		} else {
+			valueStart := idx
+			for idx < len(raw) {
+				if spacingLen := toolMarkupWhitespaceLikeLenAt(raw, idx); spacingLen > 0 {
+					break
+				}
+				if xmlTagEndDelimiterLenAt(raw, idx) > 0 || toolMarkupEqualsLenAt(raw, idx) > 0 {
+					break
+				}
+				if _, ok := consumeToolMarkupPipe(raw, idx); ok {
+					break
+				}
+				if _, ok := consumeToolMarkupClosingSlash(raw, idx); ok {
+					break
+				}
+				_, size := utf8.DecodeRuneInString(raw[idx:])
+				if size <= 0 {
+					idx++
+				} else {
+					idx += size
+				}
+			}
+			value = raw[valueStart:idx]
+		}
+
+		out = append(out, canonicalToolMarkupAttr{
+			Key:   key,
+			Value: value,
+		})
+	}
+	return out
+}
+
+func normalizeCanonicalToolAttrKey(raw string) string {
+	trimmed := strings.TrimSpace(removeToolMarkupIgnorables(raw))
+	if trimmed == "" {
+		return ""
+	}
+	if next, ok := consumeToolKeyword(trimmed, 0, "name"); ok {
+		if skipToolMarkupIgnorables(trimmed, next) == len(trimmed) {
+			return "name"
+		}
+	}
+	return ""
+}
+
+func quoteCanonicalXMLAttrValue(raw string) string {
+	if raw == "" {
+		return ""
+	}
+	return strings.ReplaceAll(raw, `"`, "&quot;")
+}
+
+func removeToolMarkupIgnorables(raw string) string {
+	if raw == "" {
+		return ""
+	}
+	var b strings.Builder
+	b.Grow(len(raw))
+	for i := 0; i < len(raw); {
+		if ignorableLen := toolMarkupIgnorableLenAt(raw, i); ignorableLen > 0 {
+			i += ignorableLen
+			continue
+		}
+		r, size := utf8.DecodeRuneInString(raw[i:])
+		if size <= 0 {
+			b.WriteByte(raw[i])
+			i++
+			continue
+		}
+		b.WriteRune(r)
+		i += size
+	}
+	return b.String()
+}
+
+func skipToolMarkupIgnorables(text string, idx int) int {
+	for idx < len(text) {
+		if ignorableLen := toolMarkupIgnorableLenAt(text, idx); ignorableLen > 0 {
+			idx += ignorableLen
+			continue
+		}
+		break
+	}
+	return idx
+}
+
+func toolMarkupIgnorableLenAt(text string, idx int) int {
+	if idx < 0 || idx >= len(text) {
+		return 0
+	}
+	r, size := utf8.DecodeRuneInString(text[idx:])
+	if size <= 0 {
+		return 0
+	}
+	if unicode.Is(unicode.Cf, r) {
+		return size
+	}
+	if unicode.IsControl(r) && !unicode.IsSpace(r) {
+		return size
+	}
+	return 0
+}
+
+func toolMarkupEqualsLenAt(text string, idx int) int {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx < 0 || idx >= len(text) {
+		return 0
+	}
+	switch {
+	case text[idx] == '=':
+		return 1
+	case strings.HasPrefix(text[idx:], "＝"):
+		return len("＝")
+	case strings.HasPrefix(text[idx:], "﹦"):
+		return len("﹦")
+	case strings.HasPrefix(text[idx:], "꞊"):
+		return len("꞊")
+	default:
+		return 0
+	}
+}
+
+func toolMarkupDashLenAt(text string, idx int) int {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx < 0 || idx >= len(text) {
+		return 0
+	}
+	switch {
+	case text[idx] == '-':
+		return 1
+	case strings.HasPrefix(text[idx:], "‐"):
+		return len("‐")
+	case strings.HasPrefix(text[idx:], "‑"):
+		return len("‑")
+	case strings.HasPrefix(text[idx:], "‒"):
+		return len("‒")
+	case strings.HasPrefix(text[idx:], "–"):
+		return len("–")
+	case strings.HasPrefix(text[idx:], "—"):
+		return len("—")
+	case strings.HasPrefix(text[idx:], "―"):
+		return len("―")
+	case strings.HasPrefix(text[idx:], "−"):
+		return len("−")
+	case strings.HasPrefix(text[idx:], "﹣"):
+		return len("﹣")
+	case strings.HasPrefix(text[idx:], "－"):
+		return len("－")
+	default:
+		return 0
+	}
+}
+
+func toolMarkupUnderscoreLenAt(text string, idx int) int {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx < 0 || idx >= len(text) {
+		return 0
+	}
+	switch {
+	case text[idx] == '_':
+		return 1
+	case strings.HasPrefix(text[idx:], "＿"):
+		return len("＿")
+	case strings.HasPrefix(text[idx:], "﹍"):
+		return len("﹍")
+	case strings.HasPrefix(text[idx:], "﹎"):
+		return len("﹎")
+	case strings.HasPrefix(text[idx:], "﹏"):
+		return len("﹏")
+	default:
+		return 0
+	}
+}
+
+func consumeToolKeyword(text string, idx int, keyword string) (int, bool) {
+	next := idx
+	for i := 0; i < len(keyword); i++ {
+		next = skipToolMarkupIgnorables(text, next)
+		if next >= len(text) {
+			return idx, false
+		}
+		target := asciiLower(keyword[i])
+		switch target {
+		case '_':
+			if underscoreLen := toolMarkupUnderscoreLenAt(text, next); underscoreLen > 0 {
+				next += underscoreLen
+				continue
+			}
+			return idx, false
+		case '-':
+			if dashLen := toolMarkupDashLenAt(text, next); dashLen > 0 {
+				next += dashLen
+				continue
+			}
+			return idx, false
+		default:
+			r, size := utf8.DecodeRuneInString(text[next:])
+			if size <= 0 {
+				return idx, false
+			}
+			folded, ok := foldToolKeywordRune(r)
+			if !ok || folded != target {
+				return idx, false
+			}
+			next += size
+		}
+	}
+	return next, true
+}
+
+func foldToolKeywordRune(r rune) (byte, bool) {
+	if r >= 'Ａ' && r <= 'Ｚ' {
+		r = r - 'Ａ' + 'A'
+	}
+	if r >= 'ａ' && r <= 'ｚ' {
+		r = r - 'ａ' + 'a'
+	}
+	r = unicode.ToLower(r)
+	switch r {
+	case 'a', 'c', 'd', 'e', 'i', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'v':
+		return byte(r), true
+	case 'а', 'Α', 'α':
+		return 'a', true
+	case 'с', 'С', 'ϲ', 'Ϲ':
+		return 'c', true
+	case 'ԁ', 'ⅾ':
+		return 'd', true
+	case 'е', 'Е', 'Ε', 'ε':
+		return 'e', true
+	case 'і', 'І', 'Ι', 'ι', 'ı':
+		return 'i', true
+	case 'к', 'К', 'Κ', 'κ':
+		return 'k', true
+	case 'ⅼ':
+		return 'l', true
+	case 'м', 'М', 'Μ', 'μ':
+		return 'm', true
+	case 'ո':
+		return 'n', true
+	case 'о', 'О', 'Ο', 'ο':
+		return 'o', true
+	case 'р', 'Р', 'Ρ', 'ρ':
+		return 'p', true
+	case 'ѕ', 'Ѕ':
+		return 's', true
+	case 'т', 'Т', 'Τ', 'τ':
+		return 't', true
+	case 'ν', 'Ν', 'ѵ', 'ⅴ':
+		return 'v', true
+	default:
+		return 0, false
+	}
+}
+
+func toolMarkupWhitespaceLikeLenAt(text string, idx int) int {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx < 0 || idx >= len(text) {
+		return 0
+	}
+	switch text[idx] {
+	case ' ', '\t', '\n', '\r':
+		return 1
+	}
+	if strings.HasPrefix(text[idx:], "▁") {
+		return len("▁")
+	}
+	r, size := utf8.DecodeRuneInString(text[idx:])
+	if size > 0 && unicode.IsSpace(r) {
+		return size
+	}
+	return 0
+}
+
+func consumeToolMarkupPipe(text string, idx int) (int, bool) {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx >= len(text) {
+		return idx, false
+	}
+	switch {
+	case text[idx] == '|':
+		return idx + 1, true
+	case strings.HasPrefix(text[idx:], "│"):
+		return idx + len("│"), true
+	case strings.HasPrefix(text[idx:], "∣"):
+		return idx + len("∣"), true
+	case strings.HasPrefix(text[idx:], "❘"):
+		return idx + len("❘"), true
+	case strings.HasPrefix(text[idx:], "ǀ"):
+		return idx + len("ǀ"), true
+	case strings.HasPrefix(text[idx:], "￨"):
+		return idx + len("￨"), true
+	default:
+		return idx, false
+	}
+}
+
+func consumeToolMarkupClosingSlash(text string, idx int) (int, bool) {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx >= len(text) {
+		return idx, false
+	}
+	switch {
+	case text[idx] == '/':
+		return idx + 1, true
+	case strings.HasPrefix(text[idx:], "／"):
+		return idx + len("／"), true
+	case strings.HasPrefix(text[idx:], "∕"):
+		return idx + len("∕"), true
+	case strings.HasPrefix(text[idx:], "⁄"):
+		return idx + len("⁄"), true
+	case strings.HasPrefix(text[idx:], "⧸"):
+		return idx + len("⧸"), true
+	default:
+		return idx, false
+	}
+}
+
+func xmlTagStartDelimiterLenAt(text string, idx int) int {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx < 0 || idx >= len(text) {
+		return 0
+	}
+	switch {
+	case text[idx] == '<':
+		return 1
+	case strings.HasPrefix(text[idx:], "＜"):
+		return len("＜")
+	case strings.HasPrefix(text[idx:], "﹤"):
+		return len("﹤")
+	case strings.HasPrefix(text[idx:], "〈"):
+		return len("〈")
+	default:
+		return 0
+	}
+}
+
+func xmlTagEndDelimiterLenAt(text string, idx int) int {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx < 0 || idx >= len(text) {
+		return 0
+	}
+	switch {
+	case text[idx] == '>':
+		return 1
+	case strings.HasPrefix(text[idx:], "＞"):
+		return len("＞")
+	case strings.HasPrefix(text[idx:], "﹥"):
+		return len("﹥")
+	case strings.HasPrefix(text[idx:], "〉"):
+		return len("〉")
+	default:
+		return 0
+	}
+}
+
+func xmlTagEndDelimiterLenEndingAt(text string, end int) int {
+	if end < 0 || end >= len(text) {
+		return 0
+	}
+	if text[end] == '>' {
+		return 1
+	}
+	if end+1 >= len("＞") && text[end+1-len("＞"):end+1] == "＞" {
+		return len("＞")
+	}
+	return 0
+}
+
+func xmlQuotePairAt(text string, idx int) (string, int) {
+	idx = skipToolMarkupIgnorables(text, idx)
+	if idx < 0 || idx >= len(text) {
+		return "", 0
+	}
+	switch {
+	case text[idx] == '"':
+		return `"`, 1
+	case text[idx] == '\'':
+		return `'`, 1
+	case strings.HasPrefix(text[idx:], "“"):
+		return "”", len("“")
+	case strings.HasPrefix(text[idx:], "‘"):
+		return "’", len("‘")
+	case strings.HasPrefix(text[idx:], "＂"):
+		return "＂", len("＂")
+	case strings.HasPrefix(text[idx:], "＇"):
+		return "＇", len("＇")
+	case strings.HasPrefix(text[idx:], "„"):
+		return "”", len("„")
+	case strings.HasPrefix(text[idx:], "‟"):
+		return "”", len("‟")
+	default:
+		return "", 0
+	}
+}
+
+func xmlQuoteCloseDelimiterLenAt(text string, idx int, quote string) int {
+	if quote == "" || idx < 0 || idx >= len(text) {
+		return 0
+	}
+	if strings.HasPrefix(text[idx:], quote) {
+		return len(quote)
+	}
+	return 0
+}
+
+func hasRepairableXMLToolCallsWrapper(text string) bool {
+	if strings.TrimSpace(text) == "" {
+		return false
+	}
+	if _, ok := firstToolMarkupTagByName(text, "tool_calls", false); ok {
+		return false
+	}
+	invokeTag, ok := firstToolMarkupTagByName(text, "invoke", false)
+	if !ok {
+		return false
+	}
+	closeTag, ok := lastToolMarkupTagByName(text, "tool_calls", true)
+	if !ok {
+		return false
+	}
+	return invokeTag.Start < closeTag.Start
+}
+
+func toolCDATAOpenLenAt(text string, idx int) int {
+	start := skipToolMarkupIgnorables(text, idx)
+	ltLen := xmlTagStartDelimiterLenAt(text, start)
+	if ltLen == 0 {
+		return 0
+	}
+	pos := start + ltLen
+	for skipped := 0; skipped <= 4 && pos < len(text); skipped++ {
+		pos = skipToolMarkupIgnorables(text, pos)
+		if pos >= len(text) {
+			return 0
+		}
+		if text[pos] == '[' {
+			pos++
+			next, ok := consumeToolKeyword(text, pos, "cdata")
+			if !ok {
+				return 0
+			}
+			pos = skipToolMarkupIgnorables(text, next)
+			if pos >= len(text) || text[pos] != '[' {
+				return 0
+			}
+			pos++
+			return pos - idx
+		}
+		r, size := utf8.DecodeRuneInString(text[pos:])
+		if size <= 0 || !isToolMarkupSeparator(r) {
+			return 0
+		}
+		pos += size
+	}
+	return 0
+}
+
+func indexToolCDATAOpen(text string, start int) int {
+	for i := maxInt(start, 0); i < len(text); i++ {
+		if toolCDATAOpenLenAt(text, i) > 0 {
+			return i
+		}
+	}
+	return -1
+}
+
+func findTrailingToolCDATACloseStart(text string) int {
+	for i := len(text) - 1; i >= 0; i-- {
+		if closeLen := toolCDATACloseLenAt(text, i); closeLen > 0 && i+closeLen == len(text) {
+			return i
+		}
+	}
+	return -1
+}
--- a/internal/toolcall/toolcalls_dsml.go
+++ b/internal/toolcall/toolcalls_dsml.go
@@ -1,27 +1,29 @@
 package toolcall

-import "strings"
+import (
+	"strings"
+)

 func normalizeDSMLToolCallMarkup(text string) (string, bool) {
 	if text == "" {
 		return "", true
 	}
-	hasAliasLikeMarkup, _ := ContainsToolMarkupSyntaxOutsideIgnored(text)
-	if !hasAliasLikeMarkup {
-		return text, true
+	canonicalized := canonicalizeToolCallCandidateSpans(text)
+	hasDSMLLikeMarkup, hasCanonicalMarkup := ContainsToolMarkupSyntaxOutsideIgnored(canonicalized)
+	if !hasDSMLLikeMarkup && !hasCanonicalMarkup {
+		return canonicalized, true
 	}
-	return rewriteDSMLToolMarkupOutsideIgnored(text), true
+	return rewriteDSMLToolMarkupOutsideIgnored(canonicalized), true
 }

 func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
 	if text == "" {
 		return ""
 	}
-	lower := strings.ToLower(text)
 	var b strings.Builder
 	b.Grow(len(text))
 	for i := 0; i < len(text); {
-		next, advanced, blocked := skipXMLIgnoredSection(text, lower, i)
+		next, advanced, blocked := skipXMLIgnoredSection(text, i)
 		if blocked {
 			b.WriteString(text[i:])
 			break
@@ -31,27 +33,30 @@ func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
 			i = next
 			continue
 		}
+		if end, ok := markdownCodeSpanEnd(text, i); ok {
+			b.WriteString(text[i:end])
+			i = end
+			continue
+		}
 		tag, ok := scanToolMarkupTagAt(text, i)
 		if !ok {
 			b.WriteByte(text[i])
 			i++
 			continue
 		}
-		if tag.DSMLLike {
 		b.WriteByte('<')
 		if tag.Closing {
 			b.WriteByte('/')
 		}
 		b.WriteString(tag.Name)
+		if delimLen := xmlTagEndDelimiterLenEndingAt(text, tag.End); delimLen > 0 {
+			b.WriteString(text[tag.NameEnd : tag.End+1-delimLen])
+			b.WriteByte('>')
+		} else {
 			b.WriteString(text[tag.NameEnd : tag.End+1])
-			if text[tag.End] != '>' {
 			b.WriteByte('>')
 		}
 		i = tag.End + 1
-			continue
-		}
-		b.WriteString(text[tag.Start : tag.End+1])
-		i = tag.End + 1
 	}
 	return b.String()
 }
--- a/internal/toolcall/toolcalls_markup.go
+++ b/internal/toolcall/toolcalls_markup.go
@@ -9,9 +9,6 @@ import (

 var toolCallMarkupKVPattern = regexp.MustCompile(`(?is)<(?:[a-z0-9_:-]+:)?([a-z0-9_\-.]+)\b[^>]*>(.*?)</(?:[a-z0-9_:-]+:)?([a-z0-9_\-.]+)>`)

-// cdataPattern matches a standalone CDATA section.
-var cdataPattern = regexp.MustCompile(`(?is)^<!\[CDATA\[(.*?)]]>$`)
-
 func parseMarkupKVObject(text string) map[string]any {
 	matches := toolCallMarkupKVPattern.FindAllStringSubmatch(strings.TrimSpace(text), -1)
 	if len(matches) == 0 {
@@ -108,11 +105,14 @@ func extractRawTagValue(inner string) string {

 func extractStandaloneCDATA(inner string) (string, bool) {
 	trimmed := strings.TrimSpace(inner)
-	if cdataMatches := cdataPattern.FindStringSubmatch(trimmed); len(cdataMatches) >= 2 {
-		return cdataMatches[1], true
+	if openLen := toolCDATAOpenLenAt(trimmed, 0); openLen > 0 {
+		if closeStart := findTrailingToolCDATACloseStart(trimmed); closeStart >= openLen {
+			return trimmed[openLen:closeStart], true
 		}
-	if strings.HasPrefix(strings.ToLower(trimmed), "<![cdata[") {
-		return trimmed[len("<![CDATA["):], true
+		if end := findToolCDATAEnd(trimmed, openLen); end >= 0 {
+			return trimmed[openLen:end], true
+		}
+		return trimmed[openLen:], true
 	}
 	return "", false
 }
@@ -145,26 +145,22 @@ func SanitizeLooseCDATA(text string) string {
 		return ""
 	}

-	lower := strings.ToLower(text)
-	const openMarker = "<![cdata["
-	const closeMarker = "]]>"
-
 	var b strings.Builder
 	b.Grow(len(text))
 	changed := false
 	pos := 0
 	for pos < len(text) {
-		startRel := strings.Index(lower[pos:], openMarker)
-		if startRel < 0 {
+		start := indexToolCDATAOpen(text, pos)
+		if start < 0 {
 			b.WriteString(text[pos:])
 			break
 		}
-		start := pos + startRel
-		contentStart := start + len(openMarker)
+		openLen := toolCDATAOpenLenAt(text, start)
+		contentStart := start + openLen
 		b.WriteString(text[pos:start])

-		if endRel := strings.Index(lower[contentStart:], closeMarker); endRel >= 0 {
-			end := contentStart + endRel + len(closeMarker)
+		if endRel := findToolCDATAEnd(text, contentStart); endRel >= 0 {
+			end := endRel + toolCDATACloseLenAt(text, endRel)
 			b.WriteString(text[start:end])
 			pos = end
 			continue
--- a/internal/toolcall/toolcalls_parse.go
+++ b/internal/toolcall/toolcalls_parse.go
@@ -53,7 +53,6 @@ func parseToolCallsDetailedXMLOnly(text string) ToolCallParseResult {
 	if trimmed == "" {
 		return result
 	}
-	result.SawToolCallSyntax = looksLikeToolCallSyntax(trimmed)
 	trimmed = stripFencedCodeBlocks(trimmed)
 	trimmed = strings.TrimSpace(trimmed)
 	if trimmed == "" {
@@ -64,8 +63,9 @@ func parseToolCallsDetailedXMLOnly(text string) ToolCallParseResult {
 	if !ok {
 		return result
 	}
+	result.SawToolCallSyntax = looksLikeToolCallSyntax(normalized) || hasRepairableXMLToolCallsWrapper(normalized)
 	parsed := parseXMLToolCalls(normalized)
-	if len(parsed) == 0 && strings.Contains(strings.ToLower(normalized), "<![cdata[") {
+	if len(parsed) == 0 && indexToolCDATAOpen(normalized, 0) >= 0 {
 		recovered := SanitizeLooseCDATA(normalized)
 		if recovered != normalized {
 			parsed = parseXMLToolCalls(recovered)
@@ -92,45 +92,11 @@ func filterToolCallsDetailed(parsed []ParsedToolCall) ([]ParsedToolCall, []strin
 		if tc.Input == nil {
 			tc.Input = map[string]any{}
 		}
-		if len(tc.Input) > 0 && !toolCallInputHasMeaningfulValue(tc.Input) {
-			continue
-		}
 		out = append(out, tc)
 	}
 	return out, nil
 }

-func toolCallInputHasMeaningfulValue(v any) bool {
-	switch x := v.(type) {
-	case nil:
-		return false
-	case string:
-		return strings.TrimSpace(x) != ""
-	case map[string]any:
-		if len(x) == 0 {
-			return false
-		}
-		for _, child := range x {
-			if toolCallInputHasMeaningfulValue(child) {
-				return true
-			}
-		}
-		return false
-	case []any:
-		if len(x) == 0 {
-			return false
-		}
-		for _, child := range x {
-			if toolCallInputHasMeaningfulValue(child) {
-				return true
-			}
-		}
-		return false
-	default:
-		return true
-	}
-}
-
 func looksLikeToolCallSyntax(text string) bool {
 	hasDSML, hasCanonical := ContainsToolCallWrapperSyntaxOutsideIgnored(text)
 	return hasDSML || hasCanonical
@@ -187,8 +153,31 @@ func stripFencedCodeBlocks(text string) string {
 	return b.String()
 }

+func markdownCodeSpanEnd(text string, start int) (int, bool) {
+	if start < 0 || start >= len(text) || text[start] != '`' {
+		return start, false
+	}
+	count := countLeadingFenceChars(text[start:], '`')
+	if count == 0 {
+		return start, false
+	}
+	search := start + count
+	for search < len(text) {
+		if text[search] != '`' {
+			search++
+			continue
+		}
+		run := countLeadingFenceChars(text[search:], '`')
+		if run == count {
+			return search + run, true
+		}
+		search += run
+	}
+	return start, false
+}
+
 func cdataStartsBeforeFence(line string) bool {
-	cdataIdx := strings.Index(strings.ToLower(line), "<![cdata[")
+	cdataIdx := indexToolCDATAOpen(line, 0)
 	if cdataIdx < 0 {
 		return false
 	}
@@ -212,17 +201,19 @@ func firstFenceMarkerIndex(line string) int {
 }

 func updateCDATAStateForStrip(inCDATA bool, cdataFenceMarker, line string) (bool, string) {
-	lower := strings.ToLower(line)
 	pos := 0
 	state := inCDATA
 	fenceMarker := cdataFenceMarker
 	lineForFence := line
 	if !state {
-		start := strings.Index(lower[pos:], "<![cdata[")
+		start := indexToolCDATAOpen(line, pos)
 		if start < 0 {
 			return false, ""
 		}
-		pos += start + len("<![cdata[")
+		pos = start + toolCDATAOpenLenAt(line, start)
+		if pos > len(line) {
+			pos = len(line)
+		}
 		state = true
 		lineForFence = line[pos:]
 	}
@@ -239,24 +230,37 @@ func updateCDATAStateForStrip(inCDATA bool, cdataFenceMarker, line string) (bool
 		fenceMarker = ""
 	}

-	for pos < len(lower) {
-		end := strings.Index(lower[pos:], "]]>")
-		if end < 0 {
+	for pos < len(line) {
+		endPos := -1
+		closeLen := 0
+		for search := pos; search < len(line); search++ {
+			if foundLen := toolCDATACloseLenAt(line, search); foundLen > 0 {
+				endPos = search
+				closeLen = foundLen
+				break
+			}
+		}
+		if endPos < 0 {
 			return true, fenceMarker
 		}
-		endPos := pos + end
-		pos = endPos + len("]]>")
+		pos = endPos + closeLen
+		if pos > len(line) {
+			pos = len(line)
+		}
 		if fenceMarker != "" {
 			continue
 		}
-		if cdataEndLooksStructural(lower, pos) || strings.TrimSpace(lower[pos:]) == "" {
+		if cdataEndLooksStructural(line, pos) || strings.TrimSpace(line[pos:]) == "" {
 			state = false
-			for pos < len(lower) {
-				start := strings.Index(lower[pos:], "<![cdata[")
+			for pos < len(line) {
+				start := indexToolCDATAOpen(line, pos)
 				if start < 0 {
 					return false, ""
 				}
-				pos += start + len("<![cdata[")
+				pos = start + toolCDATAOpenLenAt(line, start)
+				if pos > len(line) {
+					pos = len(line)
+				}
 				state = true
 				trimmedTail := strings.TrimLeft(line[pos:], " \t")
 				if marker, ok := parseFenceOpen(trimmedTail); ok {
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
CJACK.	882d0d16ac	Update VERSION	2026-05-10 23:33:52 +08:00
CJACK.	92b30932cf	Merge pull request #483 from NgoQuocViet2001/ai/tool-result-leak-sanitize fix(openai): strip leaked tool result markers	2026-05-10 23:33:16 +08:00
NgoQuocViet2001	3e935c088b	fix(openai): strip leaked tool result markers	2026-05-10 22:05:33 +07:00
CJACK	3569ae136a	fix webui static root path guard	2026-05-10 18:55:57 +08:00
CJACK	77a47ada4e	Fix tool detection when unclosed backtick precedes tool call Handles cases where a stray backtick opens an inline code span but is never closed. Previously, any subsequent XML tool tag was treated as inside markdown code and ignored. Now, tool tags are detected after an unclosed backtick, and the markdown state is reset when the backtick is confirmed to be literal text at stream boundaries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-10 18:41:51 +08:00
CJACK.	8623920c89	Merge pull request #476 from CJackHwang/codex/fix-security-advisory-ghsa-rf34-c5jc-4ffw [codex] fix security advisory and toolcall parsing issues	2026-05-10 18:06:24 +08:00
CJACK	e393110121	fix toolcall inline code and query redaction	2026-05-10 18:02:54 +08:00
CJACK	243860f6d3	bump version to 4.6.1	2026-05-10 17:02:40 +08:00
CJACK	03ea3728e7	fix security advisory issues	2026-05-10 17:01:22 +08:00
CJACK.	22a00dc667	Merge pull request #475 from CJackHwang/dev feat: enhance DSML tool-call parsing drift tolerance and update API docs	2026-05-10 16:33:14 +08:00
CJACK	0b05915bb6	Merge branch 'pr-474' into dev # Conflicts: # internal/httpapi/openai/chat/handler.go # internal/httpapi/openai/chat/vercel_prepare_test.go # internal/httpapi/openai/chat/vercel_stream.go	2026-05-10 16:28:36 +08:00
CJACK	eaeb403fda	feat: align Go/Node DSML tool-call parsing drift tolerance and update API docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-10 16:17:46 +08:00
CJACK	cee8757d14	revert: replace fullwidth pipe ｜ with halfwidth \| in DSML tool markup PR #460 introduced fullwidth pipe characters (｜) in DSML tool call formatting to improve parsing robustness, but models exposed to these fullwidth pipes in system prompts exhibit significantly higher rates of tool output hallucinations. Reverting to halfwidth pipes (\|) drastically reduces tokenizer/perplexity-driven hallucinations while retaining the existing confusable-hardening in the parser. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 15:18:54 +08:00
ds2api-bot	45590d6748	style: fix gofmt formatting in vercel_prepare_test.go	2026-05-10 07:08:50 +00:00
CJACK.	3beb31309f	Merge pull request #473 from LoGGGG240211/codex/integrate-confusable-hardening feat(toolcall): harden confusable candidate spans	2026-05-10 14:12:10 +08:00
Your Name	196e3c46f6	feat(toolcall): harden confusable candidate spans	2026-05-10 09:27:30 +07:00
ds2api-bot	df6859bddc	fix(vercel): enable auto-delete session on Vercel stream release The "delete current conversation" feature was not working on Vercel deployment because the stream flow uses a separate lease mechanism. The session_id created during prepare phase was not preserved for deletion when the stream ends. Changes: - Add SessionID field to streamLease struct to preserve session_id - Pass session_id to holdStreamLease during prepare - Modify releaseStreamLease to return auth and session_id - Call autoDeleteRemoteSession in handleVercelStreamRelease when releasing a lease with auto-delete mode enabled Closes #vercel-auto-delete	2026-05-10 02:05:05 +00:00
CJACK.	6a8edf96c3	Merge pull request #470 from CJackHwang/dev feat: DSML/CDATA parsing robustness, tool-call resilience, and completion retry improvements	2026-05-10 04:57:59 +08:00
CJACK	1aa791ec3a	feat: support PascalCase local-name drift in DSML tool markup parsing Detect camelCase→PascalCase boundaries between arbitrary prefixes and fixed local names (tool_calls/invoke/parameter), so that fused forms like <DSmartToolCalls> are recognized without explicit separator characters. Also add the underscore-free alias "toolcalls" as a valid DSML local name. Includes lookalike rejection tests to ensure near-matches like <DSmartToolCallsExtra> are not falsely accepted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-10 04:52:19 +08:00
CJACK	247fc7c788	refactor: unify tool markup pipe and CDATA separator into general-purpose separator detector Replace the hardcoded isToolMarkupPipe (matching \|, ｜, ␂, \x02, !) and isToolCDATAOpenSeparator (exclusion-based) with a single isToolMarkupSeparator that treats any Unicode punctuation outside structural characters as a valid DSML separator. This eliminates the need for a per-character allowlist — novel separators like ※ are automatically supported without code changes. Also removes the unused cdataPattern regexp and updates docs to use "non-structural separator" terminology. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-10 04:24:10 +08:00
CJACK	7a28b9e265	feat: improve CDATA and DSML tag parsing robustness with support for fullwidth-bang, ideographic-comma, and extended quote/separator normalization.	2026-05-10 03:41:55 +08:00
CJACK.	7d24a08b0f	Merge pull request #469 from CJackHwang/dev feat: tool-call markup parsing resilience — CJK, arbitrary prefixes, control separators, and retry hardening	2026-05-10 02:16:13 +08:00
CJACK	61d42f8b72	feat: add support for CJK angle bracket and trailing attribute separator drift in DSML tool parsing	2026-05-10 01:54:31 +08:00
CJACK	77b6d83266	feat: expand tool-call parsing resilience, refine model alias resolution, and update API documentation	2026-05-10 01:35:43 +08:00
CJACK	740a78ad5a	refactor: allow and preserve empty tool parameter values while updating sieve to release malformed XML as text	2026-05-10 01:05:18 +08:00
CJACK	ddd42e532e	feat: implement managed-account rotation on 429 empty-output completion retries	2026-05-10 00:41:45 +08:00
CJACK	3cc7f469f3	feat: implement support for arbitrary tool markup prefixes and control character separators in tool sieve parsing	2026-05-10 00:19:03 +08:00
CJACK	7c66742a19	refactor: unify empty-output retry logic into shared completionruntime package and normalize protocol adapter boundary.	2026-05-10 00:10:53 +08:00
CJACK	067cf465bb	feat: integrate reasoning content into assistant tool-call messages and improve tool markup parsing for prompt compatibility	2026-05-09 23:16:07 +08:00
CJACK.	dbf2bfb64f	Merge pull request #466 from CJackHwang/dev Merge pull request #465 from CJackHwang/codex/review-and-update-project-documentation docs: clarify Vercel chat-stream supports root `/chat/completions` alias	2026-05-09 19:46:45 +08:00
CJACK.	9e9a7f1bec	Merge pull request #467 from CJackHwang/codex/fix-dsml-delimiter-consistency-in-examples fix(toolcall): unify DSML delimiter in correct examples	2026-05-09 19:37:40 +08:00
CJACK.	96691aa37a	fix(toolcall): unify DSML delimiter in correct examples	2026-05-09 19:29:36 +08:00
CJACK.	a3ce8008af	Merge pull request #465 from CJackHwang/codex/review-and-update-project-documentation docs: clarify Vercel chat-stream supports root `/chat/completions` alias	2026-05-09 19:00:27 +08:00
CJACK.	23a79df687	docs: sync Vercel chat-stream path alias in docs	2026-05-09 18:59:08 +08:00
CJACK.	e251a7ee29	Merge pull request #463 from CJackHwang/codex/align-vercel-js-path-with-go-implementation-tqm8ci Align Vercel JS CORS Vary-Origin behavior with Go	2026-05-09 18:27:39 +08:00
CJACK.	30cca7cda0	Merge pull request #462 from CJackHwang/codex/align-vercel-js-path-with-go-implementation fix(vercel): align JS stream path guard with Go /chat/completions alias	2026-05-09 18:26:35 +08:00
CJACK.	ab163edee7	Merge pull request #461 from CJackHwang/codex/refactor-hasasciiprefixfoldat-and-hasdsmlprefix 抽取通用 ASCII 部分前缀匹配以合并重复的 DSML 前缀逻辑	2026-05-09 18:20:31 +08:00
CJACK.	1201c3773f	Align Vercel JS CORS Vary-Origin behavior with Go	2026-05-09 18:17:16 +08:00
CJACK.	595ddf52af	fix(vercel): align js stream path guard with go chat alias	2026-05-09 18:16:47 +08:00
CJACK.	0adffccd46	refactor tool markup prefix folding helpers	2026-05-09 18:09:12 +08:00
CJACK.	0670d5acb4	Update VERSION	2026-05-09 17:54:26 +08:00
CJACK.	239c4faa97	Merge pull request #460 from waiwaic/main fix(toolcall): eliminate strings.ToLower panics from Unicode case folding	2026-05-09 16:56:46 +08:00
waiwai	f33789399e	fix(toolcall): correct DSML closing tag slash position The closing tag format was <｜/DSML｜tag> but must be </｜DSML｜tag>. The scanner's closing-tag detection checks text[1] == '/', so the slash must come immediately after '<', before the first full-width pipe (U+FF5C). Tags like <｜/DSML｜tool_calls> would not set closing=true and would not match any tool markup name. Files fixed: - internal/toolcall/tool_prompt.go: all closing tags - internal/promptcompat/prompt_build_test.go: 1 test expectation	2026-05-09 16:42:22 +08:00
waiwai	1e00e482a6	fix(toolcall): eliminate strings.ToLower panics from Unicode case folding Replace all strings.ToLower usage with ASCII case-insensitive matching (hasASCIIPrefixFoldAt, indexASCIIFold, hasDSMLPrefix) to prevent slice bounds errors when Unicode characters change byte length after case folding (e.g., Turkish İ U+0130 → i + combining dot: 2 bytes → 3 bytes). Root cause: code created a strings.ToLower(text) copy, found byte positions in that copy, then used those positions to slice the original text — byte offsets that were valid in the lowercased copy became out-of-bounds in the original when case folding changed byte lengths. Files changed: - toolcalls_scan.go: remove 5 lower usages, add hasDSMLPrefix - toolcalls_parse_markup.go: remove 3 lower usages, add indexASCIIFold - toolcalls_markup.go: SanitizeLooseCDATA lower removal - toolcalls_parse.go: updateCDATAStateForStrip lower removal - tool_prompt.go: align DSML pipe characters with tool call spec - tool_prompt_test.go: fix pre-existing test character mismatch	2026-05-09 15:05:51 +08:00
CJACK.	7ab5a0e66d	Merge pull request #458 from CJackHwang/dev Avoid lowercasing ignored XML tails in toolcall	2026-05-08 17:13:00 +08:00
CJACK.	410efbd70b	Merge pull request #457 from NgoQuocViet2001/ai/skipxml-lower-hotpath fix(toolcall): avoid lowercasing ignored XML tails	2026-05-08 17:05:28 +08:00
NgoQuocViet2001	7179b995bb	fix(toolcall): avoid lowercasing ignored XML tails	2026-05-08 14:15:32 +07:00
CJACK.	fef3798e5e	Merge pull request #453 from CJackHwang/dev Fix character length calculation issue	2026-05-08 13:40:47 +08:00
CJACK.	00fe18b505	Update VERSION	2026-05-08 13:36:17 +08:00
CJACK.	9b746e32d8	Merge pull request #452 from waiwaic/fix/turkish-i-boundary-panic fix(toolcall): use len(lower) not len(text) after ToLower to prevent out-of-bounds panic	2026-05-08 13:34:28 +08:00
waiwai	ace440481a	refactor(toolcall): remove lower param from skipXMLIgnoredSection The lower parameter was a footgun: callers had to keep it in sync with the loop bound over text. Instead, skipXMLIgnoredSection now accepts only text and constructs strings.ToLower(tail) internally for its prefix checks. This eliminates the entire class of len(text) vs len(lower) boundary bugs along with the min() workaround. Also changes: - findToolCDATAEnd: drop lower param, use text directly for closeMarker search (]]> is ASCII, ToLower is a no-op for it) - cdataEndLooksStructural: drop lower param, use raw text byte comparison - All external callers: loop bound reverts to plain len(text) The inner tag-matching functions (findXMLStartTagOutsideCDATA, findMatchingXMLEndTagOutsideCDATA) retain their own local lower for HasPrefix comparisons against the target tag name, keeping concerns properly separated. Fixes #435.	2026-05-08 13:29:21 +08:00
CJACK.	66e0fa568f	Merge pull request #449 from CJackHwang/dev Update	2026-05-08 01:24:16 +08:00
CJACK.	fa489248bc	Merge pull request #450 from CJackHwang/codex/add-json-tag-for-ollama-model-id Add Ollama-compatible API endpoints and model capability support	2026-05-08 01:21:41 +08:00
CJACK.	657b9379ed	test(docs): assert ollama show id field and document ollama endpoints	2026-05-08 01:11:35 +08:00
CJACK.	9062330104	Merge pull request #446 from dinhnn/main feat: add Ollama API endpoints /api/version, /api/tags, /api/show for Copilot integration	2026-05-08 00:54:16 +08:00
Dinh Nguyen	d0d61a5d77	Update ollama api test	2026-05-07 14:23:12 +07:00
dinhnn	ffef451f7a	Fixbug test typing	2026-05-07 13:48:03 +07:00
Dinh Nguyen	a68a79e087	Add ollama api for copilot support	2026-05-07 09:41:46 +07:00
CJACK.	c8db66615c	Update VERSION	2026-05-06 13:04:16 +08:00
CJACK.	79ae9c8970	Merge pull request #436 from waiwaic/main fix: auto-detect Vercel for chat history path	2026-05-06 13:03:30 +08:00
waiwai	2378f0fbe7	fix: auto-detect Vercel for chat history path On Vercel, /var/task is read-only at runtime. ChatHistoryPath() now auto-detects Vercel via IsVercel() and defaults to /tmp/chat_history.json when no explicit DS2API_CHAT_HISTORY_PATH is set. Manual env var still works as explicit override.	2026-05-06 11:10:14 +08:00
@@ -1 +1 @@
 .4.3
 .6.2