Fix stream compatibility and vision model exposure

2026-05-07 01:45:27 +08:00 · 2026-04-29 20:23:13 +08:00
parent d7e071b24a
commit 241334c658
42 changed files with 603 additions and 157 deletions
--- a/API.en.md
+++ b/API.en.md
@@ -199,8 +199,7 @@ No auth required. Returns the currently supported DeepSeek native model list.
    {"id": "deepseek-v4-pro", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
    {"id": "deepseek-v4-flash-search", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
    {"id": "deepseek-v4-pro-search", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
-    {"id": "deepseek-v4-vision", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
-    {"id": "deepseek-v4-vision-search", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []}
+    {"id": "deepseek-v4-vision", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []}
  ]
 }
 ```
@@ -224,6 +223,8 @@ Built-in aliases come from `internal/config/models.go`; `config.model_aliases` c
 - Gemini: `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-pro-vision`
 - Other compatibility families: `llama-*`, `qwen-*`, `mistral-*`, and `command-*` fall back through family heuristics

+Current vision support resolves only to `deepseek-v4-vision` and does not expose a separate `vision-search` variant.
+
 Retired historical families such as `claude-1.*`, `claude-2.*`, `claude-instant-*`, and `gpt-3.5*` are explicitly rejected.

 ### `POST /v1/chat/completions`
--- a/API.md
+++ b/API.md
@@ -204,9 +204,7 @@ Gemini 兼容客户端还可以使用 `x-goog-api-key`、`?key=` 或 `?api_key=`
    {"id": "deepseek-v4-pro-search", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
    {"id": "deepseek-v4-pro-search-nothinking", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
    {"id": "deepseek-v4-vision", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
-    {"id": "deepseek-v4-vision-nothinking", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
-    {"id": "deepseek-v4-vision-search", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []},
-    {"id": "deepseek-v4-vision-search-nothinking", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []}
+    {"id": "deepseek-v4-vision-nothinking", "object": "model", "created": 1677610602, "owned_by": "deepseek", "permission": []}
  ]
 }
 ```
@@ -232,6 +230,7 @@ Gemini 兼容客户端还可以使用 `x-goog-api-key`、`?key=` 或 `?api_key=`
 - 其他兼容族：`llama-*`、`qwen-*`、`mistral-*`、`command-*` 会按家族启发式回退

 上述 alias 若在请求名后追加 `-nothinking` 后缀，也会映射到对应的强制关闭 thinking 版本。
+当前视觉能力仅对应 `deepseek-v4-vision` / `deepseek-v4-vision-nothinking`，不会解析出独立的 `vision-search` 变体。

 退役历史模型（如 `claude-1.*`、`claude-2.*`、`claude-instant-*`、`gpt-3.5*`）会被显式拒绝。

--- a/README.MD
+++ b/README.MD
@@ -158,10 +158,9 @@ flowchart LR
 | expert | `deepseek-v4-pro-search-nothinking` | 永久关闭，不受请求参数影响 | ✅ |
 | vision | `deepseek-v4-vision` | 默认开启，可由请求参数控制 | ❌ |
 | vision | `deepseek-v4-vision-nothinking` | 永久关闭，不受请求参数影响 | ❌ |
-| vision | `deepseek-v4-vision-search` | 默认开启，可由请求参数控制 | ✅ |
-| vision | `deepseek-v4-vision-search-nothinking` | 永久关闭，不受请求参数影响 | ✅ |

 除原生模型外，也支持常见 alias 输入（如 `gpt-4.1`、`gpt-5`、`gpt-5-codex`、`o3`、`claude-*`、`gemini-*` 等），但 `/v1/models` 返回的是规范化后的 DeepSeek 原生模型 ID。若 alias 名本身追加 `-nothinking` 后缀，也会映射到对应的强制关思考模型。完整 alias 行为以 [API.md](API.md#模型-alias-解析策略) 和 `config.example.json` 为准。
+当前上游视觉模型只暴露 `vision` 通道，不提供独立的联网搜索视觉变体。

 ### Claude 接口（`GET /anthropic/v1/models`）

@@ -316,7 +315,7 @@ go run ./cmd/ds2api
 - `runtime`：账号并发、队列与 token 刷新策略，可通过 Admin Settings 热更新。
 - `auto_delete.mode`：请求结束后的远端会话清理策略，支持 `none` / `single` / `all`。
 - `history_split`：旧轮次拆分字段，已废弃并忽略，仅保留兼容旧配置。
- `current_input_file`：唯一生效的独立拆分策略；默认开启且阈值为 `0`，触发时将完整上下文合并上传为隐藏上下文文件。
+- `current_input_file`：唯一生效的独立拆分策略；默认开启且阈值为 `0`，触发时将完整上下文合并上传为 `history.txt` 上下文文件。
 - 如果关闭 `current_input_file`，请求会直接透传，不上传拆分上下文文件。
 - `thinking_injection`：默认开启；在最新 user 消息末尾追加思考增强提示词，提高高强度推理与工具调用前的思考稳定性；`prompt` 留空时使用内置默认提示词。

--- a/README.en.md
+++ b/README.en.md
@@ -150,9 +150,9 @@ For the full module-by-module architecture and directory responsibilities, see [
 | default | `deepseek-v4-flash-search` | enabled by default, request-controlled | ✅ |
 | expert | `deepseek-v4-pro-search` | enabled by default, request-controlled | ✅ |
 | vision | `deepseek-v4-vision` | enabled by default, request-controlled | ❌ |
-| vision | `deepseek-v4-vision-search` | enabled by default, request-controlled | ✅ |

 Besides native IDs, DS2API also accepts common aliases as input (for example `gpt-4.1`, `gpt-5`, `gpt-5-codex`, `o3`, `claude-*`, `gemini-*`), but `/v1/models` returns normalized DeepSeek native model IDs. The complete alias behavior is documented in [API.en.md](API.en.md#model-alias-resolution) and `config.example.json`.
+Current upstream vision support exposes only the `vision` lane and does not provide a separate search-enabled vision variant.

 ### Claude Endpoint (`GET /anthropic/v1/models`)

@@ -304,7 +304,7 @@ Common fields:
 - `runtime`: account concurrency, queueing, and token refresh behavior, hot-reloadable via Admin Settings.
 - `auto_delete.mode`: remote session cleanup after each request, supporting `none` / `single` / `all`.
 - `history_split`: legacy multi-turn history split field, now ignored and kept only for backward-compatible config loading.
- `current_input_file`: the only active split mode; it is enabled by default and uploads the full context as a hidden context file once the character threshold is reached.
+- `current_input_file`: the only active split mode; it is enabled by default and uploads the full context as a `history.txt` context file once the character threshold is reached.
 - If you turn off `current_input_file`, requests pass through directly without uploading any split context file.

 For the full environment variable list, see [docs/DEPLOY.en.md](docs/DEPLOY.en.md). For auth behavior, see [API.en.md](API.en.md#authentication).
--- a/docs/DeepSeekSSE行为结构说明-2026-04-05.md
+++ b/docs/DeepSeekSSE行为结构说明-2026-04-05.md
@@ -309,7 +309,7 @@ parse SSE block
 - 新模型可能增加新的 `p` 路径。
 - 新版本可能增加新的 fragment.type。
 - `CONTENT_FILTER` 的终态模板内容可能变化。
- 自动续写相关状态（如 `INCOMPLETE` / `AUTO_CONTINUE`）当前主要来自实测与实现兼容逻辑，后续字段形态仍可能变化。
+- 自动续写相关状态（如 `INCOMPLETE` / `AUTO_CONTINUE`）当前主要来自实测与实现兼容逻辑，后续字段形态仍可能变化。当前实现不会仅因早期 `WIP` 状态就自动继续；只有显式 `INCOMPLETE` 或 `auto_continue` 信号才会触发 continue。
 - 解析器应当对未知字段、未知路径、未知事件保持容忍。

 如果你要把这份说明用于实际开发，建议同时保留原始流样本、回放脚本和回归测试，不要只依赖本文。
--- a/docs/prompt-compatibility.md
+++ b/docs/prompt-compatibility.md
@@ -102,7 +102,7 @@ DS2API 当前的核心思路，不是把客户端传来的 `messages`、`tools`
 - 但 DeepSeek 远端本身支持同一 `chat_session_id` 的跨轮次持续对话。2026-04-27 已用项目内现有 DeepSeek client 做过一次不改业务代码的双轮实测：同一 `chat_session_id` 下，第 1 轮返回 `request_message_id=1` / `response_message_id=2` / 文本 `SESSION_TEST_ONE`；第 2 轮重新获取一次 PoW，并发送 `parent_message_id=2` 后，成功返回 `request_message_id=3` / `response_message_id=4` / 文本 `SESSION_TEST_TWO`。这说明“同远端会话持续聊天”能力存在，且每轮需要携带正确的 parent/message 链接信息，同时重新获取对应轮次可用的 PoW。
 - OpenAI Chat / Responses 原生走统一 OpenAI 标准化与 DeepSeek payload 组装；Claude / Gemini 会尽量复用 OpenAI prompt/tool 语义，其中 Gemini 直接复用 `promptcompat.BuildOpenAIPromptForAdapter`，Claude 消息接口在可代理场景会转换为 OpenAI chat 形态再执行。
 - 客户端传入的 thinking / reasoning 开关会被归一到下游 `thinking_enabled`。Gemini `generationConfig.thinkingConfig.thinkingBudget` 会翻译成同一套 thinking 开关；关闭时即使上游返回 `response/thinking_content`，兼容层也不会把它当作可见正文输出。若最终解析出的模型名带 `-nothinking` 后缀，则会无条件强制关闭 thinking，优先级高于请求体中的 `thinking` / `reasoning` / `reasoning_effort`。Claude surface 在流式请求且未显式声明 `thinking` 时，仍按 Anthropic 语义默认关闭；但在非流式代理场景，兼容层会内部开启一次下游 thinking，用于捕获“正文为空、工具调用落在 thinking 里”的情况，随后在回包前剥离用户不可见的 thinking block。
- 对 OpenAI Chat / Responses 的非流式收尾，如果最终可见正文为空，兼容层会优先尝试把思维链中的独立 DSML / XML 工具块当作真实工具调用解析出来。流式链路也会在收尾阶段做同样的 fallback 检测，但不会因为思维链内容去中途拦截或改写流式输出；thinking / reasoning 增量仍按原样先发，只有在结束收尾时才可能补发最终工具调用结果。补发结果会作为本轮 assistant 的结构化 `tool_calls` / `function_call` 输出返回，而不是塞进 `content` 文本；如果客户端没有开启 thinking / reasoning，思维链只用于检测，不会作为 `reasoning_content` 或可见正文暴露。只有正文为空且思维链里也没有可执行工具调用时，才继续按空回复错误处理。
+- 对 OpenAI Chat / Responses 的非流式收尾，如果最终可见正文为空，兼容层会优先尝试把思维链中的独立 DSML / XML 工具块当作真实工具调用解析出来。流式链路也会在收尾阶段做同样的 fallback 检测，但不会因为思维链内容去中途拦截或改写流式输出；真正的工具识别始终基于原始上游文本，而不是基于“已经做过可见输出清洗”的版本，因此即使最终可见层会剥离完整 leaked DSML / XML `tool_calls` wrapper、并抑制全空参数或无效 wrapper 块，也不会影响真实工具调用转成结构化 `tool_calls` / `function_call`。补发结果会作为本轮 assistant 的结构化 `tool_calls` / `function_call` 输出返回，而不是塞进 `content` 文本；如果客户端没有开启 thinking / reasoning，思维链只用于检测，不会作为 `reasoning_content` 或可见正文暴露。只有正文为空且思维链里也没有可执行工具调用时，才继续按空回复错误处理。
 - OpenAI Chat / Responses 的空回复错误处理之前会默认做一次内部补偿重试：第一次上游完整结束后，如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用，并且终止原因不是 `content_filter`，兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略，把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。重试遵循 DeepSeek 多轮对话协议：从第一次上游 SSE 流中提取 `response_message_id`，并在重试 payload 中设置 `parent_message_id` 为该值，使重试成为同一会话的后续轮次而非断裂的根消息；同时重新获取一次 PoW（若 PoW 获取失败则回退到原始 PoW）。该重试不会重新标准化消息、不会新建 session、不会切换账号，也不会向流式客户端插入重试标记；第二次 thinking / reasoning 会按正常增量直接接到第一次之后，并继续使用 overlap trim 去重。若第二次仍为空，终端错误码仍保持现有 `upstream_empty_output`；若任一尝试触发空 `content_filter`，不做补偿重试并保持 `content_filter` 错误。JS Vercel 运行时同样设置 `parent_message_id`，但因无法直接调用 PoW API 而复用原始 PoW。

 - OpenAI Chat / Responses 在最终可见正文渲染阶段，会把 DeepSeek 搜索返回中的 `[citation:N]` / `[reference:N]` 标记替换成对应 Markdown 链接。`citation` 标记按一基序号解析；`reference` 标记只有在同一段正文中出现 `[reference:0]`（允许冒号后有空格）时才按零基序号映射，并且不会影响同段正文里的 `citation` 标记。
@@ -246,7 +246,7 @@ OpenAI 文件相关实现：

 兼容层现在只保留 `current_input_file` 这一种拆分方式；旧的 `history_split` 已废弃，只保留为兼容旧配置的字段，不再参与请求处理。

- `current_input_file` 默认开启；它用于把“完整上下文”合并进隐藏上下文文件。当最新 user turn 的纯文本长度达到 `current_input_file.min_chars`（默认 `0`）时，兼容层会上传一个文件名为 `IGNORE.txt` 的上下文文件，并在 live prompt 中只保留一个中性的 user 消息要求模型直接回答最新请求，不再暴露文件名或要求模型读取本地文件。
+- `current_input_file` 默认开启；它用于把“完整上下文”合并进 `history.txt` 上下文文件。当最新 user turn 的纯文本长度达到 `current_input_file.min_chars`（默认 `0`）时，兼容层会上传一个文件名为 `history.txt` 的上下文文件，并在 live prompt 中只保留一个中性的 user 消息要求模型直接回答最新请求，不再暴露文件名或要求模型读取本地文件。
 - 如果 `current_input_file.enabled=false`，请求会直接透传，不上传任何拆分上下文文件。
 - 旧的 `history_split.enabled` / `history_split.trigger_after_turns` 会被读取进配置对象以保持兼容，但不会触发拆分上传，也不会影响 `current_input_file` 的默认开启。

@@ -259,15 +259,15 @@ OpenAI 文件相关实现：
 - 旧历史拆分兼容壳：
  [internal/httpapi/openai/history/history_split.go](../internal/httpapi/openai/history/history_split.go)

-当前输入转文件启用并触发时，上传文件的真实文件名是 `IGNORE.txt`，文件内容是完整 `messages` 上下文；它仍会先用 OpenAI 消息标准化和 DeepSeek 角色标记序列化，再包进 `IGNORE` 文件边界里：
+当前输入转文件启用并触发时，上传文件的真实文件名是 `history.txt`，文件内容是完整 `messages` 上下文；它仍会先用 OpenAI 消息标准化和 DeepSeek 角色标记序列化，再包进 `history.txt` 文件边界里：

 ```text
-[uploaded filename]: IGNORE.txt
+[uploaded filename]: history.txt
 [file content end]

 <｜begin▁of▁sentence｜><｜System｜>...<｜User｜>...<｜Assistant｜>...<｜Tool｜>...<｜User｜>...

-[file name]: IGNORE
+[file name]: history.txt
 [file content begin]
 ```

@@ -335,7 +335,7 @@ OpenAI 文件相关实现：

 - 大部分结构化语义被压进 `prompt`
 - 文件保持文件
- 需要时把完整上下文拆进隐藏上下文文件
+- 需要时把完整上下文拆进 `history.txt` 上下文文件

 ## 12. 修改时必须同步本文档的场景

@@ -348,7 +348,7 @@ OpenAI 文件相关实现：
 - tool result 注入方式变更
 - tool prompt 模板或 tool_choice 约束变更
 - inline 文件上传 / 文件引用收集规则变更
- current input file 触发条件、上传格式、`IGNORE` 包装格式变更
+- current input file 触发条件、上传格式、`history.txt` 包装格式变更
 - 旧 `history_split` 兼容逻辑的读取、忽略或退化行为变更
 - completion payload 字段语义变更
 - Claude / Gemini 对这套统一语义的复用关系变更
--- a/internal/chathistory/store.go
+++ b/internal/chathistory/store.go
@@ -14,6 +14,7 @@ import (
 	"github.com/google/uuid"

 	"ds2api/internal/config"
+	"ds2api/internal/util"
 )

 const (
@@ -610,8 +611,8 @@ func buildPreview(item Entry) string {
 	if candidate == "" {
 		candidate = strings.TrimSpace(item.UserInput)
 	}
-	if len(candidate) > defaultPreviewAt {
-		return candidate[:defaultPreviewAt] + "..."
+	if truncated, ok := util.TruncateRunes(candidate, defaultPreviewAt); ok {
+		return truncated + "..."
 	}
 	return candidate
 }
--- a/internal/chathistory/store_test.go
+++ b/internal/chathistory/store_test.go
@@ -8,6 +8,7 @@ import (
 	"strings"
 	"sync"
 	"testing"
+	"unicode/utf8"
 )

 func blockDetailDir(t *testing.T, detailDir string) func() {
@@ -105,6 +106,17 @@ func TestStoreCreatesAndPersistsEntries(t *testing.T) {
 	}
 }

+func TestBuildPreviewPreservesUTF8MB4Characters(t *testing.T) {
+	long := strings.Repeat("😀", defaultPreviewAt+1)
+	preview := buildPreview(Entry{Content: long})
+	if !utf8.ValidString(preview) {
+		t.Fatalf("expected valid utf-8 preview, got %q", preview)
+	}
+	if preview != strings.Repeat("😀", defaultPreviewAt)+"..." {
+		t.Fatalf("unexpected preview: %q", preview)
+	}
+}
+
 func TestStoreTrimsToConfiguredLimit(t *testing.T) {
 	path := filepath.Join(t.TempDir(), "chat_history.json")
 	store := New(path)
--- a/internal/config/config_edge_test.go
+++ b/internal/config/config_edge_test.go
@@ -79,13 +79,20 @@ func TestGetModelConfigDeepSeekExpertReasonerSearch(t *testing.T) {
 	}
 }

-func TestGetModelConfigDeepSeekVisionReasonerSearch(t *testing.T) {
-	thinking, search, ok := GetModelConfig("deepseek-v4-vision-search")
+func TestGetModelConfigDeepSeekVision(t *testing.T) {
+	thinking, search, ok := GetModelConfig("deepseek-v4-vision")
 	if !ok {
-		t.Fatal("expected ok for deepseek-v4-vision-search")
+		t.Fatal("expected ok for deepseek-v4-vision")
 	}
-	if !thinking || !search {
-		t.Fatalf("expected both true, got thinking=%v search=%v", thinking, search)
+	if !thinking || search {
+		t.Fatalf("expected thinking=true search=false, got thinking=%v search=%v", thinking, search)
+	}
+}
+
+func TestGetModelConfigDeepSeekVisionSearchUnsupported(t *testing.T) {
+	_, _, ok := GetModelConfig("deepseek-v4-vision-search")
+	if ok {
+		t.Fatal("expected deepseek-v4-vision-search to be unsupported")
 	}
 }

@@ -748,18 +755,16 @@ func TestOpenAIModelsResponse(t *testing.T) {
 		t.Fatal("expected non-empty models list")
 	}
 	expected := map[string]bool{
-		"deepseek-v4-flash":                    false,
-		"deepseek-v4-flash-nothinking":         false,
-		"deepseek-v4-pro":                      false,
-		"deepseek-v4-pro-nothinking":           false,
-		"deepseek-v4-flash-search":             false,
-		"deepseek-v4-flash-search-nothinking":  false,
-		"deepseek-v4-pro-search":               false,
-		"deepseek-v4-pro-search-nothinking":    false,
-		"deepseek-v4-vision":                   false,
-		"deepseek-v4-vision-nothinking":        false,
-		"deepseek-v4-vision-search":            false,
-		"deepseek-v4-vision-search-nothinking": false,
+		"deepseek-v4-flash":                   false,
+		"deepseek-v4-flash-nothinking":        false,
+		"deepseek-v4-pro":                     false,
+		"deepseek-v4-pro-nothinking":          false,
+		"deepseek-v4-flash-search":            false,
+		"deepseek-v4-flash-search-nothinking": false,
+		"deepseek-v4-pro-search":              false,
+		"deepseek-v4-pro-search-nothinking":   false,
+		"deepseek-v4-vision":                  false,
+		"deepseek-v4-vision-nothinking":       false,
 	}
 	for _, model := range data {
 		if _, ok := expected[model.ID]; ok {
--- a/internal/config/model_alias_test.go
+++ b/internal/config/model_alias_test.go
@@ -144,10 +144,17 @@ func TestResolveModelCustomAliasToExpert(t *testing.T) {

 func TestResolveModelCustomAliasToVision(t *testing.T) {
 	got, ok := ResolveModel(mockModelAliasReader{
-		"my-vision-model": "deepseek-v4-vision-search",
+		"my-vision-model": "deepseek-v4-vision",
 	}, "my-vision-model")
-	if !ok || got != "deepseek-v4-vision-search" {
-		t.Fatalf("expected alias -> deepseek-v4-vision-search, got ok=%v model=%q", ok, got)
+	if !ok || got != "deepseek-v4-vision" {
+		t.Fatalf("expected alias -> deepseek-v4-vision, got ok=%v model=%q", ok, got)
+	}
+}
+
+func TestResolveModelHeuristicVisionIgnoresSearchSuffix(t *testing.T) {
+	got, ok := ResolveModel(nil, "gemini-vision-search")
+	if !ok || got != "deepseek-v4-vision" {
+		t.Fatalf("expected heuristic vision alias to resolve without search variant, got ok=%v model=%q", ok, got)
 	}
 }

--- a/internal/config/models.go
+++ b/internal/config/models.go
@@ -22,7 +22,6 @@ var deepSeekBaseModels = []ModelInfo{
 	{ID: "deepseek-v4-flash-search", Object: "model", Created: 1677610602, OwnedBy: "deepseek", Permission: []any{}},
 	{ID: "deepseek-v4-pro-search", Object: "model", Created: 1677610602, OwnedBy: "deepseek", Permission: []any{}},
 	{ID: "deepseek-v4-vision", Object: "model", Created: 1677610602, OwnedBy: "deepseek", Permission: []any{}},
-	{ID: "deepseek-v4-vision-search", Object: "model", Created: 1677610602, OwnedBy: "deepseek", Permission: []any{}},
 }

 var DeepSeekModels = appendNoThinkingVariants(deepSeekBaseModels)
@@ -67,7 +66,7 @@ func GetModelConfig(model string) (thinking bool, search bool, ok bool) {
 	switch baseModel {
 	case "deepseek-v4-flash", "deepseek-v4-pro", "deepseek-v4-vision":
 		return !noThinking, false, true
-	case "deepseek-v4-flash-search", "deepseek-v4-pro-search", "deepseek-v4-vision-search":
+	case "deepseek-v4-flash-search", "deepseek-v4-pro-search":
 		return !noThinking, true, true
 	default:
 		return false, false, false
@@ -81,7 +80,7 @@ func GetModelType(model string) (modelType string, ok bool) {
 		return "default", true
 	case "deepseek-v4-pro", "deepseek-v4-pro-search":
 		return "expert", true
-	case "deepseek-v4-vision", "deepseek-v4-vision-search":
+	case "deepseek-v4-vision":
 		return "vision", true
 	default:
 		return "", false
@@ -359,8 +358,6 @@ func resolveCanonicalModel(aliases map[string]string, model string) (string, boo
 	useSearch := strings.Contains(model, "search")

 	switch {
-	case useVision && useSearch:
-		return "deepseek-v4-vision-search", true
 	case useVision:
 		return "deepseek-v4-vision", true
 	case useReasoner && useSearch:
--- a/internal/deepseek/client/client_continue.go
+++ b/internal/deepseek/client/client_continue.go
@@ -7,6 +7,7 @@ import (
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	"encoding/json"
 	"errors"
+	"fmt"
 	"io"
 	"net/http"
 	"strings"
@@ -27,7 +28,7 @@ type continueState struct {
 }

 // wrapCompletionWithAutoContinue wraps the completion response body so that
-// if the upstream indicates the response is incomplete (WIP / INCOMPLETE /
+// if the upstream indicates the response is incomplete (INCOMPLETE /
 // AUTO_CONTINUE), ds2api will automatically call the DeepSeek continue
 // endpoint and splice the continuation SSE stream onto the original.
 // The caller sees a single, seamless SSE stream.
@@ -176,12 +177,12 @@ func (s *continueState) observe(data string) {
 	}
 	// Path-based status: {"p": "response/status", "v": "FINISHED"}
 	if p, _ := chunk["p"].(string); p == "response/status" {
-		if status, _ := chunk["v"].(string); status != "" {
-			s.lastStatus = strings.TrimSpace(status)
-			if strings.EqualFold(s.lastStatus, "FINISHED") {
-				s.finished = true
-			}
-		}
+		s.setStatus(asString(chunk["v"]))
+	}
+	if p, _ := chunk["p"].(string); p == "response" {
+		s.observeBatchPatches("response", chunk["v"])
+	} else {
+		s.observeBatchPatches("", chunk["v"])
 	}
 	// Nested v.response
 	v, _ := chunk["v"].(map[string]any)
@@ -189,12 +190,7 @@ func (s *continueState) observe(data string) {
 		if id := intFrom(response["message_id"]); id > 0 {
 			s.responseMessageID = id
 		}
-		if status, _ := response["status"].(string); status != "" {
-			s.lastStatus = strings.TrimSpace(status)
-			if strings.EqualFold(s.lastStatus, "FINISHED") {
-				s.finished = true
-			}
-		}
+		s.setStatus(asString(response["status"]))
 		if autoContinue, ok := response["auto_continue"].(bool); ok && autoContinue {
 			s.lastStatus = "AUTO_CONTINUE"
 		}
@@ -205,18 +201,56 @@ func (s *continueState) observe(data string) {
 			if id := intFrom(response["message_id"]); id > 0 {
 				s.responseMessageID = id
 			}
-			if status, _ := response["status"].(string); status != "" {
-				s.lastStatus = strings.TrimSpace(status)
-				if strings.EqualFold(s.lastStatus, "FINISHED") {
-					s.finished = true
-				}
-			}
+			s.setStatus(asString(response["status"]))
 		}
 	}
 }

-// shouldContinue returns true when the upstream indicates the response is
-// not yet finished and we have enough information to issue a continue request.
+func (s *continueState) observeBatchPatches(parentPath string, raw any) {
+	if s == nil {
+		return
+	}
+	patches, ok := raw.([]any)
+	if !ok {
+		return
+	}
+	for _, patch := range patches {
+		m, ok := patch.(map[string]any)
+		if !ok {
+			continue
+		}
+		path := strings.TrimSpace(asString(m["p"]))
+		if path == "" {
+			continue
+		}
+		fullPath := path
+		if parent := strings.Trim(strings.TrimSpace(parentPath), "/"); parent != "" && !strings.Contains(path, "/") {
+			fullPath = parent + "/" + path
+		}
+		switch strings.Trim(strings.TrimSpace(fullPath), "/") {
+		case "response/status", "status", "response/quasi_status", "quasi_status":
+			s.setStatus(asString(m["v"]))
+		}
+	}
+}
+
+func (s *continueState) setStatus(status string) {
+	if s == nil {
+		return
+	}
+	normalized := strings.TrimSpace(status)
+	if normalized == "" {
+		return
+	}
+	s.lastStatus = normalized
+	if strings.EqualFold(normalized, "FINISHED") || strings.EqualFold(normalized, "CONTENT_FILTER") {
+		s.finished = true
+	}
+}
+
+// shouldContinue returns true when the upstream explicitly indicates the
+// response is incomplete and we have enough information to issue a continue
+// request. Plain WIP is not sufficient because normal streams begin in WIP.
 func (s *continueState) shouldContinue() bool {
 	if s == nil {
 		return false
@@ -225,7 +259,7 @@ func (s *continueState) shouldContinue() bool {
 		return false
 	}
 	switch strings.ToUpper(strings.TrimSpace(s.lastStatus)) {
-	case "WIP", "INCOMPLETE", "AUTO_CONTINUE":
+	case "INCOMPLETE", "AUTO_CONTINUE":
 		return true
 	default:
 		return false
@@ -241,3 +275,12 @@ func (s *continueState) prepareForNextRound() {
 	s.finished = false
 	s.lastStatus = ""
 }
+
+func asString(v any) string {
+	switch x := v.(type) {
+	case string:
+		return x
+	default:
+		return strings.TrimSpace(strings.ReplaceAll(strings.TrimSpace(fmt.Sprint(v)), "\u0000", ""))
+	}
+}
--- a/internal/deepseek/client/client_continue_test.go
+++ b/internal/deepseek/client/client_continue_test.go
@@ -8,6 +8,7 @@ import (
 	"io"
 	"net/http"
 	"strings"
+	"sync/atomic"
 	"testing"

 	"ds2api/internal/auth"
@@ -124,6 +125,90 @@ func TestCallCompletionAutoContinueThreadsPowHeader(t *testing.T) {
 	}
 }

+func TestAutoContinueDoesNotTriggerOnPlainWIPWithoutExplicitContinuationSignal(t *testing.T) {
+	initialBody := strings.Join([]string{
+		`data: {"response_message_id":321,"v":{"response":{"message_id":321,"status":"WIP","auto_continue":false}}}`,
+		`data: [DONE]`,
+	}, "\n") + "\n"
+
+	var continueCalls atomic.Int32
+	body := newAutoContinueBody(context.Background(), io.NopCloser(strings.NewReader(initialBody)), "session-123", 8, func(context.Context, string, int) (*http.Response, error) {
+		continueCalls.Add(1)
+		return nil, errors.New("continue should not have been called")
+	})
+	defer func() { _ = body.Close() }()
+
+	out, err := io.ReadAll(body)
+	if err != nil {
+		t.Fatalf("read body failed: %v", err)
+	}
+	if continueCalls.Load() != 0 {
+		t.Fatalf("expected no continue calls, got %d", continueCalls.Load())
+	}
+	if !bytes.Contains(out, []byte(`"status":"WIP"`)) || !bytes.Contains(out, []byte(`data: [DONE]`)) {
+		t.Fatalf("expected original body to pass through unchanged, got=%s", string(out))
+	}
+}
+
+func TestAutoContinueTriggersOnResponseBatchQuasiStatusIncomplete(t *testing.T) {
+	initialBody := strings.Join([]string{
+		`data: {"response_message_id":321,"v":{"response":{"message_id":321,"status":"WIP","auto_continue":false}}}`,
+		`data: {"p":"response","o":"BATCH","v":[{"p":"accumulated_token_usage","v":2413},{"p":"quasi_status","v":"INCOMPLETE"}]}`,
+		`data: [DONE]`,
+	}, "\n") + "\n"
+
+	var continueCalls atomic.Int32
+	body := newAutoContinueBody(context.Background(), io.NopCloser(strings.NewReader(initialBody)), "session-123", 8, func(context.Context, string, int) (*http.Response, error) {
+		continueCalls.Add(1)
+		return &http.Response{
+			StatusCode: http.StatusOK,
+			Header:     make(http.Header),
+			Body: io.NopCloser(strings.NewReader(
+				`data: {"response_message_id":322,"p":"response/status","v":"FINISHED"}` + "\n" +
+					`data: [DONE]` + "\n",
+			)),
+		}, nil
+	})
+	defer func() { _ = body.Close() }()
+
+	out, err := io.ReadAll(body)
+	if err != nil {
+		t.Fatalf("read body failed: %v", err)
+	}
+	if continueCalls.Load() != 1 {
+		t.Fatalf("expected exactly one continue call, got %d", continueCalls.Load())
+	}
+	if !bytes.Contains(out, []byte(`"quasi_status","v":"INCOMPLETE"`)) || !bytes.Contains(out, []byte(`"v":"FINISHED"`)) {
+		t.Fatalf("expected continued output to include initial and final rounds, got=%s", string(out))
+	}
+}
+
+func TestAutoContinueDoesNotTriggerWhenResponseBatchQuasiStatusFinished(t *testing.T) {
+	initialBody := strings.Join([]string{
+		`data: {"response_message_id":321,"v":{"response":{"message_id":321,"status":"WIP","auto_continue":false}}}`,
+		`data: {"p":"response","o":"BATCH","v":[{"p":"accumulated_token_usage","v":2413},{"p":"quasi_status","v":"FINISHED"}]}`,
+		`data: [DONE]`,
+	}, "\n") + "\n"
+
+	var continueCalls atomic.Int32
+	body := newAutoContinueBody(context.Background(), io.NopCloser(strings.NewReader(initialBody)), "session-123", 8, func(context.Context, string, int) (*http.Response, error) {
+		continueCalls.Add(1)
+		return nil, errors.New("continue should not have been called")
+	})
+	defer func() { _ = body.Close() }()
+
+	out, err := io.ReadAll(body)
+	if err != nil {
+		t.Fatalf("read body failed: %v", err)
+	}
+	if continueCalls.Load() != 0 {
+		t.Fatalf("expected no continue calls, got %d", continueCalls.Load())
+	}
+	if !bytes.Contains(out, []byte(`"quasi_status","v":"FINISHED"`)) || !bytes.Contains(out, []byte(`data: [DONE]`)) {
+		t.Fatalf("expected original finished body to pass through unchanged, got=%s", string(out))
+	}
+}
+
 type failingOrCompletionDoer struct {
 	completionResp *http.Response
 }
--- a/internal/devcapture/store.go
+++ b/internal/devcapture/store.go
@@ -10,6 +10,8 @@ import (
 	"sync"
 	"time"

+	"ds2api/internal/util"
+
 	"github.com/google/uuid"
 )

@@ -194,7 +196,8 @@ func (c *captureBody) append(chunk string) {
 	}
 	remain := maxLen - current
 	if len(chunk) > remain {
-		c.buf.WriteString(chunk[:remain])
+		truncated, _ := util.TruncateUTF8Bytes(chunk, remain)
+		c.buf.WriteString(truncated)
 		c.truncated = true
 		return
 	}
--- a/internal/devcapture/store_test.go
+++ b/internal/devcapture/store_test.go
@@ -4,6 +4,7 @@ import (
 	"io"
 	"strings"
 	"testing"
+	"unicode/utf8"
 )

 func TestNewFromEnvDefaults(t *testing.T) {
@@ -82,3 +83,28 @@ func TestWrapBodyTruncatesByLimit(t *testing.T) {
 		t.Fatalf("expected account id, got %q", items[0].AccountID)
 	}
 }
+
+func TestWrapBodyTruncatesUTF8WithoutBreakingRune(t *testing.T) {
+	s := &Store{enabled: true, limit: 5, maxBodyBytes: 5}
+	session := s.Start("test", "http://x", "acc1", map[string]any{"x": 1})
+	if session == nil {
+		t.Fatal("expected session")
+	}
+	rc := session.WrapBody(io.NopCloser(strings.NewReader("😀xy")), 200)
+	_, _ = io.ReadAll(rc)
+	_ = rc.Close()
+
+	items := s.Snapshot()
+	if len(items) != 1 {
+		t.Fatalf("expected 1 item, got %d", len(items))
+	}
+	if !utf8.ValidString(items[0].ResponseBody) {
+		t.Fatalf("expected valid utf-8 response body, got %q", items[0].ResponseBody)
+	}
+	if items[0].ResponseBody != "😀x" {
+		t.Fatalf("expected rune-safe truncation, got %q", items[0].ResponseBody)
+	}
+	if !items[0].ResponseTruncated {
+		t.Fatal("expected truncated flag true")
+	}
+}
--- a/internal/httpapi/admin/rawsamples/handler_raw_samples.go
+++ b/internal/httpapi/admin/rawsamples/handler_raw_samples.go
@@ -15,6 +15,7 @@ import (
 	"ds2api/internal/devcapture"
 	adminshared "ds2api/internal/httpapi/admin/shared"
 	"ds2api/internal/rawsample"
+	"ds2api/internal/util"
 )

 type captureChain struct {
@@ -479,10 +480,13 @@ func previewCaptureChainResponse(chain captureChain) string {

 func previewText(text string, limit int) string {
 	text = strings.TrimSpace(text)
-	if limit <= 0 || len(text) <= limit {
+	if limit <= 0 {
 		return text
 	}
-	return text[:limit] + "..."
+	if truncated, ok := util.TruncateRunes(text, limit); ok {
+		return truncated + "..."
+	}
+	return text
 }

 func captureChainHasTruncatedResponse(chain captureChain) bool {
--- a/internal/httpapi/admin/rawsamples/handler_raw_samples_test.go
+++ b/internal/httpapi/admin/rawsamples/handler_raw_samples_test.go
@@ -10,6 +10,7 @@ import (
 	"path/filepath"
 	"strings"
 	"testing"
+	"unicode/utf8"

 	"ds2api/internal/devcapture"
 )
@@ -231,6 +232,16 @@ func TestCombineCaptureBodiesPreservesOrderAndSeparators(t *testing.T) {
 	}
 }

+func TestPreviewTextPreservesUTF8MB4Characters(t *testing.T) {
+	preview := previewText(strings.Repeat("😀", 281), 280)
+	if !utf8.ValidString(preview) {
+		t.Fatalf("expected valid utf-8 preview, got %q", preview)
+	}
+	if preview != strings.Repeat("😀", 280)+"..." {
+		t.Fatalf("unexpected preview: %q", preview)
+	}
+}
+
 func TestQueryRawSampleCapturesGroupsBySessionAndMatchesQuestion(t *testing.T) {
 	devcapture.Global().Clear()
 	defer devcapture.Global().Clear()
--- a/internal/httpapi/openai/chat/chat_history_test.go
+++ b/internal/httpapi/openai/chat/chat_history_test.go
@@ -310,8 +310,8 @@ func TestChatCompletionsCurrentInputFilePersistsNeutralPrompt(t *testing.T) {
 	if len(ds.uploadCalls) != 1 {
 		t.Fatalf("expected current input upload to happen, got %d", len(ds.uploadCalls))
 	}
-	if ds.uploadCalls[0].Filename != "IGNORE.txt" {
-		t.Fatalf("expected IGNORE.txt upload, got %q", ds.uploadCalls[0].Filename)
+	if ds.uploadCalls[0].Filename != "history.txt" {
+		t.Fatalf("expected history.txt upload, got %q", ds.uploadCalls[0].Filename)
 	}
 	if full.HistoryText != string(ds.uploadCalls[0].Data) {
 		t.Fatalf("expected uploaded current input file to be persisted in history text")
--- a/internal/httpapi/openai/chat/chat_stream_runtime.go
+++ b/internal/httpapi/openai/chat/chat_stream_runtime.go
@@ -36,8 +36,10 @@ type chatStreamRuntime struct {
 	toolSieve             toolstream.State
 	streamToolCallIDs     map[int]string
 	streamToolNames       map[int]string
+	rawThinking           strings.Builder
 	thinking              strings.Builder
 	toolDetectionThinking strings.Builder
+	rawText               strings.Builder
 	text                  strings.Builder
 	responseMessageID     int

@@ -141,7 +143,7 @@ func (s *chatStreamRuntime) finalize(finishReason string, deferEmptyOutput bool)
 	finalText := cleanVisibleOutput(s.text.String(), s.stripReferenceMarkers)
 	s.finalThinking = finalThinking
 	s.finalText = finalText
-	detected := detectAssistantToolCalls(finalText, finalThinking, finalToolDetectionThinking, s.toolNames)
+	detected := detectAssistantToolCalls(s.rawText.String(), s.rawThinking.String(), finalToolDetectionThinking, s.toolNames)
 	if len(detected.Calls) > 0 && !s.toolCallsDoneEmitted {
 		finishReason = "tool_calls"
 		delta := map[string]any{
@@ -186,7 +188,7 @@ func (s *chatStreamRuntime) finalize(finishReason string, deferEmptyOutput bool)
 				continue
 			}
 			cleaned := cleanVisibleOutput(evt.Content, s.stripReferenceMarkers)
-			if cleaned == "" {
+			if cleaned == "" || (s.searchEnabled && sse.IsCitation(cleaned)) {
 				continue
 			}
 			delta := map[string]any{
@@ -263,21 +265,22 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 		}
 	}
 	for _, p := range parsed.Parts {
-		cleanedText := cleanVisibleOutput(p.Text, s.stripReferenceMarkers)
-		if s.searchEnabled && sse.IsCitation(cleanedText) {
-			continue
-		}
-		if cleanedText == "" {
-			continue
-		}
-		contentSeen = true
 		delta := map[string]any{}
 		if !s.firstChunkSent {
 			delta["role"] = "assistant"
 			s.firstChunkSent = true
 		}
 		if p.Type == "thinking" {
+			rawTrimmed := sse.TrimContinuationOverlap(s.rawThinking.String(), p.Text)
+			if rawTrimmed != "" {
+				s.rawThinking.WriteString(rawTrimmed)
+				contentSeen = true
+			}
 			if s.thinkingEnabled {
+				cleanedText := cleanVisibleOutput(rawTrimmed, s.stripReferenceMarkers)
+				if cleanedText == "" {
+					continue
+				}
 				trimmed := sse.TrimContinuationOverlap(s.thinking.String(), cleanedText)
 				if trimmed == "" {
 					continue
@@ -286,15 +289,27 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 				delta["reasoning_content"] = trimmed
 			}
 		} else {
-			trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
-			if trimmed == "" {
+			rawTrimmed := sse.TrimContinuationOverlap(s.rawText.String(), p.Text)
+			if rawTrimmed == "" {
 				continue
 			}
-			s.text.WriteString(trimmed)
+			s.rawText.WriteString(rawTrimmed)
+			contentSeen = true
+			cleanedText := cleanVisibleOutput(rawTrimmed, s.stripReferenceMarkers)
+			if s.searchEnabled && sse.IsCitation(cleanedText) {
+				continue
+			}
+			trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
+			if trimmed != "" {
+				s.text.WriteString(trimmed)
+			}
 			if !s.bufferToolContent {
+				if trimmed == "" {
+					continue
+				}
 				delta["content"] = trimmed
 			} else {
-				events := toolstream.ProcessChunk(&s.toolSieve, trimmed, s.toolNames)
+				events := toolstream.ProcessChunk(&s.toolSieve, rawTrimmed, s.toolNames)
 				for _, evt := range events {
 					if len(evt.ToolCallDeltas) > 0 {
 						if !s.emitEarlyToolDeltas {
@@ -335,7 +350,7 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 					}
 					if evt.Content != "" {
 						cleaned := cleanVisibleOutput(evt.Content, s.stripReferenceMarkers)
-						if cleaned == "" {
+						if cleaned == "" || (s.searchEnabled && sse.IsCitation(cleaned)) {
 							continue
 						}
 						contentDelta := map[string]any{
--- a/internal/httpapi/openai/chat/empty_retry_runtime.go
+++ b/internal/httpapi/openai/chat/empty_retry_runtime.go
@@ -16,6 +16,8 @@ import (
 )

 type chatNonStreamResult struct {
+	rawThinking           string
+	rawText               string
 	thinking              string
 	toolDetectionThinking string
 	text                  string
@@ -31,6 +33,7 @@ func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Co
 	currentResp := resp
 	usagePrompt := finalPrompt
 	accumulatedThinking := ""
+	accumulatedRawThinking := ""
 	accumulatedToolDetectionThinking := ""
 	for {
 		result, ok := h.collectChatNonStreamAttempt(w, currentResp, completionID, model, usagePrompt, thinkingEnabled, searchEnabled, toolNames, toolsRaw)
@@ -38,10 +41,12 @@ func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Co
 			return
 		}
 		accumulatedThinking += sse.TrimContinuationOverlap(accumulatedThinking, result.thinking)
+		accumulatedRawThinking += sse.TrimContinuationOverlap(accumulatedRawThinking, result.rawThinking)
 		accumulatedToolDetectionThinking += sse.TrimContinuationOverlap(accumulatedToolDetectionThinking, result.toolDetectionThinking)
 		result.thinking = accumulatedThinking
+		result.rawThinking = accumulatedRawThinking
 		result.toolDetectionThinking = accumulatedToolDetectionThinking
-		detected := detectAssistantToolCalls(result.text, result.thinking, result.toolDetectionThinking, toolNames)
+		detected := detectAssistantToolCalls(result.rawText, result.rawThinking, result.toolDetectionThinking, toolNames)
 		result.detectedCalls = len(detected.Calls)
 		result.body = openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, result.thinking, result.text, detected.Calls, toolsRaw)
 		result.finishReason = chatFinishReason(result.body)
@@ -82,16 +87,17 @@ func (h *Handler) collectChatNonStreamAttempt(w http.ResponseWriter, resp *http.
 	result := sse.CollectStream(resp, thinkingEnabled, true)
 	stripReferenceMarkers := h.compatStripReferenceMarkers()
 	finalThinking := cleanVisibleOutput(result.Thinking, stripReferenceMarkers)
-	finalToolDetectionThinking := cleanVisibleOutput(result.ToolDetectionThinking, stripReferenceMarkers)
 	finalText := cleanVisibleOutput(result.Text, stripReferenceMarkers)
 	if searchEnabled {
 		finalText = replaceCitationMarkersWithLinks(finalText, result.CitationLinks)
 	}
-	detected := detectAssistantToolCalls(finalText, finalThinking, finalToolDetectionThinking, toolNames)
+	detected := detectAssistantToolCalls(result.Text, result.Thinking, result.ToolDetectionThinking, toolNames)
 	respBody := openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, finalThinking, finalText, detected.Calls, toolsRaw)
 	return chatNonStreamResult{
+		rawThinking:           result.Thinking,
+		rawText:               result.Text,
 		thinking:              finalThinking,
-		toolDetectionThinking: finalToolDetectionThinking,
+		toolDetectionThinking: result.ToolDetectionThinking,
 		text:                  finalText,
 		contentFilter:         result.ContentFilter,
 		detectedCalls:         len(detected.Calls),
--- a/internal/httpapi/openai/chat/handler_chat.go
+++ b/internal/httpapi/openai/chat/handler_chat.go
@@ -162,12 +162,11 @@ func (h *Handler) handleNonStream(w http.ResponseWriter, resp *http.Response, co

 	stripReferenceMarkers := h.compatStripReferenceMarkers()
 	finalThinking := cleanVisibleOutput(result.Thinking, stripReferenceMarkers)
-	finalToolDetectionThinking := cleanVisibleOutput(result.ToolDetectionThinking, stripReferenceMarkers)
 	finalText := cleanVisibleOutput(result.Text, stripReferenceMarkers)
 	if searchEnabled {
 		finalText = replaceCitationMarkersWithLinks(finalText, result.CitationLinks)
 	}
-	detected := detectAssistantToolCalls(finalText, finalThinking, finalToolDetectionThinking, toolNames)
+	detected := detectAssistantToolCalls(result.Text, result.Thinking, result.ToolDetectionThinking, toolNames)
 	if shouldWriteUpstreamEmptyOutputError(finalText) && len(detected.Calls) == 0 {
 		status, message, code := upstreamEmptyOutputDetail(result.ContentFilter, finalText, finalThinking)
 		if historySession != nil {
--- a/internal/httpapi/openai/chat/handler_toolcall_test.go
+++ b/internal/httpapi/openai/chat/handler_toolcall_test.go
@@ -291,20 +291,16 @@ func TestHandleStreamPromotesThinkingToolCallsOnFinalizeWithoutMidstreamIntercep
 	if !streamHasToolCallsDelta(frames) {
 		t.Fatalf("expected tool_calls delta from finalize fallback, body=%s", rec.Body.String())
 	}
-	reasoningSeen := false
 	for _, frame := range frames {
 		choices, _ := frame["choices"].([]any)
 		for _, item := range choices {
 			choice, _ := item.(map[string]any)
 			delta, _ := choice["delta"].(map[string]any)
 			if asString(delta["reasoning_content"]) != "" {
-				reasoningSeen = true
+				t.Fatalf("did not expect leaked reasoning_content markup, body=%s", rec.Body.String())
 			}
 		}
 	}
-	if !reasoningSeen {
-		t.Fatalf("expected reasoning_content to stream before finalize fallback, body=%s", rec.Body.String())
-	}
 	if streamFinishReason(frames) != "tool_calls" {
 		t.Fatalf("expected finish_reason=tool_calls, body=%s", rec.Body.String())
 	}
--- a/internal/httpapi/openai/history/current_input_file.go
+++ b/internal/httpapi/openai/history/current_input_file.go
@@ -13,7 +13,7 @@ import (
 )

 const (
-	currentInputFilename    = "IGNORE.txt"
+	currentInputFilename    = promptcompat.CurrentInputContextFilename
 	currentInputContentType = "text/plain; charset=utf-8"
 	currentInputPurpose     = "assistants"
 )
--- a/internal/httpapi/openai/history_split_test.go
+++ b/internal/httpapi/openai/history_split_test.go
@@ -79,7 +79,7 @@ func TestBuildOpenAICurrentInputContextTranscriptUsesInjectedFileWrapper(t *test
 	if !strings.Contains(transcript, "<|DSML|tool_calls>") {
 		t.Fatalf("expected tool calls preserved, got %q", transcript)
 	}
-	if !strings.HasSuffix(transcript, "\n[file name]: IGNORE\n[file content begin]\n") {
+	if !strings.HasSuffix(transcript, "\n[file name]: history.txt\n[file content begin]\n") {
 		t.Fatalf("expected injected file wrapper suffix, got %q", transcript)
 	}
 }
@@ -274,7 +274,7 @@ func TestApplyCurrentInputFileUploadsFirstTurnWithInjectedWrapper(t *testing.T)
 		t.Fatalf("expected 1 current input upload, got %d", len(ds.uploadCalls))
 	}
 	upload := ds.uploadCalls[0]
-	if upload.Filename != "IGNORE.txt" {
+	if upload.Filename != "history.txt" {
 		t.Fatalf("unexpected upload filename: %q", upload.Filename)
 	}
 	uploadedText := string(upload.Data)
@@ -287,13 +287,13 @@ func TestApplyCurrentInputFileUploadsFirstTurnWithInjectedWrapper(t *testing.T)
 	if !strings.Contains(uploadedText, promptcompat.ThinkingInjectionMarker) {
 		t.Fatalf("expected thinking injection in current input file, got %q", uploadedText)
 	}
-	if !strings.HasSuffix(uploadedText, "\n[file name]: IGNORE\n[file content begin]\n") {
+	if !strings.HasSuffix(uploadedText, "\n[file name]: history.txt\n[file content begin]\n") {
 		t.Fatalf("expected injected file wrapper suffix, got %q", uploadedText)
 	}
 	if strings.Contains(out.FinalPrompt, "first turn content that is long enough") {
 		t.Fatalf("expected current input text to be replaced in live prompt, got %s", out.FinalPrompt)
 	}
-	if strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "IGNORE.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
+	if strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "history.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
 		t.Fatalf("expected live prompt not to instruct file reads, got %s", out.FinalPrompt)
 	}
 	if !strings.Contains(out.FinalPrompt, "Answer the latest user request directly.") {
@@ -335,8 +335,8 @@ func TestApplyCurrentInputFileUploadsFullContextFile(t *testing.T) {
 		t.Fatalf("expected one current input upload, got %d", len(ds.uploadCalls))
 	}
 	upload := ds.uploadCalls[0]
-	if upload.Filename != "IGNORE.txt" {
-		t.Fatalf("expected IGNORE.txt upload, got %q", upload.Filename)
+	if upload.Filename != "history.txt" {
+		t.Fatalf("expected history.txt upload, got %q", upload.Filename)
 	}
 	uploadedText := string(upload.Data)
 	for _, want := range []string{"system instructions", "first user turn", "hidden reasoning", "tool result", "latest user turn", promptcompat.ThinkingInjectionMarker} {
@@ -344,7 +344,7 @@ func TestApplyCurrentInputFileUploadsFullContextFile(t *testing.T) {
 			t.Fatalf("expected full context file to contain %q, got %q", want, uploadedText)
 		}
 	}
-	if strings.Contains(out.FinalPrompt, "first user turn") || strings.Contains(out.FinalPrompt, "latest user turn") || strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "IGNORE.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
+	if strings.Contains(out.FinalPrompt, "first user turn") || strings.Contains(out.FinalPrompt, "latest user turn") || strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "history.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
 		t.Fatalf("expected live prompt to use only a neutral continuation instruction, got %s", out.FinalPrompt)
 	}
 	if !strings.Contains(out.FinalPrompt, "Answer the latest user request directly.") {
@@ -411,15 +411,15 @@ func TestChatCompletionsCurrentInputFileUploadsContextAndKeepsNeutralPrompt(t *t
 		t.Fatalf("expected 1 upload call, got %d", len(ds.uploadCalls))
 	}
 	upload := ds.uploadCalls[0]
-	if upload.Filename != "IGNORE.txt" {
+	if upload.Filename != "history.txt" {
 		t.Fatalf("unexpected upload filename: %q", upload.Filename)
 	}
 	if upload.Purpose != "assistants" {
 		t.Fatalf("unexpected purpose: %q", upload.Purpose)
 	}
 	historyText := string(upload.Data)
-	if !strings.Contains(historyText, "[file content end]") || !strings.Contains(historyText, "[file name]: IGNORE") {
-		t.Fatalf("expected injected IGNORE wrapper, got %s", historyText)
+	if !strings.Contains(historyText, "[file content end]") || !strings.Contains(historyText, "[file name]: history.txt") {
+		t.Fatalf("expected injected history.txt wrapper, got %s", historyText)
 	}
 	if !strings.Contains(historyText, "latest user turn") {
 		t.Fatalf("expected full context to include latest turn, got %s", historyText)
--- a/internal/httpapi/openai/leaked_output_sanitize_test.go
+++ b/internal/httpapi/openai/leaked_output_sanitize_test.go
@@ -42,6 +42,14 @@ func TestSanitizeLeakedOutputRemovesDanglingThinkBlock(t *testing.T) {
 	}
 }

+func TestSanitizeLeakedOutputRemovesCompleteDSMLToolCallWrapper(t *testing.T) {
+	raw := "前置文本\n<｜DSML｜tool_calls>\n<｜DSML｜invoke name=\"Bash\">\n<｜DSML｜parameter name=\"command\"></｜DSML｜parameter>\n</｜DSML｜invoke>\n</｜DSML｜tool_calls>\n后置文本"
+	got := sanitizeLeakedOutput(raw)
+	if got != "前置文本\n\n后置文本" {
+		t.Fatalf("unexpected sanitize result for leaked dsml wrapper: %q", got)
+	}
+}
+
 func TestSanitizeLeakedOutputRemovesAgentXMLLeaks(t *testing.T) {
 	raw := "Done.<attempt_completion><result>Some final answer</result></attempt_completion>"
 	got := sanitizeLeakedOutput(raw)
--- a/internal/httpapi/openai/responses/empty_retry_runtime.go
+++ b/internal/httpapi/openai/responses/empty_retry_runtime.go
@@ -18,6 +18,8 @@ import (
 )

 type responsesNonStreamResult struct {
+	rawThinking           string
+	rawText               string
 	thinking              string
 	toolDetectionThinking string
 	text                  string
@@ -32,6 +34,7 @@ func (h *Handler) handleResponsesNonStreamWithRetry(w http.ResponseWriter, ctx c
 	currentResp := resp
 	usagePrompt := finalPrompt
 	accumulatedThinking := ""
+	accumulatedRawThinking := ""
 	accumulatedToolDetectionThinking := ""
 	for {
 		result, ok := h.collectResponsesNonStreamAttempt(w, currentResp, responseID, model, usagePrompt, thinkingEnabled, searchEnabled, toolNames, toolsRaw)
@@ -39,10 +42,12 @@ func (h *Handler) handleResponsesNonStreamWithRetry(w http.ResponseWriter, ctx c
 			return
 		}
 		accumulatedThinking += sse.TrimContinuationOverlap(accumulatedThinking, result.thinking)
+		accumulatedRawThinking += sse.TrimContinuationOverlap(accumulatedRawThinking, result.rawThinking)
 		accumulatedToolDetectionThinking += sse.TrimContinuationOverlap(accumulatedToolDetectionThinking, result.toolDetectionThinking)
 		result.thinking = accumulatedThinking
+		result.rawThinking = accumulatedRawThinking
 		result.toolDetectionThinking = accumulatedToolDetectionThinking
-		result.parsed = detectAssistantToolCalls(result.text, result.thinking, result.toolDetectionThinking, toolNames)
+		result.parsed = detectAssistantToolCalls(result.rawText, result.rawThinking, result.toolDetectionThinking, toolNames)
 		result.body = openaifmt.BuildResponseObjectWithToolCalls(responseID, model, usagePrompt, result.thinking, result.text, result.parsed.Calls, toolsRaw)

 		if !shouldRetryResponsesNonStream(result, attempts) {
@@ -78,16 +83,17 @@ func (h *Handler) collectResponsesNonStreamAttempt(w http.ResponseWriter, resp *
 	result := sse.CollectStream(resp, thinkingEnabled, false)
 	stripReferenceMarkers := h.compatStripReferenceMarkers()
 	sanitizedThinking := cleanVisibleOutput(result.Thinking, stripReferenceMarkers)
-	toolDetectionThinking := cleanVisibleOutput(result.ToolDetectionThinking, stripReferenceMarkers)
 	sanitizedText := cleanVisibleOutput(result.Text, stripReferenceMarkers)
 	if searchEnabled {
 		sanitizedText = replaceCitationMarkersWithLinks(sanitizedText, result.CitationLinks)
 	}
-	textParsed := detectAssistantToolCalls(sanitizedText, sanitizedThinking, toolDetectionThinking, toolNames)
+	textParsed := detectAssistantToolCalls(result.Text, result.Thinking, result.ToolDetectionThinking, toolNames)
 	responseObj := openaifmt.BuildResponseObjectWithToolCalls(responseID, model, usagePrompt, sanitizedThinking, sanitizedText, textParsed.Calls, toolsRaw)
 	return responsesNonStreamResult{
+		rawThinking:           result.Thinking,
+		rawText:               result.Text,
 		thinking:              sanitizedThinking,
-		toolDetectionThinking: toolDetectionThinking,
+		toolDetectionThinking: result.ToolDetectionThinking,
 		text:                  sanitizedText,
 		contentFilter:         result.ContentFilter,
 		parsed:                textParsed,
--- a/internal/httpapi/openai/responses/responses_handler.go
+++ b/internal/httpapi/openai/responses/responses_handler.go
@@ -131,12 +131,11 @@ func (h *Handler) handleResponsesNonStream(w http.ResponseWriter, resp *http.Res
 	result := sse.CollectStream(resp, thinkingEnabled, true)
 	stripReferenceMarkers := h.compatStripReferenceMarkers()
 	sanitizedThinking := cleanVisibleOutput(result.Thinking, stripReferenceMarkers)
-	toolDetectionThinking := cleanVisibleOutput(result.ToolDetectionThinking, stripReferenceMarkers)
 	sanitizedText := cleanVisibleOutput(result.Text, stripReferenceMarkers)
 	if searchEnabled {
 		sanitizedText = replaceCitationMarkersWithLinks(sanitizedText, result.CitationLinks)
 	}
-	textParsed := detectAssistantToolCalls(sanitizedText, sanitizedThinking, toolDetectionThinking, toolNames)
+	textParsed := detectAssistantToolCalls(result.Text, result.Thinking, result.ToolDetectionThinking, toolNames)
 	if len(textParsed.Calls) == 0 && writeUpstreamEmptyOutputError(w, sanitizedText, sanitizedThinking, result.ContentFilter) {
 		return
 	}
--- a/internal/httpapi/openai/responses/responses_stream_runtime_core.go
+++ b/internal/httpapi/openai/responses/responses_stream_runtime_core.go
@@ -36,8 +36,10 @@ type responsesStreamRuntime struct {
 	toolCallsDoneEmitted bool

 	sieve                 toolstream.State
+	rawThinking           strings.Builder
 	thinking              strings.Builder
 	toolDetectionThinking strings.Builder
+	rawText               strings.Builder
 	text                  strings.Builder
 	visibleText           strings.Builder
 	responseMessageID     int
@@ -141,15 +143,14 @@ func (s *responsesStreamRuntime) finalize(finishReason string, deferEmptyOutput
 	s.finalErrorStatus = 0
 	s.finalErrorMessage = ""
 	s.finalErrorCode = ""
-	finalThinking := s.thinking.String()
-	finalToolDetectionThinking := s.toolDetectionThinking.String()
-	finalText := cleanVisibleOutput(s.text.String(), s.stripReferenceMarkers)
-
 	if s.bufferToolContent {
 		s.processToolStreamEvents(toolstream.Flush(&s.sieve, s.toolNames), true, true)
 	}

-	textParsed := detectAssistantToolCalls(finalText, finalThinking, finalToolDetectionThinking, s.toolNames)
+	finalThinking := s.thinking.String()
+	finalToolDetectionThinking := s.toolDetectionThinking.String()
+	finalText := cleanVisibleOutput(s.text.String(), s.stripReferenceMarkers)
+	textParsed := detectAssistantToolCalls(s.rawText.String(), s.rawThinking.String(), finalToolDetectionThinking, s.toolNames)
 	detected := textParsed.Calls
 	s.logToolPolicyRejections(textParsed)

@@ -227,18 +228,19 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 		}
 	}
 	for _, p := range parsed.Parts {
-		cleanedText := cleanVisibleOutput(p.Text, s.stripReferenceMarkers)
-		if cleanedText == "" {
-			continue
-		}
-		if p.Type != "thinking" && s.searchEnabled && sse.IsCitation(cleanedText) {
-			continue
-		}
-		contentSeen = true
 		if p.Type == "thinking" {
+			rawTrimmed := sse.TrimContinuationOverlap(s.rawThinking.String(), p.Text)
+			if rawTrimmed != "" {
+				s.rawThinking.WriteString(rawTrimmed)
+				contentSeen = true
+			}
 			if !s.thinkingEnabled {
 				continue
 			}
+			cleanedText := cleanVisibleOutput(rawTrimmed, s.stripReferenceMarkers)
+			if cleanedText == "" {
+				continue
+			}
 			trimmed := sse.TrimContinuationOverlap(s.thinking.String(), cleanedText)
 			if trimmed == "" {
 				continue
@@ -248,16 +250,28 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 			continue
 		}

-		trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
-		if trimmed == "" {
+		rawTrimmed := sse.TrimContinuationOverlap(s.rawText.String(), p.Text)
+		if rawTrimmed == "" {
 			continue
 		}
-		s.text.WriteString(trimmed)
+		s.rawText.WriteString(rawTrimmed)
+		contentSeen = true
+		cleanedText := cleanVisibleOutput(rawTrimmed, s.stripReferenceMarkers)
+		if s.searchEnabled && sse.IsCitation(cleanedText) {
+			continue
+		}
+		trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
+		if trimmed != "" {
+			s.text.WriteString(trimmed)
+		}
 		if !s.bufferToolContent {
+			if trimmed == "" {
+				continue
+			}
 			s.emitTextDelta(trimmed)
 			continue
 		}
-		s.processToolStreamEvents(toolstream.ProcessChunk(&s.sieve, trimmed, s.toolNames), true, true)
+		s.processToolStreamEvents(toolstream.ProcessChunk(&s.sieve, rawTrimmed, s.toolNames), true, true)
 	}

 	return streamengine.ParsedDecision{ContentSeen: contentSeen}
--- a/internal/httpapi/openai/responses/responses_stream_runtime_events.go
+++ b/internal/httpapi/openai/responses/responses_stream_runtime_events.go
@@ -4,6 +4,7 @@ import (
 	"encoding/json"

 	openaifmt "ds2api/internal/format/openai"
+	"ds2api/internal/sse"
 	"ds2api/internal/toolstream"
 )

@@ -43,7 +44,10 @@ func (s *responsesStreamRuntime) sendDone() {
 func (s *responsesStreamRuntime) processToolStreamEvents(events []toolstream.Event, emitContent bool, resetAfterToolCalls bool) {
 	for _, evt := range events {
 		if emitContent && evt.Content != "" {
-			s.emitTextDelta(evt.Content)
+			cleaned := cleanVisibleOutput(evt.Content, s.stripReferenceMarkers)
+			if cleaned != "" && !(s.searchEnabled && sse.IsCitation(cleaned)) {
+				s.emitTextDelta(cleaned)
+			}
 		}
 		if len(evt.ToolCallDeltas) > 0 {
 			if !s.emitEarlyToolDeltas {
--- a/internal/httpapi/openai/responses/responses_stream_test.go
+++ b/internal/httpapi/openai/responses/responses_stream_test.go
@@ -254,8 +254,8 @@ func TestHandleResponsesStreamPromotesThinkingToolCallsOnFinalizeWithoutMidstrea
 	h.handleResponsesStream(rec, req, resp, "owner-a", "resp_test", "deepseek-v4-pro", "prompt", true, false, []string{"read_file"}, nil, promptcompat.DefaultToolChoicePolicy(), "")

 	body := rec.Body.String()
-	if !strings.Contains(body, "event: response.reasoning.delta") {
-		t.Fatalf("expected reasoning delta in stream body, got %s", body)
+	if strings.Contains(body, "event: response.reasoning.delta") {
+		t.Fatalf("did not expect leaked reasoning delta in stream body, got %s", body)
 	}
 	if !strings.Contains(body, "event: response.function_call_arguments.done") {
 		t.Fatalf("expected finalize fallback function call event, got %s", body)
--- a/internal/httpapi/openai/shared/leaked_output_sanitize.go
+++ b/internal/httpapi/openai/shared/leaked_output_sanitize.go
@@ -3,6 +3,8 @@ package shared
 import (
 	"regexp"
 	"strings"
+
+	"ds2api/internal/toolcall"
 )

 var emptyJSONFencePattern = regexp.MustCompile("(?is)```json\\s*```")
@@ -47,10 +49,42 @@ func sanitizeLeakedOutput(text string) string {
 	out = leakedThinkTagPattern.ReplaceAllString(out, "")
 	out = leakedBOSMarkerPattern.ReplaceAllString(out, "")
 	out = leakedMetaMarkerPattern.ReplaceAllString(out, "")
+	out = stripLeakedToolCallWrapperBlocks(out)
 	out = sanitizeLeakedAgentXMLBlocks(out)
 	return out
 }

+func stripLeakedToolCallWrapperBlocks(text string) string {
+	if text == "" {
+		return text
+	}
+	var b strings.Builder
+	pos := 0
+	for pos < len(text) {
+		tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(text, pos)
+		if !ok {
+			b.WriteString(text[pos:])
+			break
+		}
+		if tag.Start > pos {
+			b.WriteString(text[pos:tag.Start])
+		}
+		if tag.Closing || tag.Name != "tool_calls" {
+			b.WriteString(text[tag.Start : tag.End+1])
+			pos = tag.End + 1
+			continue
+		}
+		closeTag, ok := toolcall.FindMatchingToolMarkupClose(text, tag)
+		if !ok {
+			b.WriteString(text[tag.Start : tag.End+1])
+			pos = tag.End + 1
+			continue
+		}
+		pos = closeTag.End + 1
+	}
+	return b.String()
+}
+
 func stripDanglingThinkSuffix(text string) string {
 	matches := leakedThinkTagPattern.FindAllStringIndex(text, -1)
 	if len(matches) == 0 {
--- a/internal/js/chat-stream/vercel_stream_impl.js
+++ b/internal/js/chat-stream/vercel_stream_impl.js
@@ -513,6 +513,11 @@ function observeContinueState(state, chunk) {
  if (chunk.p === 'response/status') {
    setContinueStatus(state, asString(chunk.v));
  }
+  if (chunk.p === 'response') {
+    observeContinueBatchPatches(state, 'response', chunk.v);
+  } else {
+    observeContinueBatchPatches(state, '', chunk.v);
+  }
  const response = chunk.v && typeof chunk.v === 'object' ? chunk.v.response : null;
  if (response && typeof response === 'object') {
    const id = numberValue(response.message_id);
@@ -534,13 +539,43 @@ function observeContinueState(state, chunk) {
  }
 }

+function observeContinueBatchPatches(state, parentPath, raw) {
+  if (!state || !Array.isArray(raw)) {
+    return;
+  }
+  for (const patch of raw) {
+    if (!patch || typeof patch !== 'object') {
+      continue;
+    }
+    const path = asString(patch.p).trim();
+    if (!path) {
+      continue;
+    }
+    let fullPath = path;
+    const parent = asString(parentPath).trim().replace(/^\/+|\/+$/g, '');
+    if (parent && !path.includes('/')) {
+      fullPath = `${parent}/${path}`;
+    }
+    switch (fullPath.replace(/^\/+|\/+$/g, '')) {
+      case 'response/status':
+      case 'status':
+      case 'response/quasi_status':
+      case 'quasi_status':
+        setContinueStatus(state, asString(patch.v));
+        break;
+      default:
+        break;
+    }
+  }
+}
+
 function setContinueStatus(state, status) {
  const normalized = asString(status).trim();
  if (!normalized) {
    return;
  }
  state.lastStatus = normalized;
-  if (normalized.toUpperCase() === 'FINISHED') {
+  if (['FINISHED', 'CONTENT_FILTER'].includes(normalized.toUpperCase())) {
    state.finished = true;
  }
 }
@@ -549,7 +584,7 @@ function shouldAutoContinue(state) {
  if (!state || state.finished || !state.sessionID || state.responseMessageID <= 0) {
    return false;
  }
-  return ['WIP', 'INCOMPLETE', 'AUTO_CONTINUE'].includes(asString(state.lastStatus).trim().toUpperCase());
+  return ['INCOMPLETE', 'AUTO_CONTINUE'].includes(asString(state.lastStatus).trim().toUpperCase());
 }

 function numberValue(v) {
--- a/internal/promptcompat/history_transcript.go
+++ b/internal/promptcompat/history_transcript.go
@@ -7,7 +7,7 @@ import (
 	"ds2api/internal/prompt"
 )

-const historySplitInjectedFilename = "IGNORE"
+const CurrentInputContextFilename = "history.txt"

 func BuildOpenAIHistoryTranscript(messages []any) string {
 	return buildOpenAIInjectedFileTranscript(messages)
@@ -32,5 +32,5 @@ func buildOpenAIInjectedFileTranscript(messages []any) string {
 	if transcript == "" {
 		return ""
 	}
-	return fmt.Sprintf("[file content end]\n\n%s\n\n[file name]: %s\n[file content begin]\n", transcript, historySplitInjectedFilename)
+	return fmt.Sprintf("[file content end]\n\n%s\n\n[file name]: %s\n[file content begin]\n", transcript, CurrentInputContextFilename)
 }
--- a/internal/promptcompat/standard_request_test.go
+++ b/internal/promptcompat/standard_request_test.go
@@ -13,7 +13,7 @@ func TestStandardRequestCompletionPayloadSetsModelTypeFromResolvedModel(t *testi
 		{name: "default", model: "deepseek-v4-flash", thinking: false, search: false, modelType: "default"},
 		{name: "default_nothinking", model: "deepseek-v4-flash-nothinking", thinking: false, search: false, modelType: "default"},
 		{name: "expert", model: "deepseek-v4-pro", thinking: true, search: false, modelType: "expert"},
-		{name: "vision", model: "deepseek-v4-vision-search", thinking: false, search: true, modelType: "vision"},
+		{name: "vision", model: "deepseek-v4-vision", thinking: true, search: false, modelType: "vision"},
 	}

 	for _, tc := range tests {
--- a/internal/toolcall/toolcalls_parse.go
+++ b/internal/toolcall/toolcalls_parse.go
@@ -92,11 +92,45 @@ func filterToolCallsDetailed(parsed []ParsedToolCall) ([]ParsedToolCall, []strin
 		if tc.Input == nil {
 			tc.Input = map[string]any{}
 		}
+		if len(tc.Input) > 0 && !toolCallInputHasMeaningfulValue(tc.Input) {
+			continue
+		}
 		out = append(out, tc)
 	}
 	return out, nil
 }

+func toolCallInputHasMeaningfulValue(v any) bool {
+	switch x := v.(type) {
+	case nil:
+		return false
+	case string:
+		return strings.TrimSpace(x) != ""
+	case map[string]any:
+		if len(x) == 0 {
+			return false
+		}
+		for _, child := range x {
+			if toolCallInputHasMeaningfulValue(child) {
+				return true
+			}
+		}
+		return false
+	case []any:
+		if len(x) == 0 {
+			return false
+		}
+		for _, child := range x {
+			if toolCallInputHasMeaningfulValue(child) {
+				return true
+			}
+		}
+		return false
+	default:
+		return true
+	}
+}
+
 func looksLikeToolCallSyntax(text string) bool {
 	hasDSML, hasCanonical := ContainsToolCallWrapperSyntaxOutsideIgnored(text)
 	return hasDSML || hasCanonical
--- a/internal/toolcall/toolcalls_test.go
+++ b/internal/toolcall/toolcalls_test.go
@@ -323,6 +323,28 @@ func TestParseToolCallsDetailedMarksToolCallsSyntax(t *testing.T) {
 	}
 }

+func TestParseToolCallsRejectsAllEmptyParameterPayload(t *testing.T) {
+	text := `<tool_calls><invoke name="Bash"><parameter name="command"></parameter><parameter name="description">   </parameter><parameter name="timeout"></parameter></invoke></tool_calls>`
+	res := ParseToolCallsDetailed(text, []string{"Bash"})
+	if !res.SawToolCallSyntax {
+		t.Fatalf("expected tool syntax to be detected, got %#v", res)
+	}
+	if len(res.Calls) != 0 {
+		t.Fatalf("expected all-empty payload to be rejected, got %#v", res.Calls)
+	}
+}
+
+func TestParseToolCallsPreservesExplicitZeroArgToolCall(t *testing.T) {
+	text := `<tool_calls><invoke name="noop"></invoke></tool_calls>`
+	res := ParseToolCallsDetailed(text, []string{"noop"})
+	if len(res.Calls) != 1 {
+		t.Fatalf("expected zero-arg tool call to remain valid, got %#v", res.Calls)
+	}
+	if len(res.Calls[0].Input) != 0 {
+		t.Fatalf("expected empty input map for zero-arg tool call, got %#v", res.Calls[0].Input)
+	}
+}
+
 func TestParseToolCallsSupportsInlineJSONToolObject(t *testing.T) {
 	text := `<tool_calls><invoke name="Bash">{"input":{"command":"pwd","description":"show cwd"}}</invoke></tool_calls>`
 	calls := ParseToolCalls(text, []string{"bash"})
--- a/internal/toolstream/tool_sieve_xml.go
+++ b/internal/toolstream/tool_sieve_xml.go
@@ -44,14 +44,21 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
 		xmlBlock := captured[tag.Start : closeTag.End+1]
 		prefixPart := captured[:tag.Start]
 		suffixPart := captured[closeTag.End+1:]
-		parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
-		if len(parsed) > 0 {
+		parsed := toolcall.ParseStandaloneToolCallsDetailed(xmlBlock, toolNames)
+		if len(parsed.Calls) > 0 {
 			prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
 			if best == nil || tag.Start < best.start {
-				best = &candidate{start: tag.Start, prefix: prefixPart, calls: parsed, suffix: suffixPart}
+				best = &candidate{start: tag.Start, prefix: prefixPart, calls: parsed.Calls, suffix: suffixPart}
 			}
 			break
 		}
+		if parsed.SawToolCallSyntax {
+			if rejected == nil || tag.Start < rejected.start {
+				rejected = &rejectedBlock{start: tag.Start, prefix: prefixPart, suffix: suffixPart}
+			}
+			searchFrom = tag.End + 1
+			continue
+		}
 		if rejected == nil || tag.Start < rejected.start {
 			rejected = &rejectedBlock{start: tag.Start, prefix: prefixPart + xmlBlock, suffix: suffixPart}
 		}
@@ -75,10 +82,13 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
 				xmlBlock := "<tool_calls>" + captured[invokeTag.Start:closeTag.End+1]
 				prefixPart := captured[:invokeTag.Start]
 				suffixPart := captured[closeTag.End+1:]
-				parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
-				if len(parsed) > 0 {
+				parsed := toolcall.ParseStandaloneToolCallsDetailed(xmlBlock, toolNames)
+				if len(parsed.Calls) > 0 {
 					prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
-					return prefixPart, parsed, suffixPart, true
+					return prefixPart, parsed.Calls, suffixPart, true
+				}
+				if parsed.SawToolCallSyntax {
+					return prefixPart, nil, suffixPart, true
 				}
 				return prefixPart + captured[invokeTag.Start:closeTag.End+1], nil, suffixPart, true
 			}
--- a/internal/toolstream/tool_sieve_xml_test.go
+++ b/internal/toolstream/tool_sieve_xml_test.go
@@ -288,7 +288,7 @@ func TestProcessToolSieveNonToolXMLKeepsSuffixForToolParsing(t *testing.T) {
 	}
 }

-func TestProcessToolSievePassesThroughMalformedExecutableXMLBlock(t *testing.T) {
+func TestProcessToolSieveSuppressesMalformedExecutableXMLBlock(t *testing.T) {
 	var state State
 	chunk := `<tool_calls><invoke name="read_file"><param>{"path":"README.md"}</param></invoke></tool_calls>`
 	events := ProcessChunk(&state, chunk, []string{"read_file"})
@@ -302,10 +302,39 @@ func TestProcessToolSievePassesThroughMalformedExecutableXMLBlock(t *testing.T)
 	}

 	if toolCalls != 0 {
-		t.Fatalf("expected malformed executable-looking XML to stay text, got %d events=%#v", toolCalls, events)
+		t.Fatalf("expected malformed executable-looking XML not to become a tool call, got %d events=%#v", toolCalls, events)
 	}
-	if textContent.String() != chunk {
-		t.Fatalf("expected malformed executable-looking XML to pass through unchanged, got %q", textContent.String())
+	if textContent.Len() != 0 {
+		t.Fatalf("expected malformed executable-looking XML to be suppressed, got %q", textContent.String())
+	}
+}
+
+func TestProcessToolSieveSuppressesAllEmptyDSMLToolBlock(t *testing.T) {
+	var state State
+	chunk := strings.Join([]string{
+		`<|DSML|tool_calls>`,
+		`<|DSML|invoke name="Bash">`,
+		`<|DSML|parameter name="command"></|DSML|parameter>`,
+		`<|DSML|parameter name="description">   </|DSML|parameter>`,
+		`<|DSML|parameter name="timeout"></|DSML|parameter>`,
+		`</|DSML|invoke>`,
+		`</|DSML|tool_calls>`,
+	}, "\n")
+	events := ProcessChunk(&state, chunk, []string{"Bash"})
+	events = append(events, Flush(&state, []string{"Bash"})...)
+
+	var textContent strings.Builder
+	toolCalls := 0
+	for _, evt := range events {
+		textContent.WriteString(evt.Content)
+		toolCalls += len(evt.ToolCalls)
+	}
+
+	if toolCalls != 0 {
+		t.Fatalf("expected all-empty DSML block not to produce tool calls, got %d events=%#v", toolCalls, events)
+	}
+	if textContent.Len() != 0 {
+		t.Fatalf("expected all-empty DSML block not to leak as text, got %q", textContent.String())
 	}
 }

--- a/internal/util/text.go
+++ b/internal/util/text.go
@@ -0,0 +1,46 @@
+package util
+
+import "unicode/utf8"
+
+// TruncateRunes trims a string to at most limit Unicode code points.
+func TruncateRunes(text string, limit int) (string, bool) {
+	if limit < 0 {
+		return text, false
+	}
+	if limit == 0 {
+		return "", text != ""
+	}
+
+	count := 0
+	for i := range text {
+		if count == limit {
+			return text[:i], true
+		}
+		count++
+	}
+	return text, false
+}
+
+// TruncateUTF8Bytes trims a string to fit within limit bytes without cutting
+// through a UTF-8 code point boundary.
+func TruncateUTF8Bytes(text string, limit int) (string, bool) {
+	if limit < 0 {
+		return text, false
+	}
+	if len(text) <= limit {
+		return text, false
+	}
+	if limit == 0 {
+		return "", true
+	}
+
+	raw := []byte(text)
+	cut := limit
+	if cut > len(raw) {
+		cut = len(raw)
+	}
+	for cut > 0 && cut < len(raw) && !utf8.RuneStart(raw[cut]) {
+		cut--
+	}
+	return string(raw[:cut]), true
+}
--- a/webui/src/features/apiTester/ApiTesterContainer.jsx
+++ b/webui/src/features/apiTester/ApiTesterContainer.jsx
@@ -11,9 +11,7 @@ function describeModel(t, modelID) {
    const noThinking = modelID.endsWith('-nothinking')

    let description = t('apiTester.models.generic')
-    if (modelID.includes('vision-search')) {
-        description = t('apiTester.models.visionSearch')
-    } else if (modelID.includes('vision')) {
+    if (modelID.includes('vision')) {
        description = t('apiTester.models.vision')
    } else if (modelID.includes('pro-search')) {
        description = t('apiTester.models.proSearch')
--- a/webui/src/locales/en.json
+++ b/webui/src/locales/en.json
@@ -224,7 +224,6 @@
            "flashSearch": "v4 Flash (with search)",
            "proSearch": "v4 Pro (with search)",
            "vision": "v4 Vision (thinking on by default)",
-            "visionSearch": "v4 Vision (with search)",
            "generic": "Compatible model",
            "noThinking": "thinking forced off"
        },
@@ -395,7 +394,7 @@
        "thinkingInjectionPromptHelp": "Leave empty to use the built-in default prompt shown as the input placeholder.",
        "currentInputFileTitle": "Independent Split",
        "currentInputFileEnabled": "Independent split (by size)",
-        "currentInputFileDesc": "Enabled by default. Once the character threshold is reached, upload the full context as a hidden context file.",
+        "currentInputFileDesc": "Enabled by default. Once the character threshold is reached, upload the full context as a history.txt context file.",
        "currentInputFileMinChars": "Current input threshold (characters)",
        "currentInputFileHelp": "Default is 0, which uses independent split for any non-empty input.",
        "compatibilityTitle": "Compatibility",
--- a/webui/src/locales/zh.json
+++ b/webui/src/locales/zh.json
@@ -224,7 +224,6 @@
            "flashSearch": "v4 Flash（带搜索）",
            "proSearch": "v4 Pro（带搜索）",
            "vision": "v4 Vision（默认开启思考）",
-            "visionSearch": "v4 Vision（带搜索）",
            "generic": "兼容模型",
            "noThinking": "强制关闭思考"
        },
@@ -395,7 +394,7 @@
        "thinkingInjectionPromptHelp": "留空时使用内置默认提示词；默认内容会显示在输入框占位文本中。",
        "currentInputFileTitle": "独立拆分",
        "currentInputFileEnabled": "独立拆分（按量）",
-        "currentInputFileDesc": "默认开启。达到字符阈值后，将完整上下文上传为隐藏上下文文件。",
+        "currentInputFileDesc": "默认开启。达到字符阈值后，将完整上下文上传为 history.txt 上下文文件。",
        "currentInputFileMinChars": "当前输入阈值（字符数）",
        "currentInputFileHelp": "默认 0，表示只要有输入就会使用独立拆分。",
        "compatibilityTitle": "兼容性设置",