feat: implement context cancellation handling for chat and response stream runtimes to ensure clean termination without retries

Merge pull request #392 from wyv202011y/fix/timeout-and-context-cancel
fix: increase stream timeout constants for large-context models; guar…
2026-05-01 23:15:27 +08:00 · 2026-05-01 23:20:46 +08:00 · 2026-05-01 23:17:31 +08:00 · 2026-05-01 22:27:59 +08:00 · 2026-05-01 22:05:45 +08:00 · 2026-05-01 21:45:14 +08:00
65 changed files with 1952 additions and 489 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -29,6 +29,7 @@ yarn.lock
 pnpm-lock.yaml

 # Build artifacts
+dist/
 *.tsbuildinfo
 .cache/
 .parcel-cache/
--- a/2
+++ b/2
@@ -29,7 +29,7 @@ WORKDIR /app
 RUN apt-get update \
    && apt-get install -y --no-install-recommends ca-certificates \
    && groupadd -r ds2api && useradd -r -g ds2api -d /app -s /sbin/nologin ds2api \
-    && mkdir -p /app/data && chown -R ds2api:ds2api /app \
+    && mkdir -p /app/data /data && chown -R ds2api:ds2api /app /data \
    && rm -rf /var/lib/apt/lists/*
 COPY --from=busybox-tools /bin/busybox /usr/local/bin/busybox
 EXPOSE 5001
--- a/README.MD
+++ b/README.MD
@@ -247,6 +247,7 @@ docker-compose logs -f

 默认 `docker-compose.yml` 会把宿主机 `6011` 映射到容器内的 `5001`。如果你希望直接对外暴露 `5001`，请设置 `DS2API_HOST_PORT=5001`（或者手动调整 `ports` 配置）。
 同时默认把 `./config.json` 挂载到容器 `/data/config.json`，并设置 `DS2API_CONFIG_PATH=/data/config.json`，用于避免 `/app` 只读导致运行时 token 持久化失败。
+镜像会预创建 `/data` 并授权给非 root 的 `ds2api` 用户；如果使用单文件 bind mount，请确保宿主机 `config.json` 对容器用户可读写，例如 `chmod 644 config.json`。

 更新镜像：`docker-compose up -d --build`

@@ -317,7 +318,7 @@ go run ./cmd/ds2api
 - `runtime`：账号并发、队列与 token 刷新策略，可通过 Admin Settings 热更新。
 - `auto_delete.mode`：请求结束后的远端会话清理策略，支持 `none` / `single` / `all`。
 - `history_split`：旧轮次拆分字段，已废弃并忽略，仅保留兼容旧配置。
- `current_input_file`：唯一生效的独立拆分策略；默认开启且阈值为 `0`，触发时将完整上下文合并上传为 `history.txt` 上下文文件。
+- `current_input_file`：唯一生效的独立拆分策略；默认开启且阈值为 `0`，触发时将完整上下文合并上传为 `DS2API_HISTORY.txt` 上下文文件。
 - 如果关闭 `current_input_file`，请求会直接透传，不上传拆分上下文文件。
 - `thinking_injection`：默认开启；在最新 user 消息末尾追加思考增强提示词，提高高强度推理与工具调用前的思考稳定性；`prompt` 留空时使用内置默认提示词。

--- a/README.en.md
+++ b/README.en.md
@@ -306,7 +306,7 @@ Common fields:
 - `runtime`: account concurrency, queueing, and token refresh behavior, hot-reloadable via Admin Settings.
 - `auto_delete.mode`: remote session cleanup after each request, supporting `none` / `single` / `all`.
 - `history_split`: legacy multi-turn history split field, now ignored and kept only for backward-compatible config loading.
- `current_input_file`: the only active split mode; it is enabled by default and uploads the full context as a `history.txt` context file once the character threshold is reached.
+- `current_input_file`: the only active split mode; it is enabled by default and uploads the full context as a `DS2API_HISTORY.txt` context file once the character threshold is reached.
 - If you turn off `current_input_file`, requests pass through directly without uploading any split context file.

 For the full environment variable list, see [docs/DEPLOY.en.md](docs/DEPLOY.en.md). For auth behavior, see [API.en.md](API.en.md#authentication).
--- a/2
+++ b/2
@@ -1 +1 @@
-4.2.0
+4.2.1
--- a/docs/DEPLOY.en.md
+++ b/docs/DEPLOY.en.md
@@ -131,6 +131,7 @@ docker-compose logs -f

 The default `docker-compose.yml` directly uses `ghcr.io/cjackhwang/ds2api:latest` and maps host port `6011` to container port `5001`. If you want `5001` exposed directly, set `DS2API_HOST_PORT=5001` (or adjust the `ports` mapping).
 The compose template also defaults to `DS2API_CONFIG_PATH=/data/config.json` with `./config.json:/data/config.json` mounted, so deployments avoid read-only `/app` persistence issues by default.
+The image pre-creates `/data` and grants it to the non-root `ds2api` user. If you bind-mount a single host file, make sure `config.json` is readable/writable by the container user, for example with `chmod 644 config.json`; otherwise Linux UID/GID mismatches can still cause `open /data/config.json: permission denied`.
 Compatibility note: when `DS2API_CONFIG_PATH` is unset and runtime base dir is `/app`, newer versions prefer `/data/config.json`; if that file is missing but legacy `/app/config.json` exists, DS2API automatically falls back to the legacy path to avoid post-upgrade config loss.

 If you want a pinned version instead of `latest`, you can also pull a specific tag directly:
@@ -270,6 +271,7 @@ VERCEL_TEAM_ID=team_xxxxxxxxxxxx   # optional for personal accounts
 | `VERCEL_TOKEN` | Vercel sync token | — |
 | `VERCEL_PROJECT_ID` | Vercel project ID | — |
 | `VERCEL_TEAM_ID` | Vercel team ID | — |
+| `DS2API_CHAT_HISTORY_PATH` | Chat history storage path (must be set to `/tmp/chat_history.json` on Vercel, otherwise unavailable due to read-only filesystem) | `data/chat_history.json` |
 | `DS2API_VERCEL_PROTECTION_BYPASS` | Deployment protection bypass for internal Node→Go calls | — |

 ### 3.4 Vercel Architecture
@@ -359,6 +361,22 @@ If API responses return Vercel HTML `Authentication Required`:
 - **Option B**: Add `x-vercel-protection-bypass` header to requests
 - **Option C**: Set `VERCEL_AUTOMATION_BYPASS_SECRET` (or `DS2API_VERCEL_PROTECTION_BYPASS`) for internal Node→Go calls

+#### Chat History Unavailable (read-only file system)
+
+```text
+create chat history dir: mkdir /var/task/data: read-only file system
+```
+
+**Cause**: Vercel Serverless functions have a read-only filesystem (`/var/task`). Chat history fails because it cannot create directories there.
+
+**Fix**: Add the following in Vercel Project Settings → Environment Variables:
+
+```text
+DS2API_CHAT_HISTORY_PATH=/tmp/chat_history.json
+```
+
+`/tmp` is the only writable directory in Vercel Serverless. Data is ephemeral (not persisted across cold starts), but the feature works within a single instance lifetime.
+
 ### 3.6 Build Artifacts Not Committed

 - `static/admin` directory is not in Git
--- a/docs/DEPLOY.md
+++ b/docs/DEPLOY.md
@@ -131,6 +131,7 @@ docker-compose logs -f

 默认 `docker-compose.yml` 直接使用 `ghcr.io/cjackhwang/ds2api:latest`，并把宿主机 `6011` 映射到容器内的 `5001`。如果你希望直接对外暴露 `5001`，请设置 `DS2API_HOST_PORT=5001`（或者手动调整 `ports` 配置）。
 Compose 模板还会默认设置 `DS2API_CONFIG_PATH=/data/config.json` 并挂载 `./config.json:/data/config.json`，优先避免 `/app` 只读带来的配置持久化问题。
+镜像内会预创建 `/data` 并授权给非 root 的 `ds2api` 用户；如果你使用 bind mount 单文件，请确保宿主机 `config.json` 至少可被容器用户读取/写入，例如 `chmod 644 config.json`，否则 Linux UID/GID 不一致时仍可能出现 `open /data/config.json: permission denied`。
 兼容说明：若未设置 `DS2API_CONFIG_PATH` 且运行目录是 `/app`，新版本会优先使用 `/data/config.json`；当该文件不存在但检测到历史 `/app/config.json` 时，会自动回退读取旧路径，避免升级后“配置丢失”。

 如需固定版本，也可以直接拉取指定 tag：
@@ -270,6 +271,7 @@ VERCEL_TEAM_ID=team_xxxxxxxxxxxx   # 个人账号可留空
 | `VERCEL_TOKEN` | Vercel 同步 token | — |
 | `VERCEL_PROJECT_ID` | Vercel 项目 ID | — |
 | `VERCEL_TEAM_ID` | Vercel 团队 ID | — |
+| `DS2API_CHAT_HISTORY_PATH` | Chat history 存储路径（Vercel 上必须设为 `/tmp/chat_history.json`，否则因文件系统只读而不可用） | `data/chat_history.json` |
 | `DS2API_VERCEL_PROTECTION_BYPASS` | 部署保护绕过密钥（内部 Node→Go 调用） | — |

 ### 3.3 运行时行为配置（通过 Admin API 设置）
@@ -369,6 +371,22 @@ No Output Directory named "public" found after the Build completed.
 - **方案 B**：请求中添加 `x-vercel-protection-bypass` 头
 - **方案 C**：设置 `VERCEL_AUTOMATION_BYPASS_SECRET`（或 `DS2API_VERCEL_PROTECTION_BYPASS`），仅影响内部 Node→Go 调用

+#### Chat History 不可用（read-only file system）
+
+```text
+create chat history dir: mkdir /var/task/data: read-only file system
+```
+
+**原因**：Vercel Serverless 函数的文件系统（`/var/task`）为只读，chat history 尝试在该路径下创建目录失败。
+
+**解决**：在 Vercel Project Settings → Environment Variables 中添加：
+
+```text
+DS2API_CHAT_HISTORY_PATH=/tmp/chat_history.json
+```
+
+`/tmp` 是 Vercel Serverless 环境中唯一可写的目录。数据在函数冷启动之间不会持久化（ephemeral），但在单个实例生命周期内功能正常。
+
 ### 3.6 仓库不提交构建产物

 - `static/admin` 目录不在 Git 中
--- a/docs/prompt-compatibility.md
+++ b/docs/prompt-compatibility.md
@@ -156,10 +156,12 @@ OpenAI Chat / Responses 在标准化后、current input file 之前，会默认
 工具调用正例现在优先示范官方 DSML 风格：`<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`。
 兼容层仍接受旧式纯 `<tool_calls>` wrapper，但提示词会优先要求模型输出官方 DSML 标签，并强调不能只输出 closing wrapper 而漏掉 opening tag。需要注意：这是“兼容 DSML 外壳，内部仍以 XML 解析语义为准”，不是原生 DSML 全链路实现；DSML 标签会在解析入口归一化回现有 XML 标签后继续走同一套 parser。
 数组参数使用 `<item>...</item>` 子节点表示；当某个参数体只包含 item 子节点时，Go / Node 解析器会把它还原成数组，避免 `questions` / `options` 这类 schema 中要求 array 的参数被误解析成 `{ "item": ... }` 对象。若模型把完整结构化 XML fragment 误包进 CDATA，兼容层会在保护 `content` / `command` 等原文字段的前提下，尝试把非原文字段中的 CDATA XML fragment 还原成 object / array。不过，如果 CDATA 只是单个平面的 XML/HTML 标签，例如 `<b>urgent</b>` 这种行内标记，兼容层会保留原始字符串，不会强行升成 object / array；只有明显表示结构的 CDATA 片段，例如多兄弟节点、嵌套子节点或 `item` 列表，才会触发结构化恢复。
+Go 侧读取 DeepSeek SSE 时不再依赖 `bufio.Scanner` 的固定 2MiB 单行上限；当写文件类工具把很长的 `content` 放在单个 `data:` 行里返回时，非流式收集、流式解析和 auto-continue 透传都会保留完整行，再进入同一套工具解析与序列化流程。
 在 assistant 最终回包阶段，如果某个 tool 参数在声明 schema 中明确是 `string`，兼容层会在把解析后的 `tool_calls` / `function_call` 重新序列化成 OpenAI / Responses / Claude 可见参数前，递归把该路径上的 number / bool / object / array 统一转成字符串；其中 object / array 会压成紧凑 JSON 字符串。这个保护只对 schema 明确声明为 string 的路径生效，不会改写本来就是 `number` / `boolean` / `object` / `array` 的参数。这样可以兼容 DeepSeek 输出了结构化片段、但上游客户端工具 schema 又严格要求字符串参数的场景（例如 `content`、`prompt`、`path`、`taskId` 等）。
 工具 schema 的权威来源始终是**当前请求实际携带的 schema**，而不是同名工具在其他 runtime（Claude Code / OpenCode / Codex 等）里的默认印象。兼容层现在会同时兼容 OpenAI 风格 `function.parameters`、直接工具对象上的 `parameters` / `input_schema`、以及 camelCase 的 `inputSchema` / `schema`，并在最终输出阶段按这份请求内 schema 决定是保留 array/object，还是仅对明确声明为 `string` 的路径做字符串化。该规则同样适用于 Claude 的流式收尾和 Vercel Node 流式 tool-call formatter，避免不同 runtime 因 schema shape 差异而出现同名工具参数类型漂移。
 正例中的工具名只会来自当前请求实际声明的工具；如果当前请求没有足够的已知工具形态，就省略对应的单工具、多工具或嵌套示例，避免把不可用工具名写进 prompt。
 对执行类工具，脚本内容必须进入执行参数本身：`Bash` / `execute_command` 使用 `command`，`exec_command` 使用 `cmd`；不要把脚本示范成 `path` / `content` 文件写入参数。
+如果当前请求声明了 `Read` / `read_file` 这类读取工具，兼容层会额外注入一条 read-tool cache guard：当读取结果只表示“文件未变更 / 已在历史中 / 请引用先前上下文 / 没有正文内容”时，模型必须把它视为内容不可用，不能反复调用同一个无正文读取；应改为请求完整正文读取能力，或向用户说明需要重新提供文件内容。这个约束只缓解客户端缓存返回空内容导致的死循环，DS2API 不会也无法凭空恢复客户端本地文件正文。

 OpenAI 路径实现：
 [internal/promptcompat/tool_prompt.go](../internal/promptcompat/tool_prompt.go)
@@ -247,7 +249,7 @@ OpenAI 文件相关实现：

 兼容层现在只保留 `current_input_file` 这一种拆分方式；旧的 `history_split` 已废弃，只保留为兼容旧配置的字段，不再参与请求处理。

- `current_input_file` 默认开启；它用于把“完整上下文”合并进 `history.txt` 上下文文件。当最新 user turn 的纯文本长度达到 `current_input_file.min_chars`（默认 `0`）时，兼容层会上传一个文件名为 `history.txt` 的上下文文件，并在 live prompt 中只保留一个中性的 user 消息要求模型直接回答最新请求，不再暴露文件名或要求模型读取本地文件。
+- `current_input_file` 默认开启；它用于把“完整上下文”合并进 `DS2API_HISTORY.txt` 上下文文件。当最新 user turn 的纯文本长度达到 `current_input_file.min_chars`（默认 `0`）时，兼容层会上传一个文件名为 `DS2API_HISTORY.txt` 的上下文文件。文件内容会先做 OpenAI 消息标准化，再序列化成按轮次编号的 `DS2API_HISTORY.txt` 风格 transcript，带有 `# DS2API_HISTORY.txt` 标题和 `=== N. ROLE ===` 分段；live prompt 中则会给出一个 continuation 语气的 user 消息，引导模型从 `DS2API_HISTORY.txt` 的最新状态继续推进，并直接回答最新请求，避免把任务拉回起点。
 - 如果 `current_input_file.enabled=false`，请求会直接透传，不上传任何拆分上下文文件。
 - 旧的 `history_split.enabled` / `history_split.trigger_after_turns` 会被读取进配置对象以保持兼容，但不会触发拆分上传，也不会影响 `current_input_file` 的默认开启。
 - 即使触发 `current_input_file` 后 live prompt 被缩短，对客户端回包里的上下文 token 统计，仍会沿用**拆分前的完整 prompt 语义**做计数，而不是按缩短后的占位 prompt 计算；否则会把真实上下文显著算小。
@@ -261,11 +263,24 @@ OpenAI 文件相关实现：
 - 旧历史拆分兼容壳：
  [internal/httpapi/openai/history/history_split.go](../internal/httpapi/openai/history/history_split.go)

-当前输入转文件启用并触发时，上传文件的真实文件名是 `history.txt`，文件内容是完整 `messages` 上下文；它仍会先用 OpenAI 消息标准化和 DeepSeek 角色标记序列化，并直接作为 `history.txt` 的纯文本内容上传（不再注入文件边界标签）：
+当前输入转文件启用并触发时，上传文件的真实文件名是 `DS2API_HISTORY.txt`，文件内容是完整 `messages` 上下文；它仍会先用 OpenAI 消息标准化和 DeepSeek 角色标记序列化，再按轮次编号成 `DS2API_HISTORY.txt` 风格的 transcript（不再注入文件边界标签）：

 ```text
-[uploaded filename]: history.txt
-<｜begin▁of▁sentence｜><｜System｜>...<｜User｜>...<｜Assistant｜>...<｜Tool｜>...<｜User｜>...
+[uploaded filename]: DS2API_HISTORY.txt
+# DS2API_HISTORY.txt
+Prior conversation history and tool progress.
+
+=== 1. SYSTEM ===
+...
+
+=== 2. USER ===
+...
+
+=== 3. ASSISTANT ===
+...
+
+=== 4. TOOL ===
+...
 ```

 开启后，请求的 live prompt 不再直接内联完整上下文，而是保留一个 user role 的短提示，提示模型基于已提供上下文直接回答最新请求；上传后的 `file_id` 会进入 `ref_file_ids`。
@@ -317,7 +332,7 @@ OpenAI 文件相关实现：

 ```json
 {
-  "prompt": "<｜begin▁of▁sentence｜><｜System｜>原 system / developer\n\nYou have access to these tools: ...<｜end▁of▁instructions｜><｜User｜>The current request and prior conversation context have already been provided. Answer the latest user request directly.<｜Assistant｜>",
+  "prompt": "<｜begin▁of▁sentence｜><｜System｜>原 system / developer\n\nYou have access to these tools: ...<｜end▁of▁instructions｜><｜User｜>Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly.<｜Assistant｜>",
  "ref_file_ids": [
    "file-current-input-ignore",
    "file-systemprompt",
@@ -332,7 +347,7 @@ OpenAI 文件相关实现：

 - 大部分结构化语义被压进 `prompt`
 - 文件保持文件
- 需要时把完整上下文拆进 `history.txt` 上下文文件
+- 需要时把完整上下文拆进 `DS2API_HISTORY.txt` 上下文文件，并按轮次编号成 transcript

 ## 12. 修改时必须同步本文档的场景

@@ -345,7 +360,7 @@ OpenAI 文件相关实现：
 - tool result 注入方式变更
 - tool prompt 模板或 tool_choice 约束变更
 - inline 文件上传 / 文件引用收集规则变更
- current input file 触发条件、上传格式、`history.txt` 包装格式变更
+- current input file 触发条件、上传格式、`DS2API_HISTORY.txt` transcript 结构变更
 - 旧 `history_split` 兼容逻辑的读取、忽略或退化行为变更
 - completion payload 字段语义变更
 - Claude / Gemini 对这套统一语义的复用关系变更
--- a/docs/toolcall-semantics.md
+++ b/docs/toolcall-semantics.md
@@ -26,7 +26,7 @@
 </tool_calls>
 ```

-这不是原生 DSML 全链路实现。DSML 只作为 prompt 外壳和解析入口别名；进入 parser 前会被归一化成 `<tool_calls>` / `<invoke>` / `<parameter>`，内部仍以现有 XML 解析语义为准。
+这不是原生 DSML 全链路实现。DSML 主要用于让模型有意识地输出协议标识，隔离普通 XML 语义；进入 parser 前会按固定本地标签名归一化成 `<tool_calls>` / `<invoke>` / `<parameter>`，内部仍以现有 XML 解析语义为准。

 约束：

@@ -39,7 +39,8 @@
 兼容修复：

 - 如果模型漏掉 opening wrapper，但后面仍输出了一个或多个 invoke 并以 closing wrapper 收尾，Go 解析链路会在解析前补回缺失的 opening wrapper。
- 如果模型把 DSML 标签里的分隔符 `|` 写漏成空格（例如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`，或无 leading pipe 的 `<DSML tool_calls>` 形态），或把 `DSML` 与工具标签名直接黏连（例如 `<DSMLtool_calls>` / `<DSMLinvoke>` / `<DSMLparameter>`），或把最前面的 pipe 误写成全宽竖线（例如 `<｜DSML|tool_calls>` / `<｜DSML|invoke>` / `<｜DSML|parameter>`），Go / Node 会在固定工具标签名范围内归一化；相似但非工具标签名（如 `tool_calls_extra`）仍按普通文本处理。
+- Go / Node 解析层不再枚举每一种 DSML typo。它会把工具标签名前的 `DSML`、管道符 `|` / `｜`、空白、重复 leading `<` 视为可容忍的协议噪声，然后只匹配固定本地标签名 `tool_calls` / `invoke` / `parameter`。例如 `<DSML|tool_calls>`、`<<|DSML|tool_calls>`、`<|DSML tool_calls>`、`<DSMLtool_calls>`、`<<DSML|DSML|tool_calls>` 都会归一化；相似但非固定标签名（如 `tool_calls_extra`）仍按普通文本处理。
+- 如果模型在固定工具标签名后多输出一个尾部管道符，例如 `<|DSML|tool_calls|` / `<|DSML|invoke|` / `<|DSML|parameter|`，兼容层会把这个尾部 `|` 当作异常标签终止符并补齐缺失的 `>`；如果后面已经有 `>`，也会消费这个多余 `|` 后再归一化。
 - 这是一个针对常见模型失误的窄修复，不改变推荐输出格式；prompt 仍要求模型直接输出完整 DSML 外壳。
 - 裸 `<invoke ...>` / `<parameter ...>` 不会被当成“已支持的工具语法”；只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 才会进入工具调用路径。

@@ -53,7 +54,7 @@

 在流式链路中（Go / Node 一致）：

- DSML `<|DSML|tool_calls>` wrapper、兼容变体（`<dsml|tool_calls>`、`<｜tool_calls>`、`<|tool_calls>`、`<｜DSML|tool_calls>`）、窄容错空格分隔形态（如 `<|DSML tool_calls>`）、黏连形态（如 `<DSMLtool_calls>`）和 canonical `<tool_calls>` wrapper 都会进入结构化捕获
+- DSML `<|DSML|tool_calls>` wrapper、基于固定本地标签名的 DSML 噪声容错形态、尾部管道符形态（如 `<|DSML|tool_calls|`）和 canonical `<tool_calls>` wrapper 都会进入结构化捕获
 - 如果流里直接从 invoke 开始，但后面补上了 closing wrapper，Go 流式筛分也会按缺失 opening wrapper 的修复路径尝试恢复
 - 已识别成功的工具调用不会再次回流到普通文本
 - 不符合新格式的块不会执行，并继续按原样文本透传
@@ -61,6 +62,7 @@
 - 支持嵌套围栏（如 4 反引号嵌套 3 反引号）和 CDATA 内围栏保护
 - 如果模型把 `<![CDATA[` 打开后却没有闭合，流式扫描阶段仍会保守地继续缓冲，不会误把 CDATA 里的示例 XML 当成真实工具调用；在最终 parse / flush 恢复阶段，会对这类 loose CDATA 做窄修复，尽量保住外层已完整包裹的真实工具调用
 - 当文本中 mention 了某种标签名（如 `<dsml|tool_calls>` 或 Markdown inline code 里的 `<|DSML|tool_calls>`）而后面紧跟真正工具调用时，sieve 会跳过不可解析的 mention 候选并继续匹配后续真实工具块，不会因 mention 导致工具调用丢失，也不会截断 mention 后的正文
+- Go 侧 SSE 读取不再使用 `bufio.Scanner` 的固定 token 上限；单个 `data:` 行中包含很长的写文件参数时，非流式收集、流式解析与 auto-continue 透传都应保留完整行，再交给 tool parser 处理

 另外，`<parameter>` 的值如果本身是合法 JSON 字面量，也会按结构化值解析，而不是一律保留为字符串。例如 `123`、`true`、`null`、`[1,2]`、`{"a":1}` 都会还原成对应的 number / boolean / null / array / object。
 结构化 XML 参数也会还原为 JSON 结构：如果参数体只包含一个或多个 `<item>...</item>` 子节点，会输出数组；嵌套对象里的 item-only 字段也同样按数组处理。例如 `<parameter name="questions"><item><question>...</question></item></parameter>` 会输出 `{"questions":[{"question":"..."}]}`，而不是 `{"questions":{"item":...}}`。
@@ -94,7 +96,7 @@ node --test tests/node/stream-tool-sieve.test.js

 - DSML `<|DSML|tool_calls>` wrapper 正常解析
 - legacy canonical `<tool_calls>` wrapper 正常解析
- 别名变体（`<dsml|tool_calls>`、`<｜tool_calls>`、`<|tool_calls>`）、DSML 空格分隔 typo（如 `<|DSML tool_calls>`）和黏连 typo（如 `<DSMLtool_calls>`）正常解析
+- 固定本地标签名的 DSML 噪声容错形态（如 `<DSML|tool_calls>`、`<<|DSML|tool_calls>`、`<|DSML tool_calls>`、`<DSMLtool_calls>`、`<<DSML|DSML|tool_calls>`）正常解析
 - 混搭标签（DSML wrapper + canonical inner）归一化后正常解析
 - 波浪线围栏 `~~~` 内的示例不执行
 - 嵌套围栏（4 反引号嵌套 3 反引号）内的示例不执行
--- a/go.mod
+++ b/go.mod
@@ -6,14 +6,12 @@ require (
 	github.com/andybalholm/brotli v1.2.1
 	github.com/go-chi/chi/v5 v5.2.5
 	github.com/google/uuid v1.6.0
+	github.com/hupe1980/go-tiktoken v0.0.10
 	github.com/refraction-networking/utls v1.8.2
 	github.com/router-for-me/CLIProxyAPI/v6 v6.9.14
 )

-require (
-	github.com/dlclark/regexp2 v1.11.5 // indirect
-	github.com/hupe1980/go-tiktoken v0.0.10 // indirect
-)
+require github.com/dlclark/regexp2 v1.11.5 // indirect

 require (
 	github.com/klauspost/compress v1.18.5 // indirect
--- a/go.sum
+++ b/go.sum
@@ -41,6 +41,8 @@ golang.org/x/net v0.52.0 h1:He/TN1l0e4mmR3QqHMT2Xab3Aj3L9qjbhRm78/6jrW0=
 golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
 golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
 golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
+golang.org/x/text v0.35.0 h1:JOVx6vVDFokkpaq1AEptVzLTpDe9KGpj5tR4/X+ybL8=
+golang.org/x/text v0.35.0/go.mod h1:khi/HExzZJ2pGnjenulevKNX1W67CUy0AsXcNubPGCA=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
 gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
--- a/internal/chathistory/store.go
+++ b/internal/chathistory/store.go
@@ -310,8 +310,12 @@ func (s *Store) Update(id string, params UpdateParams) (Entry, error) {
 	if params.Status != "" {
 		item.Status = params.Status
 	}
-	item.ReasoningContent = params.ReasoningContent
-	item.Content = params.Content
+	if params.ReasoningContent != "" || item.ReasoningContent == "" {
+		item.ReasoningContent = params.ReasoningContent
+	}
+	if params.Content != "" || item.Content == "" {
+		item.Content = params.Content
+	}
 	item.Error = strings.TrimSpace(params.Error)
 	item.StatusCode = params.StatusCode
 	item.ElapsedMs = params.ElapsedMs
--- a/internal/chathistory/store_test.go
+++ b/internal/chathistory/store_test.go
@@ -493,3 +493,112 @@ func TestStoreWritesOnlyChangedDetailFiles(t *testing.T) {
 		t.Fatalf("expected untouched detail file to remain byte-identical")
 	}
 }
+
+func TestUpdatePreservesContentWhenNewContentIsEmpty(t *testing.T) {
+	path := filepath.Join(t.TempDir(), "chat_history.json")
+	store := New(path)
+
+	started, err := store.Start(StartParams{
+		CallerID:  "caller:abc",
+		Model:     "deepseek-v4-flash",
+		Stream:    true,
+		UserInput: "hello",
+	})
+	if err != nil {
+		t.Fatalf("start entry failed: %v", err)
+	}
+
+	if _, err := store.Update(started.ID, UpdateParams{
+		Status:           "streaming",
+		ReasoningContent: "let me think",
+		Content:          "I'll help you with that.",
+	}); err != nil {
+		t.Fatalf("progress update failed: %v", err)
+	}
+
+	updated, err := store.Update(started.ID, UpdateParams{
+		Status:    "success",
+		Content:   "",
+		Completed: true,
+	})
+	if err != nil {
+		t.Fatalf("success update failed: %v", err)
+	}
+
+	if updated.Content != "I'll help you with that." {
+		t.Fatalf("expected content to be preserved, got %q", updated.Content)
+	}
+	if updated.ReasoningContent != "let me think" {
+		t.Fatalf("expected reasoning content to be preserved, got %q", updated.ReasoningContent)
+	}
+
+	full, err := store.Get(started.ID)
+	if err != nil {
+		t.Fatalf("get entry failed: %v", err)
+	}
+	if full.Content != "I'll help you with that." {
+		t.Fatalf("expected persisted content to be preserved, got %q", full.Content)
+	}
+	if full.ReasoningContent != "let me think" {
+		t.Fatalf("expected persisted reasoning content to be preserved, got %q", full.ReasoningContent)
+	}
+}
+
+func TestUpdateAllowsSettingContentFromEmpty(t *testing.T) {
+	path := filepath.Join(t.TempDir(), "chat_history.json")
+	store := New(path)
+
+	started, err := store.Start(StartParams{
+		CallerID:  "caller:abc",
+		Model:     "deepseek-v4-flash",
+		Stream:    true,
+		UserInput: "hello",
+	})
+	if err != nil {
+		t.Fatalf("start entry failed: %v", err)
+	}
+
+	updated, err := store.Update(started.ID, UpdateParams{
+		Status:  "success",
+		Content: "final answer",
+	})
+	if err != nil {
+		t.Fatalf("update failed: %v", err)
+	}
+	if updated.Content != "final answer" {
+		t.Fatalf("expected content to be set, got %q", updated.Content)
+	}
+}
+
+func TestUpdateAllowsOverwritingContentWithNewValue(t *testing.T) {
+	path := filepath.Join(t.TempDir(), "chat_history.json")
+	store := New(path)
+
+	started, err := store.Start(StartParams{
+		CallerID:  "caller:abc",
+		Model:     "deepseek-v4-flash",
+		Stream:    true,
+		UserInput: "hello",
+	})
+	if err != nil {
+		t.Fatalf("start entry failed: %v", err)
+	}
+
+	if _, err := store.Update(started.ID, UpdateParams{
+		Status:  "streaming",
+		Content: "partial",
+	}); err != nil {
+		t.Fatalf("first update failed: %v", err)
+	}
+
+	updated, err := store.Update(started.ID, UpdateParams{
+		Status:  "success",
+		Content: "final answer",
+	})
+	if err != nil {
+		t.Fatalf("second update failed: %v", err)
+	}
+	if updated.Content != "final answer" {
+		t.Fatalf("expected content to be overwritten, got %q", updated.Content)
+	}
+}
--- a/internal/deepseek/client/client_continue.go
+++ b/internal/deepseek/client/client_continue.go
@@ -133,33 +133,51 @@ func pumpAutoContinue(ctx context.Context, pw *io.PipeWriter, initial io.ReadClo
 // sentinels are consumed (not forwarded) so that the downstream only sees
 // one final [DONE] at the very end.
 func streamBodyWithContinueState(ctx context.Context, pw *io.PipeWriter, body io.Reader, state *continueState) (bool, error) {
-	scanner := bufio.NewScanner(body)
-	scanner.Buffer(make([]byte, 0, 64*1024), 2*1024*1024)
+	reader := bufio.NewReaderSize(body, 64*1024)
 	hadDone := false
-	for scanner.Scan() {
+	for {
 		select {
 		case <-ctx.Done():
 			return hadDone, ctx.Err()
 		default:
 		}
-		line := append([]byte{}, scanner.Bytes()...)
-		trimmed := strings.TrimSpace(string(line))
-		if trimmed == "" {
-			continue
-		}
-		if strings.HasPrefix(trimmed, "data:") {
-			data := strings.TrimSpace(strings.TrimPrefix(trimmed, "data:"))
-			if data == "[DONE]" {
-				hadDone = true
-				continue
+		line, err := reader.ReadBytes('\n')
+		if len(line) == 0 && err != nil {
+			if err == io.EOF {
+				return hadDone, nil
 			}
-			state.observe(data)
+			return hadDone, err
 		}
-		if _, err := io.Copy(pw, bytes.NewReader(append(line, '\n'))); err != nil {
+		trimmed := strings.TrimSpace(string(line))
+		if trimmed != "" {
+			if strings.HasPrefix(trimmed, "data:") {
+				data := strings.TrimSpace(strings.TrimPrefix(trimmed, "data:"))
+				if data == "[DONE]" {
+					hadDone = true
+					if err != nil && err != io.EOF {
+						return hadDone, err
+					}
+					if err == io.EOF {
+						return hadDone, nil
+					}
+					continue
+				}
+				state.observe(data)
+			}
+			if !strings.HasSuffix(string(line), "\n") {
+				line = append(line, '\n')
+			}
+			if _, copyErr := io.Copy(pw, bytes.NewReader(line)); copyErr != nil {
+				return hadDone, copyErr
+			}
+		}
+		if err != nil {
+			if err == io.EOF {
+				return hadDone, nil
+			}
 			return hadDone, err
 		}
 	}
-	return hadDone, scanner.Err()
 }

 // observe extracts continue-relevant signals from an SSE JSON chunk.
@@ -175,34 +193,48 @@ func (s *continueState) observe(data string) {
 	if id := intFrom(chunk["response_message_id"]); id > 0 {
 		s.responseMessageID = id
 	}
-	// Path-based status: {"p": "response/status", "v": "FINISHED"}
-	if p, _ := chunk["p"].(string); p == "response/status" {
-		s.setStatus(asString(chunk["v"]))
-	}
+	s.observeDirectPatch(asString(chunk["p"]), chunk["v"])
 	if p, _ := chunk["p"].(string); p == "response" {
 		s.observeBatchPatches("response", chunk["v"])
 	} else {
 		s.observeBatchPatches("", chunk["v"])
 	}
-	// Nested v.response
-	v, _ := chunk["v"].(map[string]any)
-	if response, _ := v["response"].(map[string]any); response != nil {
-		if id := intFrom(response["message_id"]); id > 0 {
-			s.responseMessageID = id
-		}
-		s.setStatus(asString(response["status"]))
-		if autoContinue, ok := response["auto_continue"].(bool); ok && autoContinue {
+	if v, _ := chunk["v"].(map[string]any); v != nil {
+		s.observeResponseObject(v["response"])
+	}
+	if message, _ := chunk["message"].(map[string]any); message != nil {
+		s.observeResponseObject(message["response"])
+	}
+}
+
+func (s *continueState) observeDirectPatch(path string, value any) {
+	if s == nil {
+		return
+	}
+	switch strings.Trim(strings.TrimSpace(path), "/") {
+	case "response/status", "status", "response/quasi_status", "quasi_status":
+		s.setStatus(asString(value))
+	case "response/auto_continue", "auto_continue":
+		if v, ok := value.(bool); ok && v {
 			s.lastStatus = "AUTO_CONTINUE"
 		}
 	}
-	// Nested message.response
-	if message, _ := chunk["message"].(map[string]any); message != nil {
-		if response, _ := message["response"].(map[string]any); response != nil {
-			if id := intFrom(response["message_id"]); id > 0 {
-				s.responseMessageID = id
-			}
-			s.setStatus(asString(response["status"]))
-		}
+}
+
+func (s *continueState) observeResponseObject(raw any) {
+	if s == nil {
+		return
+	}
+	response, _ := raw.(map[string]any)
+	if response == nil {
+		return
+	}
+	if id := intFrom(response["message_id"]); id > 0 {
+		s.responseMessageID = id
+	}
+	s.setStatus(asString(response["status"]))
+	if autoContinue, ok := response["auto_continue"].(bool); ok && autoContinue {
+		s.lastStatus = "AUTO_CONTINUE"
 	}
 }

@@ -230,6 +262,10 @@ func (s *continueState) observeBatchPatches(parentPath string, raw any) {
 		switch strings.Trim(strings.TrimSpace(fullPath), "/") {
 		case "response/status", "status", "response/quasi_status", "quasi_status":
 			s.setStatus(asString(m["v"]))
+		case "response/auto_continue", "auto_continue":
+			if v, ok := m["v"].(bool); ok && v {
+				s.lastStatus = "AUTO_CONTINUE"
+			}
 		}
 	}
 }
--- a/internal/deepseek/client/client_continue_test.go
+++ b/internal/deepseek/client/client_continue_test.go
@@ -150,6 +150,62 @@ func TestAutoContinueDoesNotTriggerOnPlainWIPWithoutExplicitContinuationSignal(t
 	}
 }

+func TestAutoContinuePassesThroughLongSingleSSELine(t *testing.T) {
+	payload := strings.Repeat("x", 2*1024*1024+4096)
+	initialBody := `data: {"p":"response/content","v":"` + payload + `"}` + "\n" +
+		`data: [DONE]` + "\n"
+
+	body := newAutoContinueBody(context.Background(), io.NopCloser(strings.NewReader(initialBody)), "session-123", 8, func(context.Context, string, int) (*http.Response, error) {
+		return nil, errors.New("continue should not have been called")
+	})
+	defer func() { _ = body.Close() }()
+
+	out, err := io.ReadAll(body)
+	if err != nil {
+		t.Fatalf("read body failed: %v", err)
+	}
+	if !bytes.Contains(out, []byte(payload)) {
+		t.Fatalf("expected long SSE payload to pass through, got len=%d want payload len=%d", len(out), len(payload))
+	}
+	if !bytes.Contains(out, []byte(`data: [DONE]`)) {
+		t.Fatalf("expected final DONE sentinel in body, got len=%d", len(out))
+	}
+}
+
+func TestAutoContinueTriggersOnDirectQuasiStatusIncomplete(t *testing.T) {
+	initialBody := strings.Join([]string{
+		`data: {"response_message_id":321,"p":"response/content","v":"<tool_calls><invoke name=\"write_file\"><parameter name=\"content\"><![CDATA[part-one"}`,
+		`data: {"p":"response/quasi_status","v":"INCOMPLETE"}`,
+		`data: [DONE]`,
+	}, "\n") + "\n"
+
+	var continueCalls atomic.Int32
+	body := newAutoContinueBody(context.Background(), io.NopCloser(strings.NewReader(initialBody)), "session-123", 8, func(context.Context, string, int) (*http.Response, error) {
+		continueCalls.Add(1)
+		return &http.Response{
+			StatusCode: http.StatusOK,
+			Header:     make(http.Header),
+			Body: io.NopCloser(strings.NewReader(
+				`data: {"response_message_id":322,"p":"response/content","v":"-part-two]]></parameter></invoke></tool_calls>"}` + "\n" +
+					`data: {"p":"response/status","v":"FINISHED"}` + "\n" +
+					`data: [DONE]` + "\n",
+			)),
+		}, nil
+	})
+	defer func() { _ = body.Close() }()
+
+	out, err := io.ReadAll(body)
+	if err != nil {
+		t.Fatalf("read body failed: %v", err)
+	}
+	if continueCalls.Load() != 1 {
+		t.Fatalf("expected exactly one continue call, got %d", continueCalls.Load())
+	}
+	if !bytes.Contains(out, []byte("part-one")) || !bytes.Contains(out, []byte("-part-two")) {
+		t.Fatalf("expected continued tool content in body, got=%s", string(out))
+	}
+}
+
 func TestAutoContinueTriggersOnResponseBatchQuasiStatusIncomplete(t *testing.T) {
 	initialBody := strings.Join([]string{
 		`data: {"response_message_id":321,"v":{"response":{"message_id":321,"status":"WIP","auto_continue":false}}}`,
--- a/internal/deepseek/protocol/constants.go
+++ b/internal/deepseek/protocol/constants.go
@@ -159,6 +159,6 @@ func toStringSet(in []string) map[string]struct{} {

 const (
 	KeepAliveTimeout  = 5
-	StreamIdleTimeout = 90
-	MaxKeepaliveCount = 10
+	StreamIdleTimeout = 300
+	MaxKeepaliveCount = 40
 )
--- a/internal/deepseek/protocol/constants_shared.json
+++ b/internal/deepseek/protocol/constants_shared.json
@@ -2,7 +2,7 @@
  "client": {
    "name": "DeepSeek",
    "platform": "android",
-    "version": "2.0.3",
+    "version": "2.0.4",
    "android_api_level": "35",
    "locale": "zh_CN"
  },
@@ -24,4 +24,4 @@
  "skip_exact_paths": [
    "response/search_status"
  ]
-}
+}
--- a/internal/deepseek/protocol/sse.go
+++ b/internal/deepseek/protocol/sse.go
@@ -2,20 +2,24 @@ package protocol

 import (
 	"bufio"
+	"io"
 	"net/http"
 )

 func ScanSSELines(resp *http.Response, onLine func([]byte) bool) error {
-	scanner := bufio.NewScanner(resp.Body)
-	buf := make([]byte, 0, 64*1024)
-	scanner.Buffer(buf, 2*1024*1024)
-	for scanner.Scan() {
-		if !onLine(scanner.Bytes()) {
-			break
+	reader := bufio.NewReaderSize(resp.Body, 64*1024)
+	for {
+		line, err := reader.ReadBytes('\n')
+		if len(line) > 0 {
+			if !onLine(line) {
+				return nil
+			}
+		}
+		if err != nil {
+			if err == io.EOF {
+				return nil
+			}
+			return err
 		}
 	}
-	if err := scanner.Err(); err != nil {
-		return err
-	}
-	return nil
 }
--- a/internal/deepseek/protocol/sse_test.go
+++ b/internal/deepseek/protocol/sse_test.go
@@ -0,0 +1,26 @@
+package protocol
+
+import (
+	"io"
+	"net/http"
+	"strings"
+	"testing"
+)
+
+func TestScanSSELinesHandlesLongSingleLine(t *testing.T) {
+	payload := strings.Repeat("x", 2*1024*1024+4096)
+	body := "data: {\"p\":\"response/content\",\"v\":\"" + payload + "\"}\n"
+	resp := &http.Response{Body: io.NopCloser(strings.NewReader(body))}
+
+	var got string
+	err := ScanSSELines(resp, func(line []byte) bool {
+		got = string(line)
+		return true
+	})
+	if err != nil {
+		t.Fatalf("ScanSSELines returned error: %v", err)
+	}
+	if !strings.Contains(got, payload) {
+		t.Fatalf("long SSE line was not preserved: got len=%d want payload len=%d", len(got), len(payload))
+	}
+}
--- a/internal/httpapi/openai/chat/chat_history_test.go
+++ b/internal/httpapi/openai/chat/chat_history_test.go
@@ -311,16 +311,16 @@ func TestChatCompletionsCurrentInputFilePersistsNeutralPrompt(t *testing.T) {
 	if len(ds.uploadCalls) != 1 {
 		t.Fatalf("expected current input upload to happen, got %d", len(ds.uploadCalls))
 	}
-	if ds.uploadCalls[0].Filename != "history.txt" {
-		t.Fatalf("expected history.txt upload, got %q", ds.uploadCalls[0].Filename)
+	if ds.uploadCalls[0].Filename != "DS2API_HISTORY.txt" {
+		t.Fatalf("expected DS2API_HISTORY.txt upload, got %q", ds.uploadCalls[0].Filename)
 	}
 	if full.HistoryText != string(ds.uploadCalls[0].Data) {
 		t.Fatalf("expected uploaded current input file to be persisted in history text")
 	}
 	if len(full.Messages) != 1 {
-		t.Fatalf("expected neutral prompt to be the only persisted message, got %#v", full.Messages)
+		t.Fatalf("expected continuation prompt to be the only persisted message, got %#v", full.Messages)
 	}
-	if !strings.Contains(full.Messages[0].Content, "Answer the latest user request directly.") {
-		t.Fatalf("expected neutral prompt to be persisted, got %#v", full.Messages[0])
+	if !strings.Contains(full.Messages[0].Content, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected continuation prompt to be persisted, got %#v", full.Messages[0])
 	}
 }
--- a/internal/httpapi/openai/chat/chat_stream_runtime.go
+++ b/internal/httpapi/openai/chat/chat_stream_runtime.go
@@ -53,6 +53,32 @@ type chatStreamRuntime struct {
 	finalErrorCode    string
 }

+type chatDeltaBatch struct {
+	runtime *chatStreamRuntime
+	field   string
+	text    strings.Builder
+}
+
+func (b *chatDeltaBatch) append(field, text string) {
+	if text == "" {
+		return
+	}
+	if b.field != "" && b.field != field {
+		b.flush()
+	}
+	b.field = field
+	b.text.WriteString(text)
+}
+
+func (b *chatDeltaBatch) flush() {
+	if b.field == "" || b.text.Len() == 0 {
+		return
+	}
+	b.runtime.sendDelta(map[string]any{b.field: b.text.String()})
+	b.field = ""
+	b.text.Reset()
+}
+
 func newChatStreamRuntime(
 	w http.ResponseWriter,
 	rc *http.ResponseController,
@@ -107,6 +133,23 @@ func (s *chatStreamRuntime) sendChunk(v any) {
 	}
 }

+func (s *chatStreamRuntime) sendDelta(delta map[string]any) {
+	if len(delta) == 0 {
+		return
+	}
+	if !s.firstChunkSent {
+		delta["role"] = "assistant"
+		s.firstChunkSent = true
+	}
+	s.sendChunk(openaifmt.BuildChatStreamChunk(
+		s.completionID,
+		s.created,
+		s.model,
+		[]map[string]any{openaifmt.BuildChatStreamDeltaChoice(0, delta)},
+		nil,
+	))
+}
+
 func (s *chatStreamRuntime) sendDone() {
 	_, _ = s.w.Write([]byte("data: [DONE]\n\n"))
 	if s.canFlush {
@@ -130,6 +173,15 @@ func (s *chatStreamRuntime) sendFailedChunk(status int, message, code string) {
 	s.sendDone()
 }

+func (s *chatStreamRuntime) markContextCancelled() {
+	s.finalErrorStatus = 499
+	s.finalErrorMessage = "request context cancelled"
+	s.finalErrorCode = string(streamengine.StopReasonContextCancelled)
+	s.finalThinking = s.thinking.String()
+	s.finalText = cleanVisibleOutput(s.text.String(), s.stripReferenceMarkers)
+	s.finalFinishReason = string(streamengine.StopReasonContextCancelled)
+}
+
 func (s *chatStreamRuntime) resetStreamToolCallState() {
 	s.streamToolCallIDs = map[int]string{}
 	s.streamToolNames = map[int]string{}
@@ -147,42 +199,22 @@ func (s *chatStreamRuntime) finalize(finishReason string, deferEmptyOutput bool)
 	detected := detectAssistantToolCalls(s.rawText.String(), finalText, s.rawThinking.String(), finalToolDetectionThinking, s.toolNames)
 	if len(detected.Calls) > 0 && !s.toolCallsDoneEmitted {
 		finishReason = "tool_calls"
-		delta := map[string]any{
+		s.sendDelta(map[string]any{
 			"tool_calls": formatFinalStreamToolCallsWithStableIDs(detected.Calls, s.streamToolCallIDs, s.toolsRaw),
-		}
-		if !s.firstChunkSent {
-			delta["role"] = "assistant"
-			s.firstChunkSent = true
-		}
-		s.sendChunk(openaifmt.BuildChatStreamChunk(
-			s.completionID,
-			s.created,
-			s.model,
-			[]map[string]any{openaifmt.BuildChatStreamDeltaChoice(0, delta)},
-			nil,
-		))
+		})
 		s.toolCallsEmitted = true
 		s.toolCallsDoneEmitted = true
 	} else if s.bufferToolContent {
+		batch := chatDeltaBatch{runtime: s}
 		for _, evt := range toolstream.Flush(&s.toolSieve, s.toolNames) {
 			if len(evt.ToolCalls) > 0 {
+				batch.flush()
 				finishReason = "tool_calls"
 				s.toolCallsEmitted = true
 				s.toolCallsDoneEmitted = true
-				tcDelta := map[string]any{
+				s.sendDelta(map[string]any{
 					"tool_calls": formatFinalStreamToolCallsWithStableIDs(evt.ToolCalls, s.streamToolCallIDs, s.toolsRaw),
-				}
-				if !s.firstChunkSent {
-					tcDelta["role"] = "assistant"
-					s.firstChunkSent = true
-				}
-				s.sendChunk(openaifmt.BuildChatStreamChunk(
-					s.completionID,
-					s.created,
-					s.model,
-					[]map[string]any{openaifmt.BuildChatStreamDeltaChoice(0, tcDelta)},
-					nil,
-				))
+				})
 				s.resetStreamToolCallState()
 			}
 			if evt.Content == "" {
@@ -192,21 +224,9 @@ func (s *chatStreamRuntime) finalize(finishReason string, deferEmptyOutput bool)
 			if cleaned == "" || (s.searchEnabled && sse.IsCitation(cleaned)) {
 				continue
 			}
-			delta := map[string]any{
-				"content": cleaned,
-			}
-			if !s.firstChunkSent {
-				delta["role"] = "assistant"
-				s.firstChunkSent = true
-			}
-			s.sendChunk(openaifmt.BuildChatStreamChunk(
-				s.completionID,
-				s.created,
-				s.model,
-				[]map[string]any{openaifmt.BuildChatStreamDeltaChoice(0, delta)},
-				nil,
-			))
+			batch.append("content", cleaned)
 		}
+		batch.flush()
 	}

 	if len(detected.Calls) > 0 || s.toolCallsEmitted {
@@ -257,8 +277,8 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 		return streamengine.ParsedDecision{Stop: true, StopReason: streamengine.StopReasonHandlerRequested}
 	}

-	newChoices := make([]map[string]any, 0, len(parsed.Parts))
 	contentSeen := false
+	batch := chatDeltaBatch{runtime: s}
 	for _, p := range parsed.ToolDetectionThinkingParts {
 		trimmed := sse.TrimContinuationOverlap(s.toolDetectionThinking.String(), p.Text)
 		if trimmed != "" {
@@ -266,11 +286,6 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 		}
 	}
 	for _, p := range parsed.Parts {
-		delta := map[string]any{}
-		if !s.firstChunkSent {
-			delta["role"] = "assistant"
-			s.firstChunkSent = true
-		}
 		if p.Type == "thinking" {
 			rawTrimmed := sse.TrimContinuationOverlap(s.rawThinking.String(), p.Text)
 			if rawTrimmed != "" {
@@ -287,7 +302,7 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 					continue
 				}
 				s.thinking.WriteString(trimmed)
-				delta["reasoning_content"] = trimmed
+				batch.append("reasoning_content", trimmed)
 			}
 		} else {
 			rawTrimmed := sse.TrimContinuationOverlap(s.rawText.String(), p.Text)
@@ -308,7 +323,7 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 				if trimmed == "" {
 					continue
 				}
-				delta["content"] = trimmed
+				batch.append("content", trimmed)
 			} else {
 				events := toolstream.ProcessChunk(&s.toolSieve, rawTrimmed, s.toolNames)
 				for _, evt := range events {
@@ -324,28 +339,22 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 						if len(formatted) == 0 {
 							continue
 						}
+						batch.flush()
 						tcDelta := map[string]any{
 							"tool_calls": formatted,
 						}
 						s.toolCallsEmitted = true
-						if !s.firstChunkSent {
-							tcDelta["role"] = "assistant"
-							s.firstChunkSent = true
-						}
-						newChoices = append(newChoices, openaifmt.BuildChatStreamDeltaChoice(0, tcDelta))
+						s.sendDelta(tcDelta)
 						continue
 					}
 					if len(evt.ToolCalls) > 0 {
+						batch.flush()
 						s.toolCallsEmitted = true
 						s.toolCallsDoneEmitted = true
 						tcDelta := map[string]any{
 							"tool_calls": formatFinalStreamToolCallsWithStableIDs(evt.ToolCalls, s.streamToolCallIDs, s.toolsRaw),
 						}
-						if !s.firstChunkSent {
-							tcDelta["role"] = "assistant"
-							s.firstChunkSent = true
-						}
-						newChoices = append(newChoices, openaifmt.BuildChatStreamDeltaChoice(0, tcDelta))
+						s.sendDelta(tcDelta)
 						s.resetStreamToolCallState()
 						continue
 					}
@@ -354,25 +363,12 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 						if cleaned == "" || (s.searchEnabled && sse.IsCitation(cleaned)) {
 							continue
 						}
-						contentDelta := map[string]any{
-							"content": cleaned,
-						}
-						if !s.firstChunkSent {
-							contentDelta["role"] = "assistant"
-							s.firstChunkSent = true
-						}
-						newChoices = append(newChoices, openaifmt.BuildChatStreamDeltaChoice(0, contentDelta))
+						batch.append("content", cleaned)
 					}
 				}
 			}
 		}
-		if len(delta) > 0 {
-			newChoices = append(newChoices, openaifmt.BuildChatStreamDeltaChoice(0, delta))
-		}
-	}
-
-	if len(newChoices) > 0 {
-		s.sendChunk(openaifmt.BuildChatStreamChunk(s.completionID, s.created, s.model, newChoices, nil))
 	}
+	batch.flush()
 	return streamengine.ParsedDecision{ContentSeen: contentSeen}
 }
--- a/internal/httpapi/openai/chat/empty_retry_runtime.go
+++ b/internal/httpapi/openai/chat/empty_retry_runtime.go
@@ -49,6 +49,7 @@ func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Co
 		detected := detectAssistantToolCalls(result.rawText, result.text, result.rawThinking, result.toolDetectionThinking, toolNames)
 		result.detectedCalls = len(detected.Calls)
 		result.body = openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, result.thinking, result.text, detected.Calls, toolsRaw)
+		addRefFileTokensToUsage(result.body, refFileTokens)
 		result.finishReason = chatFinishReason(result.body)
 		if !shouldRetryChatNonStream(result, attempts) {
 			h.finishChatNonStreamResult(w, result, attempts, usagePrompt, refFileTokens, historySession)
@@ -246,11 +247,15 @@ func (h *Handler) consumeChatStreamAttempt(r *http.Request, resp *http.Response,
 			}
 		},
 		OnContextDone: func() {
+			streamRuntime.markContextCancelled()
 			if historySession != nil {
 				historySession.stopped(streamRuntime.thinking.String(), streamRuntime.text.String(), string(streamengine.StopReasonContextCancelled))
 			}
 		},
 	})
+	if streamRuntime.finalErrorCode == string(streamengine.StopReasonContextCancelled) {
+		return true, false
+	}
 	terminalWritten := streamRuntime.finalize(finalReason, allowDeferEmpty && finalReason != "content_filter")
 	if terminalWritten {
 		recordChatStreamHistory(streamRuntime, historySession)
@@ -282,6 +287,10 @@ func logChatStreamTerminal(streamRuntime *chatStreamRuntime, attempts int) {
 	if attempts > 0 {
 		source = "synthetic_retry"
 	}
+	if streamRuntime.finalErrorCode == string(streamengine.StopReasonContextCancelled) {
+		config.Logger.Info("[openai_empty_retry] terminal cancelled", "surface", "chat.completions", "stream", true, "retry_attempts", attempts, "error_code", streamRuntime.finalErrorCode)
+		return
+	}
 	if streamRuntime.finalErrorMessage != "" {
 		config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "chat.completions", "stream", true, "retry_attempts", attempts, "success_source", "none", "error_code", streamRuntime.finalErrorCode)
 		return
--- a/internal/httpapi/openai/chat/empty_retry_runtime_test.go
+++ b/internal/httpapi/openai/chat/empty_retry_runtime_test.go
@@ -0,0 +1,85 @@
+package chat
+
+import (
+	"context"
+	"net/http"
+	"net/http/httptest"
+	"testing"
+	"time"
+
+	"ds2api/internal/chathistory"
+	"ds2api/internal/stream"
+)
+
+func TestConsumeChatStreamAttemptMarksContextCancelledState(t *testing.T) {
+	historyStore := newTestChatHistoryStore(t)
+	entry, err := historyStore.Start(chathistory.StartParams{
+		CallerID:  "caller:test",
+		Model:     "deepseek-v4-flash",
+		Stream:    true,
+		UserInput: "hello",
+	})
+	if err != nil {
+		t.Fatalf("start history failed: %v", err)
+	}
+	session := &chatHistorySession{
+		store:       historyStore,
+		entryID:     entry.ID,
+		startedAt:   time.Now(),
+		lastPersist: time.Now(),
+		finalPrompt: "prompt",
+	}
+
+	ctx, cancel := context.WithCancel(context.Background())
+	cancel()
+
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", nil).WithContext(ctx)
+	rec := httptest.NewRecorder()
+	streamRuntime := newChatStreamRuntime(
+		rec,
+		http.NewResponseController(rec),
+		true,
+		"cid-cancelled",
+		time.Now().Unix(),
+		"deepseek-v4-flash",
+		"prompt",
+		false,
+		false,
+		true,
+		nil,
+		nil,
+		false,
+		false,
+	)
+	resp := makeOpenAISSEHTTPResponse(
+		`data: {"p":"response/content","v":"hello"}`,
+		`data: [DONE]`,
+	)
+
+	h := &Handler{}
+	terminalWritten, retryable := h.consumeChatStreamAttempt(req, resp, streamRuntime, "text", false, session, true)
+	if !terminalWritten || retryable {
+		t.Fatalf("expected cancelled attempt to terminate without retry, got terminalWritten=%v retryable=%v", terminalWritten, retryable)
+	}
+	if got, want := streamRuntime.finalErrorCode, string(stream.StopReasonContextCancelled); got != want {
+		t.Fatalf("expected cancelled final error code %q, got %q", want, got)
+	}
+	if streamRuntime.finalErrorMessage == "" {
+		t.Fatalf("expected cancelled final error message to be preserved")
+	}
+
+	snapshot, err := historyStore.Snapshot()
+	if err != nil {
+		t.Fatalf("snapshot failed: %v", err)
+	}
+	if len(snapshot.Items) != 1 {
+		t.Fatalf("expected one history item, got %d", len(snapshot.Items))
+	}
+	full, err := historyStore.Get(snapshot.Items[0].ID)
+	if err != nil {
+		t.Fatalf("get detail failed: %v", err)
+	}
+	if full.Status != "stopped" {
+		t.Fatalf("expected stopped status, got %#v", full)
+	}
+}
--- a/internal/httpapi/openai/chat/handler_toolcall_test.go
+++ b/internal/httpapi/openai/chat/handler_toolcall_test.go
@@ -1,6 +1,7 @@
 package chat

 import (
+	"context"
 	"encoding/json"
 	"io"
 	"net/http"
@@ -239,6 +240,118 @@ func TestHandleStreamToolsPlainTextStreamsBeforeFinish(t *testing.T) {
 	}
 }

+func TestHandleStreamThinkingDisabledDoesNotLeakHiddenFragmentContinuations(t *testing.T) {
+	h := &Handler{}
+	resp := makeSSEHTTPResponse(
+		`data: {"p":"response/fragments","o":"APPEND","v":[{"type":"THINK","content":"我们"}]}`,
+		`data: {"p":"response/fragments/-1/content","v":"被"}`,
+		`data: {"v":"要求"}`,
+		`data: {"p":"response/fragments","o":"APPEND","v":[{"type":"RESPONSE","content":"答"}]}`,
+		`data: {"p":"response/fragments/-1/content","v":"案"}`,
+		`data: [DONE]`,
+	)
+	rec := httptest.NewRecorder()
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", nil)
+
+	h.handleStream(rec, req, resp, "cid-hidden-fragment", "deepseek-v4-flash", "prompt", 0, false, false, nil, nil, nil)
+
+	frames, done := parseSSEDataFrames(t, rec.Body.String())
+	if !done {
+		t.Fatalf("expected [DONE], body=%s", rec.Body.String())
+	}
+	content := strings.Builder{}
+	for _, frame := range frames {
+		choices, _ := frame["choices"].([]any)
+		for _, item := range choices {
+			choice, _ := item.(map[string]any)
+			delta, _ := choice["delta"].(map[string]any)
+			if c, ok := delta["content"].(string); ok {
+				content.WriteString(c)
+			}
+		}
+	}
+	if got := content.String(); got != "答案" {
+		t.Fatalf("expected only visible response text, got %q body=%s", got, rec.Body.String())
+	}
+}
+
+func TestHandleStreamEmitsSingleChoiceFramesForMultipleParsedParts(t *testing.T) {
+	h := &Handler{}
+	resp := makeSSEHTTPResponse(
+		`data: {"p":"response/fragments","o":"APPEND","v":[{"type":"THINK","content":"我们"},{"type":"THINK","content":"被"},{"type":"THINK","content":"要求"},{"type":"RESPONSE","content":"答"},{"type":"RESPONSE","content":"案"}]}`,
+		`data: [DONE]`,
+	)
+	rec := httptest.NewRecorder()
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", nil)
+
+	h.handleStream(rec, req, resp, "cid-multi-parts", "deepseek-v4-pro", "prompt", 0, true, false, nil, nil, nil)
+
+	frames, done := parseSSEDataFrames(t, rec.Body.String())
+	if !done {
+		t.Fatalf("expected [DONE], body=%s", rec.Body.String())
+	}
+	var reasoning, content strings.Builder
+	for _, frame := range frames {
+		choices, _ := frame["choices"].([]any)
+		if len(choices) != 1 {
+			t.Fatalf("expected exactly one choice per stream frame, got %d frame=%#v body=%s", len(choices), frame, rec.Body.String())
+		}
+		choice, _ := choices[0].(map[string]any)
+		delta, _ := choice["delta"].(map[string]any)
+		reasoning.WriteString(asString(delta["reasoning_content"]))
+		content.WriteString(asString(delta["content"]))
+	}
+	if got := reasoning.String(); got != "我们被要求" {
+		t.Fatalf("first-choice-only client would miss reasoning tokens: got %q body=%s", got, rec.Body.String())
+	}
+	if got := content.String(); got != "答案" {
+		t.Fatalf("first-choice-only client would miss content tokens: got %q body=%s", got, rec.Body.String())
+	}
+}
+
+func TestHandleStreamCoalescesSmallContentDeltas(t *testing.T) {
+	h := &Handler{}
+	lines := make([]string, 0, 101)
+	for i := 0; i < 100; i++ {
+		b, _ := json.Marshal(map[string]any{
+			"p": "response/content",
+			"v": "字",
+		})
+		lines = append(lines, "data: "+string(b))
+	}
+	lines = append(lines, "data: [DONE]")
+	resp := makeSSEHTTPResponse(lines...)
+	rec := httptest.NewRecorder()
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", nil)
+
+	h.handleStream(rec, req, resp, "cid-coalesce", "deepseek-v4-flash", "prompt", 0, false, false, nil, nil, nil)
+
+	frames, done := parseSSEDataFrames(t, rec.Body.String())
+	if !done {
+		t.Fatalf("expected [DONE], body=%s", rec.Body.String())
+	}
+	var content strings.Builder
+	contentDeltaFrames := 0
+	for _, frame := range frames {
+		choices, _ := frame["choices"].([]any)
+		if len(choices) != 1 {
+			t.Fatalf("expected exactly one choice per stream frame, got %d frame=%#v body=%s", len(choices), frame, rec.Body.String())
+		}
+		choice, _ := choices[0].(map[string]any)
+		delta, _ := choice["delta"].(map[string]any)
+		if c, ok := delta["content"].(string); ok {
+			contentDeltaFrames++
+			content.WriteString(c)
+		}
+	}
+	if got, want := content.String(), strings.Repeat("字", 100); got != want {
+		t.Fatalf("coalesced stream content mismatch: got %q want %q body=%s", got, want, rec.Body.String())
+	}
+	if contentDeltaFrames >= 100 {
+		t.Fatalf("expected coalescing to reduce 100 tiny content frames, got %d body=%s", contentDeltaFrames, rec.Body.String())
+	}
+}
+
 func TestHandleStreamIncompleteCapturedToolJSONFlushesAsTextOnFinalize(t *testing.T) {
 	h := &Handler{}
 	resp := makeSSEHTTPResponse(
@@ -447,3 +560,45 @@ func TestHandleStreamCoercesSchemaDeclaredStringArgumentsOnFinalize(t *testing.T
 	}
 	t.Fatalf("expected at least one streamed tool call delta, body=%s", rec.Body.String())
 }
+
+func TestHandleNonStreamWithRetryIncludesRefFileTokensInUsage(t *testing.T) {
+	h := &Handler{}
+
+	run := func(refFileTokens int) map[string]any {
+		resp := makeSSEHTTPResponse(
+			`data: {"p":"response/content","v":"hello world"}`,
+			`data: [DONE]`,
+		)
+		rec := httptest.NewRecorder()
+		h.handleNonStreamWithRetry(rec, context.Background(), nil, resp, nil, "", "cid-ref", "deepseek-v4-flash", "prompt", refFileTokens, false, false, nil, nil, nil)
+		if rec.Code != http.StatusOK {
+			t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+		}
+		return decodeJSONBody(t, rec.Body.String())
+	}
+
+	base := run(0)
+	withRef := run(7)
+
+	baseUsage, _ := base["usage"].(map[string]any)
+	refUsage, _ := withRef["usage"].(map[string]any)
+	if baseUsage == nil || refUsage == nil {
+		t.Fatalf("expected usage objects, base=%#v ref=%#v", base["usage"], withRef["usage"])
+	}
+
+	getInt := func(m map[string]any, key string) int {
+		t.Helper()
+		v, ok := m[key].(float64)
+		if !ok {
+			t.Fatalf("expected numeric %s, got %#v", key, m[key])
+		}
+		return int(v)
+	}
+
+	if got := getInt(refUsage, "prompt_tokens") - getInt(baseUsage, "prompt_tokens"); got != 7 {
+		t.Fatalf("expected prompt_tokens delta 7, got %d", got)
+	}
+	if got := getInt(refUsage, "total_tokens") - getInt(baseUsage, "total_tokens"); got != 7 {
+		t.Fatalf("expected total_tokens delta 7, got %d", got)
+	}
+}
--- a/internal/httpapi/openai/chat/vercel_prepare_test.go
+++ b/internal/httpapi/openai/chat/vercel_prepare_test.go
@@ -130,8 +130,8 @@ func TestHandleVercelStreamPrepareAppliesCurrentInputFile(t *testing.T) {
 		t.Fatalf("expected payload object, got %#v", body["payload"])
 	}
 	promptText, _ := payload["prompt"].(string)
-	if !strings.Contains(promptText, "Answer the latest user request directly.") {
-		t.Fatalf("expected neutral prompt, got %s", promptText)
+	if !strings.Contains(promptText, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected continuation prompt, got %s", promptText)
 	}
 	if strings.Contains(promptText, "first user turn") || strings.Contains(promptText, "latest user turn") {
 		t.Fatalf("expected original turns hidden from prompt, got %s", promptText)
--- a/internal/httpapi/openai/history/current_input_file.go
+++ b/internal/httpapi/openai/history/current_input_file.go
@@ -62,7 +62,7 @@ func (s Service) ApplyCurrentInputFile(ctx context.Context, a *auth.RequestAuth,
 	stdReq.RefFileIDs = prependUniqueRefFileID(stdReq.RefFileIDs, fileID)
 	stdReq.FinalPrompt, stdReq.ToolNames = promptcompat.BuildOpenAIPrompt(messages, stdReq.ToolsRaw, "", stdReq.ToolChoice, stdReq.Thinking)
 	// Token accounting must reflect the actual downstream context:
-	// the uploaded history.txt file content + the neutral live prompt.
+	// the uploaded DS2API_HISTORY.txt file content + the continuation live prompt.
 	stdReq.PromptTokenText = fileText + "\n" + stdReq.FinalPrompt
 	return stdReq, nil
 }
@@ -87,5 +87,5 @@ func latestUserInputForFile(messages []any) (int, string) {
 }

 func currentInputFilePrompt() string {
-	return "The current request and prior conversation context have already been provided. Answer the latest user request directly."
+	return "Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly."
 }
--- a/internal/httpapi/openai/history_split_test.go
+++ b/internal/httpapi/openai/history_split_test.go
@@ -61,26 +61,33 @@ func (streamStatusManagedAuthStub) DetermineCaller(_ *http.Request) (*auth.Reque

 func (streamStatusManagedAuthStub) Release(_ *auth.RequestAuth) {}

-func TestBuildOpenAICurrentInputContextTranscriptUsesInjectedFileWrapper(t *testing.T) {
+func TestBuildOpenAICurrentInputContextTranscriptUsesNumberedHistorySections(t *testing.T) {
 	_, historyMessages := splitOpenAIHistoryMessages(historySplitTestMessages(), 1)
 	transcript := buildOpenAICurrentInputContextTranscript(historyMessages)

 	if strings.Contains(transcript, "[file content end]") || strings.Contains(transcript, "[file content begin]") || strings.Contains(transcript, "[file name]:") {
-		t.Fatalf("expected plain transcript without file wrapper tags, got %q", transcript)
+		t.Fatalf("expected transcript without file wrapper tags, got %q", transcript)
 	}
-	if !strings.Contains(transcript, "<｜begin▁of▁sentence｜>") {
-		t.Fatalf("expected serialized conversation markers, got %q", transcript)
+	if !strings.Contains(transcript, "# DS2API_HISTORY.txt") {
+		t.Fatalf("expected history transcript header, got %q", transcript)
 	}
-	if !strings.Contains(transcript, "first user turn") || !strings.Contains(transcript, "tool result") {
-		t.Fatalf("expected historical turns preserved, got %q", transcript)
+	if !strings.Contains(transcript, "Prior conversation history and tool progress.") {
+		t.Fatalf("expected history transcript description, got %q", transcript)
 	}
-	if !strings.Contains(transcript, "[reasoning_content]") || !strings.Contains(transcript, "hidden reasoning") {
-		t.Fatalf("expected reasoning block preserved, got %q", transcript)
+	for _, want := range []string{
+		"=== 1. USER ===",
+		"=== 2. ASSISTANT ===",
+		"=== 3. TOOL ===",
+		"first user turn",
+		"tool result",
+		"[reasoning_content]",
+		"hidden reasoning",
+		"<|DSML|tool_calls>",
+	} {
+		if !strings.Contains(transcript, want) {
+			t.Fatalf("expected transcript to contain %q, got %q", want, transcript)
+		}
 	}
-	if !strings.Contains(transcript, "<|DSML|tool_calls>") {
-		t.Fatalf("expected tool calls preserved, got %q", transcript)
-	}
-
 }

 func TestSplitOpenAIHistoryMessagesUsesLatestUserTurn(t *testing.T) {
@@ -243,7 +250,7 @@ func TestApplyCurrentInputFileDisabledPassThrough(t *testing.T) {
 	}
 }

-func TestApplyCurrentInputFileUploadsFirstTurnWithInjectedWrapper(t *testing.T) {
+func TestApplyCurrentInputFileUploadsFirstTurnWithNumberedHistoryTranscript(t *testing.T) {
 	ds := &inlineUploadDSStub{}
 	h := &openAITestSurface{
 		Store: mockOpenAIConfig{
@@ -273,15 +280,21 @@ func TestApplyCurrentInputFileUploadsFirstTurnWithInjectedWrapper(t *testing.T)
 		t.Fatalf("expected 1 current input upload, got %d", len(ds.uploadCalls))
 	}
 	upload := ds.uploadCalls[0]
-	if upload.Filename != "history.txt" {
+	if upload.Filename != "DS2API_HISTORY.txt" {
 		t.Fatalf("unexpected upload filename: %q", upload.Filename)
 	}
 	uploadedText := string(upload.Data)
 	if strings.Contains(uploadedText, "[file content end]") || strings.Contains(uploadedText, "[file content begin]") || strings.Contains(uploadedText, "[file name]:") {
 		t.Fatalf("expected uploaded transcript without file wrapper tags, got %q", uploadedText)
 	}
-	if !strings.Contains(uploadedText, "<｜begin▁of▁sentence｜><｜User｜>first turn content that is long enough") {
-		t.Fatalf("expected serialized current user turn markers, got %q", uploadedText)
+	for _, want := range []string{
+		"# DS2API_HISTORY.txt",
+		"=== 1. USER ===",
+		"first turn content that is long enough",
+	} {
+		if !strings.Contains(uploadedText, want) {
+			t.Fatalf("expected uploaded transcript to contain %q, got %q", want, uploadedText)
+		}
 	}
 	if !strings.Contains(uploadedText, promptcompat.ThinkingInjectionMarker) {
 		t.Fatalf("expected thinking injection in current input file, got %q", uploadedText)
@@ -290,11 +303,11 @@ func TestApplyCurrentInputFileUploadsFirstTurnWithInjectedWrapper(t *testing.T)
 	if strings.Contains(out.FinalPrompt, "first turn content that is long enough") {
 		t.Fatalf("expected current input text to be replaced in live prompt, got %s", out.FinalPrompt)
 	}
-	if strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "history.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
+	if strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
 		t.Fatalf("expected live prompt not to instruct file reads, got %s", out.FinalPrompt)
 	}
-	if !strings.Contains(out.FinalPrompt, "Answer the latest user request directly.") {
-		t.Fatalf("expected neutral continuation instruction in live prompt, got %s", out.FinalPrompt)
+	if !strings.Contains(out.FinalPrompt, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected continuation-oriented prompt in live prompt, got %s", out.FinalPrompt)
 	}
 	if len(out.RefFileIDs) != 1 || out.RefFileIDs[0] != "file-inline-1" {
 		t.Fatalf("expected current input file id in ref_file_ids, got %#v", out.RefFileIDs)
@@ -302,6 +315,9 @@ func TestApplyCurrentInputFileUploadsFirstTurnWithInjectedWrapper(t *testing.T)
 	if !strings.Contains(out.PromptTokenText, "first turn content that is long enough") {
 		t.Fatalf("expected prompt token text to preserve original full context, got %q", out.PromptTokenText)
 	}
+	if !strings.Contains(out.PromptTokenText, "# DS2API_HISTORY.txt") || !strings.Contains(out.PromptTokenText, "=== 1. USER ===") {
+		t.Fatalf("expected prompt token text to include numbered history transcript, got %q", out.PromptTokenText)
+	}
 }

 func TestApplyCurrentInputFilePreservesFullContextPromptForTokenCounting(t *testing.T) {
@@ -337,10 +353,13 @@ func TestApplyCurrentInputFilePreservesFullContextPromptForTokenCounting(t *test
 		t.Fatalf("expected prompt token text to contain file context with full conversation, got %q", out.PromptTokenText)
 	}
 	if strings.Contains(out.PromptTokenText, "[file content end]") || strings.Contains(out.PromptTokenText, "[file name]:") {
-		t.Fatalf("expected prompt token text to use raw transcript without wrapper tags, got %q", out.PromptTokenText)
+		t.Fatalf("expected prompt token text to omit file wrapper tags, got %q", out.PromptTokenText)
 	}
-	if !strings.Contains(out.PromptTokenText, "Answer the latest user request directly.") {
-		t.Fatalf("expected prompt token text to also include neutral live prompt, got %q", out.PromptTokenText)
+	if !strings.Contains(out.PromptTokenText, "# DS2API_HISTORY.txt") || !strings.Contains(out.PromptTokenText, "=== 1. SYSTEM ===") {
+		t.Fatalf("expected prompt token text to include numbered history transcript, got %q", out.PromptTokenText)
+	}
+	if !strings.Contains(out.PromptTokenText, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected prompt token text to also include continuation prompt, got %q", out.PromptTokenText)
 	}
 	if strings.Contains(out.FinalPrompt, "first user turn") || strings.Contains(out.FinalPrompt, "latest user turn") {
 		t.Fatalf("expected live prompt to hide original turns, got %q", out.FinalPrompt)
@@ -378,20 +397,20 @@ func TestApplyCurrentInputFileUploadsFullContextFile(t *testing.T) {
 		t.Fatalf("expected one current input upload, got %d", len(ds.uploadCalls))
 	}
 	upload := ds.uploadCalls[0]
-	if upload.Filename != "history.txt" {
-		t.Fatalf("expected history.txt upload, got %q", upload.Filename)
+	if upload.Filename != "DS2API_HISTORY.txt" {
+		t.Fatalf("expected DS2API_HISTORY.txt upload, got %q", upload.Filename)
 	}
 	uploadedText := string(upload.Data)
-	for _, want := range []string{"system instructions", "first user turn", "hidden reasoning", "tool result", "latest user turn", promptcompat.ThinkingInjectionMarker} {
+	for _, want := range []string{"# DS2API_HISTORY.txt", "=== 1. SYSTEM ===", "=== 2. USER ===", "=== 3. ASSISTANT ===", "=== 4. TOOL ===", "=== 5. USER ===", "system instructions", "first user turn", "hidden reasoning", "tool result", "latest user turn", promptcompat.ThinkingInjectionMarker} {
 		if !strings.Contains(uploadedText, want) {
 			t.Fatalf("expected full context file to contain %q, got %q", want, uploadedText)
 		}
 	}
-	if strings.Contains(out.FinalPrompt, "first user turn") || strings.Contains(out.FinalPrompt, "latest user turn") || strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "history.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
-		t.Fatalf("expected live prompt to use only a neutral continuation instruction, got %s", out.FinalPrompt)
+	if strings.Contains(out.FinalPrompt, "first user turn") || strings.Contains(out.FinalPrompt, "latest user turn") || strings.Contains(out.FinalPrompt, "CURRENT_USER_INPUT.txt") || strings.Contains(out.FinalPrompt, "Read that file") {
+		t.Fatalf("expected live prompt to use only a continuation instruction, got %s", out.FinalPrompt)
 	}
-	if !strings.Contains(out.FinalPrompt, "Answer the latest user request directly.") {
-		t.Fatalf("expected neutral continuation instruction in live prompt, got %s", out.FinalPrompt)
+	if !strings.Contains(out.FinalPrompt, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected continuation-oriented prompt in live prompt, got %s", out.FinalPrompt)
 	}
 }

@@ -423,6 +442,9 @@ func TestApplyCurrentInputFileCarriesHistoryText(t *testing.T) {
 	if out.HistoryText != string(ds.uploadCalls[0].Data) {
 		t.Fatalf("expected current input file flow to preserve uploaded text in history, got %q", out.HistoryText)
 	}
+	if !strings.Contains(out.HistoryText, "# DS2API_HISTORY.txt") || !strings.Contains(out.HistoryText, "=== 1. SYSTEM ===") {
+		t.Fatalf("expected history text to use numbered transcript format, got %q", out.HistoryText)
+	}
 }

 func TestChatCompletionsCurrentInputFileUploadsContextAndKeepsNeutralPrompt(t *testing.T) {
@@ -454,7 +476,7 @@ func TestChatCompletionsCurrentInputFileUploadsContextAndKeepsNeutralPrompt(t *t
 		t.Fatalf("expected 1 upload call, got %d", len(ds.uploadCalls))
 	}
 	upload := ds.uploadCalls[0]
-	if upload.Filename != "history.txt" {
+	if upload.Filename != "DS2API_HISTORY.txt" {
 		t.Fatalf("unexpected upload filename: %q", upload.Filename)
 	}
 	if upload.Purpose != "assistants" {
@@ -462,7 +484,10 @@ func TestChatCompletionsCurrentInputFileUploadsContextAndKeepsNeutralPrompt(t *t
 	}
 	historyText := string(upload.Data)
 	if strings.Contains(historyText, "[file content end]") || strings.Contains(historyText, "[file content begin]") || strings.Contains(historyText, "[file name]:") {
-		t.Fatalf("expected plain history transcript without wrapper tags, got %s", historyText)
+		t.Fatalf("expected history transcript without file wrapper tags, got %s", historyText)
+	}
+	if !strings.Contains(historyText, "# DS2API_HISTORY.txt") || !strings.Contains(historyText, "=== 1. SYSTEM ===") {
+		t.Fatalf("expected history transcript to use numbered sections, got %s", historyText)
 	}
 	if !strings.Contains(historyText, "latest user turn") {
 		t.Fatalf("expected full context to include latest turn, got %s", historyText)
@@ -471,8 +496,8 @@ func TestChatCompletionsCurrentInputFileUploadsContextAndKeepsNeutralPrompt(t *t
 		t.Fatal("expected completion payload to be captured")
 	}
 	promptText, _ := ds.completionReq["prompt"].(string)
-	if !strings.Contains(promptText, "Answer the latest user request directly.") {
-		t.Fatalf("expected neutral completion prompt, got %s", promptText)
+	if !strings.Contains(promptText, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected continuation-oriented prompt, got %s", promptText)
 	}
 	if strings.Contains(promptText, "first user turn") || strings.Contains(promptText, "latest user turn") {
 		t.Fatalf("expected prompt to hide original turns, got %s", promptText)
@@ -523,12 +548,16 @@ func TestResponsesCurrentInputFileUploadsContextAndKeepsNeutralPrompt(t *testing
 	if len(ds.uploadCalls) != 1 {
 		t.Fatalf("expected 1 upload call, got %d", len(ds.uploadCalls))
 	}
+	historyText := string(ds.uploadCalls[0].Data)
+	if !strings.Contains(historyText, "# DS2API_HISTORY.txt") || !strings.Contains(historyText, "=== 1. SYSTEM ===") {
+		t.Fatalf("expected uploaded history text to use numbered transcript format, got %s", historyText)
+	}
 	if ds.completionReq == nil {
 		t.Fatal("expected completion payload to be captured")
 	}
 	promptText, _ := ds.completionReq["prompt"].(string)
-	if !strings.Contains(promptText, "Answer the latest user request directly.") {
-		t.Fatalf("expected neutral completion prompt, got %s", promptText)
+	if !strings.Contains(promptText, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected continuation-oriented prompt, got %s", promptText)
 	}
 	if strings.Contains(promptText, "first user turn") || strings.Contains(promptText, "latest user turn") {
 		t.Fatalf("expected prompt to hide original turns, got %s", promptText)
@@ -669,11 +698,15 @@ func TestCurrentInputFileWorksAcrossAutoDeleteModes(t *testing.T) {
 			if len(ds.uploadCalls) != 1 {
 				t.Fatalf("expected current input upload for mode=%s, got %d", mode, len(ds.uploadCalls))
 			}
+			historyText := string(ds.uploadCalls[0].Data)
+			if !strings.Contains(historyText, "# DS2API_HISTORY.txt") || !strings.Contains(historyText, "=== 1. SYSTEM ===") {
+				t.Fatalf("expected uploaded history text to use numbered transcript format, got %s", historyText)
+			}
 			if ds.completionReq == nil {
 				t.Fatalf("expected completion payload for mode=%s", mode)
 			}
 			promptText, _ := ds.completionReq["prompt"].(string)
-			if !strings.Contains(promptText, "Answer the latest user request directly.") || strings.Contains(promptText, "first user turn") || strings.Contains(promptText, "latest user turn") {
+			if !strings.Contains(promptText, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") || strings.Contains(promptText, "first user turn") || strings.Contains(promptText, "latest user turn") {
 				t.Fatalf("unexpected prompt for mode=%s: %s", mode, promptText)
 			}
 		})
--- a/internal/httpapi/openai/responses/empty_retry_runtime.go
+++ b/internal/httpapi/openai/responses/empty_retry_runtime.go
@@ -222,7 +222,13 @@ func (h *Handler) consumeResponsesStreamAttempt(r *http.Request, resp *http.Resp
 				finalReason = "content_filter"
 			}
 		},
+		OnContextDone: func() {
+			streamRuntime.markContextCancelled()
+		},
 	})
+	if streamRuntime.finalErrorCode == string(streamengine.StopReasonContextCancelled) {
+		return true, false
+	}
 	terminalWritten := streamRuntime.finalize(finalReason, allowDeferEmpty && finalReason != "content_filter")
 	if terminalWritten {
 		return true, false
@@ -235,6 +241,10 @@ func logResponsesStreamTerminal(streamRuntime *responsesStreamRuntime, attempts
 	if attempts > 0 {
 		source = "synthetic_retry"
 	}
+	if streamRuntime.finalErrorCode == string(streamengine.StopReasonContextCancelled) {
+		config.Logger.Info("[openai_empty_retry] terminal cancelled", "surface", "responses", "stream", true, "retry_attempts", attempts, "error_code", streamRuntime.finalErrorCode)
+		return
+	}
 	if streamRuntime.failed {
 		config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "responses", "stream", true, "retry_attempts", attempts, "success_source", "none", "error_code", streamRuntime.finalErrorCode)
 		return
--- a/internal/httpapi/openai/responses/empty_retry_runtime_test.go
+++ b/internal/httpapi/openai/responses/empty_retry_runtime_test.go
@@ -0,0 +1,70 @@
+package responses
+
+import (
+	"context"
+	"io"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+
+	"ds2api/internal/promptcompat"
+	"ds2api/internal/stream"
+)
+
+func makeResponsesOpenAISSEHTTPResponse(lines ...string) *http.Response {
+	body := strings.Join(lines, "\n")
+	if !strings.HasSuffix(body, "\n") {
+		body += "\n"
+	}
+	return &http.Response{
+		StatusCode: http.StatusOK,
+		Header:     make(http.Header),
+		Body:       io.NopCloser(strings.NewReader(body)),
+	}
+}
+
+func TestConsumeResponsesStreamAttemptMarksContextCancelledState(t *testing.T) {
+	ctx, cancel := context.WithCancel(context.Background())
+	cancel()
+
+	req := httptest.NewRequest(http.MethodPost, "/v1/responses", nil).WithContext(ctx)
+	rec := httptest.NewRecorder()
+	streamRuntime := newResponsesStreamRuntime(
+		rec,
+		http.NewResponseController(rec),
+		true,
+		"resp-cancelled",
+		"deepseek-v4-flash",
+		"prompt",
+		false,
+		false,
+		true,
+		nil,
+		nil,
+		false,
+		false,
+		promptcompat.DefaultToolChoicePolicy(),
+		"",
+		nil,
+	)
+	resp := makeResponsesOpenAISSEHTTPResponse(
+		`data: {"p":"response/content","v":"hello"}`,
+		`data: [DONE]`,
+	)
+
+	h := &Handler{}
+	terminalWritten, retryable := h.consumeResponsesStreamAttempt(req, resp, streamRuntime, "text", false, true)
+	if !terminalWritten || retryable {
+		t.Fatalf("expected cancelled attempt to terminate without retry, got terminalWritten=%v retryable=%v", terminalWritten, retryable)
+	}
+	if !streamRuntime.failed {
+		t.Fatalf("expected cancelled response stream to be marked failed")
+	}
+	if got, want := streamRuntime.finalErrorCode, string(stream.StopReasonContextCancelled); got != want {
+		t.Fatalf("expected cancelled final error code %q, got %q", want, got)
+	}
+	if streamRuntime.finalErrorMessage == "" {
+		t.Fatalf("expected cancelled final error message to be preserved")
+	}
+}
--- a/internal/httpapi/openai/responses/responses_stream_delta_batch.go
+++ b/internal/httpapi/openai/responses/responses_stream_delta_batch.go
@@ -0,0 +1,39 @@
+package responses
+
+import (
+	"strings"
+
+	openaifmt "ds2api/internal/format/openai"
+)
+
+type responsesDeltaBatch struct {
+	runtime *responsesStreamRuntime
+	kind    string
+	text    strings.Builder
+}
+
+func (b *responsesDeltaBatch) append(kind, text string) {
+	if text == "" {
+		return
+	}
+	if b.kind != "" && b.kind != kind {
+		b.flush()
+	}
+	b.kind = kind
+	b.text.WriteString(text)
+}
+
+func (b *responsesDeltaBatch) flush() {
+	if b.kind == "" || b.text.Len() == 0 {
+		return
+	}
+	text := b.text.String()
+	switch b.kind {
+	case "reasoning":
+		b.runtime.sendEvent("response.reasoning.delta", openaifmt.BuildResponsesReasoningDeltaPayload(b.runtime.responseID, text))
+	case "text":
+		b.runtime.emitTextDelta(text)
+	}
+	b.kind = ""
+	b.text.Reset()
+}
--- a/internal/httpapi/openai/responses/responses_stream_runtime_core.go
+++ b/internal/httpapi/openai/responses/responses_stream_runtime_core.go
@@ -139,6 +139,13 @@ func (s *responsesStreamRuntime) failResponse(status int, message, code string)
 	s.sendDone()
 }

+func (s *responsesStreamRuntime) markContextCancelled() {
+	s.failed = true
+	s.finalErrorStatus = 499
+	s.finalErrorMessage = "request context cancelled"
+	s.finalErrorCode = string(streamengine.StopReasonContextCancelled)
+}
+
 func (s *responsesStreamRuntime) finalize(finishReason string, deferEmptyOutput bool) bool {
 	s.failed = false
 	s.finalErrorStatus = 0
@@ -222,6 +229,7 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 	}

 	contentSeen := false
+	batch := responsesDeltaBatch{runtime: s}
 	for _, p := range parsed.ToolDetectionThinkingParts {
 		trimmed := sse.TrimContinuationOverlap(s.toolDetectionThinking.String(), p.Text)
 		if trimmed != "" {
@@ -247,7 +255,7 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 				continue
 			}
 			s.thinking.WriteString(trimmed)
-			s.sendEvent("response.reasoning.delta", openaifmt.BuildResponsesReasoningDeltaPayload(s.responseID, trimmed))
+			batch.append("reasoning", trimmed)
 			continue
 		}

@@ -269,11 +277,13 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 			if trimmed == "" {
 				continue
 			}
-			s.emitTextDelta(trimmed)
+			batch.append("text", trimmed)
 			continue
 		}
+		batch.flush()
 		s.processToolStreamEvents(toolstream.ProcessChunk(&s.sieve, rawTrimmed, s.toolNames), true, true)
 	}

+	batch.flush()
 	return streamengine.ParsedDecision{ContentSeen: contentSeen}
 }
--- a/internal/httpapi/openai/responses/responses_stream_test.go
+++ b/internal/httpapi/openai/responses/responses_stream_test.go
@@ -109,6 +109,48 @@ func TestHandleResponsesStreamOutputTextDeltaCarriesItemIndexes(t *testing.T) {
 	}
 }

+func TestHandleResponsesStreamCoalescesSmallOutputTextDeltas(t *testing.T) {
+	h := &Handler{}
+	req := httptest.NewRequest(http.MethodPost, "/v1/responses", nil)
+	rec := httptest.NewRecorder()
+
+	var streamBody strings.Builder
+	for i := 0; i < 100; i++ {
+		b, _ := json.Marshal(map[string]any{
+			"p": "response/content",
+			"v": "字",
+		})
+		streamBody.WriteString("data: ")
+		streamBody.WriteString(string(b))
+		streamBody.WriteString("\n")
+	}
+	streamBody.WriteString("data: [DONE]\n")
+	resp := &http.Response{
+		StatusCode: http.StatusOK,
+		Body:       io.NopCloser(strings.NewReader(streamBody.String())),
+	}
+
+	h.handleResponsesStream(rec, req, resp, "owner-a", "resp_coalesce", "deepseek-v4-flash", "prompt", 0, false, false, nil, nil, promptcompat.DefaultToolChoicePolicy(), "")
+
+	payloads := extractSSEEventPayloads(rec.Body.String(), "response.output_text.delta")
+	if len(payloads) == 0 {
+		t.Fatalf("expected response.output_text.delta payloads, body=%s", rec.Body.String())
+	}
+	var content strings.Builder
+	for _, payload := range payloads {
+		content.WriteString(asString(payload["delta"]))
+	}
+	if got, want := content.String(), strings.Repeat("字", 100); got != want {
+		t.Fatalf("coalesced response content mismatch: got %q want %q body=%s", got, want, rec.Body.String())
+	}
+	if len(payloads) >= 100 {
+		t.Fatalf("expected coalescing to reduce 100 tiny text deltas, got %d body=%s", len(payloads), rec.Body.String())
+	}
+	if !strings.Contains(rec.Body.String(), "event: response.completed") {
+		t.Fatalf("expected completed event, body=%s", rec.Body.String())
+	}
+}
+
 func TestHandleResponsesStreamEmitsDistinctToolCallIDsAcrossSeparateToolBlocks(t *testing.T) {
 	h := &Handler{}
 	req := httptest.NewRequest(http.MethodPost, "/v1/responses", nil)
--- a/internal/js/chat-stream/sse_parse_impl.js
+++ b/internal/js/chat-stream/sse_parse_impl.js
@@ -70,7 +70,6 @@ function finalizeThinkingParts(parts, thinkingEnabled, newType) {
  }
  if (!thinkingEnabled) {
    finalParts = dropThinkingParts(finalParts);
-    finalType = 'text';
  }
  return { parts: finalParts, newType: finalType };
 }
@@ -213,6 +212,12 @@ function parseChunkForContent(chunk, thinkingEnabled, currentType, stripReferenc
    }
  }

+  if (pathValue === 'response/content') {
+    newType = 'text';
+  } else if (pathValue === 'response/thinking_content' && (!thinkingEnabled || newType !== 'text')) {
+    newType = 'thinking';
+  }
+
  let partType = 'text';
  if (pathValue === 'response/thinking_content') {
    if (!thinkingEnabled) {
@@ -226,8 +231,8 @@ function parseChunkForContent(chunk, thinkingEnabled, currentType, stripReferenc
    partType = 'text';
  } else if (pathValue.includes('response/fragments') && pathValue.includes('/content')) {
    partType = newType;
-  } else if (!pathValue && thinkingEnabled) {
-    partType = newType;
+  } else if (!pathValue) {
+    partType = newType || 'text';
  }

  const val = chunk.v;
--- a/internal/js/chat-stream/stream_emitter.js
+++ b/internal/js/chat-stream/stream_emitter.js
@@ -1,5 +1,8 @@
 'use strict';

+const MIN_DELTA_FLUSH_CHARS = 160;
+const MAX_DELTA_FLUSH_WAIT_MS = 80;
+
 function createChatCompletionEmitter({ res, sessionID, created, model, isClosed }) {
  let firstChunkSent = false;

@@ -34,6 +37,62 @@ function createChatCompletionEmitter({ res, sessionID, created, model, isClosed
  };
 }

+function createDeltaCoalescer({ sendDeltaFrame, minFlushChars = MIN_DELTA_FLUSH_CHARS, maxFlushWaitMS = MAX_DELTA_FLUSH_WAIT_MS }) {
+  let pendingField = '';
+  let pendingText = '';
+  let flushTimer = null;
+
+  const clearFlushTimer = () => {
+    if (flushTimer) {
+      clearTimeout(flushTimer);
+      flushTimer = null;
+    }
+  };
+
+  const flush = () => {
+    clearFlushTimer();
+    if (!pendingField || !pendingText) {
+      return;
+    }
+    const delta = { [pendingField]: pendingText };
+    pendingField = '';
+    pendingText = '';
+    sendDeltaFrame(delta);
+  };
+
+  const scheduleFlush = () => {
+    if (flushTimer || maxFlushWaitMS <= 0) {
+      return;
+    }
+    flushTimer = setTimeout(flush, maxFlushWaitMS);
+    if (typeof flushTimer.unref === 'function') {
+      flushTimer.unref();
+    }
+  };
+
+  const append = (field, text) => {
+    if (!field || !text) {
+      return;
+    }
+    if (pendingField && pendingField !== field) {
+      flush();
+    }
+    pendingField = field;
+    pendingText += text;
+    if ([...pendingText].length >= minFlushChars) {
+      flush();
+      return;
+    }
+    scheduleFlush();
+  };
+
+  return {
+    append,
+    flush,
+  };
+}
+
 module.exports = {
  createChatCompletionEmitter,
+  createDeltaCoalescer,
 };
--- a/internal/js/chat-stream/vercel_stream_impl.js
+++ b/internal/js/chat-stream/vercel_stream_impl.js
@@ -20,7 +20,7 @@ const {
  boolDefaultTrue,
  resetStreamToolCallState,
 } = require('./toolcall_policy');
-const { createChatCompletionEmitter } = require('./stream_emitter');
+const { createChatCompletionEmitter, createDeltaCoalescer } = require('./stream_emitter');
 const {
  asString,
  isAbortError,
@@ -191,6 +191,7 @@ async function handleVercelStream(req, res, rawBody, payload) {
      model,
      isClosed: () => clientClosed,
    });
+    const deltaCoalescer = createDeltaCoalescer({ sendDeltaFrame });

    const finish = async (reason, options = {}) => {
      if (ended) {
@@ -201,6 +202,7 @@ async function handleVercelStream(req, res, rawBody, payload) {
        await releaseLease();
        return true;
      }
+      deltaCoalescer.flush();
      const detected = parseStandaloneToolCalls(outputText, toolNames);
      if (detected.length > 0 && !toolCallsDoneEmitted) {
        toolCallsEmitted = true;
@@ -210,6 +212,7 @@ async function handleVercelStream(req, res, rawBody, payload) {
        const tailEvents = flushToolSieve(toolSieveState, toolNames);
        for (const evt of tailEvents) {
          if (evt.type === 'tool_calls' && Array.isArray(evt.calls) && evt.calls.length > 0) {
+            deltaCoalescer.flush();
            toolCallsEmitted = true;
            toolCallsDoneEmitted = true;
            sendDeltaFrame({ tool_calls: formatOpenAIStreamToolCalls(evt.calls, streamToolCallIDs, payload.tools) });
@@ -217,9 +220,10 @@ async function handleVercelStream(req, res, rawBody, payload) {
            continue;
          }
          if (evt.text) {
-            sendDeltaFrame({ content: evt.text });
+            deltaCoalescer.append('content', evt.text);
          }
        }
+        deltaCoalescer.flush();
      }
      if (detected.length > 0 || toolCallsEmitted) {
        reason = 'tool_calls';
@@ -327,7 +331,7 @@ async function handleVercelStream(req, res, rawBody, payload) {
                      continue;
                    }
                    thinkingText += trimmed;
-                    sendDeltaFrame({ reasoning_content: trimmed });
+                    deltaCoalescer.append('reasoning_content', trimmed);
                  }
                } else {
                  const trimmed = trimContinuationOverlap(outputText, p.text);
@@ -339,7 +343,7 @@ async function handleVercelStream(req, res, rawBody, payload) {
                  }
                  outputText += trimmed;
                  if (!toolSieveEnabled) {
-                    sendDeltaFrame({ content: trimmed });
+                    deltaCoalescer.append('content', trimmed);
                    continue;
                  }
                  const events = processToolSieveChunk(toolSieveState, trimmed, toolNames);
@@ -352,19 +356,21 @@ async function handleVercelStream(req, res, rawBody, payload) {
                      const formatted = formatIncrementalToolCallDeltas(filtered, streamToolCallIDs);
                      if (formatted.length > 0) {
                        toolCallsEmitted = true;
-                      sendDeltaFrame({ tool_calls: formatted });
+                        deltaCoalescer.flush();
+                        sendDeltaFrame({ tool_calls: formatted });
                      }
                      continue;
                    }
                    if (evt.type === 'tool_calls') {
                      toolCallsEmitted = true;
                      toolCallsDoneEmitted = true;
+                      deltaCoalescer.flush();
                      sendDeltaFrame({ tool_calls: formatOpenAIStreamToolCalls(evt.calls, streamToolCallIDs, payload.tools) });
                      resetStreamToolCallState(streamToolCallIDs, streamToolNames);
                      continue;
                    }
                    if (evt.text) {
-                      sendDeltaFrame({ content: evt.text });
+                      deltaCoalescer.append('content', evt.text);
                    }
                  }
                }
@@ -510,32 +516,51 @@ function observeContinueState(state, chunk) {
  if (topID > 0) {
    state.responseMessageID = topID;
  }
-  if (chunk.p === 'response/status') {
-    setContinueStatus(state, asString(chunk.v));
-  }
+  observeContinueDirectPatch(state, chunk.p, chunk.v);
  if (chunk.p === 'response') {
    observeContinueBatchPatches(state, 'response', chunk.v);
  } else {
    observeContinueBatchPatches(state, '', chunk.v);
  }
  const response = chunk.v && typeof chunk.v === 'object' ? chunk.v.response : null;
-  if (response && typeof response === 'object') {
-    const id = numberValue(response.message_id);
-    if (id > 0) {
-      state.responseMessageID = id;
-    }
-    setContinueStatus(state, asString(response.status));
-    if (response.auto_continue === true) {
-      state.lastStatus = 'AUTO_CONTINUE';
-    }
-  }
+  observeContinueResponseObject(state, response);
  const messageResponse = chunk.message && typeof chunk.message === 'object' && chunk.message.response;
-  if (messageResponse && typeof messageResponse === 'object') {
-    const id = numberValue(messageResponse.message_id);
-    if (id > 0) {
-      state.responseMessageID = id;
-    }
-    setContinueStatus(state, asString(messageResponse.status));
+  observeContinueResponseObject(state, messageResponse);
+}
+
+function observeContinueDirectPatch(state, path, value) {
+  if (!state) {
+    return;
+  }
+  switch (asString(path).trim().replace(/^\/+|\/+$/g, '')) {
+    case 'response/status':
+    case 'status':
+    case 'response/quasi_status':
+    case 'quasi_status':
+      setContinueStatus(state, asString(value));
+      break;
+    case 'response/auto_continue':
+    case 'auto_continue':
+      if (value === true) {
+        state.lastStatus = 'AUTO_CONTINUE';
+      }
+      break;
+    default:
+      break;
+  }
+}
+
+function observeContinueResponseObject(state, response) {
+  if (!state || !response || typeof response !== 'object') {
+    return;
+  }
+  const id = numberValue(response.message_id);
+  if (id > 0) {
+    state.responseMessageID = id;
+  }
+  setContinueStatus(state, asString(response.status));
+  if (response.auto_continue === true) {
+    state.lastStatus = 'AUTO_CONTINUE';
  }
 }

@@ -563,6 +588,12 @@ function observeContinueBatchPatches(state, parentPath, raw) {
      case 'quasi_status':
        setContinueStatus(state, asString(patch.v));
        break;
+      case 'response/auto_continue':
+      case 'auto_continue':
+        if (patch.v === true) {
+          state.lastStatus = 'AUTO_CONTINUE';
+        }
+        break;
      default:
        break;
    }
--- a/internal/js/helpers/stream-tool-sieve/parse_payload.js
+++ b/internal/js/helpers/stream-tool-sieve/parse_payload.js
@@ -248,6 +248,9 @@ function replaceDSMLToolMarkupOutsideIgnored(text) {
    if (tag) {
      if (tag.dsmlLike) {
        out += `<${tag.closing ? '/' : ''}${tag.name}${raw.slice(tag.nameEnd, tag.end + 1)}`;
+        if (raw[tag.end] !== '>') {
+          out += '>';
+        }
      } else {
        out += raw.slice(tag.start, tag.end + 1);
      }
@@ -424,31 +427,42 @@ function scanToolMarkupTagAt(text, start) {
  }
  const lower = raw.toLowerCase();
  let i = start + 1;
+  while (i < raw.length && raw[i] === '<') {
+    i += 1;
+  }
  const closing = raw[i] === '/';
  if (closing) {
    i += 1;
  }
-  let dsmlLike = false;
-  if (i < raw.length && isToolMarkupPipe(raw[i])) {
-    dsmlLike = true;
-    i += 1;
-  }
-  if (lower.startsWith('dsml', i)) {
-    dsmlLike = true;
-    i += 'dsml'.length;
-    while (i < raw.length && isToolMarkupSeparator(raw[i])) {
-      i += 1;
-    }
-  }
+  const prefix = consumeToolMarkupNamePrefix(raw, lower, i);
+  i = prefix.next;
+  const dsmlLike = prefix.dsmlLike;
  const { name, len } = matchToolMarkupName(lower, i);
  if (!name) {
    return null;
  }
-  const nameEnd = i + len;
+  const originalNameEnd = i + len;
+  let nameEnd = originalNameEnd;
+  while (nameEnd < raw.length && isToolMarkupPipe(raw[nameEnd])) {
+    nameEnd += 1;
+  }
+  const hasTrailingPipe = nameEnd > originalNameEnd;
  if (!hasXmlTagBoundary(raw, nameEnd)) {
    return null;
  }
-  const end = findXmlTagEnd(raw, nameEnd);
+  let end = findXmlTagEnd(raw, nameEnd);
+  if (end < 0) {
+    if (!hasTrailingPipe) {
+      return null;
+    }
+    end = nameEnd - 1;
+  }
+  if (hasTrailingPipe) {
+    const nextLT = raw.indexOf('<', nameEnd);
+    if (nextLT >= 0 && end >= nextLT) {
+      end = nameEnd - 1;
+    }
+  }
  if (end < 0) {
    return null;
  }
@@ -520,37 +534,94 @@ function findPartialToolMarkupStart(text) {
  if (lastLT < 0) {
    return -1;
  }
-  const tail = raw.slice(lastLT);
+  const start = includeDuplicateLeadingLessThan(raw, lastLT);
+  const tail = raw.slice(start);
  if (tail.includes('>')) {
    return -1;
  }
-  const lowerTail = tail.toLowerCase();
-  const candidates = [
-    '<tool_calls', '<invoke', '<parameter',
-    '<|tool_calls', '<|invoke', '<|parameter',
-    '<｜tool_calls', '<｜invoke', '<｜parameter',
-    '<|dsml|tool_calls', '<|dsml|invoke', '<|dsml|parameter',
-    '<｜dsml|tool_calls', '<｜dsml|invoke', '<｜dsml|parameter',
-    '<dsmltool_calls', '<dsmlinvoke', '<dsmlparameter',
-    '<dsml tool_calls', '<dsml invoke', '<dsml parameter',
-    '<dsml|tool_calls', '<dsml|invoke', '<dsml|parameter',
-    '<|dsmltool_calls', '<|dsmlinvoke', '<|dsmlparameter',
-    '<|dsml tool_calls', '<|dsml invoke', '<|dsml parameter',
-  ];
-  for (const candidate of candidates) {
-    if (candidate.startsWith(lowerTail)) {
-      return lastLT;
-    }
+  return isPartialToolMarkupTagPrefix(tail) ? start : -1;
+}
+
+function includeDuplicateLeadingLessThan(text, idx) {
+  let out = idx;
+  while (out > 0 && text[out - 1] === '<') {
+    out -= 1;
  }
-  return -1;
+  return out;
 }

 function isToolMarkupPipe(ch) {
  return ch === '|' || ch === '｜';
 }

-function isToolMarkupSeparator(ch) {
-  return ch === ' ' || ch === '\t' || ch === '\r' || ch === '\n' || isToolMarkupPipe(ch);
+function isPartialToolMarkupTagPrefix(text) {
+  const raw = toStringSafe(text);
+  if (!raw || raw[0] !== '<' || raw.includes('>')) {
+    return false;
+  }
+  const lower = raw.toLowerCase();
+  let i = 1;
+  while (i < raw.length && raw[i] === '<') {
+    i += 1;
+  }
+  if (i >= raw.length) {
+    return true;
+  }
+  if (raw[i] === '/') {
+    i += 1;
+  }
+  while (i <= raw.length) {
+    if (i === raw.length) {
+      return true;
+    }
+    if (hasToolMarkupNamePrefix(lower.slice(i))) {
+      return true;
+    }
+    if ('dsml'.startsWith(lower.slice(i))) {
+      return true;
+    }
+    const next = consumeToolMarkupNamePrefixOnce(raw, lower, i);
+    if (!next.ok) {
+      return false;
+    }
+    i = next.next;
+  }
+  return false;
+}
+
+function consumeToolMarkupNamePrefix(raw, lower, idx) {
+  let next = idx;
+  let dsmlLike = false;
+  while (true) {
+    const consumed = consumeToolMarkupNamePrefixOnce(raw, lower, next);
+    if (!consumed.ok) {
+      return { next, dsmlLike };
+    }
+    next = consumed.next;
+    dsmlLike = true;
+  }
+}
+
+function consumeToolMarkupNamePrefixOnce(raw, lower, idx) {
+  if (idx < raw.length && isToolMarkupPipe(raw[idx])) {
+    return { next: idx + 1, ok: true };
+  }
+  if (idx < raw.length && [' ', '\t', '\r', '\n'].includes(raw[idx])) {
+    return { next: idx + 1, ok: true };
+  }
+  if (lower.startsWith('dsml', idx)) {
+    return { next: idx + 'dsml'.length, ok: true };
+  }
+  return { next: idx, ok: false };
+}
+
+function hasToolMarkupNamePrefix(lowerTail) {
+  for (const name of TOOL_MARKUP_NAMES) {
+    if (lowerTail.startsWith(name) || name.startsWith(lowerTail)) {
+      return true;
+    }
+  }
+  return false;
 }

 function matchToolMarkupName(lower, start) {
--- a/internal/js/helpers/stream-tool-sieve/tool-keywords.js
+++ b/internal/js/helpers/stream-tool-sieve/tool-keywords.js
@@ -1,55 +0,0 @@
-'use strict';
-
-const XML_TOOL_SEGMENT_TAGS = [
-  '<|dsml|tool_calls>', '<|dsml|tool_calls\n', '<|dsml|tool_calls ',
-  '<｜dsml|tool_calls>', '<｜dsml|tool_calls\n', '<｜dsml|tool_calls ',
-  '<|dsml|invoke ', '<|dsml|invoke\n', '<|dsml|invoke\t', '<|dsml|invoke\r',
-  '<|dsmltool_calls>', '<|dsmltool_calls\n', '<|dsmltool_calls ',
-  '<|dsmlinvoke ', '<|dsmlinvoke\n', '<|dsmlinvoke\t', '<|dsmlinvoke\r',
-  '<|dsml tool_calls>', '<|dsml tool_calls\n', '<|dsml tool_calls ',
-  '<|dsml invoke ', '<|dsml invoke\n', '<|dsml invoke\t', '<|dsml invoke\r',
-  '<dsml|tool_calls>', '<dsml|tool_calls\n', '<dsml|tool_calls ',
-  '<dsml|invoke ', '<dsml|invoke\n', '<dsml|invoke\t', '<dsml|invoke\r',
-  '<dsmltool_calls>', '<dsmltool_calls\n', '<dsmltool_calls ',
-  '<dsmlinvoke ', '<dsmlinvoke\n', '<dsmlinvoke\t', '<dsmlinvoke\r',
-  '<dsml tool_calls>', '<dsml tool_calls\n', '<dsml tool_calls ',
-  '<dsml invoke ', '<dsml invoke\n', '<dsml invoke\t', '<dsml invoke\r',
-  '<｜tool_calls>', '<｜tool_calls\n', '<｜tool_calls ',
-  '<｜invoke ', '<｜invoke\n', '<｜invoke\t', '<｜invoke\r',
-  '<|tool_calls>', '<|tool_calls\n', '<|tool_calls ',
-  '<|invoke ', '<|invoke\n', '<|invoke\t', '<|invoke\r',
-  '<tool_calls>', '<tool_calls\n', '<tool_calls ',
-  '<invoke ', '<invoke\n', '<invoke\t', '<invoke\r',
-];
-
-const XML_TOOL_OPENING_TAGS = [
-  '<|dsml|tool_calls',
-  '<｜dsml|tool_calls',
-  '<|dsmltool_calls',
-  '<|dsml tool_calls',
-  '<dsml|tool_calls',
-  '<dsmltool_calls',
-  '<dsml tool_calls',
-  '<｜tool_calls',
-  '<|tool_calls',
-  '<tool_calls',
-];
-
-const XML_TOOL_CLOSING_TAGS = [
-  '</|dsml|tool_calls>',
-  '</｜dsml|tool_calls>',
-  '</|dsmltool_calls>',
-  '</|dsml tool_calls>',
-  '</dsml|tool_calls>',
-  '</dsmltool_calls>',
-  '</dsml tool_calls>',
-  '</｜tool_calls>',
-  '</|tool_calls>',
-  '</tool_calls>',
-];
-
-module.exports = {
-  XML_TOOL_SEGMENT_TAGS,
-  XML_TOOL_OPENING_TAGS,
-  XML_TOOL_CLOSING_TAGS,
-};
--- a/internal/promptcompat/history_transcript.go
+++ b/internal/promptcompat/history_transcript.go
@@ -1,35 +1,108 @@
 package promptcompat

 import (
+	"fmt"
 	"strings"
-
-	"ds2api/internal/prompt"
 )

-const CurrentInputContextFilename = "history.txt"
+const CurrentInputContextFilename = "DS2API_HISTORY.txt"
+
+const historyTranscriptTitle = "# DS2API_HISTORY.txt"
+const historyTranscriptSummary = "Prior conversation history and tool progress."

 func BuildOpenAIHistoryTranscript(messages []any) string {
-	return buildOpenAIInjectedFileTranscript(messages)
+	return buildOpenAIHistoryTranscript(messages)
 }

 func BuildOpenAICurrentUserInputTranscript(text string) string {
 	if strings.TrimSpace(text) == "" {
 		return ""
 	}
-	return BuildOpenAICurrentInputContextTranscript([]any{
+	return buildOpenAIHistoryTranscript([]any{
 		map[string]any{"role": "user", "content": text},
 	})
 }

 func BuildOpenAICurrentInputContextTranscript(messages []any) string {
-	return buildOpenAIInjectedFileTranscript(messages)
+	return buildOpenAIHistoryTranscript(messages)
 }

-func buildOpenAIInjectedFileTranscript(messages []any) string {
-	normalized := NormalizeOpenAIMessagesForPrompt(messages, "")
-	transcript := strings.TrimSpace(prompt.MessagesPrepare(normalized))
+func buildOpenAIHistoryTranscript(messages []any) string {
+	if len(messages) == 0 {
+		return ""
+	}
+	var b strings.Builder
+	b.WriteString(historyTranscriptTitle)
+	b.WriteString("\n")
+	b.WriteString(historyTranscriptSummary)
+	b.WriteString("\n\n")
+
+	entry := 0
+	for _, raw := range messages {
+		msg, ok := raw.(map[string]any)
+		if !ok {
+			continue
+		}
+		role := normalizeOpenAIRoleForPrompt(strings.ToLower(strings.TrimSpace(asString(msg["role"]))))
+		content := strings.TrimSpace(buildOpenAIHistoryEntry(role, msg))
+		if content == "" {
+			continue
+		}
+		entry++
+		fmt.Fprintf(&b, "=== %d. %s ===\n%s\n\n", entry, strings.ToUpper(roleLabelForHistory(role)), content)
+	}
+
+	transcript := strings.TrimSpace(b.String())
 	if transcript == "" {
 		return ""
 	}
-	return transcript
+	return transcript + "\n"
+}
+
+func buildOpenAIHistoryEntry(role string, msg map[string]any) string {
+	switch role {
+	case "assistant":
+		return strings.TrimSpace(buildAssistantContentForPrompt(msg))
+	case "tool", "function":
+		return strings.TrimSpace(buildToolHistoryContent(msg))
+	case "system", "user":
+		return strings.TrimSpace(NormalizeOpenAIContentForPrompt(msg["content"]))
+	default:
+		return strings.TrimSpace(NormalizeOpenAIContentForPrompt(msg["content"]))
+	}
+}
+
+func buildToolHistoryContent(msg map[string]any) string {
+	content := strings.TrimSpace(NormalizeOpenAIContentForPrompt(msg["content"]))
+	parts := make([]string, 0, 2)
+	if name := strings.TrimSpace(asString(msg["name"])); name != "" {
+		parts = append(parts, "name="+name)
+	}
+	if callID := strings.TrimSpace(asString(msg["tool_call_id"])); callID != "" {
+		parts = append(parts, "tool_call_id="+callID)
+	}
+	header := ""
+	if len(parts) > 0 {
+		header = "[" + strings.Join(parts, " ") + "]"
+	}
+	switch {
+	case header != "" && content != "":
+		return header + "\n" + content
+	case header != "":
+		return header
+	default:
+		return content
+	}
+}
+
+func roleLabelForHistory(role string) string {
+	role = strings.ToLower(strings.TrimSpace(role))
+	switch role {
+	case "function":
+		return "tool"
+	case "":
+		return "unknown"
+	default:
+		return role
+	}
 }
--- a/internal/promptcompat/prompt_build_test.go
+++ b/internal/promptcompat/prompt_build_test.go
@@ -88,6 +88,58 @@ func TestBuildOpenAIFinalPrompt_VercelPreparePathKeepsFinalAnswerInstruction(t *
 	}
 }

+func TestBuildOpenAIFinalPromptReadLikeToolIncludesCacheGuard(t *testing.T) {
+	messages := []any{
+		map[string]any{"role": "user", "content": "请读取文件"},
+	}
+	tools := []any{
+		map[string]any{
+			"type": "function",
+			"function": map[string]any{
+				"name":        "read_file",
+				"description": "Read a file",
+				"parameters": map[string]any{
+					"type": "object",
+				},
+			},
+		},
+	}
+
+	finalPrompt, _ := buildOpenAIFinalPrompt(messages, tools, "", false)
+	if !strings.Contains(finalPrompt, "Read-tool cache guard") {
+		t.Fatalf("read-like tool prompt missing cache guard: %q", finalPrompt)
+	}
+	if !strings.Contains(finalPrompt, "provides no file body") {
+		t.Fatalf("read-like tool prompt missing no-body handling: %q", finalPrompt)
+	}
+	if !strings.Contains(finalPrompt, "Do not repeatedly call the same read request") {
+		t.Fatalf("read-like tool prompt missing loop guard: %q", finalPrompt)
+	}
+}
+
+func TestBuildOpenAIFinalPromptNonReadToolOmitsCacheGuard(t *testing.T) {
+	messages := []any{
+		map[string]any{"role": "user", "content": "搜索一下"},
+	}
+	tools := []any{
+		map[string]any{
+			"type": "function",
+			"function": map[string]any{
+				"name":        "search",
+				"description": "Search docs",
+				"parameters": map[string]any{
+					"type": "object",
+				},
+			},
+		},
+	}
+
+	finalPrompt, _ := buildOpenAIFinalPrompt(messages, tools, "", false)
+	if strings.Contains(finalPrompt, "Read-tool cache guard") {
+		t.Fatalf("non-read tool prompt should not include read cache guard: %q", finalPrompt)
+	}
+}
+
 func TestBuildOpenAIFinalPromptWithThinkingKeepsPromptUnchanged(t *testing.T) {
 	messages := []any{
 		map[string]any{"role": "user", "content": "继续回答上一个问题"},
--- a/internal/promptcompat/tool_prompt.go
+++ b/internal/promptcompat/tool_prompt.go
@@ -4,6 +4,7 @@ import (
 	"encoding/json"
 	"fmt"
 	"strings"
+	"unicode"

 	"ds2api/internal/toolcall"
 )
@@ -46,6 +47,9 @@ func injectToolPrompt(messages []map[string]any, tools []any, policy ToolChoiceP
 		return messages, names
 	}
 	toolPrompt := "You have access to these tools:\n\n" + strings.Join(toolSchemas, "\n\n") + "\n\n" + toolcall.BuildToolCallInstructions(names)
+	if hasReadLikeTool(names) {
+		toolPrompt += "\n\nRead-tool cache guard: If a Read/read_file-style tool result says the file is unchanged, already available in history, should be referenced from previous context, or otherwise provides no file body, treat that result as missing content. Do not repeatedly call the same read request for that missing body. Request a full-content read if the tool supports it, or tell the user that the file contents need to be provided again."
+	}
 	if policy.Mode == ToolChoiceRequired {
 		toolPrompt += "\n7) For this response, you MUST call at least one tool from the allowed list."
 	}
@@ -64,3 +68,23 @@ func injectToolPrompt(messages []map[string]any, tools []any, policy ToolChoiceP
 	messages = append([]map[string]any{{"role": "system", "content": toolPrompt}}, messages...)
 	return messages, names
 }
+
+func hasReadLikeTool(names []string) bool {
+	for _, name := range names {
+		switch normalizeToolNameForGuard(name) {
+		case "read", "readfile":
+			return true
+		}
+	}
+	return false
+}
+
+func normalizeToolNameForGuard(name string) string {
+	var b strings.Builder
+	for _, r := range strings.ToLower(strings.TrimSpace(name)) {
+		if unicode.IsLetter(r) || unicode.IsDigit(r) {
+			b.WriteRune(r)
+		}
+	}
+	return b.String()
+}
--- a/internal/sse/consumer_edge_test.go
+++ b/internal/sse/consumer_edge_test.go
@@ -41,6 +41,15 @@ func TestCollectStreamTextOnly(t *testing.T) {
 	}
 }

+func TestCollectStreamHandlesLongSingleSSELine(t *testing.T) {
+	payload := strings.Repeat("x", 2*1024*1024+4096)
+	resp := makeHTTPResponse(makeLargeContentSSEBody(t, payload))
+	result := CollectStream(resp, false, true)
+	if result.Text != payload {
+		t.Fatalf("long SSE line payload mismatch: got len=%d want len=%d", len(result.Text), len(payload))
+	}
+}
+
 func TestCollectStreamThinkingAndText(t *testing.T) {
 	resp := makeHTTPResponse(
 		"data: {\"p\":\"response/thinking_content\",\"v\":\"Thinking...\"}\n" +
--- a/internal/sse/parser.go
+++ b/internal/sse/parser.go
@@ -92,6 +92,7 @@ func ParseSSEChunkForContentDetailed(chunk map[string]any, thinkingEnabled bool,
 	}
 	newType := currentFragmentType
 	parts := make([]ContentPart, 0, 8)
+	updateTypeFromExplicitPath(path, thinkingEnabled, &newType)
 	collectDirectFragments(path, chunk, v, &newType, &parts)
 	updateTypeFromNestedResponse(path, v, &newType)
 	partType := resolvePartType(path, thinkingEnabled, newType)
@@ -107,11 +108,24 @@ func ParseSSEChunkForContentDetailed(chunk map[string]any, thinkingEnabled bool,
 	detectionThinkingParts := selectThinkingParts(parts)
 	if !thinkingEnabled {
 		parts = dropThinkingParts(parts)
-		newType = "text"
 	}
 	return parts, detectionThinkingParts, false, newType
 }

+func updateTypeFromExplicitPath(path string, thinkingEnabled bool, newType *string) {
+	if newType == nil {
+		return
+	}
+	switch path {
+	case "response/content":
+		*newType = "text"
+	case "response/thinking_content":
+		if !thinkingEnabled || *newType != "text" {
+			*newType = "thinking"
+		}
+	}
+}
+
 func selectThinkingParts(parts []ContentPart) []ContentPart {
 	if len(parts) == 0 {
 		return nil
@@ -206,8 +220,11 @@ func resolvePartType(path string, thinkingEnabled bool, newType string) string {
 		return "text"
 	case strings.Contains(path, "response/fragments") && strings.Contains(path, "/content"):
 		return newType
-	case path == "" && thinkingEnabled:
-		return newType
+	case path == "":
+		if newType != "" {
+			return newType
+		}
+		return "text"
 	default:
 		return "text"
 	}
--- a/internal/sse/parser_test.go
+++ b/internal/sse/parser_test.go
@@ -88,6 +88,71 @@ func TestParseSSEChunkForContentAfterAppendUsesUpdatedType(t *testing.T) {
 	}
 }

+func TestParseSSEChunkForContentThinkingDisabledKeepsHiddenFragmentState(t *testing.T) {
+	chunk1 := map[string]any{
+		"p": "response/fragments",
+		"o": "APPEND",
+		"v": []any{
+			map[string]any{"type": "THINK", "content": "我们"},
+		},
+	}
+	parts1, finished1, nextType1 := ParseSSEChunkForContent(chunk1, false, "text")
+	if finished1 {
+		t.Fatal("expected first chunk unfinished")
+	}
+	if nextType1 != "thinking" {
+		t.Fatalf("expected hidden THINK fragment to keep next type thinking, got %q", nextType1)
+	}
+	if len(parts1) != 0 {
+		t.Fatalf("expected hidden thinking to be dropped, got %#v", parts1)
+	}
+
+	chunk2 := map[string]any{
+		"p": "response/fragments/-1/content",
+		"v": "被",
+	}
+	parts2, finished2, nextType2 := ParseSSEChunkForContent(chunk2, false, nextType1)
+	if finished2 {
+		t.Fatal("expected second chunk unfinished")
+	}
+	if nextType2 != "thinking" {
+		t.Fatalf("expected hidden continuation to keep next type thinking, got %q", nextType2)
+	}
+	if len(parts2) != 0 {
+		t.Fatalf("expected hidden continuation to be dropped, got %#v", parts2)
+	}
+
+	chunk3 := map[string]any{"v": "要求"}
+	parts3, finished3, nextType3 := ParseSSEChunkForContent(chunk3, false, nextType2)
+	if finished3 {
+		t.Fatal("expected third chunk unfinished")
+	}
+	if nextType3 != "thinking" {
+		t.Fatalf("expected pathless hidden continuation to keep next type thinking, got %q", nextType3)
+	}
+	if len(parts3) != 0 {
+		t.Fatalf("expected pathless hidden continuation to be dropped, got %#v", parts3)
+	}
+
+	chunk4 := map[string]any{
+		"p": "response/fragments",
+		"o": "APPEND",
+		"v": []any{
+			map[string]any{"type": "RESPONSE", "content": "答"},
+		},
+	}
+	parts4, finished4, nextType4 := ParseSSEChunkForContent(chunk4, false, nextType3)
+	if finished4 {
+		t.Fatal("expected fourth chunk unfinished")
+	}
+	if nextType4 != "text" {
+		t.Fatalf("expected RESPONSE fragment to switch next type text, got %q", nextType4)
+	}
+	if len(parts4) != 1 || parts4[0].Type != "text" || parts4[0].Text != "答" {
+		t.Fatalf("expected visible response text, got %#v", parts4)
+	}
+}
+
 func TestParseSSEChunkForContentAutoTransitionsThinkClose(t *testing.T) {
 	chunk := map[string]any{
 		"p": "response/thinking_content",
--- a/internal/sse/stream.go
+++ b/internal/sse/stream.go
@@ -9,8 +9,7 @@ import (

 const (
 	parsedLineBufferSize = 128
-	scannerBufferSize    = 64 * 1024
-	maxScannerLineSize   = 2 * 1024 * 1024
+	lineReaderBufferSize = 64 * 1024
 	minFlushChars        = 160
 	maxFlushWait         = 80 * time.Millisecond
 )
@@ -29,8 +28,8 @@ func StartParsedLinePump(ctx context.Context, body io.Reader, thinkingEnabled bo
 			eof  bool
 		}
 		lineCh := make(chan scanItem, 1)
-		stopScanner := make(chan struct{})
-		defer close(stopScanner)
+		stopReader := make(chan struct{})
+		defer close(stopReader)
 		go func() {
 			sendScanItem := func(item scanItem) bool {
 				select {
@@ -38,20 +37,28 @@ func StartParsedLinePump(ctx context.Context, body io.Reader, thinkingEnabled bo
 					return true
 				case <-ctx.Done():
 					return false
-				case <-stopScanner:
+				case <-stopReader:
 					return false
 				}
 			}
 			defer close(lineCh)
-			scanner := bufio.NewScanner(body)
-			scanner.Buffer(make([]byte, 0, scannerBufferSize), maxScannerLineSize)
-			for scanner.Scan() {
-				line := append([]byte{}, scanner.Bytes()...)
-				if !sendScanItem(scanItem{line: line}) {
+			reader := bufio.NewReaderSize(body, lineReaderBufferSize)
+			for {
+				line, err := reader.ReadBytes('\n')
+				if len(line) > 0 {
+					line = append([]byte{}, line...)
+					if !sendScanItem(scanItem{line: line}) {
+						return
+					}
+				}
+				if err != nil {
+					if err == io.EOF {
+						err = nil
+					}
+					_ = sendScanItem(scanItem{err: err, eof: true})
 					return
 				}
 			}
-			_ = sendScanItem(scanItem{err: scanner.Err(), eof: true})
 		}()

 		ticker := time.NewTicker(maxFlushWait)
--- a/internal/sse/stream_edge_test.go
+++ b/internal/sse/stream_edge_test.go
@@ -158,11 +158,13 @@ func TestStartParsedLinePumpNonSSELines(t *testing.T) {

 func TestStartParsedLinePumpThinkingDisabled(t *testing.T) {
 	body := strings.NewReader(
-		"data: {\"p\":\"response/thinking_content\",\"v\":\"thought\"}\n" +
+		"data: {\"p\":\"response/fragments\",\"o\":\"APPEND\",\"v\":[{\"type\":\"THINK\",\"content\":\"思\"}]}\n" +
+			"data: {\"p\":\"response/fragments/-1/content\",\"v\":\"考\"}\n" +
+			"data: {\"v\":\"隐藏\"}\n" +
+			"data: {\"p\":\"response/fragments\",\"o\":\"APPEND\",\"v\":[{\"type\":\"RESPONSE\",\"content\":\"答\"}]}\n" +
 			"data: {\"p\":\"response/content\",\"v\":\"response\"}\n" +
 			"data: [DONE]\n",
 	)
-	// With thinking disabled, thinking content should still be emitted but marked differently
 	results, done := StartParsedLinePump(context.Background(), body, false, "text")

 	var parts []ContentPart
@@ -171,8 +173,15 @@ func TestStartParsedLinePumpThinkingDisabled(t *testing.T) {
 	}
 	<-done

-	if len(parts) < 1 {
-		t.Fatalf("expected at least 1 part, got %d", len(parts))
+	got := strings.Builder{}
+	for _, p := range parts {
+		if p.Type != "text" {
+			t.Fatalf("expected only text parts with thinking disabled, got %#v", parts)
+		}
+		got.WriteString(p.Text)
+	}
+	if got.String() != "答response" {
+		t.Fatalf("expected hidden thinking to be dropped, got %q from %#v", got.String(), parts)
 	}
 }

--- a/internal/sse/stream_test.go
+++ b/internal/sse/stream_test.go
@@ -2,10 +2,23 @@ package sse

 import (
 	"context"
+	"encoding/json"
 	"strings"
 	"testing"
 )

+func makeLargeContentSSEBody(t *testing.T, payload string) string {
+	t.Helper()
+	line, err := json.Marshal(map[string]any{
+		"p": "response/content",
+		"v": payload,
+	})
+	if err != nil {
+		t.Fatalf("marshal SSE line failed: %v", err)
+	}
+	return "data: " + string(line) + "\n" + "data: [DONE]\n"
+}
+
 func TestStartParsedLinePumpParsesAndStops(t *testing.T) {
 	body := strings.NewReader("data: {\"p\":\"response/content\",\"v\":\"hi\"}\n\ndata: [DONE]\n")
 	results, done := StartParsedLinePump(context.Background(), body, false, "text")
@@ -28,3 +41,28 @@ func TestStartParsedLinePumpParsesAndStops(t *testing.T) {
 		t.Fatalf("expected last line to stop stream, got parsed=%v stop=%v", last.Parsed, last.Stop)
 	}
 }
+
+func TestStartParsedLinePumpHandlesLongSingleSSELine(t *testing.T) {
+	payload := strings.Repeat("x", 2*1024*1024+4096)
+	results, done := StartParsedLinePump(context.Background(), strings.NewReader(makeLargeContentSSEBody(t, payload)), false, "text")
+
+	var got strings.Builder
+	var sawDone bool
+	for r := range results {
+		for _, p := range r.Parts {
+			got.WriteString(p.Text)
+		}
+		if r.Stop {
+			sawDone = true
+		}
+	}
+	if err := <-done; err != nil {
+		t.Fatalf("unexpected long-line read error: %v", err)
+	}
+	if got.String() != payload {
+		t.Fatalf("long SSE line payload mismatch: got len=%d want len=%d", got.Len(), len(payload))
+	}
+	if !sawDone {
+		t.Fatal("expected DONE after long SSE line")
+	}
+}
--- a/internal/toolcall/toolcalls_dsml.go
+++ b/internal/toolcall/toolcalls_dsml.go
@@ -44,6 +44,9 @@ func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
 			}
 			b.WriteString(tag.Name)
 			b.WriteString(text[tag.NameEnd : tag.End+1])
+			if text[tag.End] != '>' {
+				b.WriteByte('>')
+			}
 			i = tag.End + 1
 			continue
 		}
--- a/internal/toolcall/toolcalls_scan.go
+++ b/internal/toolcall/toolcalls_scan.go
@@ -128,34 +128,39 @@ func scanToolMarkupTagAt(text string, start int) (ToolMarkupTag, bool) {
 	}
 	lower := strings.ToLower(text)
 	i := start + 1
+	for i < len(text) && text[i] == '<' {
+		i++
+	}
 	closing := false
 	if i < len(text) && text[i] == '/' {
 		closing = true
 		i++
 	}
-	dsmlLike := false
-	if next, ok := consumeToolMarkupPipe(text, i); ok {
-		dsmlLike = true
-		i = next
-	}
-	if strings.HasPrefix(lower[i:], "dsml") {
-		dsmlLike = true
-		i += len("dsml")
-		for next, ok := consumeToolMarkupSeparator(text, i); ok; next, ok = consumeToolMarkupSeparator(text, i) {
-			i = next
-		}
-	}
+	i, dsmlLike := consumeToolMarkupNamePrefix(lower, text, i)
 	name, nameLen := matchToolMarkupName(lower, i)
 	if nameLen == 0 {
 		return ToolMarkupTag{}, false
 	}
 	nameEnd := i + nameLen
+	nameEndBeforePipes := nameEnd
+	for next, ok := consumeToolMarkupPipe(text, nameEnd); ok; next, ok = consumeToolMarkupPipe(text, nameEnd) {
+		nameEnd = next
+	}
+	hasTrailingPipe := nameEnd > nameEndBeforePipes
 	if !hasToolMarkupBoundary(text, nameEnd) {
 		return ToolMarkupTag{}, false
 	}
 	end := findXMLTagEnd(text, nameEnd)
 	if end < 0 {
-		return ToolMarkupTag{}, false
+		if !hasTrailingPipe {
+			return ToolMarkupTag{}, false
+		}
+		end = nameEnd - 1
+	}
+	if hasTrailingPipe {
+		if nextLT := strings.IndexByte(text[nameEnd:], '<'); nextLT >= 0 && end >= nameEnd+nextLT {
+			end = nameEnd - 1
+		}
 	}
 	trimmed := strings.TrimSpace(text[start : end+1])
 	return ToolMarkupTag{
@@ -171,6 +176,74 @@ func scanToolMarkupTagAt(text string, start int) (ToolMarkupTag, bool) {
 	}, true
 }

+func IsPartialToolMarkupTagPrefix(text string) bool {
+	if text == "" || text[0] != '<' || strings.Contains(text, ">") {
+		return false
+	}
+	lower := strings.ToLower(text)
+	i := 1
+	for i < len(text) && text[i] == '<' {
+		i++
+	}
+	if i >= len(text) {
+		return true
+	}
+	if text[i] == '/' {
+		i++
+	}
+	for i <= len(text) {
+		if i == len(text) {
+			return true
+		}
+		if hasToolMarkupNamePrefix(lower[i:]) {
+			return true
+		}
+		if strings.HasPrefix("dsml", lower[i:]) {
+			return true
+		}
+		next, ok := consumeToolMarkupNamePrefixOnce(lower, text, i)
+		if !ok {
+			return false
+		}
+		i = next
+	}
+	return false
+}
+
+func consumeToolMarkupNamePrefix(lower, text string, idx int) (int, bool) {
+	dsmlLike := false
+	for {
+		next, ok := consumeToolMarkupNamePrefixOnce(lower, text, idx)
+		if !ok {
+			return idx, dsmlLike
+		}
+		idx = next
+		dsmlLike = true
+	}
+}
+
+func consumeToolMarkupNamePrefixOnce(lower, text string, idx int) (int, bool) {
+	if next, ok := consumeToolMarkupPipe(text, idx); ok {
+		return next, true
+	}
+	if idx < len(text) && (text[idx] == ' ' || text[idx] == '\t' || text[idx] == '\r' || text[idx] == '\n') {
+		return idx + 1, true
+	}
+	if strings.HasPrefix(lower[idx:], "dsml") {
+		return idx + len("dsml"), true
+	}
+	return idx, false
+}
+
+func hasToolMarkupNamePrefix(lowerTail string) bool {
+	for _, name := range toolMarkupNames {
+		if strings.HasPrefix(lowerTail, name) || strings.HasPrefix(name, lowerTail) {
+			return true
+		}
+	}
+	return false
+}
+
 func matchToolMarkupName(lower string, start int) (string, int) {
 	for _, name := range toolMarkupNames {
 		if strings.HasPrefix(lower[start:], name) {
@@ -193,19 +266,6 @@ func consumeToolMarkupPipe(text string, idx int) (int, bool) {
 	return idx, false
 }

-func consumeToolMarkupSeparator(text string, idx int) (int, bool) {
-	if idx >= len(text) {
-		return idx, false
-	}
-	if text[idx] == ' ' || text[idx] == '\t' || text[idx] == '\r' || text[idx] == '\n' {
-		return idx + 1, true
-	}
-	if next, ok := consumeToolMarkupPipe(text, idx); ok {
-		return next, true
-	}
-	return idx, false
-}
-
 func hasToolMarkupBoundary(text string, idx int) bool {
 	if idx >= len(text) {
 		return true
--- a/internal/toolcall/toolcalls_test.go
+++ b/internal/toolcall/toolcalls_test.go
@@ -41,6 +41,52 @@ func TestParseToolCallsSupportsDSMLShell(t *testing.T) {
 	}
 }

+func TestParseToolCallsToleratesDSMLTrailingPipeTagTerminator(t *testing.T) {
+	text := strings.Join([]string{
+		`<|DSML|tool_calls| `,
+		`  <|DSML|invoke name="terminal">`,
+		`    <|DSML|parameter name="command"><![CDATA[find "/home" -type d]]></|DSML|parameter>`,
+		`    <|DSML|parameter name="timeout"><![CDATA[10]]></|DSML|parameter>`,
+		`  </|DSML|invoke>`,
+		`</|DSML|tool_calls>`,
+	}, "\n")
+	calls := ParseToolCalls(text, []string{"terminal"})
+	if len(calls) != 1 {
+		t.Fatalf("expected one trailing-pipe DSML call, got %#v", calls)
+	}
+	if calls[0].Name != "terminal" {
+		t.Fatalf("expected terminal tool, got %#v", calls[0])
+	}
+	if calls[0].Input["command"] != `find "/home" -type d` {
+		t.Fatalf("expected command argument, got %#v", calls[0].Input)
+	}
+	if calls[0].Input["timeout"] != float64(10) {
+		t.Fatalf("expected numeric timeout, got %#v", calls[0].Input)
+	}
+}
+
+func TestParseToolCallsToleratesExtraLeadingLessThanBeforeDSML(t *testing.T) {
+	text := `<<|DSML|tool_calls><<|DSML|invoke name="Bash"><<|DSML|parameter name="command"><![CDATA[pwd]]></|DSML|parameter></|DSML|invoke></|DSML|tool_calls>`
+	calls := ParseToolCalls(text, []string{"Bash"})
+	if len(calls) != 1 {
+		t.Fatalf("expected one extra-leading-less-than DSML call, got %#v", calls)
+	}
+	if calls[0].Name != "Bash" || calls[0].Input["command"] != "pwd" {
+		t.Fatalf("unexpected extra-leading-less-than DSML parse result: %#v", calls[0])
+	}
+}
+
+func TestParseToolCallsToleratesRepeatedDSMLPrefixNoise(t *testing.T) {
+	text := `<<DSML|DSML|tool_calls><<DSML|DSML|invoke name="Bash"><<DSML|DSML|parameter name="command"><![CDATA[git status]]></DSML|DSML|parameter></DSML|DSML|invoke></DSML|DSML|tool_calls>`
+	calls := ParseToolCalls(text, []string{"Bash"})
+	if len(calls) != 1 {
+		t.Fatalf("expected one repeated-prefix DSML call, got %#v", calls)
+	}
+	if calls[0].Name != "Bash" || calls[0].Input["command"] != "git status" {
+		t.Fatalf("unexpected repeated-prefix DSML parse result: %#v", calls[0])
+	}
+}
+
 func TestParseToolCallsSupportsDSMLShellWithCanonicalExampleInCDATA(t *testing.T) {
 	content := `<tool_calls><invoke name="demo"><parameter name="value">x</parameter></invoke></tool_calls>`
 	text := `<|DSML|tool_calls><|DSML|invoke name="Write"><|DSML|parameter name="file_path">notes.md</|DSML|parameter><|DSML|parameter name="content"><![CDATA[` + content + `]]></|DSML|parameter></|DSML|invoke></|DSML|tool_calls>`
--- a/internal/toolstream/tool_sieve_core.go
+++ b/internal/toolstream/tool_sieve_core.go
@@ -1,10 +1,6 @@
 package toolstream

-import (
-	"strings"
-
-	"ds2api/internal/toolcall"
-)
+import "ds2api/internal/toolcall"

 func ProcessChunk(state *State, chunk string, toolNames []string) []Event {
 	if state == nil {
@@ -174,31 +170,27 @@ func findToolSegmentStart(state *State, s string) int {
 	if s == "" {
 		return -1
 	}
-	lower := strings.ToLower(s)
 	offset := 0
 	for {
-		bestKeyIdx := -1
-		matchedTag := ""
-		for _, tag := range xmlToolTagsToDetect {
-			idx := strings.Index(lower[offset:], tag)
-			if idx >= 0 {
-				idx += offset
-				if bestKeyIdx < 0 || idx < bestKeyIdx {
-					bestKeyIdx = idx
-					matchedTag = tag
-				}
-			}
-		}
-		if bestKeyIdx < 0 {
+		tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(s, offset)
+		if !ok {
 			return -1
 		}
-		if !insideCodeFenceWithState(state, s[:bestKeyIdx]) {
-			return bestKeyIdx
+		start := includeDuplicateLeadingLessThan(s, tag.Start)
+		if !insideCodeFenceWithState(state, s[:start]) {
+			return start
 		}
-		offset = bestKeyIdx + len(matchedTag)
+		offset = tag.End + 1
 	}
 }

+func includeDuplicateLeadingLessThan(s string, idx int) int {
+	for idx > 0 && s[idx-1] == '<' {
+		idx--
+	}
+	return idx
+}
+
 func consumeToolCapture(state *State, toolNames []string) (prefix string, calls []toolcall.ParsedToolCall, suffix string, ready bool) {
 	captured := state.capture.String()
 	if captured == "" {
--- a/internal/toolstream/tool_sieve_xml.go
+++ b/internal/toolstream/tool_sieve_xml.go
@@ -153,27 +153,14 @@ func findPartialXMLToolTagStart(s string) int {
 	if lastLT < 0 {
 		return -1
 	}
-	tail := s[lastLT:]
+	start := includeDuplicateLeadingLessThan(s, lastLT)
+	tail := s[start:]
 	// If there's a '>' in the tail, the tag is closed — not partial.
 	if strings.Contains(tail, ">") {
 		return -1
 	}
-	lowerTail := strings.ToLower(tail)
-	for _, tag := range []string{
-		"<tool_calls", "<invoke", "<parameter",
-		"<|tool_calls", "<|invoke", "<|parameter",
-		"<｜tool_calls", "<｜invoke", "<｜parameter",
-		"<|dsml|tool_calls", "<|dsml|invoke", "<|dsml|parameter",
-		"<｜dsml|tool_calls", "<｜dsml|invoke", "<｜dsml|parameter",
-		"<dsmltool_calls", "<dsmlinvoke", "<dsmlparameter",
-		"<dsml tool_calls", "<dsml invoke", "<dsml parameter",
-		"<dsml|tool_calls", "<dsml|invoke", "<dsml|parameter",
-		"<|dsmltool_calls", "<|dsmlinvoke", "<|dsmlparameter",
-		"<|dsml tool_calls", "<|dsml invoke", "<|dsml parameter",
-	} {
-		if strings.HasPrefix(tag, lowerTail) {
-			return lastLT
-		}
+	if toolcall.IsPartialToolMarkupTagPrefix(tail) {
+		return start
 	}
 	return -1
 }
--- a/internal/toolstream/tool_sieve_xml_tags.go
+++ b/internal/toolstream/tool_sieve_xml_tags.go
@@ -1,35 +0,0 @@
-package toolstream
-
-import "regexp"
-
-// --- XML tool call support for the streaming sieve ---
-
-//nolint:unused // kept as explicit tag inventory for future XML sieve refinements.
-var xmlToolCallClosingTags = []string{"</tool_calls>", "</|dsml|tool_calls>", "</|dsmltool_calls>", "</|dsml tool_calls>", "</dsml|tool_calls>", "</dsmltool_calls>", "</dsml tool_calls>", "</｜tool_calls>", "</|tool_calls>"}
-
-// xmlToolCallBlockPattern matches a complete canonical XML tool call block.
-//
-//nolint:unused // reserved for future fast-path XML block detection.
-var xmlToolCallBlockPattern = regexp.MustCompile(`(?is)((?:<tool_calls\b|<\|dsml\|tool_calls\b)[^>]*>\s*(?:.*?)\s*(?:</tool_calls>|</\|dsml\|tool_calls>))`)
-
-// xmlToolTagsToDetect is the set of XML tag prefixes used by findToolSegmentStart.
-var xmlToolTagsToDetect = []string{
-	"<|dsml|tool_calls>", "<|dsml|tool_calls\n", "<|dsml|tool_calls ",
-	"<｜dsml|tool_calls>", "<｜dsml|tool_calls\n", "<｜dsml|tool_calls ",
-	"<|dsml|invoke ", "<|dsml|invoke\n", "<|dsml|invoke\t", "<|dsml|invoke\r",
-	"<|dsmltool_calls>", "<|dsmltool_calls\n", "<|dsmltool_calls ",
-	"<|dsmlinvoke ", "<|dsmlinvoke\n", "<|dsmlinvoke\t", "<|dsmlinvoke\r",
-	"<|dsml tool_calls>", "<|dsml tool_calls\n", "<|dsml tool_calls ",
-	"<|dsml invoke ", "<|dsml invoke\n", "<|dsml invoke\t", "<|dsml invoke\r",
-	"<dsml|tool_calls>", "<dsml|tool_calls\n", "<dsml|tool_calls ",
-	"<dsml|invoke ", "<dsml|invoke\n", "<dsml|invoke\t", "<dsml|invoke\r",
-	"<dsmltool_calls>", "<dsmltool_calls\n", "<dsmltool_calls ",
-	"<dsmlinvoke ", "<dsmlinvoke\n", "<dsmlinvoke\t", "<dsmlinvoke\r",
-	"<dsml tool_calls>", "<dsml tool_calls\n", "<dsml tool_calls ",
-	"<dsml invoke ", "<dsml invoke\n", "<dsml invoke\t", "<dsml invoke\r",
-	"<｜tool_calls>", "<｜tool_calls\n", "<｜tool_calls ",
-	"<｜invoke ", "<｜invoke\n", "<｜invoke\t", "<｜invoke\r",
-	"<|tool_calls>", "<|tool_calls\n", "<|tool_calls ",
-	"<|invoke ", "<|invoke\n", "<|invoke\t", "<|invoke\r",
-	"<tool_calls>", "<tool_calls\n", "<tool_calls ", "<invoke ", "<invoke\n", "<invoke\t", "<invoke\r",
-}
--- a/internal/toolstream/tool_sieve_xml_test.go
+++ b/internal/toolstream/tool_sieve_xml_test.go
@@ -72,6 +72,97 @@ func TestProcessToolSieveInterceptsDSMLToolCallWithoutLeak(t *testing.T) {
 	}
 }

+func TestProcessToolSieveInterceptsDSMLTrailingPipeToolCallWithoutLeak(t *testing.T) {
+	var state State
+	chunks := []string{
+		"<|DSML|tool_calls| \n",
+		`  <|DSML|invoke name="terminal">` + "\n",
+		`    <|DSML|parameter name="command"><![CDATA[find "/home" -type d]]></|DSML|parameter>` + "\n",
+		`    <|DSML|parameter name="timeout"><![CDATA[10]]></|DSML|parameter>` + "\n",
+		"  </|DSML|invoke>\n",
+		"</|DSML|tool_calls>",
+	}
+	var events []Event
+	for _, c := range chunks {
+		events = append(events, ProcessChunk(&state, c, []string{"terminal"})...)
+	}
+	events = append(events, Flush(&state, []string{"terminal"})...)
+
+	var textContent strings.Builder
+	var calls []any
+	for _, evt := range events {
+		textContent.WriteString(evt.Content)
+		for _, call := range evt.ToolCalls {
+			calls = append(calls, call)
+		}
+	}
+	if text := textContent.String(); strings.Contains(strings.ToLower(text), "dsml") || strings.Contains(text, "terminal") {
+		t.Fatalf("trailing-pipe DSML tool call leaked to text: %q events=%#v", text, events)
+	}
+	if len(calls) != 1 {
+		t.Fatalf("expected one trailing-pipe DSML tool call, got %d events=%#v", len(calls), events)
+	}
+}
+
+func TestProcessToolSieveInterceptsExtraLeadingLessThanDSMLToolCallWithoutLeak(t *testing.T) {
+	var state State
+	chunks := []string{
+		"<<|DSML|tool_calls>\n",
+		`  <<|DSML|invoke name="Bash">` + "\n",
+		`    <<|DSML|parameter name="command"><![CDATA[pwd]]></|DSML|parameter>` + "\n",
+		"  </|DSML|invoke>\n",
+		"</|DSML|tool_calls>",
+	}
+	var events []Event
+	for _, c := range chunks {
+		events = append(events, ProcessChunk(&state, c, []string{"Bash"})...)
+	}
+	events = append(events, Flush(&state, []string{"Bash"})...)
+
+	var textContent strings.Builder
+	toolCalls := 0
+	for _, evt := range events {
+		textContent.WriteString(evt.Content)
+		toolCalls += len(evt.ToolCalls)
+	}
+	if text := textContent.String(); strings.Contains(text, "<") || strings.Contains(text, "Bash") {
+		t.Fatalf("extra-leading-less-than DSML tool call leaked to text: %q events=%#v", text, events)
+	}
+	if toolCalls != 1 {
+		t.Fatalf("expected one extra-leading-less-than DSML tool call, got %d events=%#v", toolCalls, events)
+	}
+}
+
+func TestProcessToolSieveInterceptsRepeatedDSMLPrefixNoiseWithoutLeak(t *testing.T) {
+	var state State
+	chunks := []string{
+		"<<DSML|DSML|tool",
+		"_calls>\n",
+		`  <<DSML|DSML|invoke name="Bash">` + "\n",
+		`    <<DSML|DSML|parameter name="command"><![CDATA[git status]]></DSML|DSML|parameter>` + "\n",
+		"  </DSML|DSML|invoke>\n",
+		"</DSML|DSML|tool_calls>",
+	}
+	var events []Event
+	for _, c := range chunks {
+		events = append(events, ProcessChunk(&state, c, []string{"Bash"})...)
+	}
+	events = append(events, Flush(&state, []string{"Bash"})...)
+
+	var textContent strings.Builder
+	toolCalls := 0
+	for _, evt := range events {
+		textContent.WriteString(evt.Content)
+		toolCalls += len(evt.ToolCalls)
+	}
+	if text := textContent.String(); strings.Contains(strings.ToLower(text), "dsml") || strings.Contains(text, "Bash") {
+		t.Fatalf("repeated-prefix DSML tool call leaked to text: %q events=%#v", text, events)
+	}
+	if toolCalls != 1 {
+		t.Fatalf("expected one repeated-prefix DSML tool call, got %d events=%#v", toolCalls, events)
+	}
+}
+
 func TestProcessToolSieveHandlesLongXMLToolCall(t *testing.T) {
 	var state State
 	const toolName = "write_to_file"
@@ -442,6 +533,8 @@ func TestFindToolSegmentStartDetectsXMLToolCalls(t *testing.T) {
 		want  int
 	}{
 		{"tool_calls_tag", "some text <tool_calls>\n", 10},
+		{"dsml_trailing_pipe_tag", "some text <|DSML|tool_calls| \n", 10},
+		{"dsml_extra_leading_less_than", "some text <<|DSML|tool_calls>\n", 10},
 		{"invoke_tag_missing_wrapper", "some text <invoke name=\"read_file\">\n", 10},
 		{"bare_tool_call_text", "prefix <tool_call>\n", -1},
 		{"xml_inside_code_fence", "```xml\n<tool_calls><invoke name=\"read_file\"></invoke></tool_calls>\n```", -1},
@@ -465,6 +558,8 @@ func TestFindPartialXMLToolTagStart(t *testing.T) {
 		want  int
 	}{
 		{"partial_tool_calls", "Hello <tool_ca", 6},
+		{"partial_dsml_trailing_pipe", "Hello <|DSML|tool_calls|", 6},
+		{"partial_dsml_extra_leading_less_than", "Hello <<|DSML|tool_calls", 6},
 		{"partial_invoke", "Hello <inv", 6},
 		{"bare_tool_call_not_held", "Hello <tool_name", -1},
 		{"partial_lt_only", "Text <", 5},
--- a/internal/util/token_count.go
+++ b/internal/util/token_count.go
@@ -1,11 +1,5 @@
 package util

-import (
-	"strings"
-
-	tiktoken "github.com/hupe1980/go-tiktoken"
-)
-
 const (
 	defaultTokenizerModel = "gpt-4o"
 	claudeTokenizerModel  = "claude"
@@ -33,41 +27,6 @@ func CountOutputTokens(text, model string) int {
 	return base
 }

-func countWithTokenizer(text, model string) int {
-	text = strings.TrimSpace(text)
-	if text == "" {
-		return 0
-	}
-	encoding, err := tiktoken.NewEncodingForModel(tokenizerModelForCount(model))
-	if err != nil {
-		return 0
-	}
-	ids, _, err := encoding.Encode(text, nil, nil)
-	if err != nil {
-		return 0
-	}
-	return len(ids)
-}
-
-func tokenizerModelForCount(model string) string {
-	model = strings.ToLower(strings.TrimSpace(model))
-	if model == "" {
-		return defaultTokenizerModel
-	}
-	switch {
-	case strings.HasPrefix(model, "claude"):
-		return claudeTokenizerModel
-	case strings.HasPrefix(model, "gpt-4"), strings.HasPrefix(model, "gpt-5"), strings.HasPrefix(model, "o1"), strings.HasPrefix(model, "o3"), strings.HasPrefix(model, "o4"):
-		return defaultTokenizerModel
-	case strings.HasPrefix(model, "deepseek-v4"):
-		return defaultTokenizerModel
-	case strings.HasPrefix(model, "deepseek"):
-		return defaultTokenizerModel
-	default:
-		return defaultTokenizerModel
-	}
-}
-
 func conservativePromptPadding(base int) int {
 	padding := base / 50
 	if padding < 4 {
--- a/internal/util/token_count_heuristic.go
+++ b/internal/util/token_count_heuristic.go
@@ -0,0 +1,7 @@
+//go:build 386 || arm || mips || mipsle || wasm
+
+package util
+
+func countWithTokenizer(_, _ string) int {
+	return 0
+}
--- a/internal/util/token_count_tiktoken.go
+++ b/internal/util/token_count_tiktoken.go
@@ -0,0 +1,44 @@
+//go:build !386 && !arm && !mips && !mipsle && !wasm
+
+package util
+
+import (
+	"strings"
+
+	tiktoken "github.com/hupe1980/go-tiktoken"
+)
+
+func countWithTokenizer(text, model string) int {
+	text = strings.TrimSpace(text)
+	if text == "" {
+		return 0
+	}
+	encoding, err := tiktoken.NewEncodingForModel(tokenizerModelForCount(model))
+	if err != nil {
+		return 0
+	}
+	ids, _, err := encoding.Encode(text, nil, nil)
+	if err != nil {
+		return 0
+	}
+	return len(ids)
+}
+
+func tokenizerModelForCount(model string) string {
+	model = strings.ToLower(strings.TrimSpace(model))
+	if model == "" {
+		return defaultTokenizerModel
+	}
+	switch {
+	case strings.HasPrefix(model, "claude"):
+		return claudeTokenizerModel
+	case strings.HasPrefix(model, "gpt-4"), strings.HasPrefix(model, "gpt-5"), strings.HasPrefix(model, "o1"), strings.HasPrefix(model, "o3"), strings.HasPrefix(model, "o4"):
+		return defaultTokenizerModel
+	case strings.HasPrefix(model, "deepseek-v4"):
+		return defaultTokenizerModel
+	case strings.HasPrefix(model, "deepseek"):
+		return defaultTokenizerModel
+	default:
+		return defaultTokenizerModel
+	}
+}
--- a/plans/node-syntax-gate-targets.txt
+++ b/plans/node-syntax-gate-targets.txt
@@ -20,4 +20,3 @@ internal/js/helpers/stream-tool-sieve/sieve-xml.js
 internal/js/helpers/stream-tool-sieve/jsonscan.js
 internal/js/helpers/stream-tool-sieve/parse.js
 internal/js/helpers/stream-tool-sieve/format.js
-internal/js/helpers/stream-tool-sieve/tool-keywords.js
--- a/tests/node/chat-stream.test.js
+++ b/tests/node/chat-stream.test.js
@@ -210,6 +210,37 @@ test('vercel stream retries empty output once and keeps one terminal frame', asy
  assert.match(completionBodies[1].prompt, /Previous reply had no visible output\. Please regenerate the visible final answer or tool call now\.$/);
 });

+test('vercel stream coalesces many small content deltas while keeping one choice', async () => {
+  const lines = Array.from({ length: 100 }, () => `data: ${JSON.stringify({ p: 'response/content', v: '字' })}\n\n`);
+  lines.push('data: [DONE]\n\n');
+  const { frames } = await runMockVercelStream(lines);
+  const parsed = frames.filter((frame) => frame !== '[DONE]').map((frame) => JSON.parse(frame));
+  const contentFrames = parsed.filter((item) => item.choices?.[0]?.delta?.content);
+  const content = contentFrames.map((item) => item.choices[0].delta.content).join('');
+  assert.equal(content, '字'.repeat(100));
+  assert.ok(contentFrames.length < 100, `expected fewer than 100 content frames, got ${contentFrames.length}`);
+  for (const item of parsed) {
+    assert.equal(item.choices.length, 1);
+  }
+});
+
+test('vercel stream flushes reasoning before content and before stop', async () => {
+  const { frames } = await runMockVercelStream([
+    `data: ${JSON.stringify({ p: 'response/fragments', o: 'APPEND', v: [
+      { type: 'THINK', content: '思考' },
+      { type: 'THINK', content: '过程' },
+      { type: 'RESPONSE', content: '回答' },
+    ] })}\n\n`,
+    'data: [DONE]\n\n',
+  ], { thinking_enabled: true });
+  const parsed = frames.filter((frame) => frame !== '[DONE]').map((frame) => JSON.parse(frame));
+  const reasoning = parsed.map((item) => item.choices?.[0]?.delta?.reasoning_content || '').join('');
+  const content = parsed.map((item) => item.choices?.[0]?.delta?.content || '').join('');
+  assert.equal(reasoning, '思考过程');
+  assert.equal(content, '回答');
+  assert.equal(parsed.at(-1).choices[0].finish_reason, 'stop');
+});
+
 test('vercel stream exhausts DeepSeek continue before synthetic retry', async () => {
  const { frames, fetchURLs, fetchBodies } = await runMockVercelStreamSequence([
    [
@@ -227,6 +258,28 @@ test('vercel stream exhausts DeepSeek continue before synthetic retry', async ()
  assert.equal(fetchBodies.some((body) => String(body.prompt || '').includes('Previous reply had no visible output')), false);
 });

+test('vercel stream continues direct quasi_status incomplete before final tool call', async () => {
+  const { frames, fetchURLs } = await runMockVercelStreamSequence([
+    [
+      'data: {"response_message_id":7,"p":"response/content","v":"<tool_calls><invoke name=\\"write_file\\"><parameter name=\\"content\\"><![CDATA[part-one"}\n\n',
+      'data: {"p":"response/quasi_status","v":"INCOMPLETE"}\n\n',
+      'data: [DONE]\n\n',
+    ],
+    [
+      'data: {"response_message_id":8,"p":"response/content","v":"-part-two]]></parameter></invoke></tool_calls>"}\n\n',
+      'data: {"p":"response/status","v":"FINISHED"}\n\n',
+      'data: [DONE]\n\n',
+    ],
+  ], { tool_names: ['write_file'] });
+  const parsed = frames.filter((frame) => frame !== '[DONE]').map((frame) => JSON.parse(frame));
+  const toolDelta = parsed.find((item) => item.choices?.[0]?.delta?.tool_calls);
+  assert.equal(fetchURLs.filter((url) => url === 'https://chat.deepseek.com/api/v0/chat/continue').length, 1);
+  assert.ok(toolDelta);
+  const args = JSON.parse(toolDelta.choices[0].delta.tool_calls[0].function.arguments);
+  assert.equal(args.content, 'part-one-part-two');
+  assert.equal(parsed.at(-1).choices[0].finish_reason, 'tool_calls');
+});
+


 test('vercel stream usage completion_tokens does not double-count visible output', async () => {
@@ -511,14 +564,23 @@ test('parseChunkForContent drops thinking content when thinking is disabled', ()
    'text',
  );
  assert.equal(thinking.finished, false);
-  assert.equal(thinking.newType, 'text');
+  assert.equal(thinking.newType, 'thinking');
  assert.deepEqual(thinking.parts, []);

+  const hiddenContinuation = parseChunkForContent(
+    { v: 'still hidden' },
+    false,
+    thinking.newType,
+  );
+  assert.equal(hiddenContinuation.newType, 'thinking');
+  assert.deepEqual(hiddenContinuation.parts, []);
+
  const answer = parseChunkForContent(
    { p: 'response/content', v: 'visible answer' },
    false,
-    thinking.newType,
+    hiddenContinuation.newType,
  );
+  assert.equal(answer.newType, 'text');
  assert.deepEqual(answer.parts, [{ text: 'visible answer', type: 'text' }]);
 });

--- a/tests/node/stream-tool-sieve.test.js
+++ b/tests/node/stream-tool-sieve.test.js
@@ -57,6 +57,49 @@ test('parseToolCalls parses DSML shell as XML-compatible tool call', () => {
  assert.deepEqual(calls[0].input, { path: 'README.MD' });
 });

+test('parseToolCalls tolerates DSML trailing pipe tag terminator', () => {
+  const payload = [
+    '<|DSML|tool_calls| ',
+    '  <|DSML|invoke name="terminal">',
+    '    <|DSML|parameter name="command"><![CDATA[find "/home" -type d]]></|DSML|parameter>',
+    '    <|DSML|parameter name="timeout"><![CDATA[10]]></|DSML|parameter>',
+    '  </|DSML|invoke>',
+    '</|DSML|tool_calls>',
+  ].join('\n');
+  const calls = parseToolCalls(payload, ['terminal']);
+  assert.equal(calls.length, 1);
+  assert.equal(calls[0].name, 'terminal');
+  assert.deepEqual(calls[0].input, { command: 'find "/home" -type d', timeout: 10 });
+});
+
+test('parseToolCalls tolerates extra leading less-than before DSML tags', () => {
+  const payload = [
+    '<<|DSML|tool_calls>',
+    '  <<|DSML|invoke name="Bash">',
+    '    <<|DSML|parameter name="command"><![CDATA[pwd]]></|DSML|parameter>',
+    '  </|DSML|invoke>',
+    '</|DSML|tool_calls>',
+  ].join('\n');
+  const calls = parseToolCalls(payload, ['Bash']);
+  assert.equal(calls.length, 1);
+  assert.equal(calls[0].name, 'Bash');
+  assert.deepEqual(calls[0].input, { command: 'pwd' });
+});
+
+test('parseToolCalls tolerates repeated DSML prefix noise', () => {
+  const payload = [
+    '<<DSML|DSML|tool_calls>',
+    '  <<DSML|DSML|invoke name="Bash">',
+    '    <<DSML|DSML|parameter name="command"><![CDATA[git status]]></DSML|DSML|parameter>',
+    '  </DSML|DSML|invoke>',
+    '</DSML|DSML|tool_calls>',
+  ].join('\n');
+  const calls = parseToolCalls(payload, ['Bash']);
+  assert.equal(calls.length, 1);
+  assert.equal(calls[0].name, 'Bash');
+  assert.deepEqual(calls[0].input, { command: 'git status' });
+});
+
 test('parseToolCalls tolerates DSML space-separator typo', () => {
  const payload = '<|DSML tool_calls><|DSML invoke name="Read"><|DSML parameter name="file_path"><![CDATA[/tmp/input.txt]]></|DSML parameter></|DSML invoke></|DSML tool_calls>';
  const calls = parseToolCalls(payload, ['Read']);
@@ -285,6 +328,39 @@ test('sieve emits tool_calls for DSML space-separator typo', () => {
  assert.equal(text.includes('<|DSML invoke'), false);
 });

+test('sieve emits tool_calls for DSML trailing pipe tag terminator', () => {
+  const events = runSieve([
+    '<|DSML|tool_calls| \n',
+    '<|DSML|invoke name="terminal">\n',
+    '<|DSML|parameter name="command"><![CDATA[find "/home" -type d]]></|DSML|parameter>\n',
+    '<|DSML|parameter name="timeout"><![CDATA[10]]></|DSML|parameter>\n',
+    '</|DSML|invoke>\n',
+    '</|DSML|tool_calls>',
+  ], ['terminal']);
+  const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
+  const text = collectText(events);
+  assert.equal(finalCalls.length, 1);
+  assert.equal(finalCalls[0].name, 'terminal');
+  assert.deepEqual(finalCalls[0].input, { command: 'find "/home" -type d', timeout: 10 });
+  assert.equal(text.toLowerCase().includes('dsml'), false);
+});
+
+test('sieve emits tool_calls for extra leading less-than DSML tags without leaking prefix', () => {
+  const events = runSieve([
+    '<<|DSML|tool_calls>\n',
+    '<<|DSML|invoke name="Bash">\n',
+    '<<|DSML|parameter name="command"><![CDATA[pwd]]></|DSML|parameter>\n',
+    '</|DSML|invoke>\n',
+    '</|DSML|tool_calls>',
+  ], ['Bash']);
+  const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
+  const text = collectText(events);
+  assert.equal(finalCalls.length, 1);
+  assert.equal(finalCalls[0].name, 'Bash');
+  assert.deepEqual(finalCalls[0].input, { command: 'pwd' });
+  assert.equal(text.includes('<'), false);
+});
+
 test('sieve keeps DSML space lookalike tag names as text', () => {
  const input = '<|DSML tool_calls_extra><|DSML invoke name="Read"><|DSML parameter name="file_path">/tmp/input.txt</|DSML parameter></|DSML invoke></|DSML tool_calls_extra>';
  const events = runSieve([input], ['Read']);
--- a/tests/raw_stream_samples/longtext-deepseek-v4-flash-20260429/upstream.stream.sse
+++ b/tests/raw_stream_samples/longtext-deepseek-v4-flash-20260429/upstream.stream.sse
@@ -1,4 +1,4 @@
-{"code":0,"msg":"","data":{"biz_code":0,"biz_msg":"","biz_data":{"id":"file-b10a2aca-39e9-4a38-be9d-9f22e398cb62","status":"PENDING","file_name":"history.txt","from_share":false,"file_size":732,"model_kind":"NORMAL","token_usage":null,"error_code":null,"inserted_at":1777485015.255,"updated_at":1777485015.255,"is_image":false,"audit_result":null}}}
+{"code":0,"msg":"","data":{"biz_code":0,"biz_msg":"","biz_data":{"id":"file-b10a2aca-39e9-4a38-be9d-9f22e398cb62","status":"PENDING","file_name":"DS2API_HISTORY.txt","from_share":false,"file_size":732,"model_kind":"NORMAL","token_usage":null,"error_code":null,"inserted_at":1777485015.255,"updated_at":1777485015.255,"is_image":false,"audit_result":null}}}
 event: ready
 data: {"request_message_id":1,"response_message_id":2,"model_type":"default"}

--- a/tests/raw_stream_samples/longtext-deepseek-v4-pro-20260429/upstream.stream.sse
+++ b/tests/raw_stream_samples/longtext-deepseek-v4-pro-20260429/upstream.stream.sse
@@ -1,4 +1,4 @@
-{"code":0,"msg":"","data":{"biz_code":0,"biz_msg":"","biz_data":{"id":"file-9c8ae986-75f7-4611-9956-5e1b502f3ec2","status":"SUCCESS","file_name":"history.txt","from_share":false,"file_size":732,"model_kind":"NORMAL","token_usage":145,"error_code":null,"inserted_at":1777485076.42,"updated_at":1777485076.42,"signed_path":"/file?file_id=9c8ae986-75f7-4611-9956-5e1b502f3ec2&state=a1REa2AdO8JmDuxMFiUTPJfpiyY4ie2weyUpYxfvEOrk5lxUCZifpRw9toZAEzn3DAjkgbR6blgZf41KLkHBKwwrcYTIjfxTRKijDqjEfguis03yddpuVrii6keG4%2BXIlcLAsyZG3qcGhfTGVZhsr%2BRl17J%2BcnT9roslhxBcEy4rthFJVMWUI%2BSHjuo2gLEUDfvMfULQ1gSLVGtr%2Fpq%2FcNPCPSxZapIQv04ZVmJLcdbzRkz%2Bb%2BxM5RWUIPujp%2B3ke1WDa3%2B6S4pP0Pv%2BAJ0MFUjQsloUwO4AsJ8YhGBFWg8Ehe1b2yt1N%2Fi%2BIjLRPt5xiNmALcJJXIY%3D","is_image":false,"audit_result":null}}}
+{"code":0,"msg":"","data":{"biz_code":0,"biz_msg":"","biz_data":{"id":"file-9c8ae986-75f7-4611-9956-5e1b502f3ec2","status":"SUCCESS","file_name":"DS2API_HISTORY.txt","from_share":false,"file_size":732,"model_kind":"NORMAL","token_usage":145,"error_code":null,"inserted_at":1777485076.42,"updated_at":1777485076.42,"signed_path":"/file?file_id=9c8ae986-75f7-4611-9956-5e1b502f3ec2&state=a1REa2AdO8JmDuxMFiUTPJfpiyY4ie2weyUpYxfvEOrk5lxUCZifpRw9toZAEzn3DAjkgbR6blgZf41KLkHBKwwrcYTIjfxTRKijDqjEfguis03yddpuVrii6keG4%2BXIlcLAsyZG3qcGhfTGVZhsr%2BRl17J%2BcnT9roslhxBcEy4rthFJVMWUI%2BSHjuo2gLEUDfvMfULQ1gSLVGtr%2Fpq%2FcNPCPSxZapIQv04ZVmJLcdbzRkz%2Bb%2BxM5RWUIPujp%2B3ke1WDa3%2B6S4pP0Pv%2BAJ0MFUjQsloUwO4AsJ8YhGBFWg8Ehe1b2yt1N%2Fi%2BIjLRPt5xiNmALcJJXIY%3D","is_image":false,"audit_result":null}}}
 event: ready
 data: {"request_message_id":1,"response_message_id":2,"model_type":"expert"}

--- a/vercel.json
+++ b/vercel.json
@@ -81,6 +81,22 @@
      "source": "/admin/version",
      "destination": "/api/index"
    },
+    {
+      "source": "/admin/chat-history(.*)",
+      "destination": "/api/index"
+    },
+    {
+      "source": "/admin/proxies(.*)",
+      "destination": "/api/index"
+    },
+    {
+      "source": "/admin/dev/raw-samples/(.*)",
+      "destination": "/api/index"
+    },
+    {
+      "source": "/admin/dev/captures(.*)",
+      "destination": "/api/index"
+    },
    {
      "source": "/admin",
      "destination": "/admin/index.html"
--- a/webui/src/features/chatHistory/ChatHistoryContainer.jsx
+++ b/webui/src/features/chatHistory/ChatHistoryContainer.jsx
@@ -16,7 +16,15 @@ const TOOL_MARKER = '<｜Tool｜>'
 const END_INSTRUCTIONS_MARKER = '<｜end▁of▁instructions｜>'
 const END_SENTENCE_MARKER = '<｜end▁of▁sentence｜>'
 const END_TOOL_RESULTS_MARKER = '<｜end▁of▁toolresults｜>'
-const CURRENT_INPUT_FILE_PROMPT = 'The current request and prior conversation context have already been provided. Answer the latest user request directly.'
+const CURRENT_INPUT_FILE_PROMPT = 'Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly.'
+const LEGACY_CURRENT_INPUT_FILE_PROMPTS = new Set([
+    'The current request and prior conversation context have already been provided. Answer the latest user request directly.',
+])
+
+function isCurrentInputFilePrompt(value) {
+    const text = String(value || '').trim()
+    return text === CURRENT_INPUT_FILE_PROMPT || LEGACY_CURRENT_INPUT_FILE_PROMPTS.has(text)
+}

 function formatDateTime(value, lang) {
    if (!value) return '-'
@@ -312,7 +320,7 @@ function buildListModeMessages(item, t) {

    const placeholderOnly = liveMessages.length === 1
        && String(liveMessages[0]?.role || '').trim().toLowerCase() === 'user'
-        && String(liveMessages[0]?.content || '').trim() === CURRENT_INPUT_FILE_PROMPT
+        && isCurrentInputFilePrompt(liveMessages[0]?.content)

    if (placeholderOnly) {
        return { messages: historyMessages, historyMerged: true }
--- a/webui/src/locales/en.json
+++ b/webui/src/locales/en.json
@@ -394,7 +394,7 @@
        "thinkingInjectionPromptHelp": "Leave empty to use the built-in default prompt shown as the input placeholder.",
        "currentInputFileTitle": "Independent Split",
        "currentInputFileEnabled": "Independent split (by size)",
-        "currentInputFileDesc": "Enabled by default. Once the character threshold is reached, upload the full context as a history.txt context file.",
+        "currentInputFileDesc": "Enabled by default. Once the character threshold is reached, upload the full context as a DS2API_HISTORY.txt context file.",
        "currentInputFileMinChars": "Current input threshold (characters)",
        "currentInputFileHelp": "Default is 0, which uses independent split for any non-empty input.",
        "compatibilityTitle": "Compatibility",
@@ -485,4 +485,4 @@
            "four": "Trigger a redeploy to apply the updated environment variables."
        }
    }
-}
+}
--- a/webui/src/locales/zh.json
+++ b/webui/src/locales/zh.json
@@ -394,7 +394,7 @@
        "thinkingInjectionPromptHelp": "留空时使用内置默认提示词；默认内容会显示在输入框占位文本中。",
        "currentInputFileTitle": "独立拆分",
        "currentInputFileEnabled": "独立拆分（按量）",
-        "currentInputFileDesc": "默认开启。达到字符阈值后，将完整上下文上传为 history.txt 上下文文件。",
+        "currentInputFileDesc": "默认开启。达到字符阈值后，将完整上下文上传为 DS2API_HISTORY.txt 上下文文件。",
        "currentInputFileMinChars": "当前输入阈值（字符数）",
        "currentInputFileHelp": "默认 0，表示只要有输入就会使用独立拆分。",
        "compatibilityTitle": "兼容性设置",
@@ -485,4 +485,4 @@
            "four": "触发重新部署以应用新的环境变量。"
        }
    }
-}
+}
Author	SHA1	Message	Date
CJACK	0bca6e2cee	feat: implement context cancellation handling for chat and response stream runtimes to ensure clean termination without retries	2026-05-01 23:20:46 +08:00
CJACK.	934b40e572	Merge pull request #392 from wyv202011y/fix/timeout-and-context-cancel fix: increase stream timeout constants for large-context models; guar…	2026-05-01 23:17:31 +08:00
CJACK	dd5a0c5213	refactor: update and standardize current input file continuation prompt instructions	2026-05-01 22:27:59 +08:00
CJACK	43402e7a26	refactor: rename history file constant from HISTORY.txt to DS2API_HISTORY.txt across codebase and tests	2026-05-01 22:05:45 +08:00
CJACK.	6373c001f5	Merge pull request #391 from BigUncle/fix/vercel-admin-history-rewrite Fix: add missing Vercel rewrite rules for admin API routes	2026-05-01 21:45:14 +08:00
BigUncle	3430322e81	docs: add Vercel chat history read-only filesystem troubleshooting	2026-05-01 21:17:52 +08:00
CJACK	df1cfac9bc	refactor: replace history transcript format with numbered sections and rename upload file to HISTORY.txt	2026-05-01 21:15:17 +08:00
王	706e68de23	fix: increase stream timeout constants for large-context models; guard against context-cancelled double-recording - Increase StreamIdleTimeout from 90s to 300s and MaxKeepaliveCount from 10 to 40 to prevent premature stream termination with DeepSeek V4 Pro (~50K token contexts) - Add r.Context().Err() check after ConsumeSSE in empty_retry_runtime (chat + responses) to prevent historySession.error() from overwriting historySession.stopped() when the request context is cancelled References: - MaxKeepaliveCount=10 creates a 50s no-content timeout that kills the stream before DeepSeek V4 Pro can produce its first token with large contexts - Hermes Agent reports 'No response from provider for 180s' because the underlying SSE connection was already terminated by ds2api at 50s - Context cancellation path: OnContextDone -> stopped(), then finalize() with empty output -> retry -> error() overwrites stopped()	2026-05-01 21:11:36 +08:00
BigUncle	83b4c7bcad	fix: add missing Vercel rewrite rules for admin API routes /admin/chat-history, /admin/proxies, /admin/dev/raw-samples, and /admin/dev/captures were falling through to the SPA fallback (/admin/index.html), causing "Unexpected token '<'" JSON parse errors on the frontend.	2026-05-01 20:50:12 +08:00
CJACK.	445c95a4f2	Merge pull request #379 from CJackHwang/dev Merge pull request #377 from CJackHwang/codex/run-all-tests-and-fix-failures Fix failing current-input token accounting test	2026-05-01 16:12:17 +08:00
CJACK	0a6ef8e3f2	fix: remove bufio.Scanner 2MiB line limit for SSE; support quasi_status direct patch Replace bufio.Scanner with bufio.NewReaderSize + ReadBytes('\n') across all SSE read paths to preserve long single-line data (e.g. write_file content). Add quasi_status and auto_continue handling as direct path-based patches in both Go continue observer and Node vercel_stream_impl, mirroring existing batch-patch logic. Add 2MiB+ line throughput tests at every SSE layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 15:45:17 +08:00
CJACK	fd0ec29991	refactor: generalize DSML tag parsing to tolerate model noise; split tiktoken by build tags Replace hardcoded DSML typo variant lists in Go/Node tool call parsers with generalized prefix consumption that tolerates repeated leading <, repeated DSML prefix noise, and trailing pipe terminators. Split tiktoken-dependent token counting into a build-tagged file for non-cgo platform compatibility. Add /data directory to Dockerfile for bind-mount permissions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 15:17:11 +08:00
CJACK	2671298439	fix: coalesce small stream deltas to prevent character swallowing; add read-tool cache guard Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 13:53:27 +08:00
CJACK	92e321fe2c	修复吞字问题	2026-05-01 01:31:48 +08:00
CJACK.	fca8c01397	Merge pull request #385 from ouqiting/fix_chat_history fix: content being overwritten and left empty	2026-05-01 00:21:33 +08:00
ouqiting	667e1e3710	fix: content being overwritten and left empty	2026-04-30 21:36:47 +08:00
CJACK.	94c1acace5	Merge pull request #369 from CJackHwang/dev Merge pull request #368 from CJackHwang/codex/fix-review-issues-for-pr-#364 Restore thinking fallback for tool-call detection and drop history.txt wrapper tags	2026-04-29 23:42:50 +08:00
@@ -1 +1 @@
 .2.0
 .2.1