Merge pull request #163 from CJackHwang/dev

docs: update API documentation, deployment guides, and README with new admin endpoints, compatibility notes, and build instructions
Merge pull request #156 from CJackHwang/dev
2026-05-03 16:05:26 +08:00 · 2026-03-29 19:50:40 +08:00 · 2026-03-22 21:40:03 +08:00 · 2026-03-22 16:51:17 +08:00 · 2026-03-22 13:43:26 +08:00 · 2026-03-22 11:05:54 +08:00
62 changed files with 605 additions and 643 deletions
--- a/.github/workflows/release-artifacts.yml
+++ b/.github/workflows/release-artifacts.yml
@@ -79,7 +79,7 @@ jobs:
            CGO_ENABLED=0 GOOS="${GOOS}" GOARCH="${GOARCH}" \
              go build -trimpath -ldflags="-s -w -X ds2api/internal/version.BuildVersion=${BUILD_VERSION}" -o "${STAGE}/${BIN}" ./cmd/ds2api

-            cp config.example.json .env.example internal/deepseek/assets/sha3_wasm_bg.7b9ca65ddd.wasm LICENSE README.MD README.en.md "${STAGE}/"
+            cp config.example.json .env.example sha3_wasm_bg.7b9ca65ddd.wasm LICENSE README.MD README.en.md "${STAGE}/"
            cp -R static/admin "${STAGE}/static/admin"

            if [ "${GOOS}" = "windows" ]; then
--- a/API.en.md
+++ b/API.en.md
@@ -629,25 +629,24 @@ Reads runtime settings and status, including:

 - `success`
 - `admin` (`has_password_hash`, `jwt_expire_hours`, `jwt_valid_after_unix`, `default_password_warning`)
- `runtime` (`account_max_inflight`, `account_max_queue`, `global_max_inflight`, `token_refresh_interval_hours`)
- `responses` / `embeddings`
+- `runtime` (`account_max_inflight`, `account_max_queue`, `global_max_inflight`)
+- `toolcall` / `responses` / `embeddings`
 - `auto_delete` (`sessions`)
 - `claude_mapping` / `model_aliases`
 - `env_backed`, `needs_vercel_sync`
- `toolcall` policy is fixed to `feature_match + high` and is no longer returned or editable via settings

 ### `PUT /admin/settings`

 Hot-updates runtime settings. Supported fields:

 - `admin.jwt_expire_hours`
- `runtime.account_max_inflight` / `runtime.account_max_queue` / `runtime.global_max_inflight` / `runtime.token_refresh_interval_hours`
+- `runtime.account_max_inflight` / `runtime.account_max_queue` / `runtime.global_max_inflight`
+- `toolcall.mode` / `toolcall.early_emit_confidence`
 - `responses.store_ttl_seconds`
 - `embeddings.provider`
 - `auto_delete.sessions`
 - `claude_mapping`
 - `model_aliases`
- `toolcall` policy is fixed and is no longer writable through settings

 ### `POST /admin/settings/password`

@@ -670,7 +669,7 @@ Imports full config with:

 The request can send config directly, or wrapped as `{"config": {...}, "mode":"merge"}`.
 Query params `?mode=merge` / `?mode=replace` are also supported.
-Import accepts `keys`, `accounts`, `claude_mapping` / `claude_model_mapping`, `model_aliases`, `admin`, `runtime`, `responses`, `embeddings`, and `auto_delete`; legacy `toolcall` fields are ignored.
+Import accepts `keys`, `accounts`, `claude_mapping` / `claude_model_mapping`, `model_aliases`, `admin`, `runtime`, `toolcall`, `responses`, `embeddings`, and `auto_delete`.

 ### `GET /admin/config/export`

--- a/API.md
+++ b/API.md
@@ -638,25 +638,24 @@ data: {"type":"message_stop"}

 - `success`
 - `admin`（`has_password_hash`、`jwt_expire_hours`、`jwt_valid_after_unix`、`default_password_warning`）
- `runtime`（`account_max_inflight`、`account_max_queue`、`global_max_inflight`、`token_refresh_interval_hours`）
- `responses` / `embeddings`
+- `runtime`（`account_max_inflight`、`account_max_queue`、`global_max_inflight`）
+- `toolcall` / `responses` / `embeddings`
 - `auto_delete`（`sessions`）
 - `claude_mapping` / `model_aliases`
 - `env_backed`、`needs_vercel_sync`
- `toolcall` 策略已固定为 `feature_match + high`，不再通过 settings 返回或修改

 ### `PUT /admin/settings`

 热更新运行时设置。支持更新：

 - `admin.jwt_expire_hours`
- `runtime.account_max_inflight` / `runtime.account_max_queue` / `runtime.global_max_inflight` / `runtime.token_refresh_interval_hours`
+- `runtime.account_max_inflight` / `runtime.account_max_queue` / `runtime.global_max_inflight`
+- `toolcall.mode` / `toolcall.early_emit_confidence`
 - `responses.store_ttl_seconds`
 - `embeddings.provider`
 - `auto_delete.sessions`
 - `claude_mapping`
 - `model_aliases`
- `toolcall` 策略已固定，不再作为可写入字段

 ### `POST /admin/settings/password`

@@ -679,7 +678,7 @@ data: {"type":"message_stop"}

 请求可直接传配置对象，或使用 `{"config": {...}, "mode":"merge"}` 包裹格式。
 也支持在查询参数里传 `?mode=merge` / `?mode=replace`。
-导入时会接受 `keys`、`accounts`、`claude_mapping` / `claude_model_mapping`、`model_aliases`、`admin`、`runtime`、`responses`、`embeddings`、`auto_delete` 等字段；`toolcall` 相关字段会被忽略。
+导入时会接受 `keys`、`accounts`、`claude_mapping` / `claude_model_mapping`、`model_aliases`、`admin`、`runtime`、`toolcall`、`responses`、`embeddings`、`auto_delete` 等字段。

 ### `GET /admin/config/export`

--- a/docs/CONTRIBUTING.en.md
+++ b/docs/CONTRIBUTING.en.md
--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
--- a/docs/DEPLOY.en.md
+++ b/docs/DEPLOY.en.md
@@ -456,8 +456,8 @@ server {
 # Copy compiled binary and related files to target directory
 sudo mkdir -p /opt/ds2api
 sudo cp ds2api config.json /opt/ds2api/
-# Optional: if you want to use an external WASM file (override the embedded one, from a release package or build output)
-# sudo cp /path/to/sha3_wasm_bg.7b9ca65ddd.wasm /opt/ds2api/
+# Optional: if you want to use an external WASM file (override embedded one)
+# sudo cp sha3_wasm_bg.7b9ca65ddd.wasm /opt/ds2api/
 sudo cp -r static/admin /opt/ds2api/static/admin
 ```

--- a/docs/DEPLOY.md
+++ b/docs/DEPLOY.md
@@ -456,8 +456,8 @@ server {
 # 将编译好的二进制文件和相关文件复制到目标目录
 sudo mkdir -p /opt/ds2api
 sudo cp ds2api config.json /opt/ds2api/
-# 可选：若你希望使用外置 WASM 文件（覆盖内置版本，来自 release 包或构建产物）
-# sudo cp /path/to/sha3_wasm_bg.7b9ca65ddd.wasm /opt/ds2api/
+# 可选：若你希望使用外置 WASM 文件（覆盖内置版本）
+# sudo cp sha3_wasm_bg.7b9ca65ddd.wasm /opt/ds2api/
 sudo cp -r static/admin /opt/ds2api/static/admin
 ```

--- a/2
+++ b/2
@@ -34,7 +34,7 @@ CMD ["/usr/local/bin/ds2api"]

 FROM runtime-base AS runtime-from-source
 COPY --from=go-builder /out/ds2api /usr/local/bin/ds2api
-COPY --from=go-builder /app/internal/deepseek/assets/sha3_wasm_bg.7b9ca65ddd.wasm /app/sha3_wasm_bg.7b9ca65ddd.wasm
+COPY --from=go-builder /app/sha3_wasm_bg.7b9ca65ddd.wasm /app/sha3_wasm_bg.7b9ca65ddd.wasm
 COPY --from=go-builder /app/config.example.json /app/config.example.json
 COPY --from=webui-builder /app/static/admin /app/static/admin

--- a/README.MD
+++ b/README.MD
@@ -8,7 +8,7 @@
 ![Stars](https://img.shields.io/github/stars/CJackHwang/ds2api.svg)
 ![Forks](https://img.shields.io/github/forks/CJackHwang/ds2api.svg)
 [![Release](https://img.shields.io/github/v/release/CJackHwang/ds2api?display_name=tag)](https://github.com/CJackHwang/ds2api/releases)
-[![Docker](https://img.shields.io/badge/docker-ready-blue.svg)](docs/DEPLOY.md)
+[![Docker](https://img.shields.io/badge/docker-ready-blue.svg)](DEPLOY.md)
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/templates/L4CFHP)
 [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/CJackHwang/ds2api)

@@ -213,7 +213,7 @@ base64 < config.json | tr -d '\n'

 > **流式说明**：`/v1/chat/completions` 在 Vercel 上默认走 `api/chat-stream.js`（Node Runtime）以保证实时 SSE。鉴权、账号选择、会话/PoW 准备仍由 Go 内部 prepare 接口完成；流式响应（含 `tools`）在 Node 侧执行与 Go 对齐的输出组装与防泄漏处理。

-详细部署说明请参阅 [部署指南](docs/DEPLOY.md)。
+详细部署说明请参阅 [部署指南](DEPLOY.md)。

 ### 方式四：下载 Release 构建包

@@ -270,6 +270,10 @@ cp opencode.json.example opencode.json
  "compat": {
    "wide_input_strict_output": true
  },
+  "toolcall": {
+    "mode": "feature_match",
+    "early_emit_confidence": "high"
+  },
  "responses": {
    "store_ttl_seconds": 900
  },
@@ -286,8 +290,7 @@ cp opencode.json.example opencode.json
  "runtime": {
    "account_max_inflight": 2,
    "account_max_queue": 0,
-    "global_max_inflight": 0,
-    "token_refresh_interval_hours": 6
+    "global_max_inflight": 0
  },
  "auto_delete": {
    "sessions": false
@@ -300,12 +303,12 @@ cp opencode.json.example opencode.json
 - `token`：配置文件中即使填写也会在加载时被清空（不会从 `config.json` 读取 token）；实际 token 仅在运行时内存中维护并自动刷新
 - `model_aliases`：常见模型名（如 GPT/Codex/Claude）到 DeepSeek 模型的映射
 - `compat.wide_input_strict_output`：建议保持 `true`（当前实现默认宽进严出）
- `toolcall`：策略已固定为特征匹配 + 高置信早发，不再作为可配置项
+- `toolcall`：固定采用特征匹配 + 高置信早发策略
 - `responses.store_ttl_seconds`：`/v1/responses/{id}` 的内存缓存 TTL
 - `embeddings.provider`：embedding 提供方（当前内置 `deterministic/mock/builtin`）
 - `claude_mapping`：字典中 `fast`/`slow` 后缀映射到对应 DeepSeek 模型（兼容读取 `claude_model_mapping`）
 - `admin`：管理后台设置（JWT 过期时间、密码哈希等），可通过 Admin Settings API 热更新
- `runtime`：运行时参数（并发限制、队列大小、托管账号 token 刷新间隔），可通过 Admin Settings API 热更新；`account_max_queue=0`/`global_max_inflight=0` 表示按推荐值自动计算，`token_refresh_interval_hours=6` 为默认强制重登间隔
+- `runtime`：运行时参数（并发限制、队列大小），可通过 Admin Settings API 热更新；`account_max_queue=0`/`global_max_inflight=0` 表示按推荐值自动计算
 - `auto_delete.sessions`：是否在请求结束后自动清理 DeepSeek 会话（默认 `false`，可在 Settings 热更新）

 ### 环境变量
@@ -447,7 +450,6 @@ ds2api/
 ├── tests/
 │   ├── compat/              # 兼容性测试夹具与期望输出
 │   └── scripts/             # 统一测试脚本入口（unit/e2e）
-├── docs/                    # 部署 / 贡献 / 测试等辅助文档
 ├── static/admin/            # WebUI 构建产物（不提交到 Git）
 ├── .github/
 │   ├── workflows/           # GitHub Actions（质量门禁 + Release 自动构建）
@@ -467,9 +469,9 @@ ds2api/
 | 文档 | 说明 |
 | --- | --- |
 | [API.md](API.md) / [API.en.md](API.en.md) | API 接口文档（含请求/响应示例） |
-| [DEPLOY.md](docs/DEPLOY.md) / [DEPLOY.en.md](docs/DEPLOY.en.md) | 部署指南（本地/Docker/Vercel/systemd） |
-| [CONTRIBUTING.md](docs/CONTRIBUTING.md) / [CONTRIBUTING.en.md](docs/CONTRIBUTING.en.md) | 贡献指南 |
-| [TESTING.md](docs/TESTING.md) | 测试集使用指南 |
+| [DEPLOY.md](DEPLOY.md) / [DEPLOY.en.md](DEPLOY.en.md) | 部署指南（本地/Docker/Vercel/systemd） |
+| [CONTRIBUTING.md](CONTRIBUTING.md) / [CONTRIBUTING.en.md](CONTRIBUTING.en.md) | 贡献指南 |
+| [TESTING.md](TESTING.md) | 测试集使用指南 |

 ## 测试

@@ -499,7 +501,7 @@ npm ci --prefix webui && npm run build --prefix webui

 ## 测试

-详细测试指南请参阅 [docs/TESTING.md](docs/TESTING.md)。
+详细测试指南请参阅 [TESTING.md](TESTING.md)。

 ### 快速测试命令

--- a/README.en.md
+++ b/README.en.md
@@ -8,7 +8,7 @@
 ![Stars](https://img.shields.io/github/stars/CJackHwang/ds2api.svg)
 ![Forks](https://img.shields.io/github/forks/CJackHwang/ds2api.svg)
 [![Release](https://img.shields.io/github/v/release/CJackHwang/ds2api?display_name=tag)](https://github.com/CJackHwang/ds2api/releases)
-[![Docker](https://img.shields.io/badge/docker-ready-blue.svg)](docs/DEPLOY.en.md)
+[![Docker](https://img.shields.io/badge/docker-ready-blue.svg)](DEPLOY.en.md)
 [![Deploy on Zeabur](https://zeabur.com/button.svg)](https://zeabur.com/templates/L4CFHP)
 [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/CJackHwang/ds2api)

@@ -213,7 +213,7 @@ base64 < config.json | tr -d '\n'

 > **Streaming note**: `/v1/chat/completions` on Vercel is routed to `api/chat-stream.js` (Node Runtime) for real-time SSE. Auth, account selection, and session/PoW preparation are still handled by the Go internal prepare endpoint; streaming output (including `tools`) is assembled on Node with Go-aligned anti-leak handling.

-For detailed deployment instructions, see the [Deployment Guide](docs/DEPLOY.en.md).
+For detailed deployment instructions, see the [Deployment Guide](DEPLOY.en.md).

 ### Option 4: Download Release Binaries

@@ -270,6 +270,10 @@ cp opencode.json.example opencode.json
  "compat": {
    "wide_input_strict_output": true
  },
+  "toolcall": {
+    "mode": "feature_match",
+    "early_emit_confidence": "high"
+  },
  "responses": {
    "store_ttl_seconds": 900
  },
@@ -286,8 +290,7 @@ cp opencode.json.example opencode.json
  "runtime": {
    "account_max_inflight": 2,
    "account_max_queue": 0,
-    "global_max_inflight": 0,
-    "token_refresh_interval_hours": 6
+    "global_max_inflight": 0
  },
  "auto_delete": {
    "sessions": false
@@ -300,12 +303,12 @@ cp opencode.json.example opencode.json
 - `token`: Even if set in `config.json`, it is cleared during load (DS2API does not read persisted tokens from config); runtime tokens are maintained/refreshed in memory only
 - `model_aliases`: Map common model names (GPT/Codex/Claude) to DeepSeek models
 - `compat.wide_input_strict_output`: Keep `true` (current default policy)
- `toolcall`: Fixed to feature matching + high-confidence early emit, no longer configurable
+- `toolcall`: Fixed to feature matching + high-confidence early emit
 - `responses.store_ttl_seconds`: In-memory TTL for `/v1/responses/{id}`
 - `embeddings.provider`: Embeddings provider (`deterministic/mock/builtin` built-in)
 - `claude_mapping`: Maps `fast`/`slow` suffixes to corresponding DeepSeek models (still compatible with `claude_model_mapping`)
 - `admin`: Admin panel settings (JWT expiry, password hash, etc.), hot-reloadable via Admin Settings API
- `runtime`: Runtime parameters (concurrency limits, queue sizes, managed token refresh interval), hot-reloadable via Admin Settings API; `account_max_queue=0`/`global_max_inflight=0` means auto-calculate from recommended values, `token_refresh_interval_hours=6` is the default forced re-login interval
+- `runtime`: Runtime parameters (concurrency limits, queue sizes), hot-reloadable via Admin Settings API; `account_max_queue=0`/`global_max_inflight=0` means auto-calculate from recommended values
 - `auto_delete.sessions`: Whether to auto-delete DeepSeek sessions after request completion (default `false`, hot-reloadable via Settings)

 ### Environment Variables
@@ -441,7 +444,6 @@ ds2api/
 ├── tests/
 │   ├── compat/              # Compatibility fixtures and expected outputs
 │   └── scripts/             # Unified test script entrypoints (unit/e2e)
-├── docs/                    # Deployment / contributing / testing docs
 ├── static/admin/            # WebUI build output (not committed to Git)
 ├── .github/
 │   ├── workflows/           # GitHub Actions (quality gates + release automation)
@@ -461,9 +463,9 @@ ds2api/
 | Document | Description |
 | --- | --- |
 | [API.md](API.md) / [API.en.md](API.en.md) | API reference with request/response examples |
-| [DEPLOY.md](docs/DEPLOY.md) / [DEPLOY.en.md](docs/DEPLOY.en.md) | Deployment guide (local/Docker/Vercel/systemd) |
-| [CONTRIBUTING.md](docs/CONTRIBUTING.md) / [CONTRIBUTING.en.md](docs/CONTRIBUTING.en.md) | Contributing guide |
-| [TESTING.md](docs/TESTING.md) | Testsuite guide |
+| [DEPLOY.md](DEPLOY.md) / [DEPLOY.en.md](DEPLOY.en.md) | Deployment guide (local/Docker/Vercel/systemd) |
+| [CONTRIBUTING.md](CONTRIBUTING.md) / [CONTRIBUTING.en.md](CONTRIBUTING.en.md) | Contributing guide |
+| [TESTING.md](TESTING.md) | Testsuite guide |

 ## Testing

--- a/docs/TESTING.md
+++ b/docs/TESTING.md
--- a/2
+++ b/2
@@ -1 +1 @@
-2.5.1
+2.4.1
--- a/internal/adapter/claude/handler_util_test.go
+++ b/internal/adapter/claude/handler_util_test.go
@@ -93,11 +93,8 @@ func TestNormalizeClaudeMessagesToolUseToAssistantToolCalls(t *testing.T) {
 		t.Fatalf("expected call id preserved, got %#v", call)
 	}
 	content, _ := m["content"].(string)
-	if !containsStr(content, "<tool_calls>") || !containsStr(content, "<tool_name>search_web</tool_name>") {
-		t.Fatalf("expected assistant content to include XML tool call history, got %q", content)
-	}
-	if !containsStr(content, `<parameters>{"query":"latest"}</parameters>`) {
-		t.Fatalf("expected assistant content to include serialized parameters, got %q", content)
+	if !containsStr(content, "search_web") || !containsStr(content, `"arguments":"{\"query\":\"latest\"}"`) {
+		t.Fatalf("expected assistant content to include serialized tool call for prompt roundtrip, got %q", content)
 	}
 }

@@ -254,6 +251,9 @@ func TestBuildClaudeToolPromptSingleTool(t *testing.T) {
 	if !containsStr(prompt, "<tool_calls>") {
 		t.Fatalf("expected XML tool_calls format in prompt")
 	}
+	if containsStr(prompt, "TOOL_CALL_HISTORY") || containsStr(prompt, "TOOL_RESULT_HISTORY") {
+		t.Fatalf("expected legacy tool history markers removed from prompt")
+	}
 	if !containsStr(prompt, "TOOL CALL FORMAT") {
 		t.Fatalf("expected tool call format header in prompt")
 	}
--- a/internal/adapter/claude/handler_utils.go
+++ b/internal/adapter/claude/handler_utils.go
@@ -5,7 +5,6 @@ import (
 	"fmt"
 	"strings"

-	"ds2api/internal/prompt"
 	"ds2api/internal/util"
 )

@@ -154,7 +153,7 @@ func normalizeClaudeToolUseToAssistant(block map[string]any) map[string]any {
 	}
 	return map[string]any{
 		"role":       "assistant",
-		"content":    prompt.FormatToolCallsForPrompt(toolCalls),
+		"content":    marshalCompactJSON(toolCalls),
 		"tool_calls": toolCalls,
 	}
 }
--- a/internal/adapter/openai/chat_stream_runtime.go
+++ b/internal/adapter/openai/chat_stream_runtime.go
@@ -97,7 +97,7 @@ func (s *chatStreamRuntime) sendDone() {

 func (s *chatStreamRuntime) finalize(finishReason string) {
 	finalThinking := s.thinking.String()
-	finalText := sanitizeLeakedOutput(s.text.String())
+	finalText := sanitizeLeakedToolHistory(s.text.String())
 	detected := util.ParseStandaloneToolCallsDetailed(finalText, s.toolNames)
 	if len(detected.Calls) > 0 && !s.toolCallsDoneEmitted {
 		finishReason = "tool_calls"
@@ -141,7 +141,7 @@ func (s *chatStreamRuntime) finalize(finishReason string) {
 			if evt.Content == "" {
 				continue
 			}
-			cleaned := sanitizeLeakedOutput(evt.Content)
+			cleaned := sanitizeLeakedToolHistory(evt.Content)
 			if cleaned == "" {
 				continue
 			}
@@ -250,7 +250,7 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 						continue
 					}
 					if evt.Content != "" {
-						cleaned := sanitizeLeakedOutput(evt.Content)
+						cleaned := sanitizeLeakedToolHistory(evt.Content)
 						if cleaned == "" {
 							continue
 						}
--- a/internal/adapter/openai/handler_chat.go
+++ b/internal/adapter/openai/handler_chat.go
@@ -105,7 +105,7 @@ func (h *Handler) handleNonStream(w http.ResponseWriter, ctx context.Context, re
 	result := sse.CollectStream(resp, thinkingEnabled, true)

 	finalThinking := result.Thinking
-	finalText := sanitizeLeakedOutput(result.Text)
+	finalText := sanitizeLeakedToolHistory(result.Text)
 	respBody := openaifmt.BuildChatCompletion(completionID, model, finalPrompt, finalThinking, finalText, toolNames)
 	writeJSON(w, http.StatusOK, respBody)
 }
--- a/internal/adapter/openai/handler_toolcall_policy.go
+++ b/internal/adapter/openai/handler_toolcall_policy.go
@@ -1,9 +1,19 @@
 package openai

+import "strings"
+
 func (h *Handler) toolcallFeatureMatchEnabled() bool {
-	return true
+	if h == nil || h.Store == nil {
+		return true
+	}
+	mode := strings.TrimSpace(strings.ToLower(h.Store.ToolcallMode()))
+	return mode == "" || mode == "feature_match"
 }

 func (h *Handler) toolcallEarlyEmitHighConfidence() bool {
-	return true
+	if h == nil || h.Store == nil {
+		return true
+	}
+	level := strings.TrimSpace(strings.ToLower(h.Store.ToolcallEarlyEmitConfidence()))
+	return level == "" || level == "high"
 }
--- a/internal/adapter/openai/leaked_output_sanitize.go
+++ b/internal/adapter/openai/leaked_output_sanitize.go
@@ -1,54 +0,0 @@
-package openai
-
-import (
-	"regexp"
-)
-
-var emptyJSONFencePattern = regexp.MustCompile("(?is)```json\\s*```")
-var leakedToolCallArrayPattern = regexp.MustCompile(`(?is)\[\{\s*"function"\s*:\s*\{[\s\S]*?\}\s*,\s*"id"\s*:\s*"call[^"]*"\s*,\s*"type"\s*:\s*"function"\s*}\]`)
-var leakedToolResultBlobPattern = regexp.MustCompile(`(?is)<\s*\|\s*tool\s*\|\s*>\s*\{[\s\S]*?"tool_call_id"\s*:\s*"call[^"]*"\s*}`)
-
-// leakedMetaMarkerPattern matches DeepSeek special tokens in BOTH forms:
-//   - ASCII underscore: <｜end_of_sentence｜>
-//   - U+2581 variant:   <｜end▁of▁sentence｜>  (used in some DeepSeek outputs)
-var leakedMetaMarkerPattern = regexp.MustCompile(`(?i)<[｜\|]\s*(?:assistant|tool|end[_▁]of[_▁]sentence|end[_▁]of[_▁]thinking)\s*[｜\|]>`)
-
-// leakedAgentXMLBlockPatterns catch agent-style XML blocks that leak through
-// when the sieve fails to capture them. These are applied only to complete
-// wrapper blocks so standalone "<result>" examples in normal output remain
-// untouched.
-var leakedAgentXMLBlockPatterns = []*regexp.Regexp{
-	regexp.MustCompile(`(?is)<attempt_completion\b[^>]*>(.*?)</attempt_completion>`),
-	regexp.MustCompile(`(?is)<ask_followup_question\b[^>]*>(.*?)</ask_followup_question>`),
-	regexp.MustCompile(`(?is)<new_task\b[^>]*>(.*?)</new_task>`),
-}
-
-var leakedAgentResultTagPattern = regexp.MustCompile(`(?is)</?result>`)
-
-func sanitizeLeakedOutput(text string) string {
-	if text == "" {
-		return text
-	}
-	out := emptyJSONFencePattern.ReplaceAllString(text, "")
-	out = leakedToolCallArrayPattern.ReplaceAllString(out, "")
-	out = leakedToolResultBlobPattern.ReplaceAllString(out, "")
-	out = leakedMetaMarkerPattern.ReplaceAllString(out, "")
-	out = sanitizeLeakedAgentXMLBlocks(out)
-	return out
-}
-
-func sanitizeLeakedAgentXMLBlocks(text string) string {
-	out := text
-	for _, pattern := range leakedAgentXMLBlockPatterns {
-		out = pattern.ReplaceAllStringFunc(out, func(match string) string {
-			submatches := pattern.FindStringSubmatch(match)
-			if len(submatches) < 2 {
-				return match
-			}
-			// Preserve the inner text so leaked agent instructions do not erase
-			// the actual answer, but strip the wrapper/result markup itself.
-			return leakedAgentResultTagPattern.ReplaceAllString(submatches[1], "")
-		})
-	}
-	return out
-}
--- a/internal/adapter/openai/leaked_output_sanitize_test.go
+++ b/internal/adapter/openai/leaked_output_sanitize_test.go
@@ -1,43 +0,0 @@
-package openai
-
-import "testing"
-
-func TestSanitizeLeakedOutputRemovesEmptyJSONFence(t *testing.T) {
-	raw := "before\n```json\n```\nafter"
-	got := sanitizeLeakedOutput(raw)
-	if got != "before\n\nafter" {
-		t.Fatalf("unexpected sanitized empty json fence: %q", got)
-	}
-}
-
-func TestSanitizeLeakedOutputRemovesLeakedWireToolCallAndResult(t *testing.T) {
-	raw := "开始\n[{\"function\":{\"arguments\":\"{\\\"command\\\":\\\"java -version\\\"}\",\"name\":\"exec\"},\"id\":\"callb9a321\",\"type\":\"function\"}]< | Tool | >{\"content\":\"openjdk version 21\",\"tool_call_id\":\"callb9a321\"}\n结束"
-	got := sanitizeLeakedOutput(raw)
-	if got != "开始\n\n结束" {
-		t.Fatalf("unexpected sanitize result for leaked wire format: %q", got)
-	}
-}
-
-func TestSanitizeLeakedOutputRemovesStandaloneMetaMarkers(t *testing.T) {
-	raw := "A<| end_of_sentence |><| Assistant |>B<| end_of_thinking |>C<｜end▁of▁thinking｜>D<｜end▁of▁sentence｜>E"
-	got := sanitizeLeakedOutput(raw)
-	if got != "ABCDE" {
-		t.Fatalf("unexpected sanitize result for meta markers: %q", got)
-	}
-}
-
-func TestSanitizeLeakedOutputRemovesAgentXMLLeaks(t *testing.T) {
-	raw := "Done.<attempt_completion><result>Some final answer</result></attempt_completion>"
-	got := sanitizeLeakedOutput(raw)
-	if got != "Done.Some final answer" {
-		t.Fatalf("unexpected sanitize result for agent XML leak: %q", got)
-	}
-}
-
-func TestSanitizeLeakedOutputPreservesStandaloneResultTags(t *testing.T) {
-	raw := "Example XML: <result>value</result>"
-	got := sanitizeLeakedOutput(raw)
-	if got != raw {
-		t.Fatalf("unexpected sanitize result for standalone result tag: %q", got)
-	}
-}
--- a/internal/adapter/openai/message_normalize.go
+++ b/internal/adapter/openai/message_normalize.go
@@ -1,6 +1,7 @@
 package openai

 import (
+	"encoding/json"
 	"strings"

 	"ds2api/internal/prompt"
@@ -54,18 +55,7 @@ func normalizeOpenAIMessagesForPrompt(raw []any, traceID string) []map[string]an
 }

 func buildAssistantContentForPrompt(msg map[string]any) string {
-	content := strings.TrimSpace(normalizeOpenAIContentForPrompt(msg["content"]))
-	toolHistory := prompt.FormatToolCallsForPrompt(msg["tool_calls"])
-	switch {
-	case content == "" && toolHistory == "":
-		return ""
-	case content == "":
-		return toolHistory
-	case toolHistory == "":
-		return content
-	default:
-		return content + "\n\n" + toolHistory
-	}
+	return strings.TrimSpace(normalizeOpenAIContentForPrompt(msg["content"]))
 }

 func buildToolContentForPrompt(msg map[string]any) string {
@@ -80,6 +70,18 @@ func normalizeOpenAIContentForPrompt(v any) string {
 	return prompt.NormalizeContent(v)
 }

+func normalizeToolArgumentString(raw string) string {
+	trimmed := strings.TrimSpace(raw)
+	if trimmed == "" {
+		return ""
+	}
+	if looksLikeConcatenatedJSON(trimmed) {
+		// Keep original payload to avoid silent argument rewrites.
+		return raw
+	}
+	return trimmed
+}
+
 func normalizeOpenAIRoleForPrompt(role string) string {
 	role = strings.ToLower(strings.TrimSpace(role))
 	if role == "developer" {
@@ -94,3 +96,20 @@ func asString(v any) string {
 	}
 	return ""
 }
+
+func looksLikeConcatenatedJSON(raw string) bool {
+	trimmed := strings.TrimSpace(raw)
+	if trimmed == "" {
+		return false
+	}
+	if strings.Contains(trimmed, "}{") || strings.Contains(trimmed, "][") {
+		return true
+	}
+	dec := json.NewDecoder(strings.NewReader(trimmed))
+	var first any
+	if err := dec.Decode(&first); err != nil {
+		return false
+	}
+	var second any
+	return dec.Decode(&second) == nil
+}
--- a/internal/adapter/openai/message_normalize_test.go
+++ b/internal/adapter/openai/message_normalize_test.go
@@ -34,23 +34,20 @@ func TestNormalizeOpenAIMessagesForPrompt_AssistantToolCallsAndToolResult(t *tes
 	}

 	normalized := normalizeOpenAIMessagesForPrompt(raw, "")
-	if len(normalized) != 4 {
-		t.Fatalf("expected 4 normalized messages with assistant tool history preserved, got %d", len(normalized))
+	if len(normalized) != 3 {
+		t.Fatalf("expected 3 normalized messages with tool-call-only assistant turn omitted, got %d", len(normalized))
 	}
-	assistantContent, _ := normalized[2]["content"].(string)
-	if !strings.Contains(assistantContent, "<tool_calls>") {
-		t.Fatalf("assistant tool history should be preserved in XML form, got %q", assistantContent)
+	toolContent, _ := normalized[2]["content"].(string)
+	if !strings.Contains(toolContent, `"temp":18`) {
+		t.Fatalf("tool result should be transparently forwarded, got %q", toolContent)
 	}
-	if !strings.Contains(assistantContent, "<tool_name>get_weather</tool_name>") {
-		t.Fatalf("expected tool name in preserved history, got %q", assistantContent)
-	}
-	if !strings.Contains(normalized[3]["content"].(string), `"temp":18`) {
-		t.Fatalf("tool result should be transparently forwarded, got %#v", normalized[3]["content"])
+	if strings.Contains(toolContent, "[TOOL_RESULT_HISTORY]") {
+		t.Fatalf("tool history marker should not be injected: %q", toolContent)
 	}

 	prompt := util.MessagesPrepare(normalized)
-	if !strings.Contains(prompt, "<tool_calls>") {
-		t.Fatalf("expected preserved assistant tool history in prompt: %q", prompt)
+	if strings.Contains(prompt, "[TOOL_CALL_HISTORY]") || strings.Contains(prompt, "[TOOL_RESULT_HISTORY]") {
+		t.Fatalf("expected no synthetic history markers in prompt: %q", prompt)
 	}
 }

@@ -173,15 +170,8 @@ func TestNormalizeOpenAIMessagesForPrompt_AssistantMultipleToolCallsRemainSepara
 	}

 	normalized := normalizeOpenAIMessagesForPrompt(raw, "")
-	if len(normalized) != 1 {
-		t.Fatalf("expected assistant tool_call-only message preserved, got %#v", normalized)
-	}
-	content, _ := normalized[0]["content"].(string)
-	if strings.Count(content, "<tool_call>") != 2 {
-		t.Fatalf("expected two preserved tool call blocks, got %q", content)
-	}
-	if !strings.Contains(content, "<tool_name>search_web</tool_name>") || !strings.Contains(content, "<tool_name>eval_javascript</tool_name>") {
-		t.Fatalf("expected both tool names in preserved history, got %q", content)
+	if len(normalized) != 0 {
+		t.Fatalf("expected assistant tool_call-only message omitted, got %#v", normalized)
 	}
 }

@@ -202,12 +192,8 @@ func TestNormalizeOpenAIMessagesForPrompt_PreservesConcatenatedToolArguments(t *
 	}

 	normalized := normalizeOpenAIMessagesForPrompt(raw, "")
-	if len(normalized) != 1 {
-		t.Fatalf("expected assistant tool_call-only content preserved, got %#v", normalized)
-	}
-	content, _ := normalized[0]["content"].(string)
-	if !strings.Contains(content, `{}{"query":"测试工具调用"}`) {
-		t.Fatalf("expected concatenated tool arguments preserved, got %q", content)
+	if len(normalized) != 0 {
+		t.Fatalf("expected assistant tool_call-only content omitted, got %#v", normalized)
 	}
 }

@@ -229,7 +215,7 @@ func TestNormalizeOpenAIMessagesForPrompt_AssistantToolCallsMissingNameAreDroppe

 	normalized := normalizeOpenAIMessagesForPrompt(raw, "")
 	if len(normalized) != 0 {
-		t.Fatalf("expected assistant tool_calls without text to be dropped when name is missing, got %#v", normalized)
+		t.Fatalf("expected assistant tool_calls without text omitted, got %#v", normalized)
 	}
 }

@@ -251,15 +237,8 @@ func TestNormalizeOpenAIMessagesForPrompt_AssistantNilContentDoesNotInjectNullLi
 	}

 	normalized := normalizeOpenAIMessagesForPrompt(raw, "")
-	if len(normalized) != 1 {
-		t.Fatalf("expected nil-content assistant tool_call-only message preserved, got %#v", normalized)
-	}
-	content, _ := normalized[0]["content"].(string)
-	if strings.Contains(content, "null") {
-		t.Fatalf("expected no null literal injection, got %q", content)
-	}
-	if !strings.Contains(content, "<tool_calls>") {
-		t.Fatalf("expected assistant tool history in normalized content, got %q", content)
+	if len(normalized) != 0 {
+		t.Fatalf("expected nil-content assistant tool_call-only message omitted, got %#v", normalized)
 	}
 }

--- a/internal/adapter/openai/prompt_build_test.go
+++ b/internal/adapter/openai/prompt_build_test.go
@@ -47,11 +47,8 @@ func TestBuildOpenAIFinalPrompt_HandlerPathIncludesToolRoundtripSemantics(t *tes
 	if !strings.Contains(finalPrompt, `"condition":"sunny"`) {
 		t.Fatalf("handler finalPrompt should preserve tool output content: %q", finalPrompt)
 	}
-	if !strings.Contains(finalPrompt, "<tool_calls>") {
-		t.Fatalf("handler finalPrompt should preserve assistant tool history: %q", finalPrompt)
-	}
-	if !strings.Contains(finalPrompt, "<tool_name>get_weather</tool_name>") {
-		t.Fatalf("handler finalPrompt should include tool name history: %q", finalPrompt)
+	if strings.Contains(finalPrompt, "[TOOL_CALL_HISTORY]") || strings.Contains(finalPrompt, "[TOOL_RESULT_HISTORY]") {
+		t.Fatalf("handler finalPrompt should not include synthetic history markers: %q", finalPrompt)
 	}
 }

--- a/internal/adapter/openai/responses_handler.go
+++ b/internal/adapter/openai/responses_handler.go
@@ -113,7 +113,7 @@ func (h *Handler) handleResponsesNonStream(w http.ResponseWriter, resp *http.Res
 		return
 	}
 	result := sse.CollectStream(resp, thinkingEnabled, true)
-	sanitizedText := sanitizeLeakedOutput(result.Text)
+	sanitizedText := sanitizeLeakedToolHistory(result.Text)
 	textParsed := util.ParseStandaloneToolCallsDetailed(sanitizedText, toolNames)
 	logResponsesToolPolicyRejection(traceID, toolChoice, textParsed, "text")

--- a/internal/adapter/openai/responses_input_items.go
+++ b/internal/adapter/openai/responses_input_items.go
@@ -1,11 +1,11 @@
 package openai

 import (
+	"encoding/json"
 	"fmt"
 	"strings"

 	"ds2api/internal/config"
-	"ds2api/internal/prompt"
 )

 func normalizeResponsesInputItem(m map[string]any) map[string]any {
@@ -148,7 +148,7 @@ func normalizeResponsesInputItemWithState(m map[string]any, callNameByID map[str

 		functionPayload := map[string]any{
 			"name":      name,
-			"arguments": prompt.StringifyToolCallArguments(argsRaw),
+			"arguments": stringifyToolCallArguments(argsRaw),
 		}
 		call := map[string]any{
 			"type":     "function",
@@ -211,3 +211,26 @@ func normalizeResponsesFallbackPart(m map[string]any) string {
 	}
 	return strings.TrimSpace(fmt.Sprintf("%v", m))
 }
+
+func stringifyToolCallArguments(v any) string {
+	switch x := v.(type) {
+	case nil:
+		return "{}"
+	case string:
+		s := strings.TrimSpace(x)
+		if s == "" {
+			return "{}"
+		}
+		s = normalizeToolArgumentString(s)
+		if s == "" {
+			return "{}"
+		}
+		return s
+	default:
+		b, err := json.Marshal(x)
+		if err != nil || len(b) == 0 {
+			return "{}"
+		}
+		return string(b)
+	}
+}
--- a/internal/adapter/openai/responses_stream_runtime_core.go
+++ b/internal/adapter/openai/responses_stream_runtime_core.go
@@ -97,7 +97,7 @@ func newResponsesStreamRuntime(

 func (s *responsesStreamRuntime) finalize() {
 	finalThinking := s.thinking.String()
-	finalText := sanitizeLeakedOutput(s.text.String())
+	finalText := sanitizeLeakedToolHistory(s.text.String())

 	if s.bufferToolContent {
 		s.processToolStreamEvents(flushToolSieve(&s.sieve, s.toolNames), true)
@@ -194,7 +194,7 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 			continue
 		}

-		cleanedText := sanitizeLeakedOutput(p.Text)
+		cleanedText := sanitizeLeakedToolHistory(p.Text)
 		if cleanedText == "" {
 			continue
 		}
--- a/internal/adapter/openai/tool_history_sanitize.go
+++ b/internal/adapter/openai/tool_history_sanitize.go
@@ -0,0 +1,32 @@
+package openai
+
+import (
+	"regexp"
+)
+
+var leakedToolHistoryPattern = regexp.MustCompile(`(?is)\[TOOL_CALL_HISTORY\][\s\S]*?\[/TOOL_CALL_HISTORY\]|\[TOOL_RESULT_HISTORY\][\s\S]*?\[/TOOL_RESULT_HISTORY\]`)
+var emptyJSONFencePattern = regexp.MustCompile("(?is)```json\\s*```")
+var leakedToolCallArrayPattern = regexp.MustCompile(`(?is)\[\{\s*"function"\s*:\s*\{[\s\S]*?\}\s*,\s*"id"\s*:\s*"call[^"]*"\s*,\s*"type"\s*:\s*"function"\s*}\]`)
+var leakedToolResultBlobPattern = regexp.MustCompile(`(?is)<\s*\|\s*tool\s*\|\s*>\s*\{[\s\S]*?"tool_call_id"\s*:\s*"call[^"]*"\s*}`)
+
+// leakedMetaMarkerPattern matches DeepSeek special tokens in BOTH forms:
+//   - ASCII underscore: <｜end_of_sentence｜>
+//   - U+2581 variant:   <｜end▁of▁sentence｜>  (used in some DeepSeek outputs)
+var leakedMetaMarkerPattern = regexp.MustCompile(`(?i)<[｜\|]\s*(?:assistant|tool|end[_▁]of[_▁]sentence|end[_▁]of[_▁]thinking)\s*[｜\|]>`)
+
+// leakedAgentXMLPattern catches agent-style XML tags that leak through when
+// the sieve fails to capture them (e.g. incomplete blocks at stream end).
+var leakedAgentXMLPattern = regexp.MustCompile(`(?is)</?(?:attempt_completion|ask_followup_question|new_task|result)>`)
+
+func sanitizeLeakedToolHistory(text string) string {
+	if text == "" {
+		return text
+	}
+	out := leakedToolHistoryPattern.ReplaceAllString(text, "")
+	out = emptyJSONFencePattern.ReplaceAllString(out, "")
+	out = leakedToolCallArrayPattern.ReplaceAllString(out, "")
+	out = leakedToolResultBlobPattern.ReplaceAllString(out, "")
+	out = leakedMetaMarkerPattern.ReplaceAllString(out, "")
+	out = leakedAgentXMLPattern.ReplaceAllString(out, "")
+	return out
+}
--- a/internal/adapter/openai/tool_history_sanitize_test.go
+++ b/internal/adapter/openai/tool_history_sanitize_test.go
@@ -0,0 +1,130 @@
+package openai
+
+import "testing"
+
+func TestSanitizeLeakedToolHistoryRemovesMarkerBlocks(t *testing.T) {
+	raw := "前缀\n[TOOL_CALL_HISTORY]\nfunction.name: exec\nfunction.arguments: {}\n[/TOOL_CALL_HISTORY]\n后缀"
+	got := sanitizeLeakedToolHistory(raw)
+	if got != "前缀\n\n后缀" {
+		t.Fatalf("unexpected sanitized content: %q", got)
+	}
+}
+
+func TestSanitizeLeakedToolHistoryPreservesChunkWhitespace(t *testing.T) {
+	cases := []struct {
+		name string
+		raw  string
+		want string
+	}{
+		{
+			name: "trailing space kept",
+			raw:  "Hello ",
+			want: "Hello ",
+		},
+		{
+			name: "leading newline kept",
+			raw:  "\nworld",
+			want: "\nworld",
+		},
+		{
+			name: "surrounding whitespace around marker is preserved",
+			raw:  "A \n[TOOL_RESULT_HISTORY]\nfunction.name: exec\nfunction.arguments: {}\n[/TOOL_RESULT_HISTORY]\n B",
+			want: "A \n\n B",
+		},
+	}
+
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			got := sanitizeLeakedToolHistory(tc.raw)
+			if got != tc.want {
+				t.Fatalf("unexpected sanitize result, want %q got %q", tc.want, got)
+			}
+		})
+	}
+}
+
+func TestSanitizeLeakedToolHistoryRemovesEmptyJSONFence(t *testing.T) {
+	raw := "before\n```json\n```\nafter"
+	got := sanitizeLeakedToolHistory(raw)
+	if got != "before\n\nafter" {
+		t.Fatalf("unexpected sanitized empty json fence: %q", got)
+	}
+}
+
+func TestFlushToolSieveDropsToolHistoryLeak(t *testing.T) {
+	var state toolStreamSieveState
+	chunk := "[TOOL_CALL_HISTORY]\nstatus: already_called\nfunction.name: exec\nfunction.arguments: {}\n[/TOOL_CALL_HISTORY]"
+	evts := processToolSieveChunk(&state, chunk, []string{"exec"})
+	if len(evts) != 0 {
+		t.Fatalf("expected no immediate output before history block is complete, got %+v", evts)
+	}
+	flushed := flushToolSieve(&state, []string{"exec"})
+	if len(flushed) != 0 {
+		t.Fatalf("expected history block to be swallowed, got %+v", flushed)
+	}
+}
+
+func TestFlushToolSieveDropsToolResultHistoryLeak(t *testing.T) {
+	var state toolStreamSieveState
+	chunk := "[TOOL_RESULT_HISTORY]\nstatus: already_called\nfunction.name: exec\nfunction.arguments: {}\n[/TOOL_RESULT_HISTORY]"
+	evts := processToolSieveChunk(&state, chunk, []string{"exec"})
+	if len(evts) != 0 {
+		t.Fatalf("expected no immediate output before result history block is complete, got %+v", evts)
+	}
+	flushed := flushToolSieve(&state, []string{"exec"})
+	if len(flushed) != 0 {
+		t.Fatalf("expected result history block to be swallowed, got %+v", flushed)
+	}
+}
+
+func TestSanitizeLeakedToolHistoryRemovesLeakedWireToolCallAndResult(t *testing.T) {
+	raw := "开始\n[{\"function\":{\"arguments\":\"{\\\"command\\\":\\\"java -version\\\"}\",\"name\":\"exec\"},\"id\":\"callb9a321\",\"type\":\"function\"}]< | Tool | >{\"content\":\"openjdk version 21\",\"tool_call_id\":\"callb9a321\"}\n结束"
+	got := sanitizeLeakedToolHistory(raw)
+	if got != "开始\n\n结束" {
+		t.Fatalf("unexpected sanitize result for leaked wire format: %q", got)
+	}
+}
+
+func TestSanitizeLeakedToolHistoryRemovesStandaloneMetaMarkers(t *testing.T) {
+	raw := "A<| end_of_sentence |><| Assistant |>B<| end_of_thinking |>C<｜end▁of▁thinking｜>D<｜end▁of▁sentence｜>E"
+	got := sanitizeLeakedToolHistory(raw)
+	if got != "ABCDE" {
+		t.Fatalf("unexpected sanitize result for meta markers: %q", got)
+	}
+}
+
+func TestSanitizeLeakedToolHistoryRemovesAgentXMLLeaks(t *testing.T) {
+	raw := "Done.<attempt_completion><result>Some final answer</result></attempt_completion>"
+	got := sanitizeLeakedToolHistory(raw)
+	if got != "Done.Some final answer" {
+		t.Fatalf("unexpected sanitize result for agent XML leak: %q", got)
+	}
+}
+
+func TestProcessToolSieveChunkSplitsResultHistoryBoundary(t *testing.T) {
+	var state toolStreamSieveState
+	parts := []string{
+		"Hello ",
+		"[TOOL_RESULT_HISTORY]\nstatus: already_called\n",
+		"function.name: exec\nfunction.arguments: {}\n[/TOOL_RESULT_HISTORY]",
+		"world",
+	}
+	var events []toolStreamEvent
+	for _, p := range parts {
+		events = append(events, processToolSieveChunk(&state, p, []string{"exec"})...)
+	}
+	events = append(events, flushToolSieve(&state, []string{"exec"})...)
+
+	var text string
+	for _, evt := range events {
+		if evt.Content != "" {
+			text += evt.Content
+		}
+		if len(evt.ToolCalls) > 0 {
+			t.Fatalf("did not expect parsed tool calls from history leak: %+v", evt.ToolCalls)
+		}
+	}
+	if text != "Hello world" {
+		t.Fatalf("expected clean text output preserving boundary spaces, got %q", text)
+	}
+}
--- a/internal/adapter/openai/tool_sieve_core.go
+++ b/internal/adapter/openai/tool_sieve_core.go
@@ -183,7 +183,7 @@ func findToolSegmentStart(s string) int {
 		return -1
 	}
 	lower := strings.ToLower(s)
-	keywords := []string{"tool_calls", "\"function\"", "function.name:"}
+	keywords := []string{"tool_calls", "\"function\"", "function.name:", "[tool_call_history]", "[tool_result_history]"}
 	bestKeyIdx := -1
 	for _, kw := range keywords {
 		idx := strings.Index(lower, kw)
@@ -240,7 +240,7 @@ func consumeToolCapture(state *toolStreamSieveState, toolNames []string) (prefix

 	lower := strings.ToLower(captured)
 	keyIdx := -1
-	keywords := []string{"tool_calls", "\"function\"", "function.name:"}
+	keywords := []string{"tool_calls", "\"function\"", "function.name:", "[tool_call_history]", "[tool_result_history]"}
 	for _, kw := range keywords {
 		idx := strings.Index(lower, kw)
 		if idx >= 0 && (keyIdx < 0 || idx < keyIdx) {
@@ -253,6 +253,9 @@ func consumeToolCapture(state *toolStreamSieveState, toolNames []string) (prefix
 	}
 	start := strings.LastIndex(captured[:keyIdx], "{")
 	if start < 0 {
+		if blockStart, blockEnd, ok := extractToolHistoryBlock(captured, keyIdx); ok {
+			return captured[:blockStart], nil, captured[blockEnd:], true
+		}
 		start = keyIdx
 	}
 	obj, end, ok := extractJSONObjectFrom(captured, start)
--- a/internal/adapter/openai/tool_sieve_jsonscan.go
+++ b/internal/adapter/openai/tool_sieve_jsonscan.go
@@ -44,6 +44,31 @@ func extractJSONObjectFrom(text string, start int) (string, int, bool) {
 	return "", 0, false
 }

+func extractToolHistoryBlock(captured string, keyIdx int) (start int, end int, ok bool) {
+	if keyIdx < 0 || keyIdx >= len(captured) {
+		return 0, 0, false
+	}
+	rest := strings.ToLower(captured[keyIdx:])
+	switch {
+	case strings.HasPrefix(rest, "[tool_call_history]"):
+		closeTag := "[/tool_call_history]"
+		closeIdx := strings.Index(rest, closeTag)
+		if closeIdx < 0 {
+			return 0, 0, false
+		}
+		return keyIdx, keyIdx + closeIdx + len(closeTag), true
+	case strings.HasPrefix(rest, "[tool_result_history]"):
+		closeTag := "[/tool_result_history]"
+		closeIdx := strings.Index(rest, closeTag)
+		if closeIdx < 0 {
+			return 0, 0, false
+		}
+		return keyIdx, keyIdx + closeIdx + len(closeTag), true
+	default:
+		return 0, 0, false
+	}
+}
+
 func trimWrappingJSONFence(prefix, suffix string) (string, string) {
 	trimmedPrefix := strings.TrimRight(prefix, " \t\r\n")
 	fenceIdx := strings.LastIndex(trimmedPrefix, "```")
--- a/internal/adapter/openai/vercel_stream.go
+++ b/internal/adapter/openai/vercel_stream.go
@@ -93,16 +93,18 @@ func (h *Handler) handleVercelStreamPrepare(w http.ResponseWriter, r *http.Reque
 	}
 	leased = true
 	writeJSON(w, http.StatusOK, map[string]any{
-		"session_id":       sessionID,
-		"lease_id":         leaseID,
-		"model":            stdReq.ResponseModel,
-		"final_prompt":     stdReq.FinalPrompt,
-		"thinking_enabled": stdReq.Thinking,
-		"search_enabled":   stdReq.Search,
-		"tool_names":       stdReq.ToolNames,
-		"deepseek_token":   a.DeepSeekToken,
-		"pow_header":       powHeader,
-		"payload":          payload,
+		"session_id":               sessionID,
+		"lease_id":                 leaseID,
+		"model":                    stdReq.ResponseModel,
+		"final_prompt":             stdReq.FinalPrompt,
+		"thinking_enabled":         stdReq.Thinking,
+		"search_enabled":           stdReq.Search,
+		"tool_names":               stdReq.ToolNames,
+		"toolcall_feature_match":   h.toolcallFeatureMatchEnabled(),
+		"toolcall_early_emit_high": h.toolcallEarlyEmitHighConfidence(),
+		"deepseek_token":           a.DeepSeekToken,
+		"pow_header":               powHeader,
+		"payload":                  payload,
 	})
 }

--- a/internal/admin/deps.go
+++ b/internal/admin/deps.go
@@ -28,7 +28,6 @@ type ConfigStore interface {
 	RuntimeAccountMaxInflight() int
 	RuntimeAccountMaxQueue(defaultSize int) int
 	RuntimeGlobalMaxInflight(defaultSize int) int
-	RuntimeTokenRefreshIntervalHours() int
 	AutoDeleteSessions() bool
 }

--- a/internal/admin/handler_config_import.go
+++ b/internal/admin/handler_config_import.go
@@ -120,6 +120,12 @@ func (h *Handler) configImport(w http.ResponseWriter, r *http.Request) {
 					next.ModelAliases[k] = v
 				}
 			}
+			if strings.TrimSpace(incoming.Toolcall.Mode) != "" {
+				next.Toolcall.Mode = incoming.Toolcall.Mode
+			}
+			if strings.TrimSpace(incoming.Toolcall.EarlyEmitConfidence) != "" {
+				next.Toolcall.EarlyEmitConfidence = incoming.Toolcall.EarlyEmitConfidence
+			}
 			if incoming.Responses.StoreTTLSeconds > 0 {
 				next.Responses.StoreTTLSeconds = incoming.Responses.StoreTTLSeconds
 			}
@@ -144,9 +150,6 @@ func (h *Handler) configImport(w http.ResponseWriter, r *http.Request) {
 			if incoming.Runtime.GlobalMaxInflight > 0 {
 				next.Runtime.GlobalMaxInflight = incoming.Runtime.GlobalMaxInflight
 			}
-			if incoming.Runtime.TokenRefreshIntervalHours > 0 {
-				next.Runtime.TokenRefreshIntervalHours = incoming.Runtime.TokenRefreshIntervalHours
-			}
 		}

 		normalizeSettingsConfig(&next)
--- a/internal/admin/handler_settings_parse.go
+++ b/internal/admin/handler_settings_parse.go
@@ -21,15 +21,16 @@ func boolFrom(v any) bool {
 	}
 }

-func parseSettingsUpdateRequest(req map[string]any) (*config.AdminConfig, *config.RuntimeConfig, *config.ResponsesConfig, *config.EmbeddingsConfig, *config.AutoDeleteConfig, map[string]string, map[string]string, error) {
+func parseSettingsUpdateRequest(req map[string]any) (*config.AdminConfig, *config.RuntimeConfig, *config.ToolcallConfig, *config.ResponsesConfig, *config.EmbeddingsConfig, *config.AutoDeleteConfig, map[string]string, map[string]string, error) {
 	var (
-		adminCfg      *config.AdminConfig
-		runtimeCfg    *config.RuntimeConfig
-		respCfg       *config.ResponsesConfig
-		embCfg        *config.EmbeddingsConfig
-		autoDeleteCfg *config.AutoDeleteConfig
-		claudeMap     map[string]string
-		aliasMap      map[string]string
+		adminCfg       *config.AdminConfig
+		runtimeCfg     *config.RuntimeConfig
+		toolcallCfg    *config.ToolcallConfig
+		respCfg        *config.ResponsesConfig
+		embCfg         *config.EmbeddingsConfig
+		autoDeleteCfg  *config.AutoDeleteConfig
+		claudeMap      map[string]string
+		aliasMap       map[string]string
 	)

 	if raw, ok := req["admin"].(map[string]any); ok {
@@ -37,7 +38,7 @@ func parseSettingsUpdateRequest(req map[string]any) (*config.AdminConfig, *confi
 		if v, exists := raw["jwt_expire_hours"]; exists {
 			n := intFrom(v)
 			if n < 1 || n > 720 {
-				return nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("admin.jwt_expire_hours must be between 1 and 720")
+				return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("admin.jwt_expire_hours must be between 1 and 720")
 			}
 			cfg.JWTExpireHours = n
 		}
@@ -49,43 +50,59 @@ func parseSettingsUpdateRequest(req map[string]any) (*config.AdminConfig, *confi
 		if v, exists := raw["account_max_inflight"]; exists {
 			n := intFrom(v)
 			if n < 1 || n > 256 {
-				return nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.account_max_inflight must be between 1 and 256")
+				return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.account_max_inflight must be between 1 and 256")
 			}
 			cfg.AccountMaxInflight = n
 		}
 		if v, exists := raw["account_max_queue"]; exists {
 			n := intFrom(v)
 			if n < 1 || n > 200000 {
-				return nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.account_max_queue must be between 1 and 200000")
+				return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.account_max_queue must be between 1 and 200000")
 			}
 			cfg.AccountMaxQueue = n
 		}
 		if v, exists := raw["global_max_inflight"]; exists {
 			n := intFrom(v)
 			if n < 1 || n > 200000 {
-				return nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.global_max_inflight must be between 1 and 200000")
+				return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.global_max_inflight must be between 1 and 200000")
 			}
 			cfg.GlobalMaxInflight = n
 		}
-		if v, exists := raw["token_refresh_interval_hours"]; exists {
-			n := intFrom(v)
-			if n < 1 || n > 720 {
-				return nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.token_refresh_interval_hours must be between 1 and 720")
-			}
-			cfg.TokenRefreshIntervalHours = n
-		}
 		if cfg.AccountMaxInflight > 0 && cfg.GlobalMaxInflight > 0 && cfg.GlobalMaxInflight < cfg.AccountMaxInflight {
-			return nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.global_max_inflight must be >= runtime.account_max_inflight")
+			return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("runtime.global_max_inflight must be >= runtime.account_max_inflight")
 		}
 		runtimeCfg = cfg
 	}

+	if raw, ok := req["toolcall"].(map[string]any); ok {
+		cfg := &config.ToolcallConfig{}
+		if v, exists := raw["mode"]; exists {
+			mode := strings.ToLower(strings.TrimSpace(fmt.Sprintf("%v", v)))
+			switch mode {
+			case "feature_match", "off":
+				cfg.Mode = mode
+			default:
+				return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("toolcall.mode must be feature_match or off")
+			}
+		}
+		if v, exists := raw["early_emit_confidence"]; exists {
+			level := strings.ToLower(strings.TrimSpace(fmt.Sprintf("%v", v)))
+			switch level {
+			case "high", "low", "off":
+				cfg.EarlyEmitConfidence = level
+			default:
+				return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("toolcall.early_emit_confidence must be high, low or off")
+			}
+		}
+		toolcallCfg = cfg
+	}
+
 	if raw, ok := req["responses"].(map[string]any); ok {
 		cfg := &config.ResponsesConfig{}
 		if v, exists := raw["store_ttl_seconds"]; exists {
 			n := intFrom(v)
 			if n < 30 || n > 86400 {
-				return nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("responses.store_ttl_seconds must be between 30 and 86400")
+				return nil, nil, nil, nil, nil, nil, nil, nil, fmt.Errorf("responses.store_ttl_seconds must be between 30 and 86400")
 			}
 			cfg.StoreTTLSeconds = n
 		}
@@ -133,5 +150,5 @@ func parseSettingsUpdateRequest(req map[string]any) (*config.AdminConfig, *confi
 		autoDeleteCfg = cfg
 	}

-	return adminCfg, runtimeCfg, respCfg, embCfg, autoDeleteCfg, claudeMap, aliasMap, nil
+	return adminCfg, runtimeCfg, toolcallCfg, respCfg, embCfg, autoDeleteCfg, claudeMap, aliasMap, nil
 }
--- a/internal/admin/handler_settings_read.go
+++ b/internal/admin/handler_settings_read.go
@@ -21,11 +21,11 @@ func (h *Handler) getSettings(w http.ResponseWriter, _ *http.Request) {
 			"default_password_warning": authn.UsingDefaultAdminKey(h.Store),
 		},
 		"runtime": map[string]any{
-			"account_max_inflight":         h.Store.RuntimeAccountMaxInflight(),
-			"account_max_queue":            h.Store.RuntimeAccountMaxQueue(recommended),
-			"global_max_inflight":          h.Store.RuntimeGlobalMaxInflight(recommended),
-			"token_refresh_interval_hours": h.Store.RuntimeTokenRefreshIntervalHours(),
+			"account_max_inflight": h.Store.RuntimeAccountMaxInflight(),
+			"account_max_queue":    h.Store.RuntimeAccountMaxQueue(recommended),
+			"global_max_inflight":  h.Store.RuntimeGlobalMaxInflight(recommended),
 		},
+		"toolcall":          snap.Toolcall,
 		"responses":         snap.Responses,
 		"embeddings":        snap.Embeddings,
 		"auto_delete":       snap.AutoDelete,
--- a/internal/admin/handler_settings_runtime.go
+++ b/internal/admin/handler_settings_runtime.go
@@ -14,9 +14,6 @@ func validateMergedRuntimeSettings(current config.RuntimeConfig, incoming *confi
 		if incoming.GlobalMaxInflight > 0 {
 			merged.GlobalMaxInflight = incoming.GlobalMaxInflight
 		}
-		if incoming.TokenRefreshIntervalHours > 0 {
-			merged.TokenRefreshIntervalHours = incoming.TokenRefreshIntervalHours
-		}
 	}
 	return validateRuntimeSettings(merged)
 }
--- a/internal/admin/handler_settings_test.go
+++ b/internal/admin/handler_settings_test.go
@@ -28,25 +28,6 @@ func TestGetSettingsDefaultPasswordWarning(t *testing.T) {
 	}
 }

-func TestGetSettingsIncludesTokenRefreshInterval(t *testing.T) {
-	h := newAdminTestHandler(t, `{
-		"keys":["k1"],
-		"runtime":{"token_refresh_interval_hours":9}
-	}`)
-	req := httptest.NewRequest(http.MethodGet, "/admin/settings", nil)
-	rec := httptest.NewRecorder()
-	h.getSettings(rec, req)
-	if rec.Code != http.StatusOK {
-		t.Fatalf("status=%d body=%s", rec.Code, rec.Body.String())
-	}
-	var body map[string]any
-	_ = json.Unmarshal(rec.Body.Bytes(), &body)
-	runtime, _ := body["runtime"].(map[string]any)
-	if got := intFrom(runtime["token_refresh_interval_hours"]); got != 9 {
-		t.Fatalf("expected token_refresh_interval_hours=9, got %d body=%v", got, body)
-	}
-}
-
 func TestUpdateSettingsValidation(t *testing.T) {
 	h := newAdminTestHandler(t, `{"keys":["k1"]}`)
 	payload := map[string]any{
@@ -63,25 +44,6 @@ func TestUpdateSettingsValidation(t *testing.T) {
 	}
 }

-func TestUpdateSettingsValidationRejectsTokenRefreshInterval(t *testing.T) {
-	h := newAdminTestHandler(t, `{"keys":["k1"]}`)
-	payload := map[string]any{
-		"runtime": map[string]any{
-			"token_refresh_interval_hours": 0,
-		},
-	}
-	b, _ := json.Marshal(payload)
-	req := httptest.NewRequest(http.MethodPut, "/admin/settings", bytes.NewReader(b))
-	rec := httptest.NewRecorder()
-	h.updateSettings(rec, req)
-	if rec.Code != http.StatusBadRequest {
-		t.Fatalf("expected 400, got %d body=%s", rec.Code, rec.Body.String())
-	}
-	if !bytes.Contains(rec.Body.Bytes(), []byte("runtime.token_refresh_interval_hours")) {
-		t.Fatalf("expected token refresh validation detail, got %s", rec.Body.String())
-	}
-}
-
 func TestUpdateSettingsValidationWithMergedRuntimeSnapshot(t *testing.T) {
 	h := newAdminTestHandler(t, `{
 		"keys":["k1"],
@@ -164,29 +126,6 @@ func TestUpdateSettingsHotReloadRuntime(t *testing.T) {
 	}
 }

-func TestUpdateSettingsHotReloadTokenRefreshInterval(t *testing.T) {
-	h := newAdminTestHandler(t, `{
-		"keys":["k1"],
-		"runtime":{"token_refresh_interval_hours":6}
-	}`)
-
-	payload := map[string]any{
-		"runtime": map[string]any{
-			"token_refresh_interval_hours": 12,
-		},
-	}
-	b, _ := json.Marshal(payload)
-	req := httptest.NewRequest(http.MethodPut, "/admin/settings", bytes.NewReader(b))
-	rec := httptest.NewRecorder()
-	h.updateSettings(rec, req)
-	if rec.Code != http.StatusOK {
-		t.Fatalf("status=%d body=%s", rec.Code, rec.Body.String())
-	}
-	if got := h.Store.RuntimeTokenRefreshIntervalHours(); got != 12 {
-		t.Fatalf("token_refresh_interval_hours=%d want=12", got)
-	}
-}
-
 func TestUpdateSettingsPasswordInvalidatesOldJWT(t *testing.T) {
 	hash := authn.HashAdminPassword("old-password")
 	h := newAdminTestHandler(t, `{"admin":{"password_hash":"`+hash+`"}}`)
@@ -268,30 +207,6 @@ func TestConfigImportMergeAndReplace(t *testing.T) {
 	}
 }

-func TestConfigImportAppliesTokenRefreshInterval(t *testing.T) {
-	h := newAdminTestHandler(t, `{"keys":["k1"]}`)
-
-	replace := map[string]any{
-		"mode": "replace",
-		"config": map[string]any{
-			"keys": []any{"k9"},
-			"runtime": map[string]any{
-				"token_refresh_interval_hours": 11,
-			},
-		},
-	}
-	replaceBytes, _ := json.Marshal(replace)
-	replaceReq := httptest.NewRequest(http.MethodPost, "/admin/config/import?mode=replace", bytes.NewReader(replaceBytes))
-	replaceRec := httptest.NewRecorder()
-	h.configImport(replaceRec, replaceReq)
-	if replaceRec.Code != http.StatusOK {
-		t.Fatalf("replace status=%d body=%s", replaceRec.Code, replaceRec.Body.String())
-	}
-	if got := h.Store.RuntimeTokenRefreshIntervalHours(); got != 11 {
-		t.Fatalf("token_refresh_interval_hours=%d want=11", got)
-	}
-}
-
 func TestConfigImportRejectsInvalidRuntimeBounds(t *testing.T) {
 	h := newAdminTestHandler(t, `{"keys":["k1"]}`)
 	payload := map[string]any{
--- a/internal/admin/handler_settings_write.go
+++ b/internal/admin/handler_settings_write.go
@@ -17,7 +17,7 @@ func (h *Handler) updateSettings(w http.ResponseWriter, r *http.Request) {
 		return
 	}

-	adminCfg, runtimeCfg, responsesCfg, embeddingsCfg, autoDeleteCfg, claudeMap, aliasMap, err := parseSettingsUpdateRequest(req)
+	adminCfg, runtimeCfg, toolcallCfg, responsesCfg, embeddingsCfg, autoDeleteCfg, claudeMap, aliasMap, err := parseSettingsUpdateRequest(req)
 	if err != nil {
 		writeJSON(w, http.StatusBadRequest, map[string]any{"detail": err.Error()})
 		return
@@ -45,8 +45,13 @@ func (h *Handler) updateSettings(w http.ResponseWriter, r *http.Request) {
 			if runtimeCfg.GlobalMaxInflight > 0 {
 				c.Runtime.GlobalMaxInflight = runtimeCfg.GlobalMaxInflight
 			}
-			if runtimeCfg.TokenRefreshIntervalHours > 0 {
-				c.Runtime.TokenRefreshIntervalHours = runtimeCfg.TokenRefreshIntervalHours
+		}
+		if toolcallCfg != nil {
+			if strings.TrimSpace(toolcallCfg.Mode) != "" {
+				c.Toolcall.Mode = strings.TrimSpace(toolcallCfg.Mode)
+			}
+			if strings.TrimSpace(toolcallCfg.EarlyEmitConfidence) != "" {
+				c.Toolcall.EarlyEmitConfidence = strings.TrimSpace(toolcallCfg.EarlyEmitConfidence)
 			}
 		}
 		if responsesCfg != nil && responsesCfg.StoreTTLSeconds > 0 {
--- a/internal/admin/settings_validation.go
+++ b/internal/admin/settings_validation.go
@@ -12,6 +12,8 @@ func normalizeSettingsConfig(c *config.Config) {
 		return
 	}
 	c.Admin.PasswordHash = strings.TrimSpace(c.Admin.PasswordHash)
+	c.Toolcall.Mode = strings.ToLower(strings.TrimSpace(c.Toolcall.Mode))
+	c.Toolcall.EarlyEmitConfidence = strings.ToLower(strings.TrimSpace(c.Toolcall.EarlyEmitConfidence))
 	c.Embeddings.Provider = strings.TrimSpace(c.Embeddings.Provider)
 }

@@ -25,6 +27,20 @@ func validateSettingsConfig(c config.Config) error {
 	if c.Responses.StoreTTLSeconds != 0 && (c.Responses.StoreTTLSeconds < 30 || c.Responses.StoreTTLSeconds > 86400) {
 		return fmt.Errorf("responses.store_ttl_seconds must be between 30 and 86400")
 	}
+	if mode := strings.TrimSpace(c.Toolcall.Mode); mode != "" {
+		switch mode {
+		case "feature_match", "off":
+		default:
+			return fmt.Errorf("toolcall.mode must be feature_match or off")
+		}
+	}
+	if level := strings.TrimSpace(c.Toolcall.EarlyEmitConfidence); level != "" {
+		switch level {
+		case "high", "low", "off":
+		default:
+			return fmt.Errorf("toolcall.early_emit_confidence must be high, low or off")
+		}
+	}
 	if c.Embeddings.Provider != "" && strings.TrimSpace(c.Embeddings.Provider) == "" {
 		return fmt.Errorf("embeddings.provider cannot be empty")
 	}
@@ -41,9 +57,6 @@ func validateRuntimeSettings(runtime config.RuntimeConfig) error {
 	if runtime.GlobalMaxInflight != 0 && (runtime.GlobalMaxInflight < 1 || runtime.GlobalMaxInflight > 200000) {
 		return fmt.Errorf("runtime.global_max_inflight must be between 1 and 200000")
 	}
-	if runtime.TokenRefreshIntervalHours != 0 && (runtime.TokenRefreshIntervalHours < 1 || runtime.TokenRefreshIntervalHours > 720) {
-		return fmt.Errorf("runtime.token_refresh_interval_hours must be between 1 and 720")
-	}
 	if runtime.AccountMaxInflight > 0 && runtime.GlobalMaxInflight > 0 && runtime.GlobalMaxInflight < runtime.AccountMaxInflight {
 		return fmt.Errorf("runtime.global_max_inflight must be >= runtime.account_max_inflight")
 	}
--- a/internal/auth/request.go
+++ b/internal/auth/request.go
@@ -40,16 +40,18 @@ type Resolver struct {
 	Pool  *account.Pool
 	Login LoginFunc

-	mu               sync.Mutex
-	tokenRefreshedAt map[string]time.Time
+	mu                   sync.Mutex
+	tokenRefreshedAt     map[string]time.Time
+	tokenRefreshInterval time.Duration
 }

 func NewResolver(store *config.Store, pool *account.Pool, login LoginFunc) *Resolver {
 	return &Resolver{
-		Store:            store,
-		Pool:             pool,
-		Login:            login,
-		tokenRefreshedAt: map[string]time.Time{},
+		Store:                store,
+		Pool:                 pool,
+		Login:                login,
+		tokenRefreshedAt:     map[string]time.Time{},
+		tokenRefreshInterval: 6 * time.Hour,
 	}
 }

@@ -230,14 +232,10 @@ func (r *Resolver) ensureManagedToken(ctx context.Context, a *RequestAuth) error
 }

 func (r *Resolver) shouldForceRefresh(accountID string) bool {
-	if r == nil || r.Store == nil {
-		return false
-	}
 	if strings.TrimSpace(accountID) == "" {
 		return false
 	}
-	intervalHours := r.Store.RuntimeTokenRefreshIntervalHours()
-	if intervalHours <= 0 {
+	if r.tokenRefreshInterval <= 0 {
 		return false
 	}
 	now := time.Now()
@@ -248,7 +246,7 @@ func (r *Resolver) shouldForceRefresh(accountID string) bool {
 		r.tokenRefreshedAt[accountID] = now
 		return false
 	}
-	return now.Sub(last) >= time.Duration(intervalHours)*time.Hour
+	return now.Sub(last) >= r.tokenRefreshInterval
 }

 func (r *Resolver) markTokenRefreshedNow(accountID string) {
--- a/internal/auth/request_test.go
+++ b/internal/auth/request_test.go
@@ -244,60 +244,3 @@ func TestDetermineManagedAccountForcesRefreshEverySixHours(t *testing.T) {
 		t.Fatalf("expected exactly one forced refresh login, got %d", got)
 	}
 }
-
-func TestDetermineManagedAccountUsesUpdatedRefreshInterval(t *testing.T) {
-	t.Setenv("DS2API_CONFIG_JSON", `{
-		"keys":["managed-key"],
-		"accounts":[{"email":"acc@example.com","password":"pwd","token":"seed-token"}],
-		"runtime":{"token_refresh_interval_hours":6}
-	}`)
-	store := config.LoadStore()
-	if err := store.UpdateAccountToken("acc@example.com", "seed-token"); err != nil {
-		t.Fatalf("update token failed: %v", err)
-	}
-	pool := account.NewPool(store)
-
-	var loginCount int32
-	resolver := NewResolver(store, pool, func(_ context.Context, _ config.Account) (string, error) {
-		n := atomic.AddInt32(&loginCount, 1)
-		return "fresh-token-" + string(rune('0'+n)), nil
-	})
-
-	req, _ := http.NewRequest(http.MethodPost, "/v1/chat/completions", nil)
-	req.Header.Set("x-api-key", "managed-key")
-
-	a1, err := resolver.Determine(req)
-	if err != nil {
-		t.Fatalf("determine failed: %v", err)
-	}
-	if a1.DeepSeekToken != "seed-token" {
-		t.Fatalf("expected initial token without forced refresh, got %q", a1.DeepSeekToken)
-	}
-	resolver.Release(a1)
-	if got := atomic.LoadInt32(&loginCount); got != 0 {
-		t.Fatalf("expected no login before runtime update, got %d", got)
-	}
-
-	if err := store.Update(func(c *config.Config) error {
-		c.Runtime.TokenRefreshIntervalHours = 1
-		return nil
-	}); err != nil {
-		t.Fatalf("update runtime failed: %v", err)
-	}
-
-	resolver.mu.Lock()
-	resolver.tokenRefreshedAt["acc@example.com"] = time.Now().Add(-2 * time.Hour)
-	resolver.mu.Unlock()
-
-	a2, err := resolver.Determine(req)
-	if err != nil {
-		t.Fatalf("determine after runtime update failed: %v", err)
-	}
-	defer resolver.Release(a2)
-	if a2.DeepSeekToken != "fresh-token-1" {
-		t.Fatalf("expected refreshed token after runtime update, got %q", a2.DeepSeekToken)
-	}
-	if got := atomic.LoadInt32(&loginCount); got != 1 {
-		t.Fatalf("expected exactly one login after runtime update, got %d", got)
-	}
-}
--- a/internal/config/codec.go
+++ b/internal/config/codec.go
@@ -32,12 +32,15 @@ func (c Config) MarshalJSON() ([]byte, error) {
 	if strings.TrimSpace(c.Admin.PasswordHash) != "" || c.Admin.JWTExpireHours > 0 || c.Admin.JWTValidAfterUnix > 0 {
 		m["admin"] = c.Admin
 	}
-	if c.Runtime.AccountMaxInflight > 0 || c.Runtime.AccountMaxQueue > 0 || c.Runtime.GlobalMaxInflight > 0 || c.Runtime.TokenRefreshIntervalHours > 0 {
+	if c.Runtime.AccountMaxInflight > 0 || c.Runtime.AccountMaxQueue > 0 || c.Runtime.GlobalMaxInflight > 0 {
 		m["runtime"] = c.Runtime
 	}
 	if c.Compat.WideInputStrictOutput != nil {
 		m["compat"] = c.Compat
 	}
+	if strings.TrimSpace(c.Toolcall.Mode) != "" || strings.TrimSpace(c.Toolcall.EarlyEmitConfidence) != "" {
+		m["toolcall"] = c.Toolcall
+	}
 	if c.Responses.StoreTTLSeconds > 0 {
 		m["responses"] = c.Responses
 	}
@@ -95,7 +98,9 @@ func (c *Config) UnmarshalJSON(b []byte) error {
 				return fmt.Errorf("invalid field %q: %w", k, err)
 			}
 		case "toolcall":
-			// Legacy field ignored. Toolcall policy is fixed and no longer configurable.
+			if err := json.Unmarshal(v, &c.Toolcall); err != nil {
+				return fmt.Errorf("invalid field %q: %w", k, err)
+			}
 		case "responses":
 			if err := json.Unmarshal(v, &c.Responses); err != nil {
 				return fmt.Errorf("invalid field %q: %w", k, err)
@@ -138,6 +143,7 @@ func (c Config) Clone() Config {
 		Compat: CompatConfig{
 			WideInputStrictOutput: cloneBoolPtr(c.Compat.WideInputStrictOutput),
 		},
+		Toolcall:         c.Toolcall,
 		Responses:        c.Responses,
 		Embeddings:       c.Embeddings,
 		AutoDelete:       c.AutoDelete,
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -9,6 +9,7 @@ type Config struct {
 	Admin            AdminConfig       `json:"admin,omitempty"`
 	Runtime          RuntimeConfig     `json:"runtime,omitempty"`
 	Compat           CompatConfig      `json:"compat,omitempty"`
+	Toolcall         ToolcallConfig    `json:"toolcall,omitempty"`
 	Responses        ResponsesConfig   `json:"responses,omitempty"`
 	Embeddings       EmbeddingsConfig  `json:"embeddings,omitempty"`
 	AutoDelete       AutoDeleteConfig  `json:"auto_delete"`
@@ -61,10 +62,14 @@ type AdminConfig struct {
 }

 type RuntimeConfig struct {
-	AccountMaxInflight        int `json:"account_max_inflight,omitempty"`
-	AccountMaxQueue           int `json:"account_max_queue,omitempty"`
-	GlobalMaxInflight         int `json:"global_max_inflight,omitempty"`
-	TokenRefreshIntervalHours int `json:"token_refresh_interval_hours,omitempty"`
+	AccountMaxInflight int `json:"account_max_inflight,omitempty"`
+	AccountMaxQueue    int `json:"account_max_queue,omitempty"`
+	GlobalMaxInflight  int `json:"global_max_inflight,omitempty"`
+}
+
+type ToolcallConfig struct {
+	Mode                string `json:"mode,omitempty"`
+	EarlyEmitConfidence string `json:"early_emit_confidence,omitempty"`
 }

 type ResponsesConfig struct {
--- a/internal/config/config_edge_test.go
+++ b/internal/config/config_edge_test.go
@@ -104,9 +104,6 @@ func TestConfigJSONRoundtrip(t *testing.T) {
 			"fast": "deepseek-chat",
 			"slow": "deepseek-reasoner",
 		},
-		Runtime: RuntimeConfig{
-			TokenRefreshIntervalHours: 12,
-		},
 		VercelSyncHash: "hash123",
 		VercelSyncTime: 1234567890,
 		AdditionalFields: map[string]any{
@@ -133,9 +130,6 @@ func TestConfigJSONRoundtrip(t *testing.T) {
 	if decoded.ClaudeMapping["fast"] != "deepseek-chat" {
 		t.Fatalf("unexpected claude mapping: %#v", decoded.ClaudeMapping)
 	}
-	if decoded.Runtime.TokenRefreshIntervalHours != 12 {
-		t.Fatalf("unexpected runtime refresh interval: %#v", decoded.Runtime.TokenRefreshIntervalHours)
-	}
 	if decoded.VercelSyncHash != "hash123" {
 		t.Fatalf("unexpected vercel sync hash: %q", decoded.VercelSyncHash)
 	}
--- a/internal/config/config_test.go
+++ b/internal/config/config_test.go
@@ -79,31 +79,6 @@ func TestLoadStorePreservesFileBackedTokensForRuntime(t *testing.T) {
 	}
 }

-func TestRuntimeTokenRefreshIntervalHoursDefaultsToSix(t *testing.T) {
-	t.Setenv("DS2API_CONFIG_JSON", `{
-		"keys":["k1"],
-		"accounts":[{"email":"u@example.com","password":"p"}]
-	}`)
-
-	store := LoadStore()
-	if got := store.RuntimeTokenRefreshIntervalHours(); got != 6 {
-		t.Fatalf("expected default refresh interval 6, got %d", got)
-	}
-}
-
-func TestRuntimeTokenRefreshIntervalHoursUsesConfigValue(t *testing.T) {
-	t.Setenv("DS2API_CONFIG_JSON", `{
-		"keys":["k1"],
-		"accounts":[{"email":"u@example.com","password":"p"}],
-		"runtime":{"token_refresh_interval_hours":9}
-	}`)
-
-	store := LoadStore()
-	if got := store.RuntimeTokenRefreshIntervalHours(); got != 9 {
-		t.Fatalf("expected configured refresh interval 9, got %d", got)
-	}
-}
-
 func TestStoreUpdateAccountTokenKeepsIdentifierResolvable(t *testing.T) {
 	t.Setenv("DS2API_CONFIG_JSON", `{
 		"accounts":[{"email":"user@example.com","password":"p"}]
--- a/internal/config/store_accessors.go
+++ b/internal/config/store_accessors.go
@@ -43,11 +43,23 @@ func (s *Store) CompatWideInputStrictOutput() bool {
 }

 func (s *Store) ToolcallMode() string {
-	return "feature_match"
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+	mode := strings.TrimSpace(strings.ToLower(s.cfg.Toolcall.Mode))
+	if mode == "" {
+		return "feature_match"
+	}
+	return mode
 }

 func (s *Store) ToolcallEarlyEmitConfidence() string {
-	return "high"
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+	level := strings.TrimSpace(strings.ToLower(s.cfg.Toolcall.EarlyEmitConfidence))
+	if level == "" {
+		return "high"
+	}
+	return level
 }

 func (s *Store) ResponsesStoreTTLSeconds() int {
@@ -154,15 +166,6 @@ func (s *Store) RuntimeGlobalMaxInflight(defaultSize int) int {
 	return defaultSize
 }

-func (s *Store) RuntimeTokenRefreshIntervalHours() int {
-	s.mu.RLock()
-	defer s.mu.RUnlock()
-	if s.cfg.Runtime.TokenRefreshIntervalHours > 0 {
-		return s.cfg.Runtime.TokenRefreshIntervalHours
-	}
-	return 6
-}
-
 func (s *Store) AutoDeleteSessions() bool {
 	s.mu.RLock()
 	defer s.mu.RUnlock()
--- a/internal/js/chat-stream/toolcall_policy.js
+++ b/internal/js/chat-stream/toolcall_policy.js
@@ -12,10 +12,12 @@ function resolveToolcallPolicy(prepBody, payloadTools) {
  if (toolNames.length === 0 && Array.isArray(payloadTools) && payloadTools.length > 0) {
    toolNames = ['__any_tool__'];
  }
+  const featureMatchEnabled = boolDefaultTrue(prepBody && prepBody.toolcall_feature_match);
+  const emitEarlyToolDeltas = featureMatchEnabled && boolDefaultTrue(prepBody && prepBody.toolcall_early_emit_high);
  return {
    toolNames,
    toolSieveEnabled: toolNames.length > 0,
-    emitEarlyToolDeltas: true,
+    emitEarlyToolDeltas,
  };
 }

--- a/internal/js/helpers/stream-tool-sieve/jsonscan.js
+++ b/internal/js/helpers/stream-tool-sieve/jsonscan.js
@@ -140,6 +140,30 @@ function extractJSONObjectFrom(text, start) {
  return { ok: false, end: 0 };
 }

+function extractToolHistoryBlock(captured, keyIdx) {
+  if (typeof captured !== 'string' || keyIdx < 0 || keyIdx >= captured.length) {
+    return { ok: false, start: 0, end: 0 };
+  }
+  const rest = captured.slice(keyIdx).toLowerCase();
+  if (rest.startsWith('[tool_call_history]')) {
+    const closeTag = '[/tool_call_history]';
+    const closeIdx = rest.indexOf(closeTag);
+    if (closeIdx < 0) {
+      return { ok: false, start: 0, end: 0 };
+    }
+    return { ok: true, start: keyIdx, end: keyIdx + closeIdx + closeTag.length };
+  }
+  if (rest.startsWith('[tool_result_history]')) {
+    const closeTag = '[/tool_result_history]';
+    const closeIdx = rest.indexOf(closeTag);
+    if (closeIdx < 0) {
+      return { ok: false, start: 0, end: 0 };
+    }
+    return { ok: true, start: keyIdx, end: keyIdx + closeIdx + closeTag.length };
+  }
+  return { ok: false, start: 0, end: 0 };
+}
+
 function trimWrappingJSONFence(prefix, suffix) {
  const rightTrimmedPrefix = (prefix || '').replace(/[ \t\r\n]+$/g, '');
  const fenceIdx = rightTrimmedPrefix.lastIndexOf('```');
@@ -168,5 +192,6 @@ module.exports = {
  parseJSONStringLiteral,
  skipSpaces,
  extractJSONObjectFrom,
+  extractToolHistoryBlock,
  trimWrappingJSONFence,
 };
--- a/internal/js/helpers/stream-tool-sieve/sieve.js
+++ b/internal/js/helpers/stream-tool-sieve/sieve.js
@@ -5,7 +5,7 @@ const {
  insideCodeFenceWithState,
 } = require('./state');
 const { parseStandaloneToolCallsDetailed } = require('./parse');
-const { extractJSONObjectFrom, trimWrappingJSONFence } = require('./jsonscan');
+const { extractJSONObjectFrom, extractToolHistoryBlock, trimWrappingJSONFence } = require('./jsonscan');
 const {
  TOOL_SEGMENT_KEYWORDS,
  XML_TOOL_SEGMENT_TAGS,
@@ -233,6 +233,17 @@ function consumeToolCapture(state, toolNames) {
  }
  const start = captured.slice(0, keyIdx).lastIndexOf('{');
  const actualStart = start >= 0 ? start : keyIdx;
+  if (start < 0) {
+    const history = extractToolHistoryBlock(captured, keyIdx);
+    if (history.ok) {
+      return {
+        ready: true,
+        prefix: captured.slice(0, history.start),
+        calls: [],
+        suffix: captured.slice(history.end),
+      };
+    }
+  }
  const obj = extractJSONObjectFrom(captured, actualStart);
  if (!obj.ok) {
    return { ready: false, prefix: '', calls: [], suffix: '' };
--- a/internal/js/helpers/stream-tool-sieve/tool-keywords.js
+++ b/internal/js/helpers/stream-tool-sieve/tool-keywords.js
@@ -4,6 +4,8 @@ const TOOL_SEGMENT_KEYWORDS = [
  'tool_calls',
  '"function"',
  'function.name:',
+  '[tool_call_history]',
+  '[tool_result_history]',
 ];

 const XML_TOOL_SEGMENT_TAGS = [
--- a/internal/prompt/tool_calls.go
+++ b/internal/prompt/tool_calls.go
@@ -1,124 +0,0 @@
-package prompt
-
-import (
-	"encoding/json"
-	"strings"
-)
-
-// FormatToolCallsForPrompt renders a tool_calls slice into the canonical
-// prompt-visible history block used across adapters.
-func FormatToolCallsForPrompt(raw any) string {
-	calls, ok := raw.([]any)
-	if !ok || len(calls) == 0 {
-		return ""
-	}
-
-	blocks := make([]string, 0, len(calls))
-	for _, item := range calls {
-		call, ok := item.(map[string]any)
-		if !ok {
-			continue
-		}
-		block := formatToolCallForPrompt(call)
-		if block != "" {
-			blocks = append(blocks, block)
-		}
-	}
-	if len(blocks) == 0 {
-		return ""
-	}
-	return "<tool_calls>\n" + strings.Join(blocks, "\n") + "\n</tool_calls>"
-}
-
-// StringifyToolCallArguments normalizes tool arguments into a compact string
-// while preserving raw concatenated payloads when they already look like model
-// output rather than a single JSON object.
-func StringifyToolCallArguments(v any) string {
-	switch x := v.(type) {
-	case nil:
-		return "{}"
-	case string:
-		s := strings.TrimSpace(x)
-		if s == "" {
-			return "{}"
-		}
-		s = normalizeToolArgumentString(s)
-		if s == "" {
-			return "{}"
-		}
-		return s
-	default:
-		b, err := json.Marshal(x)
-		if err != nil || len(b) == 0 {
-			return "{}"
-		}
-		return string(b)
-	}
-}
-
-func formatToolCallForPrompt(call map[string]any) string {
-	if call == nil {
-		return ""
-	}
-
-	name := strings.TrimSpace(asString(call["name"]))
-	fn, _ := call["function"].(map[string]any)
-	if name == "" && fn != nil {
-		name = strings.TrimSpace(asString(fn["name"]))
-	}
-	if name == "" {
-		return ""
-	}
-
-	argsRaw := call["arguments"]
-	if argsRaw == nil {
-		argsRaw = call["input"]
-	}
-	if argsRaw == nil && fn != nil {
-		argsRaw = fn["arguments"]
-		if argsRaw == nil {
-			argsRaw = fn["input"]
-		}
-	}
-
-	return "  <tool_call>\n" +
-		"    <tool_name>" + name + "</tool_name>\n" +
-		"    <parameters>" + StringifyToolCallArguments(argsRaw) + "</parameters>\n" +
-		"  </tool_call>"
-}
-
-func normalizeToolArgumentString(raw string) string {
-	trimmed := strings.TrimSpace(raw)
-	if trimmed == "" {
-		return ""
-	}
-	if looksLikeConcatenatedJSON(trimmed) {
-		// Keep the original payload to avoid silently rewriting model output.
-		return raw
-	}
-	return trimmed
-}
-
-func looksLikeConcatenatedJSON(raw string) bool {
-	trimmed := strings.TrimSpace(raw)
-	if trimmed == "" {
-		return false
-	}
-	if strings.Contains(trimmed, "}{") || strings.Contains(trimmed, "][") {
-		return true
-	}
-	dec := json.NewDecoder(strings.NewReader(trimmed))
-	var first any
-	if err := dec.Decode(&first); err != nil {
-		return false
-	}
-	var second any
-	return dec.Decode(&second) == nil
-}
-
-func asString(v any) string {
-	if s, ok := v.(string); ok {
-		return s
-	}
-	return ""
-}
--- a/internal/prompt/tool_calls_test.go
+++ b/internal/prompt/tool_calls_test.go
@@ -1,28 +0,0 @@
-package prompt
-
-import "testing"
-
-func TestStringifyToolCallArgumentsPreservesConcatenatedJSON(t *testing.T) {
-	got := StringifyToolCallArguments(`{}{"query":"测试工具调用"}`)
-	if got != `{}{"query":"测试工具调用"}` {
-		t.Fatalf("expected raw concatenated JSON to be preserved, got %q", got)
-	}
-}
-
-func TestFormatToolCallsForPromptXML(t *testing.T) {
-	got := FormatToolCallsForPrompt([]any{
-		map[string]any{
-			"id": "call_1",
-			"function": map[string]any{
-				"name":      "search_web",
-				"arguments": map[string]any{"query": "latest"},
-			},
-		},
-	})
-	if got == "" {
-		t.Fatal("expected non-empty formatted tool calls")
-	}
-	if got != "<tool_calls>\n  <tool_call>\n    <tool_name>search_web</tool_name>\n    <parameters>{\"query\":\"latest\"}</parameters>\n  </tool_call>\n</tool_calls>" {
-		t.Fatalf("unexpected formatted tool call XML: %q", got)
-	}
-}
--- a/internal/util/toolcalls_candidates.go
+++ b/internal/util/toolcalls_candidates.go
@@ -64,7 +64,7 @@ func extractToolCallObjects(text string) []string {
 	lower := strings.ToLower(text)
 	out := []string{}
 	offset := 0
-	keywords := []string{"tool_calls", "\"function\"", "function.name:"}
+	keywords := []string{"tool_calls", "\"function\"", "function.name:", "[tool_call_history]"}
 	for {
 		bestIdx := -1
 		matchedKeyword := ""
--- a/internal/util/toolcalls_textkv_test.go
+++ b/internal/util/toolcalls_textkv_test.go
@@ -6,12 +6,14 @@ import (

 func TestParseTextKVToolCalls_Basic(t *testing.T) {
 	text := `
+[TOOL_CALL_HISTORY]
 status: already_called
 origin: assistant
 not_user_input: true
 tool_call_id: call_3fcd15235eb94f7eae3a8de5a9cfa36b
 function.name: execute_command
 function.arguments: {"command":"cd scripts && python check_syntax.py example.py","cwd":null,"timeout":30}
+[/TOOL_CALL_HISTORY]

 Some other text thinking...
 `
--- a/sha3_wasm_bg.7b9ca65ddd.wasm
+++ b/sha3_wasm_bg.7b9ca65ddd.wasm
--- a/tests/node/chat-stream.test.js
+++ b/tests/node/chat-stream.test.js
@@ -34,7 +34,7 @@ test('resolveToolcallPolicy defaults to feature-match + early emit when prepare
  assert.equal(policy.emitEarlyToolDeltas, true);
 });

-test('resolveToolcallPolicy ignores prepare flags and keeps early emit enabled', () => {
+test('resolveToolcallPolicy respects prepare flags and prepared tool names', () => {
  const policy = resolveToolcallPolicy(
    {
      tool_names: [' prepped_tool ', '', null],
@@ -45,7 +45,7 @@ test('resolveToolcallPolicy ignores prepare flags and keeps early emit enabled',
  );
  assert.deepEqual(policy.toolNames, ['prepped_tool']);
  assert.equal(policy.toolSieveEnabled, true);
-  assert.equal(policy.emitEarlyToolDeltas, true);
+  assert.equal(policy.emitEarlyToolDeltas, false);
 });

 test('normalizePreparedToolNames filters empty values', () => {
--- a/tests/node/stream-tool-sieve.test.js
+++ b/tests/node/stream-tool-sieve.test.js
@@ -98,8 +98,10 @@ test('parseToolCalls ignores tool_call payloads that exist only inside fenced co

 test('parseToolCalls parses text-kv fallback payload', () => {
  const text = [
+    '[TOOL_CALL_HISTORY]',
    'function.name: execute_command',
    'function.arguments: {"command":"cd scripts && python check_syntax.py example.py","cwd":null,"timeout":30}',
+    '[/TOOL_CALL_HISTORY]',
    'Some other text thinking...',
  ].join('\n');
  const calls = parseToolCalls(text, ['execute_command']);
@@ -252,6 +254,56 @@ test('sieve keeps plain text intact in tool mode when no tool call appears', ()
  assert.equal(leakedText, '你好，这是普通文本回复。请继续。');
 });

+test('sieve swallows leaked TOOL_CALL_HISTORY marker blocks', () => {
+  const events = runSieve(
+    [
+      '前置文本。',
+      '[TOOL_CALL_HISTORY]\nstatus: already_called\nfunction.name: exec\nfunction.arguments: {}\n[/TOOL_CALL_HISTORY]',
+      '后置文本。',
+    ],
+    ['exec'],
+  );
+  const leakedText = collectText(events);
+  const hasToolCall = events.some((evt) => evt.type === 'tool_calls');
+  assert.equal(hasToolCall, false);
+  assert.equal(leakedText.includes('前置文本。'), true);
+  assert.equal(leakedText.includes('后置文本。'), true);
+  assert.equal(leakedText.includes('[TOOL_CALL_HISTORY]'), false);
+});
+
+test('sieve swallows leaked TOOL_RESULT_HISTORY marker blocks', () => {
+  const events = runSieve(
+    [
+      '前置文本。',
+      '[TOOL_RESULT_HISTORY]\nstatus: already_called\nfunction.name: exec\nfunction.arguments: {}\n[/TOOL_RESULT_HISTORY]',
+      '后置文本。',
+    ],
+    ['exec'],
+  );
+  const leakedText = collectText(events);
+  const hasToolCall = events.some((evt) => evt.type === 'tool_calls');
+  assert.equal(hasToolCall, false);
+  assert.equal(leakedText.includes('前置文本。'), true);
+  assert.equal(leakedText.includes('后置文本。'), true);
+  assert.equal(leakedText.includes('[TOOL_RESULT_HISTORY]'), false);
+});
+
+test('sieve preserves text spacing when TOOL_RESULT_HISTORY spans chunks', () => {
+  const events = runSieve(
+    [
+      'Hello ',
+      '[TOOL_RESULT_HISTORY]\nstatus: already_called\n',
+      'function.name: exec\nfunction.arguments: {}\n[/TOOL_RESULT_HISTORY]',
+      'world',
+    ],
+    ['exec'],
+  );
+  const leakedText = collectText(events);
+  const hasToolCall = events.some((evt) => evt.type === 'tool_calls' && evt.calls?.length > 0);
+  assert.equal(hasToolCall, false);
+  assert.equal(leakedText, 'Hello world');
+});
+
 test('sieve emits unknown tool payload (no args) as executable tool call', () => {
  const events = runSieve(
    ['{"tool_calls":[{"name":"not_in_schema"}]}', '后置正文G。'],
--- a/vercel.json
+++ b/vercel.json
@@ -4,7 +4,7 @@
  "outputDirectory": "static",
  "functions": {
    "api/chat-stream.js": {
-      "includeFiles": "internal/deepseek/assets/sha3_wasm_bg.7b9ca65ddd.wasm",
+      "includeFiles": "**/sha3_wasm_bg.7b9ca65ddd.wasm",
      "maxDuration": 300
    },
    "api/index.go": {
--- a/webui/src/features/settings/BehaviorSection.jsx
+++ b/webui/src/features/settings/BehaviorSection.jsx
@@ -3,6 +3,35 @@ export default function BehaviorSection({ t, form, setForm }) {
        <div className="bg-card border border-border rounded-xl p-5 space-y-4">
            <h3 className="font-semibold">{t('settings.behaviorTitle')}</h3>
            <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+                <label className="text-sm space-y-2">
+                    <span className="text-muted-foreground">{t('settings.toolcallMode')}</span>
+                    <select
+                        value={form.toolcall.mode}
+                        onChange={(e) => setForm((prev) => ({
+                            ...prev,
+                            toolcall: { ...prev.toolcall, mode: e.target.value },
+                        }))}
+                        className="w-full bg-background border border-border rounded-lg px-3 py-2"
+                    >
+                        <option value="feature_match">feature_match</option>
+                        <option value="off">off</option>
+                    </select>
+                </label>
+                <label className="text-sm space-y-2">
+                    <span className="text-muted-foreground">{t('settings.earlyEmitConfidence')}</span>
+                    <select
+                        value={form.toolcall.early_emit_confidence}
+                        onChange={(e) => setForm((prev) => ({
+                            ...prev,
+                            toolcall: { ...prev.toolcall, early_emit_confidence: e.target.value },
+                        }))}
+                        className="w-full bg-background border border-border rounded-lg px-3 py-2"
+                    >
+                        <option value="high">high</option>
+                        <option value="low">low</option>
+                        <option value="off">off</option>
+                    </select>
+                </label>
                <label className="text-sm space-y-2">
                    <span className="text-muted-foreground">{t('settings.responsesTTL')}</span>
                    <input
--- a/webui/src/features/settings/RuntimeSection.jsx
+++ b/webui/src/features/settings/RuntimeSection.jsx
@@ -2,7 +2,7 @@ export default function RuntimeSection({ t, form, setForm }) {
    return (
        <div className="bg-card border border-border rounded-xl p-5 space-y-4">
            <h3 className="font-semibold">{t('settings.runtimeTitle')}</h3>
-            <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-4">
+            <div className="grid grid-cols-1 md:grid-cols-3 gap-4">
                <label className="text-sm space-y-2">
                    <span className="text-muted-foreground">{t('settings.accountMaxInflight')}</span>
                    <input
@@ -42,21 +42,6 @@ export default function RuntimeSection({ t, form, setForm }) {
                        className="w-full bg-background border border-border rounded-lg px-3 py-2"
                    />
                </label>
-                <label className="text-sm space-y-2">
-                    <span className="text-muted-foreground">{t('settings.tokenRefreshIntervalHours')}</span>
-                    <input
-                        type="number"
-                        min={1}
-                        max={720}
-                        step={1}
-                        value={form.runtime.token_refresh_interval_hours}
-                        onChange={(e) => setForm((prev) => ({
-                            ...prev,
-                            runtime: { ...prev.runtime, token_refresh_interval_hours: Number(e.target.value || 1) },
-                        }))}
-                        className="w-full bg-background border border-border rounded-lg px-3 py-2"
-                    />
-                </label>
            </div>
        </div>
    )
--- a/webui/src/features/settings/useSettingsForm.js
+++ b/webui/src/features/settings/useSettingsForm.js
@@ -12,7 +12,8 @@ const MAX_AUTO_FETCH_FAILURES = 3

 const DEFAULT_FORM = {
    admin: { jwt_expire_hours: 24 },
-    runtime: { account_max_inflight: 2, account_max_queue: 10, global_max_inflight: 10, token_refresh_interval_hours: 6 },
+    runtime: { account_max_inflight: 2, account_max_queue: 10, global_max_inflight: 10 },
+    toolcall: { mode: 'feature_match', early_emit_confidence: 'high' },
    responses: { store_ttl_seconds: 900 },
    embeddings: { provider: '' },
    auto_delete: { sessions: false },
@@ -44,7 +45,10 @@ function fromServerForm(data) {
            account_max_inflight: Number(data.runtime?.account_max_inflight || 2),
            account_max_queue: Number(data.runtime?.account_max_queue || 10),
            global_max_inflight: Number(data.runtime?.global_max_inflight || 10),
-            token_refresh_interval_hours: Number(data.runtime?.token_refresh_interval_hours || 6),
+        },
+        toolcall: {
+            mode: data.toolcall?.mode || 'feature_match',
+            early_emit_confidence: data.toolcall?.early_emit_confidence || 'high',
        },
        responses: {
            store_ttl_seconds: Number(data.responses?.store_ttl_seconds || 900),
@@ -67,7 +71,10 @@ function toServerPayload(form) {
            account_max_inflight: Number(form.runtime.account_max_inflight),
            account_max_queue: Number(form.runtime.account_max_queue),
            global_max_inflight: Number(form.runtime.global_max_inflight),
-            token_refresh_interval_hours: Number(form.runtime.token_refresh_interval_hours),
+        },
+        toolcall: {
+            mode: String(form.toolcall.mode || '').trim(),
+            early_emit_confidence: String(form.toolcall.early_emit_confidence || '').trim(),
        },
        responses: { store_ttl_seconds: Number(form.responses.store_ttl_seconds) },
        embeddings: { provider: String(form.embeddings.provider || '').trim() },
--- a/webui/src/locales/en.json
+++ b/webui/src/locales/en.json
@@ -222,12 +222,13 @@
        "passwordTooShort": "Password must be at least 4 characters.",
        "passwordUpdated": "Password updated. Please sign in again.",
        "passwordUpdateFailed": "Failed to update password.",
-        "runtimeTitle": "Runtime",
+        "runtimeTitle": "Concurrency & Queue",
        "accountMaxInflight": "Per-account max inflight",
        "accountMaxQueue": "Account max queue size",
        "globalMaxInflight": "Global max inflight",
-        "tokenRefreshIntervalHours": "Managed token refresh interval (hours)",
        "behaviorTitle": "Behavior",
+        "toolcallMode": "Toolcall mode",
+        "earlyEmitConfidence": "Early emit confidence",
        "responsesTTL": "Responses store TTL (seconds)",
        "embeddingsProvider": "Embeddings provider",
        "modelTitle": "Model mapping",
--- a/webui/src/locales/zh.json
+++ b/webui/src/locales/zh.json
@@ -222,12 +222,13 @@
        "passwordTooShort": "新密码至少 4 位",
        "passwordUpdated": "密码已更新，需重新登录",
        "passwordUpdateFailed": "密码更新失败",
-        "runtimeTitle": "运行时设置",
+        "runtimeTitle": "并发与队列",
        "accountMaxInflight": "每账号并发上限",
        "accountMaxQueue": "账号等待队列上限",
        "globalMaxInflight": "全局并发上限",
-        "tokenRefreshIntervalHours": "托管账号 Token 刷新间隔（小时）",
        "behaviorTitle": "行为设置",
+        "toolcallMode": "Toolcall 模式",
+        "earlyEmitConfidence": "早发置信度",
        "responsesTTL": "Responses 缓存 TTL（秒）",
        "embeddingsProvider": "Embeddings Provider",
        "modelTitle": "模型映射",
Author	SHA1	Message	Date
CJACK.	034c00f10e	Merge pull request #163 from CJackHwang/dev docs: update API documentation, deployment guides, and README with new admin endpoints, compatibility notes, and build instructions	2026-03-29 19:50:40 +08:00
CJACK.	390f7580e5	Merge pull request #156 from CJackHwang/dev Merge pull request #153 from CJackHwang/codex/investigate-tool-execution-bugs-in-output-7ocr8f Relax tool-name allow-listing and improve tool-call detection/parsing across adapters and sieve	2026-03-22 21:40:03 +08:00
CJACK.	586d31e556	Merge pull request #151 from CJackHwang/dev Merge pull request #149 from CJackHwang/codex/fix-tool-miscall-during-complex-json-test Ignore tool_call payloads inside fenced code blocks and chat envelopes; stream-aware code-fence tracking	2026-03-22 16:51:17 +08:00
CJACK.	c4a73e871a	Merge pull request #148 from CJackHwang/dev Merge pull request #147 from CJackHwang/codex/fix-tool-call-history-retrieval Preserve tool call/result roundtrip and raw payloads across Claude, Gemini and OpenAI adapters	2026-03-22 13:43:26 +08:00
CJACK.	25b3292497	Merge pull request #146 from CJackHwang/dev Merge pull request #145 from CJackHwang/codex/determine-which-pr-fixes-json-leak-issue Merge pull request #144 from CJackHwang/codex/refactor-codebase-to-remove-redundancy Refactor tool-sieve and response streaming, remove unused helpers and UI wrappers	2026-03-22 11:05:54 +08:00
CJACK.	11f66db87d	Merge pull request #142 from CJackHwang/dev Merge pull request #141 from CJackHwang/codex/investigate-json-leakage-in-vercel-deployment-rh84s1 Fix raw tool-call JSON leaks when feature_match mode is off	2026-03-22 08:55:29 +08:00
CJACK.	7131b06e26	Merge pull request #138 from CJackHwang/dev Merge pull request #135 from CJackHwang/codex/add-global-token-refresh-logic Sanitize leaked tool-history markers, simplify normalization, and add managed token refresh	2026-03-22 01:27:27 +08:00
@@ -1 +1 @@
 .5.1
 .4.1