Merge pull request #218 from CJackHwang/dev

fix: reverse snapshot order to preserve capture sequence during stabl…
fix: reverse snapshot order to preserve capture sequence during stable sort
2026-05-02 15:35:27 +08:00 · 2026-04-06 02:55:59 +08:00 · 2026-04-06 02:51:06 +08:00 · 2026-04-06 02:47:44 +08:00 · 2026-04-06 02:44:20 +08:00 · 2026-04-06 02:38:15 +08:00
33 changed files with 67737 additions and 57 deletions
--- a/API.en.md
+++ b/API.en.md
@@ -138,6 +138,9 @@ Gemini-compatible clients can also send `x-goog-api-key`, `?key=`, or `?api_key=
 | POST | `/admin/accounts/sessions/delete-all` | Admin | Delete all sessions for one account |
 | POST | `/admin/import` | Admin | Batch import keys/accounts |
 | POST | `/admin/test` | Admin | Test API through service |
+| POST | `/admin/dev/raw-samples/capture` | Admin | Fire one request and persist it as a raw sample |
+| GET | `/admin/dev/raw-samples/query` | Admin | Search current in-memory capture chains by prompt keyword |
+| POST | `/admin/dev/raw-samples/save` | Admin | Persist a selected in-memory capture chain as a raw sample |
 | POST | `/admin/vercel/sync` | Admin | Sync config to Vercel |
 | GET | `/admin/vercel/status` | Admin | Vercel sync status |
 | POST | `/admin/vercel/status` | Admin | Vercel sync status / draft compare |
@@ -883,6 +886,74 @@ Test API availability through the service itself.
 }
 ```

+### `POST /admin/dev/raw-samples/capture`
+
+Internally issues one `/v1/chat/completions` request through the service, then persists the request metadata and raw upstream SSE into `tests/raw_stream_samples/<sample-id>/`.
+
+Common request fields:
+
+| Field | Required | Default | Notes |
+| --- | --- | --- | --- |
+| `message` | No | `你好` | Convenience single-turn user message |
+| `messages` | No | Auto-derived from `message` | OpenAI-style message array |
+| `model` | No | `deepseek-chat` | Target model |
+| `stream` | No | `true` | Recommended to keep streaming enabled so raw SSE is recorded |
+| `api_key` | No | First configured key | Business API key to use |
+| `sample_id` | No | Auto-generated | Sample directory name |
+
+On success, the response headers include:
+
+- `X-Ds2-Sample-Id`
+- `X-Ds2-Sample-Dir`
+- `X-Ds2-Sample-Meta`
+- `X-Ds2-Sample-Upstream`
+
+If the request itself succeeds but the process did not record a new upstream capture, the endpoint returns:
+
+```json
+{"detail":"no upstream capture was recorded"}
+```
+
+### `GET /admin/dev/raw-samples/query`
+
+Searches the current process's in-memory capture entries and groups `completion + continue` rounds by `chat_session_id`.
+
+**Query parameters**:
+
+| Param | Default | Notes |
+| --- | --- | --- |
+| `q` | empty | Fuzzy match against request/response text |
+| `limit` | `20` | Max number of chains returned |
+
+**Response fields** include:
+
+- `items[].chain_key`
+- `items[].capture_ids`
+- `items[].round_count`
+- `items[].initial_label`
+- `items[].request_preview`
+- `items[].response_preview`
+
+### `POST /admin/dev/raw-samples/save`
+
+Persists one selected in-memory capture chain into `tests/raw_stream_samples/<sample-id>/`.
+
+Any one of these selectors is accepted:
+
+```json
+{"chain_key":"session:xxxx","sample_id":"tmp-from-memory"}
+```
+
+```json
+{"capture_id":"cap_xxx","sample_id":"tmp-from-memory"}
+```
+
+```json
+{"query":"Guangzhou weather","sample_id":"tmp-from-memory"}
+```
+
+The success payload includes `sample_id`, `dir`, `meta_path`, and `upstream_path`.
+
 ### `POST /admin/vercel/sync`

 | Field | Required | Notes |
--- a/API.md
+++ b/API.md
@@ -138,6 +138,9 @@ Gemini 兼容客户端还可以使用 `x-goog-api-key`、`?key=` 或 `?api_key=`
 | POST | `/admin/accounts/sessions/delete-all` | Admin | 删除某账号的全部会话 |
 | POST | `/admin/import` | Admin | 批量导入 keys/accounts |
 | POST | `/admin/test` | Admin | 测试当前 API 可用性 |
+| POST | `/admin/dev/raw-samples/capture` | Admin | 直接发起一次请求并保存为 raw sample |
+| GET | `/admin/dev/raw-samples/query` | Admin | 按问题关键词查询当前内存抓包链 |
+| POST | `/admin/dev/raw-samples/save` | Admin | 把命中的内存抓包链保存为 raw sample |
 | POST | `/admin/vercel/sync` | Admin | 同步配置到 Vercel |
 | GET | `/admin/vercel/status` | Admin | Vercel 同步状态 |
 | POST | `/admin/vercel/status` | Admin | Vercel 同步状态 / 草稿对比 |
@@ -886,6 +889,74 @@ data: {"type":"message_stop"}
 }
 ```

+### `POST /admin/dev/raw-samples/capture`
+
+直接通过服务自身发起一次 `/v1/chat/completions` 请求，并把请求元信息和上游原始 SSE 保存到 `tests/raw_stream_samples/<sample-id>/`。
+
+常用请求字段：
+
+| 字段 | 必填 | 默认值 | 说明 |
+| --- | --- | --- | --- |
+| `message` | 否 | `你好` | 便捷单轮用户消息 |
+| `messages` | 否 | 自动由 `message` 生成 | OpenAI 风格消息数组 |
+| `model` | 否 | `deepseek-chat` | 目标模型 |
+| `stream` | 否 | `true` | 建议保留流式，以记录原始 SSE |
+| `api_key` | 否 | 配置中第一个 key | 调用业务接口使用的 key |
+| `sample_id` | 否 | 自动生成 | 样本目录名 |
+
+成功时会在响应头里附带：
+
+- `X-Ds2-Sample-Id`
+- `X-Ds2-Sample-Dir`
+- `X-Ds2-Sample-Meta`
+- `X-Ds2-Sample-Upstream`
+
+如果请求本身成功，但当前进程没有记录到新的上游抓包，会返回：
+
+```json
+{"detail":"no upstream capture was recorded"}
+```
+
+### `GET /admin/dev/raw-samples/query`
+
+按关键词查询当前进程内存里的抓包记录，并按 `chat_session_id` 归并 `completion + continue` 链。
+
+**查询参数**：
+
+| 参数 | 默认值 | 说明 |
+| --- | --- | --- |
+| `q` | 空 | 按请求体/响应体关键词模糊匹配 |
+| `limit` | `20` | 返回链条数上限 |
+
+**响应字段**包含：
+
+- `items[].chain_key`
+- `items[].capture_ids`
+- `items[].round_count`
+- `items[].initial_label`
+- `items[].request_preview`
+- `items[].response_preview`
+
+### `POST /admin/dev/raw-samples/save`
+
+把当前内存中的某条抓包链落盘为 `tests/raw_stream_samples/<sample-id>/`。
+
+支持以下任一种选中方式：
+
+```json
+{"chain_key":"session:xxxx","sample_id":"tmp-from-memory"}
+```
+
+```json
+{"capture_id":"cap_xxx","sample_id":"tmp-from-memory"}
+```
+
+```json
+{"query":"广州天气","sample_id":"tmp-from-memory"}
+```
+
+成功响应会返回 `sample_id`、`dir`、`meta_path`、`upstream_path`。
+
 ### `POST /admin/vercel/sync`

 | 字段 | 必填 | 说明 |
--- a/README.MD
+++ b/README.MD
@@ -348,8 +348,8 @@ cp opencode.json.example opencode.json
 | `DS2API_STATIC_ADMIN_DIR` | 管理台静态文件目录 | `static/admin` |
 | `DS2API_AUTO_BUILD_WEBUI` | 启动时自动构建 WebUI | 本地开启，Vercel 关闭 |
 | `DS2API_DEV_PACKET_CAPTURE` | 本地开发抓包开关（记录最近会话请求/响应体） | 本地非 Vercel 默认开启 |
-| `DS2API_DEV_PACKET_CAPTURE_LIMIT` | 本地抓包保留条数（超出自动淘汰） | `5` |
-| `DS2API_DEV_PACKET_CAPTURE_MAX_BODY_BYTES` | 单条响应体最大记录字节数 | `2097152` |
+| `DS2API_DEV_PACKET_CAPTURE_LIMIT` | 本地抓包保留条数（超出自动淘汰） | `20` |
+| `DS2API_DEV_PACKET_CAPTURE_MAX_BODY_BYTES` | 单条响应体最大记录字节数 | `5242880` |
 | `DS2API_ACCOUNT_MAX_INFLIGHT` | 每账号最大并发 in-flight 请求数 | `2` |
 | `DS2API_ACCOUNT_MAX_QUEUE` | 等待队列上限 | `recommended_concurrency` |
 | `DS2API_GLOBAL_MAX_INFLIGHT` | 全局最大 in-flight 请求数 | `recommended_concurrency` |
@@ -403,13 +403,13 @@ Gemini 路由还可以使用 `x-goog-api-key`，或在没有认证头时使用 `

 ## 本地开发抓包工具

-用于定位「responses 思考流/工具调用」等问题。开启后会自动记录最近 N 条 DeepSeek 对话上游请求体与响应体（默认 5 条，超出自动淘汰）。
+用于定位「responses 思考流/工具调用」等问题。开启后会自动记录最近 N 条 DeepSeek 对话上游请求体与响应体（默认 20 条，超出自动淘汰；单条响应体默认最多记录 5 MB）。

 启用示例：

 ```bash
 DS2API_DEV_PACKET_CAPTURE=true \
-DS2API_DEV_PACKET_CAPTURE_LIMIT=5 \
+DS2API_DEV_PACKET_CAPTURE_LIMIT=20 \
 go run ./cmd/ds2api
 ```

@@ -417,6 +417,8 @@ go run ./cmd/ds2api

 - `GET /admin/dev/captures`：查看抓包列表（最新在前）
 - `DELETE /admin/dev/captures`：清空抓包
+- `GET /admin/dev/raw-samples/query?q=关键词&limit=20`：按问题关键词查询当前内存抓包，并按 `chat_session_id` 归并 `completion + continue` 链
+- `POST /admin/dev/raw-samples/save`：把命中的某条抓包链保存为 `tests/raw_stream_samples/<sample-id>/` 回放样本

 返回字段包含：

@@ -424,6 +426,12 @@ go run ./cmd/ds2api
 - `response_body`：上游返回的原始流式内容拼接文本
 - `response_truncated`：是否触发单条大小截断

+保存接口支持用 `query`、`chain_key` 或 `capture_id` 选中目标。例如：
+
+```json
+{"query":"广州天气","sample_id":"gz-weather-from-memory"}
+```
+
 ## 项目结构

 ```text
--- a/README.en.md
+++ b/README.en.md
@@ -353,8 +353,8 @@ cp opencode.json.example opencode.json
 | `DS2API_VERCEL_INTERNAL_SECRET` | Vercel hybrid streaming internal auth | Falls back to `DS2API_ADMIN_KEY` |
 | `DS2API_VERCEL_STREAM_LEASE_TTL_SECONDS` | Stream lease TTL seconds | `900` |
 | `DS2API_DEV_PACKET_CAPTURE` | Local dev packet capture switch (record recent request/response bodies) | Enabled by default on non-Vercel local runtime |
-| `DS2API_DEV_PACKET_CAPTURE_LIMIT` | Number of captured sessions to retain (auto-evict overflow) | `5` |
-| `DS2API_DEV_PACKET_CAPTURE_MAX_BODY_BYTES` | Max recorded bytes per captured response body | `2097152` |
+| `DS2API_DEV_PACKET_CAPTURE_LIMIT` | Number of captured sessions to retain (auto-evict overflow) | `20` |
+| `DS2API_DEV_PACKET_CAPTURE_MAX_BODY_BYTES` | Max recorded bytes per captured response body | `5242880` |
 | `VERCEL_TOKEN` | Vercel sync token | — |
 | `VERCEL_PROJECT_ID` | Vercel project ID | — |
 | `VERCEL_TEAM_ID` | Vercel team ID | — |
@@ -392,21 +392,22 @@ Queue limit = DS2API_ACCOUNT_MAX_QUEUE (default = recommended concurrency)
 When `tools` is present in the request, DS2API performs anti-leak handling:

 1. Toolcall feature matching is enabled only in **non-code-block context** (fenced examples are ignored)
-   - In non-code-block context, tool JSON may still be recognized even when mixed with normal prose; surrounding prose can remain as text output.
-2. `responses` streaming strictly uses official item lifecycle events (`response.output_item.*`, `response.content_part.*`, `response.function_call_arguments.*`)
-3. Tool names not declared in the `tools` schema are strictly rejected and will not be emitted as valid tool calls
+2. The parser prioritizes XML/Markup, while also accepting JSON / ANTML / invoke / text-kv, and normalizes everything into the internal tool-call structure
+3. `responses` streaming strictly uses official item lifecycle events (`response.output_item.*`, `response.content_part.*`, `response.function_call_arguments.*`)
 4. `responses` supports and enforces `tool_choice` (`auto`/`none`/`required`/forced function); `required` violations return `422` for non-stream and `response.failed` for stream
-5. Valid tool call events are only emitted after passing policy validation, preventing invalid tool names from entering the client execution chain
+5. The output protocol follows the client request (OpenAI / Claude / Gemini native shapes); model-side prompting can prefer XML, and the compatibility layer handles the protocol-specific translation
+
+> Note: the current parser still prioritizes “parse successfully whenever possible”; hard allow-list rejection for undeclared tool names is not enabled yet.

 ## Local Dev Packet Capture

-This is for debugging issues such as Responses reasoning streaming and tool-call handoff. When enabled, DS2API stores the latest N DeepSeek conversation payload pairs (request body + upstream response body), defaulting to 5 entries with auto-eviction.
+This is for debugging issues such as Responses reasoning streaming and tool-call handoff. When enabled, DS2API stores the latest N DeepSeek conversation payload pairs (request body + upstream response body), defaulting to 20 entries with auto-eviction; each response body is capped at 5 MB by default.

 Enable example:

 ```bash
 DS2API_DEV_PACKET_CAPTURE=true \
-DS2API_DEV_PACKET_CAPTURE_LIMIT=5 \
+DS2API_DEV_PACKET_CAPTURE_LIMIT=20 \
 go run ./cmd/ds2api
 ```

@@ -414,6 +415,8 @@ Inspect/clear (Admin JWT required):

 - `GET /admin/dev/captures`: list captured items (newest first)
 - `DELETE /admin/dev/captures`: clear captured items
+- `GET /admin/dev/raw-samples/query?q=keyword&limit=20`: search current in-memory captures by prompt keyword and group `completion + continue` by `chat_session_id`
+- `POST /admin/dev/raw-samples/save`: persist a selected capture chain as `tests/raw_stream_samples/<sample-id>/`

 Response fields include:

@@ -421,6 +424,12 @@ Response fields include:
 - `response_body`: concatenated raw upstream stream body text
 - `response_truncated`: whether body-size truncation happened

+The save endpoint can target a chain by `query`, `chain_key`, or `capture_id`. Example:
+
+```json
+{"query":"Guangzhou weather","sample_id":"gz-weather-from-memory"}
+```
+
 ## Project Structure

 ```text
--- a/2
+++ b/2
@@ -1 +1 @@
-3.1.0
+3.1.1
--- a/docs/CONTRIBUTING.en.md
+++ b/docs/CONTRIBUTING.en.md
@@ -41,6 +41,7 @@ npm install
 # 3. Start dev server (hot reload)
 npm run dev
 # Default: http://localhost:5173, auto-proxies API to backend
+# host: 0.0.0.0 is not configured, so LAN access is not enabled by default
 ```

 WebUI tech stack:
--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -41,6 +41,7 @@ npm install
 # 3. 启动开发服务器（热更新）
 npm run dev
 # 默认监听 http://localhost:5173，自动代理 API 到后端
+# 当前未配置 host: 0.0.0.0，因此默认不对局域网开放
 ```

 WebUI 技术栈：
--- a/docs/DEPLOY.en.md
+++ b/docs/DEPLOY.en.md
@@ -65,7 +65,7 @@ cp config.example.json config.json
 go run ./cmd/ds2api
 ```

-Default address: `http://0.0.0.0:5001` (override with `PORT`).
+Default local access URL: `http://127.0.0.1:5001`; the server actually binds to `0.0.0.0:5001` (override with `PORT`).

 ### 1.2 WebUI Build

--- a/docs/DEPLOY.md
+++ b/docs/DEPLOY.md
@@ -65,7 +65,7 @@ cp config.example.json config.json
 go run ./cmd/ds2api
 ```

-默认监听 `http://0.0.0.0:5001`，可通过 `PORT` 环境变量覆盖。
+默认本地访问地址是 `http://127.0.0.1:5001`；服务实际绑定 `0.0.0.0:5001`，可通过 `PORT` 环境变量覆盖。

 ### 1.2 WebUI 构建

--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -260,6 +260,21 @@ POST /admin/dev/raw-samples/capture

 这个接口会把请求元信息和上游原始流写入 `tests/raw_stream_samples/<sample-id>/`，以后可以直接拿来做回放和字段分析。派生输出会在本地回放时再生成，不再落在样本目录里。

+### 从内存抓包查询并保存样本
+
+如果问题刚刚在本地复现过，也可以先查当前进程内存里的抓包，再选择性落盘：
+
+```bash
+GET /admin/dev/raw-samples/query?q=广州&limit=10
+POST /admin/dev/raw-samples/save
+{"chain_key":"session:xxxx","sample_id":"tmp-from-memory"}
+```
+
+说明：
+- `query` 会按 `chat_session_id` 把 `completion + continue` 归并成一条链，适合定位接续思考问题。
+- `save` 支持用 `query`、`chain_key` 或 `capture_id` 选中目标。
+- 生成的样本目录仍然是 `tests/raw_stream_samples/<sample-id>/`，可以直接喂给回放脚本。
+
 ### 指定输出目录和超时

 ```bash
--- a/internal/adapter/claude/stream_runtime_core.go
+++ b/internal/adapter/claude/stream_runtime_core.go
@@ -96,7 +96,11 @@ func (s *claudeStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 			if !s.thinkingEnabled {
 				continue
 			}
-			s.thinking.WriteString(cleanedText)
+			trimmed := sse.TrimContinuationOverlap(s.thinking.String(), cleanedText)
+			if trimmed == "" {
+				continue
+			}
+			s.thinking.WriteString(trimmed)
 			s.closeTextBlock()
 			if !s.thinkingBlockOpen {
 				s.thinkingBlockIndex = s.nextBlockIndex
@@ -116,13 +120,17 @@ func (s *claudeStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 				"index": s.thinkingBlockIndex,
 				"delta": map[string]any{
 					"type":     "thinking_delta",
-					"thinking": cleanedText,
+					"thinking": trimmed,
 				},
 			})
 			continue
 		}

-		s.text.WriteString(cleanedText)
+		trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
+		if trimmed == "" {
+			continue
+		}
+		s.text.WriteString(trimmed)
 		if s.bufferToolContent {
 			if hasUnclosedCodeFence(s.text.String()) {
 				continue
@@ -148,7 +156,7 @@ func (s *claudeStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 			"index": s.textBlockIndex,
 			"delta": map[string]any{
 				"type": "text_delta",
-				"text": cleanedText,
+				"text": trimmed,
 			},
 		})
 	}
--- a/internal/adapter/gemini/handler_stream_runtime.go
+++ b/internal/adapter/gemini/handler_stream_runtime.go
@@ -126,11 +126,19 @@ func (s *geminiStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 		contentSeen = true
 		if p.Type == "thinking" {
 			if s.thinkingEnabled {
-				s.thinking.WriteString(cleanedText)
+				trimmed := sse.TrimContinuationOverlap(s.thinking.String(), cleanedText)
+				if trimmed == "" {
+					continue
+				}
+				s.thinking.WriteString(trimmed)
 			}
 			continue
 		}
-		s.text.WriteString(cleanedText)
+		trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
+		if trimmed == "" {
+			continue
+		}
+		s.text.WriteString(trimmed)
 		if s.bufferContent {
 			continue
 		}
@@ -140,7 +148,7 @@ func (s *geminiStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 					"index": 0,
 					"content": map[string]any{
 						"role":  "model",
-						"parts": []map[string]any{{"text": cleanedText}},
+						"parts": []map[string]any{{"text": trimmed}},
 					},
 				},
 			},
--- a/internal/adapter/openai/chat_stream_runtime.go
+++ b/internal/adapter/openai/chat_stream_runtime.go
@@ -221,15 +221,23 @@ func (s *chatStreamRuntime) onParsed(parsed sse.LineResult) streamengine.ParsedD
 		}
 		if p.Type == "thinking" {
 			if s.thinkingEnabled {
-				s.thinking.WriteString(cleanedText)
-				delta["reasoning_content"] = cleanedText
+				trimmed := sse.TrimContinuationOverlap(s.thinking.String(), cleanedText)
+				if trimmed == "" {
+					continue
+				}
+				s.thinking.WriteString(trimmed)
+				delta["reasoning_content"] = trimmed
 			}
 		} else {
-			s.text.WriteString(cleanedText)
+			trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
+			if trimmed == "" {
+				continue
+			}
+			s.text.WriteString(trimmed)
 			if !s.bufferToolContent {
-				delta["content"] = cleanedText
+				delta["content"] = trimmed
 			} else {
-				events := processToolSieveChunk(&s.toolSieve, cleanedText, s.toolNames)
+				events := processToolSieveChunk(&s.toolSieve, trimmed, s.toolNames)
 				for _, evt := range events {
 					if len(evt.ToolCallDeltas) > 0 {
 						if !s.emitEarlyToolDeltas {
--- a/internal/adapter/openai/responses_stream_runtime_core.go
+++ b/internal/adapter/openai/responses_stream_runtime_core.go
@@ -205,17 +205,25 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 			if !s.thinkingEnabled {
 				continue
 			}
-			s.thinking.WriteString(cleanedText)
-			s.sendEvent("response.reasoning.delta", openaifmt.BuildResponsesReasoningDeltaPayload(s.responseID, cleanedText))
+			trimmed := sse.TrimContinuationOverlap(s.thinking.String(), cleanedText)
+			if trimmed == "" {
+				continue
+			}
+			s.thinking.WriteString(trimmed)
+			s.sendEvent("response.reasoning.delta", openaifmt.BuildResponsesReasoningDeltaPayload(s.responseID, trimmed))
 			continue
 		}

-		s.text.WriteString(cleanedText)
-		if !s.bufferToolContent {
-			s.emitTextDelta(cleanedText)
+		trimmed := sse.TrimContinuationOverlap(s.text.String(), cleanedText)
+		if trimmed == "" {
 			continue
 		}
-		s.processToolStreamEvents(processToolSieveChunk(&s.sieve, cleanedText, s.toolNames), true)
+		s.text.WriteString(trimmed)
+		if !s.bufferToolContent {
+			s.emitTextDelta(trimmed)
+			continue
+		}
+		s.processToolStreamEvents(processToolSieveChunk(&s.sieve, trimmed, s.toolNames), true)
 	}

 	return streamengine.ParsedDecision{ContentSeen: contentSeen}
--- a/internal/admin/handler.go
+++ b/internal/admin/handler.go
@@ -36,6 +36,8 @@ func RegisterRoutes(r chi.Router, h *Handler) {
 		pr.Post("/import", h.batchImport)
 		pr.Post("/test", h.testAPI)
 		pr.Post("/dev/raw-samples/capture", h.captureRawSample)
+		pr.Get("/dev/raw-samples/query", h.queryRawSampleCaptures)
+		pr.Post("/dev/raw-samples/save", h.saveRawSampleFromCaptures)
 		pr.Post("/vercel/sync", h.syncVercel)
 		pr.Get("/vercel/status", h.vercelStatus)
 		pr.Post("/vercel/status", h.vercelStatus)
--- a/internal/admin/handler_raw_samples.go
+++ b/internal/admin/handler_raw_samples.go
@@ -8,6 +8,7 @@ import (
 	"net/http"
 	"net/http/httptest"
 	"net/url"
+	"sort"
 	"strings"

 	"ds2api/internal/config"
@@ -15,6 +16,11 @@ import (
 	"ds2api/internal/rawsample"
 )

+type captureChain struct {
+	Key     string
+	Entries []devcapture.Entry
+}
+
 func (h *Handler) captureRawSample(w http.ResponseWriter, r *http.Request) {
 	if h.OpenAI == nil {
 		writeJSON(w, http.StatusInternalServerError, map[string]any{"detail": "OpenAI handler is not configured"})
@@ -231,3 +237,312 @@ func cloneMap(in map[string]any) map[string]any {
 	}
 	return out
 }
+
+func (h *Handler) queryRawSampleCaptures(w http.ResponseWriter, r *http.Request) {
+	query := strings.TrimSpace(r.URL.Query().Get("q"))
+	limit := intFromQuery(r, "limit", 20)
+	if limit <= 0 {
+		limit = 20
+	}
+	if limit > 50 {
+		limit = 50
+	}
+
+	chains := buildCaptureChains(devcapture.Global().Snapshot())
+	items := make([]map[string]any, 0, len(chains))
+	for _, chain := range chains {
+		if query != "" && !captureChainMatchesQuery(chain, query) {
+			continue
+		}
+		items = append(items, buildCaptureChainQueryItem(chain, query))
+		if len(items) >= limit {
+			break
+		}
+	}
+
+	writeJSON(w, http.StatusOK, map[string]any{
+		"query": query,
+		"limit": limit,
+		"count": len(items),
+		"items": items,
+	})
+}
+
+func (h *Handler) saveRawSampleFromCaptures(w http.ResponseWriter, r *http.Request) {
+	var req map[string]any
+	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+		writeJSON(w, http.StatusBadRequest, map[string]any{"detail": "invalid json"})
+		return
+	}
+
+	snapshot := devcapture.Global().Snapshot()
+	if len(snapshot) == 0 {
+		writeJSON(w, http.StatusBadRequest, map[string]any{"detail": "no capture logs available"})
+		return
+	}
+
+	chain, err := resolveCaptureChainSelection(snapshot, req)
+	if err != nil {
+		writeJSON(w, http.StatusBadRequest, map[string]any{"detail": err.Error()})
+		return
+	}
+
+	sampleID := strings.TrimSpace(fieldString(req, "sample_id"))
+	source := strings.TrimSpace(fieldString(req, "source"))
+	if source == "" {
+		source = "admin/dev/raw-samples/save"
+	}
+	requestPayload := captureChainRequestPayload(chain)
+
+	saved, err := rawsample.Persist(rawsample.PersistOptions{
+		RootDir:      config.RawStreamSampleRoot(),
+		SampleID:     sampleID,
+		Source:       source,
+		Request:      requestPayload,
+		Capture:      captureSummaryFromEntries(chain.Entries),
+		UpstreamBody: combineCaptureBodies(chain.Entries),
+	})
+	if err != nil {
+		writeJSON(w, http.StatusInternalServerError, map[string]any{"detail": err.Error()})
+		return
+	}
+
+	writeJSON(w, http.StatusOK, map[string]any{
+		"success":       true,
+		"sample_id":     saved.SampleID,
+		"sample_dir":    saved.Dir,
+		"meta_path":     saved.MetaPath,
+		"upstream_path": saved.UpstreamPath,
+		"chain_key":     chain.Key,
+		"capture_ids":   captureChainIDs(chain),
+		"round_count":   len(chain.Entries),
+	})
+}
+
+func buildCaptureChains(snapshot []devcapture.Entry) []captureChain {
+	if len(snapshot) == 0 {
+		return nil
+	}
+	ordered := make([]devcapture.Entry, len(snapshot))
+	// devcapture snapshots are newest-first because the store prepends entries.
+	// Reverse once so equal-second timestamps can preserve the actual capture
+	// order (completion before continue) under the stable CreatedAt sort below.
+	for i := range snapshot {
+		ordered[len(snapshot)-1-i] = snapshot[i]
+	}
+	sort.SliceStable(ordered, func(i, j int) bool {
+		return ordered[i].CreatedAt < ordered[j].CreatedAt
+	})
+
+	byKey := make(map[string]*captureChain, len(ordered))
+	keys := make([]string, 0, len(ordered))
+	for _, entry := range ordered {
+		key := captureChainKey(entry)
+		if key == "" {
+			key = "capture:" + entry.ID
+		}
+		if _, ok := byKey[key]; !ok {
+			byKey[key] = &captureChain{Key: key}
+			keys = append(keys, key)
+		}
+		byKey[key].Entries = append(byKey[key].Entries, entry)
+	}
+
+	chains := make([]captureChain, 0, len(keys))
+	for _, key := range keys {
+		chains = append(chains, *byKey[key])
+	}
+	sort.SliceStable(chains, func(i, j int) bool {
+		return latestCreatedAt(chains[i]) > latestCreatedAt(chains[j])
+	})
+	return chains
+}
+
+func captureChainKey(entry devcapture.Entry) string {
+	req := parseCaptureRequestBody(entry.RequestBody)
+	if sessionID := strings.TrimSpace(fieldString(req, "chat_session_id")); sessionID != "" {
+		return "session:" + sessionID
+	}
+	return "capture:" + entry.ID
+}
+
+func parseCaptureRequestBody(raw string) map[string]any {
+	raw = strings.TrimSpace(raw)
+	if raw == "" {
+		return nil
+	}
+	var out map[string]any
+	if err := json.Unmarshal([]byte(raw), &out); err != nil {
+		return nil
+	}
+	return out
+}
+
+func latestCreatedAt(chain captureChain) int64 {
+	var latest int64
+	for _, entry := range chain.Entries {
+		if entry.CreatedAt > latest {
+			latest = entry.CreatedAt
+		}
+	}
+	return latest
+}
+
+func captureChainMatchesQuery(chain captureChain, query string) bool {
+	query = strings.ToLower(strings.TrimSpace(query))
+	if query == "" {
+		return true
+	}
+	for _, entry := range chain.Entries {
+		hay := strings.ToLower(strings.Join([]string{
+			entry.Label,
+			entry.URL,
+			entry.AccountID,
+			entry.RequestBody,
+			entry.ResponseBody,
+		}, "\n"))
+		if strings.Contains(hay, query) {
+			return true
+		}
+	}
+	return false
+}
+
+func buildCaptureChainQueryItem(chain captureChain, query string) map[string]any {
+	first := chain.Entries[0]
+	last := chain.Entries[len(chain.Entries)-1]
+	requestPreview := previewCaptureChainRequest(chain)
+	responsePreview := previewCaptureChainResponse(chain)
+
+	return map[string]any{
+		"chain_key":          chain.Key,
+		"capture_ids":        captureChainIDs(chain),
+		"created_at":         latestCreatedAt(chain),
+		"round_count":        len(chain.Entries),
+		"account_id":         nilIfEmpty(strings.TrimSpace(first.AccountID)),
+		"initial_label":      first.Label,
+		"initial_url":        first.URL,
+		"latest_label":       last.Label,
+		"latest_url":         last.URL,
+		"request_preview":    requestPreview,
+		"response_preview":   responsePreview,
+		"query":              query,
+		"response_truncated": captureChainHasTruncatedResponse(chain),
+	}
+}
+
+func captureChainIDs(chain captureChain) []string {
+	out := make([]string, 0, len(chain.Entries))
+	for _, entry := range chain.Entries {
+		out = append(out, entry.ID)
+	}
+	return out
+}
+
+func previewCaptureChainRequest(chain captureChain) string {
+	for _, entry := range chain.Entries {
+		req := parseCaptureRequestBody(entry.RequestBody)
+		if prompt := strings.TrimSpace(fieldString(req, "prompt")); prompt != "" {
+			return previewText(prompt, 280)
+		}
+		if messages, ok := req["messages"].([]any); ok {
+			var parts []string
+			for _, item := range messages {
+				m, _ := item.(map[string]any)
+				content := strings.TrimSpace(fieldString(m, "content"))
+				if content != "" {
+					parts = append(parts, content)
+				}
+			}
+			if len(parts) > 0 {
+				return previewText(strings.Join(parts, "\n"), 280)
+			}
+		}
+	}
+	return previewText(strings.TrimSpace(chain.Entries[0].RequestBody), 280)
+}
+
+func previewCaptureChainResponse(chain captureChain) string {
+	var b strings.Builder
+	for _, entry := range chain.Entries {
+		if b.Len() > 0 {
+			b.WriteByte('\n')
+		}
+		b.WriteString(strings.TrimSpace(entry.ResponseBody))
+		if b.Len() >= 280 {
+			break
+		}
+	}
+	return previewText(b.String(), 280)
+}
+
+func previewText(text string, limit int) string {
+	text = strings.TrimSpace(text)
+	if limit <= 0 || len(text) <= limit {
+		return text
+	}
+	return text[:limit] + "..."
+}
+
+func captureChainHasTruncatedResponse(chain captureChain) bool {
+	for _, entry := range chain.Entries {
+		if entry.ResponseTruncated {
+			return true
+		}
+	}
+	return false
+}
+
+func resolveCaptureChainSelection(snapshot []devcapture.Entry, req map[string]any) (captureChain, error) {
+	chains := buildCaptureChains(snapshot)
+	if len(chains) == 0 {
+		return captureChain{}, fmt.Errorf("no capture logs available")
+	}
+
+	if chainKey := strings.TrimSpace(fieldString(req, "chain_key")); chainKey != "" {
+		for _, chain := range chains {
+			if chain.Key == chainKey {
+				return chain, nil
+			}
+		}
+		return captureChain{}, fmt.Errorf("capture chain not found")
+	}
+
+	captureID := strings.TrimSpace(fieldString(req, "capture_id"))
+	if captureID == "" {
+		if ids, ok := toStringSlice(req["capture_ids"]); ok && len(ids) > 0 {
+			captureID = strings.TrimSpace(ids[0])
+		}
+	}
+	if captureID != "" {
+		for _, chain := range chains {
+			for _, entry := range chain.Entries {
+				if entry.ID == captureID {
+					return chain, nil
+				}
+			}
+		}
+		return captureChain{}, fmt.Errorf("capture id not found")
+	}
+
+	query := strings.TrimSpace(fieldString(req, "query"))
+	if query != "" {
+		for _, chain := range chains {
+			if captureChainMatchesQuery(chain, query) {
+				return chain, nil
+			}
+		}
+		return captureChain{}, fmt.Errorf("no capture chain matched query")
+	}
+
+	return captureChain{}, fmt.Errorf("capture_id, chain_key, or query is required")
+}
+
+func captureChainRequestPayload(chain captureChain) any {
+	for _, entry := range chain.Entries {
+		if req := parseCaptureRequestBody(entry.RequestBody); req != nil {
+			return req
+		}
+	}
+	return strings.TrimSpace(chain.Entries[0].RequestBody)
+}
--- a/internal/admin/handler_raw_samples_test.go
+++ b/internal/admin/handler_raw_samples_test.go
@@ -230,3 +230,160 @@ func TestCombineCaptureBodiesPreservesOrderAndSeparators(t *testing.T) {
 		t.Fatalf("unexpected combined body: %q", string(got))
 	}
 }
+
+func TestQueryRawSampleCapturesGroupsBySessionAndMatchesQuestion(t *testing.T) {
+	devcapture.Global().Clear()
+	defer devcapture.Global().Clear()
+
+	recordCapturedResponse(
+		"deepseek_completion",
+		"https://chat.deepseek.com/api/v0/chat/completion",
+		http.StatusOK,
+		map[string]any{
+			"chat_session_id": "session-query-1",
+			"prompt":          "用户问题：广州天气怎么样？",
+		},
+		"data: {\"v\":\"先看天气\"}\n\n",
+	)
+	recordCapturedResponse(
+		"deepseek_continue",
+		"https://chat.deepseek.com/api/v0/chat/continue",
+		http.StatusOK,
+		map[string]any{
+			"chat_session_id": "session-query-1",
+			"message_id":      2,
+		},
+		"data: {\"v\":\"再补充一点\"}\n\n",
+	)
+
+	h := &Handler{}
+	rec := httptest.NewRecorder()
+	req := httptest.NewRequest(http.MethodGet, "/admin/dev/raw-samples/query?q=广州天气", nil)
+	h.queryRawSampleCaptures(rec, req)
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+
+	var out map[string]any
+	if err := json.Unmarshal(rec.Body.Bytes(), &out); err != nil {
+		t.Fatalf("decode failed: %v", err)
+	}
+	items, _ := out["items"].([]any)
+	if len(items) != 1 {
+		t.Fatalf("expected 1 item, got %d body=%s", len(items), rec.Body.String())
+	}
+	item, _ := items[0].(map[string]any)
+	if item["chain_key"] != "session:session-query-1" {
+		t.Fatalf("unexpected chain key: %#v", item["chain_key"])
+	}
+	if int(item["round_count"].(float64)) != 2 {
+		t.Fatalf("expected 2 rounds, got %#v", item["round_count"])
+	}
+	reqPreview, _ := item["request_preview"].(string)
+	if !strings.Contains(reqPreview, "广州天气") {
+		t.Fatalf("expected request preview to contain query, got %q", reqPreview)
+	}
+}
+
+func TestBuildCaptureChainsPreservesCaptureOrderWhenTimestampsCollide(t *testing.T) {
+	snapshot := []devcapture.Entry{
+		{
+			ID:          "cap_continue",
+			CreatedAt:   1712365200,
+			Label:       "deepseek_continue",
+			RequestBody: `{"chat_session_id":"session-collision","message_id":2}`,
+			ResponseBody: "data: {\"v\":\"第二段\"}\n\n",
+		},
+		{
+			ID:          "cap_completion",
+			CreatedAt:   1712365200,
+			Label:       "deepseek_completion",
+			RequestBody: `{"chat_session_id":"session-collision","prompt":"题目"}`,
+			ResponseBody: "data: {\"v\":\"第一段\"}\n\n",
+		},
+	}
+
+	chains := buildCaptureChains(snapshot)
+	if len(chains) != 1 {
+		t.Fatalf("expected 1 chain, got %d", len(chains))
+	}
+	if len(chains[0].Entries) != 2 {
+		t.Fatalf("expected 2 entries, got %d", len(chains[0].Entries))
+	}
+	if chains[0].Entries[0].Label != "deepseek_completion" {
+		t.Fatalf("expected completion first, got %#v", chains[0].Entries)
+	}
+	if chains[0].Entries[1].Label != "deepseek_continue" {
+		t.Fatalf("expected continue second, got %#v", chains[0].Entries)
+	}
+}
+
+func TestSaveRawSampleFromCapturesPersistsSelectedChain(t *testing.T) {
+	root := t.TempDir()
+	t.Setenv("DS2API_RAW_STREAM_SAMPLE_ROOT", root)
+	devcapture.Global().Clear()
+	defer devcapture.Global().Clear()
+
+	recordCapturedResponse(
+		"deepseek_completion",
+		"https://chat.deepseek.com/api/v0/chat/completion",
+		http.StatusOK,
+		map[string]any{
+			"chat_session_id": "session-save-1",
+			"prompt":          "请回答深圳天气",
+		},
+		"data: {\"v\":\"第一段\"}\n\n",
+	)
+	recordCapturedResponse(
+		"deepseek_continue",
+		"https://chat.deepseek.com/api/v0/chat/continue",
+		http.StatusOK,
+		map[string]any{
+			"chat_session_id": "session-save-1",
+			"message_id":      2,
+		},
+		"data: {\"v\":\"第二段\"}\n\n",
+	)
+
+	h := &Handler{}
+	rec := httptest.NewRecorder()
+	reqBody := `{"query":"深圳天气","sample_id":"saved-from-memory"}`
+	req := httptest.NewRequest(http.MethodPost, "/admin/dev/raw-samples/save", strings.NewReader(reqBody))
+	h.saveRawSampleFromCaptures(rec, req)
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+
+	var out map[string]any
+	if err := json.Unmarshal(rec.Body.Bytes(), &out); err != nil {
+		t.Fatalf("decode failed: %v", err)
+	}
+	if out["sample_id"] != "saved-from-memory" {
+		t.Fatalf("unexpected sample id: %#v", out["sample_id"])
+	}
+	if int(out["round_count"].(float64)) != 2 {
+		t.Fatalf("expected round_count=2, got %#v", out["round_count"])
+	}
+
+	sampleDir := filepath.Join(root, "saved-from-memory")
+	upstreamBytes, err := os.ReadFile(filepath.Join(sampleDir, "upstream.stream.sse"))
+	if err != nil {
+		t.Fatalf("read upstream: %v", err)
+	}
+	upstream := string(upstreamBytes)
+	if !strings.Contains(upstream, "第一段") || !strings.Contains(upstream, "第二段") {
+		t.Fatalf("expected combined upstream, got %q", upstream)
+	}
+	metaBytes, err := os.ReadFile(filepath.Join(sampleDir, "meta.json"))
+	if err != nil {
+		t.Fatalf("read meta: %v", err)
+	}
+	var meta map[string]any
+	if err := json.Unmarshal(metaBytes, &meta); err != nil {
+		t.Fatalf("decode meta: %v", err)
+	}
+	reqMeta, _ := meta["request"].(map[string]any)
+	if fieldString(reqMeta, "chat_session_id") != "session-save-1" {
+		t.Fatalf("expected request to come from selected chain, got %#v", meta["request"])
+	}
+}
--- a/internal/devcapture/store.go
+++ b/internal/devcapture/store.go
@@ -14,8 +14,8 @@ import (
 )

 const (
-	defaultLimit        = 5
-	defaultMaxBodyBytes = 2 * 1024 * 1024
+	defaultLimit        = 20
+	defaultMaxBodyBytes = 5 * 1024 * 1024
 	maxLimit            = 50
 )

--- a/internal/devcapture/store_test.go
+++ b/internal/devcapture/store_test.go
@@ -6,6 +6,35 @@ import (
 	"testing"
 )

+func TestNewFromEnvDefaults(t *testing.T) {
+	t.Setenv("DS2API_DEV_PACKET_CAPTURE_LIMIT", "")
+	t.Setenv("DS2API_DEV_PACKET_CAPTURE_MAX_BODY_BYTES", "")
+	t.Setenv("VERCEL", "")
+	t.Setenv("NOW_REGION", "")
+
+	s := NewFromEnv()
+	if s.Limit() != 20 {
+		t.Fatalf("expected default limit 20, got %d", s.Limit())
+	}
+	if s.MaxBodyBytes() != 5*1024*1024 {
+		t.Fatalf("expected default max body bytes 5MB, got %d", s.MaxBodyBytes())
+	}
+}
+
+func TestNewFromEnvHonorsOverrides(t *testing.T) {
+	t.Setenv("DS2API_DEV_PACKET_CAPTURE_LIMIT", "7")
+	t.Setenv("DS2API_DEV_PACKET_CAPTURE_MAX_BODY_BYTES", "8192")
+	t.Setenv("VERCEL", "")
+	t.Setenv("NOW_REGION", "")
+	s := NewFromEnv()
+	if s.Limit() != 7 {
+		t.Fatalf("expected override limit 7, got %d", s.Limit())
+	}
+	if s.MaxBodyBytes() != 8192 {
+		t.Fatalf("expected override max body bytes 8192, got %d", s.MaxBodyBytes())
+	}
+}
+
 func TestStorePushKeepsNewestWithinLimit(t *testing.T) {
 	s := &Store{enabled: true, limit: 2, maxBodyBytes: 1024}
 	for i := 0; i < 3; i++ {
--- a/internal/js/chat-stream/dedupe.js
+++ b/internal/js/chat-stream/dedupe.js
@@ -0,0 +1,23 @@
+'use strict';
+
+const MIN_CONTINUATION_SNAPSHOT_LEN = 32;
+
+function trimContinuationOverlap(existing, incoming) {
+  if (!incoming) {
+    return '';
+  }
+  if (!existing) {
+    return incoming;
+  }
+  if (incoming.length >= MIN_CONTINUATION_SNAPSHOT_LEN && incoming.startsWith(existing)) {
+    return incoming.slice(existing.length);
+  }
+  if (incoming.length >= MIN_CONTINUATION_SNAPSHOT_LEN && existing.startsWith(incoming)) {
+    return '';
+  }
+  return incoming;
+}
+
+module.exports = {
+  trimContinuationOverlap,
+};
--- a/internal/js/chat-stream/index.js
+++ b/internal/js/chat-stream/index.js
@@ -34,6 +34,9 @@ const {
 const {
  handleVercelStream,
 } = require('./vercel_stream');
+const {
+  trimContinuationOverlap,
+} = require('./dedupe');

 async function handler(req, res) {
  setCorsHeaders(res);
@@ -119,4 +122,5 @@ module.exports.__test = {
  extractAccumulatedTokenUsage,
  isNodeStreamSupportedPath,
  extractPathname,
+  trimContinuationOverlap,
 };
--- a/internal/js/chat-stream/vercel_stream_impl.js
+++ b/internal/js/chat-stream/vercel_stream_impl.js
@@ -27,6 +27,9 @@ const {
  relayPreparedFailure,
  createLeaseReleaser,
 } = require('./http_internal');
+const {
+  trimContinuationOverlap,
+} = require('./dedupe');

 const DEEPSEEK_COMPLETION_URL = 'https://chat.deepseek.com/api/v0/chat/completion';

@@ -245,21 +248,29 @@ async function handleVercelStream(req, res, rawBody, payload) {
            if (!p.text) {
              continue;
            }
-            if (searchEnabled && isCitation(p.text)) {
-              continue;
-            }
            if (p.type === 'thinking') {
              if (thinkingEnabled) {
-                thinkingText += p.text;
-                sendDeltaFrame({ reasoning_content: p.text });
+                const trimmed = trimContinuationOverlap(thinkingText, p.text);
+                if (!trimmed) {
+                  continue;
+                }
+                thinkingText += trimmed;
+                sendDeltaFrame({ reasoning_content: trimmed });
              }
            } else {
-              outputText += p.text;
-              if (!toolSieveEnabled) {
-                sendDeltaFrame({ content: p.text });
+              const trimmed = trimContinuationOverlap(outputText, p.text);
+              if (!trimmed) {
                continue;
              }
-              const events = processToolSieveChunk(toolSieveState, p.text, toolNames);
+              if (searchEnabled && isCitation(trimmed)) {
+                continue;
+              }
+              outputText += trimmed;
+              if (!toolSieveEnabled) {
+                sendDeltaFrame({ content: trimmed });
+                continue;
+              }
+              const events = processToolSieveChunk(toolSieveState, trimmed, toolNames);
              for (const evt of events) {
                if (evt.type === 'tool_call_deltas') {
                  if (!emitEarlyToolDeltas) {
--- a/internal/sse/consumer.go
+++ b/internal/sse/consumer.go
@@ -54,9 +54,11 @@ func CollectStream(resp *http.Response, thinkingEnabled bool, closeBody bool) Co
 		}
 		for _, p := range result.Parts {
 			if p.Type == "thinking" {
-				thinking.WriteString(p.Text)
+				trimmed := TrimContinuationOverlap(thinking.String(), p.Text)
+				thinking.WriteString(trimmed)
 			} else {
-				text.WriteString(p.Text)
+				trimmed := TrimContinuationOverlap(text.String(), p.Text)
+				text.WriteString(trimmed)
 			}
 		}
 		return true
--- a/internal/sse/consumer_test.go
+++ b/internal/sse/consumer_test.go
@@ -0,0 +1,30 @@
+package sse
+
+import (
+	"io"
+	"net/http"
+	"strings"
+	"testing"
+)
+
+func TestCollectStreamDedupesContinueSnapshotReplay(t *testing.T) {
+	prefix := "我们被问到：这是一个很长的续答快照前缀，用来验证去重逻辑不会误伤正常 token。"
+	body := strings.Join([]string{
+		`data: {"v":{"response":{"fragments":[{"id":2,"type":"THINK","content":"` + prefix + `","references":[],"stage_id":1}]}}}`,
+		``,
+		`data: {"p":"response/status","v":"INCOMPLETE"}`,
+		``,
+		`data: {"v":{"response":{"fragments":[{"id":2,"type":"THINK","content":"` + prefix + `继续","references":[],"stage_id":1}]}}}`,
+		``,
+		`data: {"v":"分析"}`,
+		``,
+		`data: {"p":"response/status","v":"FINISHED"}`,
+		``,
+	}, "\n")
+
+	resp := &http.Response{Body: io.NopCloser(strings.NewReader(body))}
+	got := CollectStream(resp, true, true)
+	if got.Thinking != prefix+"继续分析" {
+		t.Fatalf("unexpected thinking after dedupe: %q", got.Thinking)
+	}
+}
--- a/internal/sse/dedupe.go
+++ b/internal/sse/dedupe.go
@@ -0,0 +1,24 @@
+package sse
+
+import "strings"
+
+const minContinuationSnapshotLen = 32
+
+// TrimContinuationOverlap removes the already-seen prefix when DeepSeek
+// continue rounds resend the full fragment snapshot instead of only the new
+// suffix. Non-overlapping chunks are returned unchanged.
+func TrimContinuationOverlap(existing, incoming string) string {
+	if incoming == "" {
+		return ""
+	}
+	if existing == "" {
+		return incoming
+	}
+	if len(incoming) >= minContinuationSnapshotLen && strings.HasPrefix(incoming, existing) {
+		return incoming[len(existing):]
+	}
+	if len(incoming) >= minContinuationSnapshotLen && strings.HasPrefix(existing, incoming) {
+		return ""
+	}
+	return incoming
+}
--- a/internal/sse/dedupe_test.go
+++ b/internal/sse/dedupe_test.go
@@ -0,0 +1,39 @@
+package sse
+
+import "testing"
+
+func TestTrimContinuationOverlapReturnsSuffixForSnapshotReplay(t *testing.T) {
+	existing := "我们被问到：这是一个很长的续答快照前缀，用来验证去重逻辑不会误伤正常 token。"
+	incoming := existing + "继续分析"
+	got := TrimContinuationOverlap(existing, incoming)
+	if got != "继续分析" {
+		t.Fatalf("expected suffix only, got %q", got)
+	}
+}
+
+func TestTrimContinuationOverlapDropsStaleShorterSnapshot(t *testing.T) {
+	incoming := "我们被问到：这是一个很长的续答快照前缀，用来验证去重逻辑不会误伤正常 token。"
+	existing := incoming + "继续分析"
+	got := TrimContinuationOverlap(existing, incoming)
+	if got != "" {
+		t.Fatalf("expected stale snapshot to be dropped, got %q", got)
+	}
+}
+
+func TestTrimContinuationOverlapPreservesNormalIncrement(t *testing.T) {
+	existing := "我们"
+	incoming := "被"
+	got := TrimContinuationOverlap(existing, incoming)
+	if got != "被" {
+		t.Fatalf("expected normal increment unchanged, got %q", got)
+	}
+}
+
+func TestTrimContinuationOverlapKeepsShortPrefixLikeNormalToken(t *testing.T) {
+	existing := "我们被问到"
+	incoming := "我们"
+	got := TrimContinuationOverlap(existing, incoming)
+	if got != "我们" {
+		t.Fatalf("expected short token preserved, got %q", got)
+	}
+}
--- a/tests/node/chat-stream.test.js
+++ b/tests/node/chat-stream.test.js
@@ -22,6 +22,7 @@ const {
  shouldSkipPath,
  isNodeStreamSupportedPath,
  extractPathname,
+  trimContinuationOverlap,
 } = handler.__test;

 test('chat-stream exposes parser test hooks', () => {
@@ -368,3 +369,10 @@ test('extractPathname strips query only', () => {
  assert.equal(extractPathname('/v1/chat/completions?stream=true'), '/v1/chat/completions');
  assert.equal(extractPathname('/v1beta/models/gemini-2.5-flash:streamGenerateContent?key=1'), '/v1beta/models/gemini-2.5-flash:streamGenerateContent');
 });
+
+test('trimContinuationOverlap preserves short normal tokens and trims long snapshots', () => {
+  assert.equal(trimContinuationOverlap('我们被问到', '我们'), '我们');
+  const existing = '我们被问到：这是一个很长的续答快照前缀，用来验证去重逻辑不会误伤正常 token。';
+  const incoming = `${existing}继续分析`;
+  assert.equal(trimContinuationOverlap(existing, incoming), '继续分析');
+});
--- a/tests/raw_stream_samples/README.md
+++ b/tests/raw_stream_samples/README.md
@@ -2,14 +2,22 @@

 该目录只保留**上游真实 SSE 原始流**，用于本地回放、字段分析和回归测试。

-## 默认永久样本
+## 样本分类

-仓库当前保留两份 canonical 默认样本：
+该目录下的样本分成两类：

- `guangzhou-weather-reasoner-search-20260404`：包含 `reference:N` 引用标记的天气搜索流，用于验证引用清理与正文输出。
- `content-filter-trigger-20260405-jwt3`：真实命中的 `CONTENT_FILTER` 风控流，用于验证终态处理与拒答格式。
+- canonical 默认样本：由 [`manifest.json`](./manifest.json) 的 `default_samples` 指定，默认回放工具优先跑这组稳定样本
+- 扩展样本：保留真实问题或特定协议行为，用于排障、字段分析和定向回归，不一定默认纳入全量回放

-默认回放工具会优先读取 [`manifest.json`](./manifest.json) 中的 `default_samples`，以稳定固定回放集。
+当前目录里除了 canonical 样本，还包含例如：
+
+- `markdown-format-example-20260405`
+- `markdown-format-example-20260405-spacefix`
+- `continue-thinking-snapshot-replay-20260405`
+
+其中 `continue-thinking-snapshot-replay-20260405` 是一个多轮样本，覆盖 `completion + continue` 的原始 SSE 重放场景，用于验证接续思考去重。
+
+如果要看默认固定回放集，以 [`manifest.json`](./manifest.json) 为准，而不是按目录数量人工判断。
 更完整的协议级行为结构说明见 [docs/DeepSeekSSE行为结构说明-2026-04-05.md](../../docs/DeepSeekSSE行为结构说明-2026-04-05.md)。

 ## 自动采集接口
@@ -29,6 +37,15 @@ POST /admin/dev/raw-samples/capture

 采集接口的响应体仍然是项目当次的实际输出，但它不会再写入样本目录。这样样本树始终只保留原始流，后续回放时再按需本地生成派生结果。

+如果问题已经在当前进程的内存抓包里复现过，也可以先查再存：
+
+```bash
+GET /admin/dev/raw-samples/query?q=关键词&limit=20
+POST /admin/dev/raw-samples/save
+```
+
+这条链路适合把“刚刚发生的一次真实问题”快速转成可回放样本，而不用重新触发请求。
+
 ## 目录规范

 每个样本一个子目录，且只保留下面两类文件：
@@ -36,6 +53,16 @@ POST /admin/dev/raw-samples/capture
 - `meta.json`：样本元信息（问题、模型、采集时间、备注）
 - `upstream.stream.sse`：完整原始 SSE 文本（`event:` / `data:` 行）

+`meta.json` 的关键字段通常包括：
+
+- `sample_id`
+- `captured_at_utc`
+- `source`
+- `request`
+- `capture`
+
+对于多轮样本，`capture.rounds` 会记录每一轮上游请求，例如首轮 `deepseek_completion` 和后续 `deepseek_continue`。
+
 ## 回放与对比

 回放工具会读取 `upstream.stream.sse`，在本地自动生成当前解析结果，并把派生结果写到 `artifacts/raw-stream-sim/<run-id>/<sample-id>/`，例如：
@@ -60,7 +87,7 @@ POST /admin/dev/raw-samples/capture
 ## 扩展方式

 1. 抓取一次真实请求。
-2. 直接调用 `/admin/dev/raw-samples/capture`，或者手工新建 `<sample-id>/` 目录并放入 `meta.json` + `upstream.stream.sse`。
+2. 直接调用 `/admin/dev/raw-samples/capture`，或者先用 `/admin/dev/raw-samples/query` + `/admin/dev/raw-samples/save` 从内存抓包落盘；也可以手工新建 `<sample-id>/` 目录并放入 `meta.json` + `upstream.stream.sse`。
 3. 运行回放工具或对比脚本，生成本地派生结果并检查是否回归。

 > 注意：样本可能包含搜索结果正文与引用信息，请勿放入敏感账号/密钥。
--- a/tests/raw_stream_samples/continue-thinking-snapshot-replay-20260405/meta.json
+++ b/tests/raw_stream_samples/continue-thinking-snapshot-replay-20260405/meta.json
--- a/tests/raw_stream_samples/continue-thinking-snapshot-replay-20260405/upstream.stream.sse
+++ b/tests/raw_stream_samples/continue-thinking-snapshot-replay-20260405/upstream.stream.sse
--- a/tests/tools/deepseek-sse-simulator.mjs
+++ b/tests/tools/deepseek-sse-simulator.mjs
@@ -7,6 +7,7 @@ import { createRequire } from 'node:module';
 const require = createRequire(import.meta.url);
 const chatStream = require('../../api/chat-stream.js');
 const { parseChunkForContent } = chatStream.__test;
+const { trimContinuationOverlap } = chatStream.__test;

 function parseArgs(argv) {
  const out = {
@@ -179,6 +180,8 @@ function parseDeepSeekReplay(raw) {
  let currentType = 'thinking';
  let sawFinish = false;
  let outputText = '';
+  let thinkingText = '';
+  let textOutput = '';
  let parsedChunks = 0;

  for (const evt of events) {
@@ -201,7 +204,15 @@ function parseDeepSeekReplay(raw) {
      sawFinish = true;
    }
    for (const part of parsed.parts) {
-      outputText += part.text;
+      if (part.type === 'thinking') {
+        const trimmed = trimContinuationOverlap(thinkingText, part.text);
+        thinkingText += trimmed;
+        outputText += trimmed;
+      } else {
+        const trimmed = trimContinuationOverlap(textOutput, part.text);
+        textOutput += trimmed;
+        outputText += trimmed;
+      }
    }
  }

--- a/webui/src/locales/en.json
+++ b/webui/src/locales/en.json
@@ -115,7 +115,7 @@
        "addAccount": "Add account",
        "testingAllAccounts": "Refreshing tokens for all accounts...",
        "sessionActive": "Session active",
-        "reauthRequired": "Re-auth required",
+        "reauthRequired": "Retest status required",
        "runtimeStatusUnknown": "Will be determined after sync",
        "testStatusFailed": "Last test failed",
        "noAccounts": "No accounts found.",
@@ -325,4 +325,4 @@
            "four": "Trigger a redeploy to apply the updated environment variables."
        }
    }
-}
+}
--- a/webui/src/locales/zh.json
+++ b/webui/src/locales/zh.json
@@ -115,7 +115,7 @@
        "addAccount": "添加账号",
        "testingAllAccounts": "正在刷新所有账号 Token...",
        "sessionActive": "已建立会话",
-        "reauthRequired": "需重新登录",
+        "reauthRequired": "需重新测试状态",
        "runtimeStatusUnknown": "状态以同步后为准",
        "testStatusFailed": "上次测试失败",
        "noAccounts": "未找到任何账号",
@@ -325,4 +325,4 @@
            "four": "触发重新部署以应用新的环境变量。"
        }
    }
-}
+}
Author	SHA1	Message	Date
CJACK.	d6ecdad6de	Merge pull request #218 from CJackHwang/dev fix: reverse snapshot order to preserve capture sequence during stabl…	2026-04-06 02:55:59 +08:00
CJACK	2857a171cc	fix: reverse snapshot order to preserve capture sequence during stable sort	2026-04-06 02:51:06 +08:00
CJACK.	eb8b45e667	Merge pull request #217 from CJackHwang/dev Dev	2026-04-06 02:47:44 +08:00
CJACK	1664349a29	docs: update documentation for raw stream test samples	2026-04-06 02:44:20 +08:00
CJACK	b105d54c00	feat: add admin endpoints for capturing, querying, and persisting raw upstream samples and increase default capture limits	2026-04-06 02:38:15 +08:00
CJACK	039d7d3db1	feat: implement raw sample capture querying and persistence, and add environment-based configuration for dev capture store.	2026-04-06 02:33:02 +08:00
CJACK	49012a227c	feat: implement trimContinuationOverlap utility to remove redundant stream prefixes and add associated tests.	2026-04-06 02:23:28 +08:00
CJACK	4d36afea4c	修复接续流的增量bug	2026-04-06 02:01:41 +08:00
@@ -1 +1 @@
 .1.0
 .1.1