Merge pull request #416 from CJackHwang/main

Add Star History section to README Added a Star History section with a chart to the README.
Add Star History section to README
2026-05-04 00:15:28 +08:00 · 2026-05-03 20:48:53 +08:00 · 2026-05-03 20:46:22 +08:00 · 2026-05-03 20:45:31 +08:00 · 2026-05-03 20:41:22 +08:00 · 2026-05-03 20:27:57 +08:00
44 changed files with 1120 additions and 126 deletions
--- a/API.md
+++ b/API.md
@@ -42,7 +42,7 @@
 - Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致：推荐模型输出 DSML 外壳 `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`；兼容层也接受 DSML wrapper 别名 `<dsml|tool_calls>`、`<|tool_calls>`、`<｜tool_calls>`、常见 DSML 分隔符漏写形态（如 `<|DSML tool_calls>`）、`DSML` 与工具标签名黏连的常见 typo（如 `<DSMLtool_calls>`），以及旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`。实现上采用窄容错结构扫描：只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 会进入工具路径，裸 `<invoke>` 不计为已支持语法；流式场景继续执行防泄漏筛分。若参数体本身是合法 JSON 字面量（如 `123`、`true`、`null`、数组或对象），会按结构化值输出，不再一律当作字符串；若 CDATA 偶发漏闭合，则会在最终 parse / flush 恢复阶段做窄修复，尽量保住已完整包裹的外层工具调用。
 - `Admin API` 将配置与运行时策略分开：`/admin/config*` 管静态配置，`/admin/settings*` 管运行时行为。
 - 当上游返回 thinking-only 响应（模型输出了推理链但无可见文本）时，非流式补全会自动重试一次：以多轮对话 follow-up 方式追加 prompt 后缀 `"Previous reply had no visible output. Please regenerate the visible final answer or tool call now."` 并设置 `parent_message_id` 在同一 DeepSeek session 内让模型重新输出；重试最大 1 次。
- 引用标记剥离（strip reference markers）当前为固定开启的运行时行为，所有协议适配层统一生效。
+- 引用标记处理边界：流式输出默认隐藏 `[citation:N]` / `[reference:N]` 这类上游内部占位符；非流式输出默认把 DeepSeek 搜索引用标记转换为 Markdown 引用链接。

 ---

@@ -168,6 +168,8 @@ Gemini 兼容客户端还可以使用 `x-goog-api-key`、`?key=` 或 `?api_key=`
 | GET | `/admin/chat-history/{id}` | Admin | 查看单条服务器端对话记录 |
 | DELETE | `/admin/chat-history/{id}` | Admin | 删除单条服务器端对话记录 |
 | PUT | `/admin/chat-history/settings` | Admin | 更新对话记录保留条数 |
+
+服务器端记录本质上是 DeepSeek 上游响应归档：OpenAI Chat、OpenAI Responses、Claude Messages、Gemini GenerateContent 等直连 DeepSeek 的生成接口，在收到上游响应后会于各协议回译/裁剪前写入记录；列表按请求创建时间倒序展示，流式请求会在生成过程中持续刷新状态与详情。WebUI「API 测试」发出的请求也会进入该记录。
 | GET | `/admin/version` | Admin | 查询当前版本与最新 Release |

 OpenAI `/v1/*` 仍是规范路径。对于只配置 DS2API 根地址的客户端，同一套 OpenAI handler 也通过根路径快捷路由暴露：`/models`、`/models/{id}`、`/chat/completions`、`/responses`、`/responses/{response_id}`、`/embeddings`、`/files`、`/files/{file_id}`。
--- a/README.MD
+++ b/README.MD
@@ -23,6 +23,16 @@

 【感谢Linux.do社区及GitHub社区各位开发者对项目的支持与贡献】

+## Star History
+
+<a href="https://www.star-history.com/?repos=cjackhwang%2Fds2api&type=date&legend=top-left">
+ <picture>
+   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/chart?repos=cjackhwang/ds2api&type=date&theme=dark&legend=top-left" />
+   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/chart?repos=cjackhwang/ds2api&type=date&legend=top-left" />
+   <img alt="Star History Chart" src="https://api.star-history.com/chart?repos=cjackhwang/ds2api&type=date&legend=top-left" />
+ </picture>
+</a>
+
 > **重要免责声明**
 >
 > 本仓库仅供学习、研究、个人实验和内部验证使用，不提供任何形式的商业授权、适用性保证或结果保证。
--- a/README.en.md
+++ b/README.en.md
@@ -20,6 +20,16 @@ DS2API converts DeepSeek Web chat capability into OpenAI-compatible, Claude-comp

 Documentation entry: [Docs Index](docs/README.md) / [Architecture](docs/ARCHITECTURE.en.md) / [API Reference](API.en.md)

+## Star History
+
+<a href="https://www.star-history.com/?repos=cjackhwang%2Fds2api&type=date&legend=top-left">
+ <picture>
+   <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/chart?repos=cjackhwang/ds2api&type=date&theme=dark&legend=top-left" />
+   <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/chart?repos=cjackhwang/ds2api&type=date&legend=top-left" />
+   <img alt="Star History Chart" src="https://api.star-history.com/chart?repos=cjackhwang/ds2api&type=date&legend=top-left" />
+ </picture>
+</a>
+
 > **Important Disclaimer**
 >
 > This repository is provided for learning, research, personal experimentation, and internal validation only. It does not grant any commercial authorization and comes with no warranty of fitness, stability, or results.
--- a/2
+++ b/2
@@ -1 +1 @@
-4.4.0
+4.4.1
--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -74,6 +74,7 @@ gofmt -w <changed-go-files>

 - Admin API：`/admin/chat-history`、`/admin/chat-history/{id}`。
 - 后端存储：`internal/chathistory/store.go`。
+- 输出归档：`internal/responsehistory` 在协议回译/裁剪前记录 DeepSeek 上游 assistant text / thinking；即使工具调用已被对外响应转成结构化 `tool_calls` 并从可见正文剔除，后台历史仍应保留原始 DSML / XML 片段，方便排查格式漂移。
 - 前端轮询和 ETag：`webui/src/features/chatHistory/ChatHistoryContainer.jsx`。

 Tool call 问题优先跑：
--- a/docs/prompt-compatibility.md
+++ b/docs/prompt-compatibility.md
@@ -114,7 +114,8 @@ DS2API 当前的核心思路，不是把客户端传来的 `messages`、`tools`
 - 对 OpenAI Chat / Responses 的非流式收尾，如果最终可见正文为空，兼容层会优先尝试把思维链中的独立 DSML / XML 工具块当作真实工具调用解析出来。流式链路也会在收尾阶段做同样的 fallback 检测，但不会因为思维链内容去中途拦截或改写流式输出；真正的工具识别始终基于原始上游文本，而不是基于“已经做过可见输出清洗”的版本，因此即使最终可见层会剥离完整 leaked DSML / XML `tool_calls` wrapper、并抑制全空参数或无效 wrapper 块，也不会影响真实工具调用转成结构化 `tool_calls` / `function_call`。补发结果会作为本轮 assistant 的结构化 `tool_calls` / `function_call` 输出返回，而不是塞进 `content` 文本；如果客户端没有开启 thinking / reasoning，思维链只用于检测，不会作为 `reasoning_content` 或可见正文暴露。只有正文为空且思维链里也没有可执行工具调用时，才继续按空回复错误处理。
 - OpenAI Chat / Responses 的空回复错误处理之前会默认做一次内部补偿重试：第一次上游完整结束后，如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用，并且终止原因不是 `content_filter`，兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略，把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。重试遵循 DeepSeek 多轮对话协议：从第一次上游 SSE 流中提取 `response_message_id`，并在重试 payload 中设置 `parent_message_id` 为该值，使重试成为同一会话的后续轮次而非断裂的根消息；同时重新获取一次 PoW（若 PoW 获取失败则回退到原始 PoW）。该重试不会重新标准化消息、不会新建 session、不会切换账号，也不会向流式客户端插入重试标记；第二次 thinking / reasoning 会按正常增量直接接到第一次之后，并继续使用 overlap trim 去重。若第二次仍为空，终端错误码仍保持现有 `upstream_empty_output`；若任一尝试触发空 `content_filter`，不做补偿重试并保持 `content_filter` 错误。JS Vercel 运行时同样设置 `parent_message_id`，但因无法直接调用 PoW API 而复用原始 PoW。

- OpenAI Chat / Responses 在最终可见正文渲染阶段，会把 DeepSeek 搜索返回中的 `[citation:N]` / `[reference:N]` 标记替换成对应 Markdown 链接。`citation` 标记按一基序号解析；`reference` 标记只有在同一段正文中出现 `[reference:0]`（允许冒号后有空格）时才按零基序号映射，并且不会影响同段正文里的 `citation` 标记。
+- 非流式 OpenAI Chat / Responses、Claude Messages、Gemini generateContent 在最终可见正文渲染阶段，会把 DeepSeek 搜索返回中的 `[citation:N]` / `[reference:N]` 标记替换成对应 Markdown 链接。`citation` 标记按一基序号解析；`reference` 标记只有在同一段正文中出现 `[reference:0]`（允许冒号后有空格）时才按零基序号映射，并且不会影响同段正文里的 `citation` 标记。
+- 流式输出仍默认隐藏 `[citation:N]` / `[reference:N]` 这类上游内部标记，避免分片输出中泄漏尚未完成映射的引用占位符。

 ## 5. prompt 是怎么拼出来的

--- a/internal/assistantturn/turn_test.go
+++ b/internal/assistantturn/turn_test.go
@@ -11,7 +11,7 @@ func TestBuildTurnFromCollectedTextCitation(t *testing.T) {
 	turn := BuildTurnFromCollected(sse.CollectResult{
 		Text:          "See [citation:1]",
 		CitationLinks: map[int]string{1: "https://example.com"},
-	}, BuildOptions{Model: "deepseek-v4-flash", Prompt: "prompt", SearchEnabled: true, StripReferenceMarkers: true})
+	}, BuildOptions{Model: "deepseek-v4-flash", Prompt: "prompt", SearchEnabled: true})
 	if turn.Text != "See [1](https://example.com)" {
 		t.Fatalf("text mismatch: %q", turn.Text)
 	}
@@ -23,6 +23,20 @@ func TestBuildTurnFromCollectedTextCitation(t *testing.T) {
 	}
 }

+func TestBuildTurnFromCollectedKeepsNonStreamReferenceLinks(t *testing.T) {
+	turn := BuildTurnFromCollected(sse.CollectResult{
+		Text: "结论[reference:0]，补充[reference:1]。",
+		CitationLinks: map[int]string{
+			1: "https://example.com/a",
+			2: "https://example.com/b",
+		},
+	}, BuildOptions{Model: "deepseek-v4-flash-search", Prompt: "prompt", SearchEnabled: true})
+	want := "结论[0](https://example.com/a)，补充[1](https://example.com/b)。"
+	if turn.Text != want {
+		t.Fatalf("text mismatch: got %q want %q", turn.Text, want)
+	}
+}
+
 func TestBuildTurnFromCollectedToolCall(t *testing.T) {
 	turn := BuildTurnFromCollected(sse.CollectResult{
 		Text: `<tool_calls><invoke name="Write"><parameter name="content">{"x":1}</parameter></invoke></tool_calls>`,
--- a/internal/chathistory/store.go
+++ b/internal/chathistory/store.go
@@ -43,6 +43,7 @@ type Entry struct {
 	Status           string         `json:"status"`
 	CallerID         string         `json:"caller_id,omitempty"`
 	AccountID        string         `json:"account_id,omitempty"`
+	Surface          string         `json:"surface,omitempty"`
 	Model            string         `json:"model,omitempty"`
 	Stream           bool           `json:"stream"`
 	UserInput        string         `json:"user_input,omitempty"`
@@ -72,6 +73,7 @@ type SummaryEntry struct {
 	Status         string `json:"status"`
 	CallerID       string `json:"caller_id,omitempty"`
 	AccountID      string `json:"account_id,omitempty"`
+	Surface        string `json:"surface,omitempty"`
 	Model          string `json:"model,omitempty"`
 	Stream         bool   `json:"stream"`
 	UserInput      string `json:"user_input,omitempty"`
@@ -92,6 +94,7 @@ type File struct {
 type StartParams struct {
 	CallerID    string
 	AccountID   string
+	Surface     string
 	Model       string
 	Stream      bool
 	UserInput   string
@@ -271,6 +274,7 @@ func (s *Store) Start(params StartParams) (Entry, error) {
 		Status:      "streaming",
 		CallerID:    strings.TrimSpace(params.CallerID),
 		AccountID:   strings.TrimSpace(params.AccountID),
+		Surface:     strings.TrimSpace(params.Surface),
 		Model:       strings.TrimSpace(params.Model),
 		Stream:      params.Stream,
 		UserInput:   strings.TrimSpace(params.UserInput),
@@ -546,10 +550,13 @@ func (s *Store) rebuildIndexLocked() {
 		summaries = append(summaries, summaryFromEntry(item))
 	}
 	sort.Slice(summaries, func(i, j int) bool {
-		if summaries[i].UpdatedAt == summaries[j].UpdatedAt {
-			return summaries[i].CreatedAt > summaries[j].CreatedAt
+		if summaries[i].CreatedAt == summaries[j].CreatedAt {
+			if summaries[i].Revision == summaries[j].Revision {
+				return summaries[i].UpdatedAt > summaries[j].UpdatedAt
+			}
+			return summaries[i].Revision > summaries[j].Revision
 		}
-		return summaries[i].UpdatedAt > summaries[j].UpdatedAt
+		return summaries[i].CreatedAt > summaries[j].CreatedAt
 	})
 	if s.state.Limit < DisabledLimit || !isAllowedLimit(s.state.Limit) {
 		s.state.Limit = DefaultLimit
@@ -593,6 +600,7 @@ func summaryFromEntry(item Entry) SummaryEntry {
 		Status:         item.Status,
 		CallerID:       item.CallerID,
 		AccountID:      item.AccountID,
+		Surface:        item.Surface,
 		Model:          item.Model,
 		Stream:         item.Stream,
 		UserInput:      item.UserInput,
--- a/internal/chathistory/store_test.go
+++ b/internal/chathistory/store_test.go
@@ -8,6 +8,7 @@ import (
 	"strings"
 	"sync"
 	"testing"
+	"time"
 	"unicode/utf8"
 )

@@ -494,6 +495,36 @@ func TestStoreWritesOnlyChangedDetailFiles(t *testing.T) {
 	}
 }

+func TestStoreOrdersByCreationTimeNotStreamingUpdates(t *testing.T) {
+	path := filepath.Join(t.TempDir(), "chat_history.json")
+	store := New(path)
+
+	first, err := store.Start(StartParams{UserInput: "first"})
+	if err != nil {
+		t.Fatalf("start first failed: %v", err)
+	}
+	time.Sleep(time.Millisecond)
+	second, err := store.Start(StartParams{UserInput: "second"})
+	if err != nil {
+		t.Fatalf("start second failed: %v", err)
+	}
+	time.Sleep(time.Millisecond)
+	if _, err := store.Update(first.ID, UpdateParams{Status: "streaming", Content: "still running"}); err != nil {
+		t.Fatalf("update first failed: %v", err)
+	}
+
+	snapshot, err := store.Snapshot()
+	if err != nil {
+		t.Fatalf("snapshot failed: %v", err)
+	}
+	if len(snapshot.Items) != 2 {
+		t.Fatalf("expected two items, got %#v", snapshot.Items)
+	}
+	if snapshot.Items[0].ID != second.ID || snapshot.Items[1].ID != first.ID {
+		t.Fatalf("expected creation-time order to stay stable, got %#v", snapshot.Items)
+	}
+}
+
 func TestUpdatePreservesContentWhenNewContentIsEmpty(t *testing.T) {
 	path := filepath.Join(t.TempDir(), "chat_history.json")
 	store := New(path)
--- a/internal/completionruntime/nonstream_test.go
+++ b/internal/completionruntime/nonstream_test.go
@@ -119,6 +119,29 @@ func TestExecuteNonStreamWithRetryUsesParentMessageForEmptyRetry(t *testing.T) {
 	}
 }

+func TestExecuteNonStreamWithRetryConvertsReferenceMarkers(t *testing.T) {
+	ds := &fakeDeepSeekCaller{responses: []*http.Response{sseHTTPResponse(
+		http.StatusOK,
+		`data: {"p":"response/content","v":"答案[reference:0]。","citation":{"cite_index":0,"url":"https://example.com/ref"}}`,
+	)}}
+	stdReq := promptcompat.StandardRequest{
+		Surface:         "test",
+		ResponseModel:   "deepseek-v4-flash-search",
+		PromptTokenText: "prompt",
+		FinalPrompt:     "final prompt",
+		Search:          true,
+	}
+
+	result, outErr := ExecuteNonStreamWithRetry(context.Background(), ds, &auth.RequestAuth{}, stdReq, Options{})
+	if outErr != nil {
+		t.Fatalf("unexpected output error: %#v", outErr)
+	}
+	want := "答案[0](https://example.com/ref)。"
+	if result.Turn.Text != want {
+		t.Fatalf("text mismatch: got %q want %q", result.Turn.Text, want)
+	}
+}
+
 func TestStartCompletionAppliesCurrentInputFileGlobally(t *testing.T) {
 	ds := &fakeDeepSeekCaller{responses: []*http.Response{sseHTTPResponse(http.StatusOK, `data: {"p":"response/content","v":"ok"}`)}}
 	stdReq := promptcompat.StandardRequest{
--- a/internal/httpapi/claude/current_input_file_test.go
+++ b/internal/httpapi/claude/current_input_file_test.go
@@ -5,15 +5,25 @@ import (
 	"io"
 	"net/http"
 	"net/http/httptest"
+	"path/filepath"
 	"strings"
 	"testing"

 	"ds2api/internal/auth"
+	"ds2api/internal/chathistory"
 	dsclient "ds2api/internal/deepseek/client"
 )

 type claudeCurrentInputAuth struct{}

+type claudeHistoryConfig struct {
+	aliases map[string]string
+}
+
+func (m claudeHistoryConfig) ModelAliases() map[string]string { return m.aliases }
+func (claudeHistoryConfig) CurrentInputFileEnabled() bool     { return false }
+func (claudeHistoryConfig) CurrentInputFileMinChars() int     { return 0 }
+
 func (claudeCurrentInputAuth) Determine(*http.Request) (*auth.RequestAuth, error) {
 	return &auth.RequestAuth{
 		DeepSeekToken: "direct-token",
@@ -22,6 +32,50 @@ func (claudeCurrentInputAuth) Determine(*http.Request) (*auth.RequestAuth, error
 	}, nil
 }

+func TestClaudeDirectRecordsResponseHistory(t *testing.T) {
+	ds := &claudeCurrentInputDS{}
+	historyStore := chathistory.New(filepath.Join(t.TempDir(), "history.json"))
+	h := &Handler{
+		Store:       claudeHistoryConfig{aliases: map[string]string{"claude-sonnet-4-6": "deepseek-v4-flash"}},
+		Auth:        claudeCurrentInputAuth{},
+		DS:          ds,
+		ChatHistory: historyStore,
+	}
+	reqBody := `{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"hello from claude"}],"max_tokens":1024}`
+	req := httptest.NewRequest(http.MethodPost, "/v1/messages", strings.NewReader(reqBody))
+	req.Header.Set("Content-Type", "application/json")
+	rec := httptest.NewRecorder()
+
+	h.Messages(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	snapshot, err := historyStore.Snapshot()
+	if err != nil {
+		t.Fatalf("snapshot history: %v", err)
+	}
+	if len(snapshot.Items) != 1 {
+		t.Fatalf("expected one history item, got %d", len(snapshot.Items))
+	}
+	item, err := historyStore.Get(snapshot.Items[0].ID)
+	if err != nil {
+		t.Fatalf("get history item: %v", err)
+	}
+	if item.Surface != "claude.messages" {
+		t.Fatalf("unexpected surface: %q", item.Surface)
+	}
+	if item.Model != "claude-sonnet-4-6" {
+		t.Fatalf("unexpected model: %q", item.Model)
+	}
+	if item.UserInput != "hello from claude" {
+		t.Fatalf("unexpected user input: %q", item.UserInput)
+	}
+	if item.Content != "ok" {
+		t.Fatalf("expected raw upstream content, got %q", item.Content)
+	}
+}
+
 func (claudeCurrentInputAuth) Release(*auth.RequestAuth) {}

 type claudeCurrentInputDS struct {
@@ -53,10 +107,12 @@ func (d *claudeCurrentInputDS) CallCompletion(_ context.Context, _ *auth.Request

 func TestClaudeDirectAppliesCurrentInputFile(t *testing.T) {
 	ds := &claudeCurrentInputDS{}
+	historyStore := chathistory.New(filepath.Join(t.TempDir(), "history.json"))
 	h := &Handler{
-		Store: mockClaudeConfig{aliases: map[string]string{"claude-sonnet-4-6": "deepseek-v4-flash"}},
-		Auth:  claudeCurrentInputAuth{},
-		DS:    ds,
+		Store:       mockClaudeConfig{aliases: map[string]string{"claude-sonnet-4-6": "deepseek-v4-flash"}},
+		Auth:        claudeCurrentInputAuth{},
+		DS:          ds,
+		ChatHistory: historyStore,
 	}
 	reqBody := `{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"hello from claude"}],"max_tokens":1024}`
 	req := httptest.NewRequest(http.MethodPost, "/v1/messages", strings.NewReader(reqBody))
@@ -82,4 +138,21 @@ func TestClaudeDirectAppliesCurrentInputFile(t *testing.T) {
 	if !strings.Contains(prompt, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
 		t.Fatalf("expected continuation prompt, got %q", prompt)
 	}
+	snapshot, err := historyStore.Snapshot()
+	if err != nil {
+		t.Fatalf("snapshot history: %v", err)
+	}
+	if len(snapshot.Items) != 1 {
+		t.Fatalf("expected one history item, got %d", len(snapshot.Items))
+	}
+	full, err := historyStore.Get(snapshot.Items[0].ID)
+	if err != nil {
+		t.Fatalf("get history item: %v", err)
+	}
+	if full.HistoryText != string(ds.uploads[0].Data) {
+		t.Fatalf("expected uploaded current input file to be persisted in history text")
+	}
+	if len(full.Messages) != 1 || !strings.Contains(full.Messages[0].Content, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected persisted message to match upstream continuation prompt, got %#v", full.Messages)
+	}
 }
--- a/internal/httpapi/claude/handler_messages.go
+++ b/internal/httpapi/claude/handler_messages.go
@@ -2,6 +2,7 @@ package claude

 import (
 	"bytes"
+	"context"
 	"encoding/json"
 	"errors"
 	"fmt"
@@ -15,8 +16,10 @@ import (
 	"ds2api/internal/completionruntime"
 	"ds2api/internal/config"
 	claudefmt "ds2api/internal/format/claude"
+	"ds2api/internal/httpapi/openai/history"
 	"ds2api/internal/httpapi/requestbody"
 	"ds2api/internal/promptcompat"
+	"ds2api/internal/responsehistory"
 	streamengine "ds2api/internal/stream"
 	"ds2api/internal/translatorcliproxy"
 	"ds2api/internal/util"
@@ -79,38 +82,70 @@ func (h *Handler) handleClaudeDirect(w http.ResponseWriter, r *http.Request) boo
 		return true
 	}
 	defer h.Auth.Release(a)
-	if norm.Standard.Stream {
-		h.handleClaudeDirectStream(w, r, a, norm.Standard)
+	stdReq, err := h.applyCurrentInputFile(r.Context(), a, norm.Standard)
+	if err != nil {
+		status, message := mapCurrentInputFileError(err)
+		writeClaudeError(w, status, message)
 		return true
 	}
-	result, outErr := completionruntime.ExecuteNonStreamWithRetry(r.Context(), h.DS, a, norm.Standard, completionruntime.Options{
-		StripReferenceMarkers: stripReferenceMarkersEnabled(),
-		RetryEnabled:          true,
-		CurrentInputFile:      h.Store,
+	historySession := responsehistory.Start(responsehistory.StartParams{
+		Store:    h.ChatHistory,
+		Request:  r,
+		Auth:     a,
+		Surface:  "claude.messages",
+		Standard: stdReq,
+	})
+	if stdReq.Stream {
+		h.handleClaudeDirectStream(w, r, a, stdReq, historySession)
+		return true
+	}
+	result, outErr := completionruntime.ExecuteNonStreamWithRetry(r.Context(), h.DS, a, stdReq, completionruntime.Options{
+		RetryEnabled:     true,
+		CurrentInputFile: h.Store,
 	})
 	if outErr != nil {
+		if historySession != nil {
+			historySession.ErrorTurn(outErr.Status, outErr.Message, outErr.Code, result.Turn)
+		}
 		writeClaudeError(w, outErr.Status, outErr.Message)
 		return true
 	}
+	if historySession != nil {
+		historySession.SuccessTurn(http.StatusOK, result.Turn, responsehistory.GenericUsage(result.Turn))
+	}
 	writeJSON(w, http.StatusOK, claudefmt.BuildMessageResponseFromTurn(
 		fmt.Sprintf("msg_%d", time.Now().UnixNano()),
-		norm.Standard.ResponseModel,
+		stdReq.ResponseModel,
 		result.Turn,
 		exposeThinking,
 	))
 	return true
 }

-func (h *Handler) handleClaudeDirectStream(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, stdReq promptcompat.StandardRequest) {
+func (h *Handler) applyCurrentInputFile(ctx context.Context, a *auth.RequestAuth, stdReq promptcompat.StandardRequest) (promptcompat.StandardRequest, error) {
+	if h == nil {
+		return stdReq, nil
+	}
+	return (history.Service{Store: h.Store, DS: h.DS}).ApplyCurrentInputFile(ctx, a, stdReq)
+}
+
+func mapCurrentInputFileError(err error) (int, string) {
+	return history.MapError(err)
+}
+
+func (h *Handler) handleClaudeDirectStream(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, stdReq promptcompat.StandardRequest, historySession *responsehistory.Session) {
 	start, outErr := completionruntime.StartCompletion(r.Context(), h.DS, a, stdReq, completionruntime.Options{
 		CurrentInputFile: h.Store,
 	})
 	if outErr != nil {
+		if historySession != nil {
+			historySession.Error(outErr.Status, outErr.Message, outErr.Code, "", "")
+		}
 		writeClaudeError(w, outErr.Status, outErr.Message)
 		return
 	}
 	streamReq := start.Request
-	h.handleClaudeStreamRealtime(w, r, start.Response, streamReq.ResponseModel, streamReq.Messages, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw)
+	h.handleClaudeStreamRealtime(w, r, start.Response, streamReq.ResponseModel, streamReq.Messages, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
 }

 func (h *Handler) proxyViaOpenAI(w http.ResponseWriter, r *http.Request, store ConfigReader) bool {
@@ -264,10 +299,17 @@ func stripClaudeThinkingBlocks(raw []byte) []byte {
 	return out
 }

-func (h *Handler) handleClaudeStreamRealtime(w http.ResponseWriter, r *http.Request, resp *http.Response, model string, messages []any, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any) {
+func (h *Handler) handleClaudeStreamRealtime(w http.ResponseWriter, r *http.Request, resp *http.Response, model string, messages []any, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, historySessions ...*responsehistory.Session) {
+	var historySession *responsehistory.Session
+	if len(historySessions) > 0 {
+		historySession = historySessions[0]
+	}
 	defer func() { _ = resp.Body.Close() }()
 	if resp.StatusCode != http.StatusOK {
 		body, _ := io.ReadAll(resp.Body)
+		if historySession != nil {
+			historySession.Error(resp.StatusCode, strings.TrimSpace(string(body)), "error", "", "")
+		}
 		writeClaudeError(w, http.StatusInternalServerError, string(body))
 		return
 	}
@@ -294,6 +336,7 @@ func (h *Handler) handleClaudeStreamRealtime(w http.ResponseWriter, r *http.Requ
 		toolNames,
 		toolsRaw,
 		buildClaudePromptTokenText(messages, thinkingEnabled),
+		historySession,
 	)
 	streamRuntime.sendMessageStart()

--- a/internal/httpapi/claude/handler_routes.go
+++ b/internal/httpapi/claude/handler_routes.go
@@ -6,6 +6,7 @@ import (

 	"github.com/go-chi/chi/v5"

+	"ds2api/internal/chathistory"
 	"ds2api/internal/config"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	"ds2api/internal/textclean"
@@ -16,10 +17,11 @@ import (
 var writeJSON = util.WriteJSON

 type Handler struct {
-	Store  ConfigReader
-	Auth   AuthResolver
-	DS     DeepSeekCaller
-	OpenAI OpenAIChatRunner
+	Store       ConfigReader
+	Auth        AuthResolver
+	DS          DeepSeekCaller
+	OpenAI      OpenAIChatRunner
+	ChatHistory *chathistory.Store
 }

 func stripReferenceMarkersEnabled() bool {
--- a/internal/httpapi/claude/stream_runtime_core.go
+++ b/internal/httpapi/claude/stream_runtime_core.go
@@ -6,6 +6,7 @@ import (
 	"strings"
 	"time"

+	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
 	streamengine "ds2api/internal/stream"
 	"ds2api/internal/toolcall"
@@ -46,6 +47,7 @@ type claudeStreamRuntime struct {
 	textEmitted        bool
 	ended              bool
 	upstreamErr        string
+	history            *responsehistory.Session
 }

 func newClaudeStreamRuntime(
@@ -60,6 +62,7 @@ func newClaudeStreamRuntime(
 	toolNames []string,
 	toolsRaw any,
 	promptTokenText string,
+	history *responsehistory.Session,
 ) *claudeStreamRuntime {
 	return &claudeStreamRuntime{
 		w:                     w,
@@ -74,6 +77,7 @@ func newClaudeStreamRuntime(
 		toolNames:             toolNames,
 		toolsRaw:              toolsRaw,
 		promptTokenText:       promptTokenText,
+		history:               history,
 		messageID:             fmt.Sprintf("msg_%d", time.Now().UnixNano()),
 		thinkingBlockIndex:    -1,
 		textBlockIndex:        -1,
@@ -232,5 +236,11 @@ func (s *claudeStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 		}
 	}

+	if s.history != nil {
+		s.history.Progress(
+			responsehistory.ThinkingForArchive(s.rawThinking.String(), s.toolDetectionThinking.String(), s.thinking.String()),
+			responsehistory.TextForArchive(s.rawText.String(), s.text.String()),
+		)
+	}
 	return streamengine.ParsedDecision{ContentSeen: contentSeen}
 }
--- a/internal/httpapi/claude/stream_runtime_finalize.go
+++ b/internal/httpapi/claude/stream_runtime_finalize.go
@@ -2,6 +2,7 @@ package claude

 import (
 	"ds2api/internal/assistantturn"
+	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
 	"ds2api/internal/toolcall"
 	"ds2api/internal/toolstream"
@@ -175,6 +176,15 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {
 	if outcome.HasToolCalls {
 		stopReason = "tool_use"
 	}
+	if s.history != nil {
+		s.history.Success(
+			200,
+			responsehistory.ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking),
+			responsehistory.TextForArchive(turn.RawText, turn.Text),
+			stopReason,
+			responsehistory.GenericUsage(turn),
+		)
+	}

 	s.send("message_delta", map[string]any{
 		"type": "message_delta",
@@ -191,10 +201,16 @@ func (s *claudeStreamRuntime) finalize(stopReason string) {

 func (s *claudeStreamRuntime) onFinalize(reason streamengine.StopReason, scannerErr error) {
 	if string(reason) == "upstream_error" {
+		if s.history != nil {
+			s.history.Error(500, s.upstreamErr, "upstream_error", responsehistory.ThinkingForArchive(s.rawThinking.String(), s.toolDetectionThinking.String(), s.thinking.String()), responsehistory.TextForArchive(s.rawText.String(), s.text.String()))
+		}
 		s.sendError(s.upstreamErr)
 		return
 	}
 	if scannerErr != nil {
+		if s.history != nil {
+			s.history.Error(500, scannerErr.Error(), "error", responsehistory.ThinkingForArchive(s.rawThinking.String(), s.toolDetectionThinking.String(), s.thinking.String()), responsehistory.TextForArchive(s.rawText.String(), s.text.String()))
+		}
 		s.sendError(scannerErr.Error())
 		return
 	}
--- a/internal/httpapi/gemini/handler_generate.go
+++ b/internal/httpapi/gemini/handler_generate.go
@@ -2,6 +2,7 @@ package gemini

 import (
 	"bytes"
+	"context"
 	"encoding/json"
 	"errors"
 	"io"
@@ -14,8 +15,10 @@ import (
 	"ds2api/internal/assistantturn"
 	"ds2api/internal/auth"
 	"ds2api/internal/completionruntime"
+	"ds2api/internal/httpapi/openai/history"
 	"ds2api/internal/httpapi/requestbody"
 	"ds2api/internal/promptcompat"
+	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
 	"ds2api/internal/toolcall"
 	"ds2api/internal/translatorcliproxy"
@@ -76,33 +79,65 @@ func (h *Handler) handleGeminiDirect(w http.ResponseWriter, r *http.Request, str
 		return true
 	}
 	defer h.Auth.Release(a)
+	stdReq, err = h.applyCurrentInputFile(r.Context(), a, stdReq)
+	if err != nil {
+		status, message := mapCurrentInputFileError(err)
+		writeGeminiError(w, status, message)
+		return true
+	}
+	historySession := responsehistory.Start(responsehistory.StartParams{
+		Store:    h.ChatHistory,
+		Request:  r,
+		Auth:     a,
+		Surface:  "gemini.generate_content",
+		Standard: stdReq,
+	})
 	if stream {
-		h.handleGeminiDirectStream(w, r, a, stdReq)
+		h.handleGeminiDirectStream(w, r, a, stdReq, historySession)
 		return true
 	}
 	result, outErr := completionruntime.ExecuteNonStreamWithRetry(r.Context(), h.DS, a, stdReq, completionruntime.Options{
-		StripReferenceMarkers: stripReferenceMarkersEnabled(),
-		RetryEnabled:          true,
-		CurrentInputFile:      h.Store,
+		RetryEnabled:     true,
+		CurrentInputFile: h.Store,
 	})
 	if outErr != nil {
+		if historySession != nil {
+			historySession.ErrorTurn(outErr.Status, outErr.Message, outErr.Code, result.Turn)
+		}
 		writeGeminiError(w, outErr.Status, outErr.Message)
 		return true
 	}
+	if historySession != nil {
+		historySession.SuccessTurn(http.StatusOK, result.Turn, responsehistory.GenericUsage(result.Turn))
+	}
 	writeJSON(w, http.StatusOK, buildGeminiGenerateContentResponseFromTurn(result.Turn))
 	return true
 }

-func (h *Handler) handleGeminiDirectStream(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, stdReq promptcompat.StandardRequest) {
+func (h *Handler) applyCurrentInputFile(ctx context.Context, a *auth.RequestAuth, stdReq promptcompat.StandardRequest) (promptcompat.StandardRequest, error) {
+	if h == nil {
+		return stdReq, nil
+	}
+	return (history.Service{Store: h.Store, DS: h.DS}).ApplyCurrentInputFile(ctx, a, stdReq)
+}
+
+func mapCurrentInputFileError(err error) (int, string) {
+	return history.MapError(err)
+}
+
+func (h *Handler) handleGeminiDirectStream(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, stdReq promptcompat.StandardRequest, historySession *responsehistory.Session) {
 	start, outErr := completionruntime.StartCompletion(r.Context(), h.DS, a, stdReq, completionruntime.Options{
 		CurrentInputFile: h.Store,
 	})
 	if outErr != nil {
+		if historySession != nil {
+			historySession.Error(outErr.Status, outErr.Message, outErr.Code, "", "")
+		}
 		writeGeminiError(w, outErr.Status, outErr.Message)
 		return
 	}
 	streamReq := start.Request
-	h.handleStreamGenerateContent(w, r, start.Response, streamReq.ResponseModel, streamReq.PromptTokenText, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw)
+	h.handleStreamGenerateContent(w, r, start.Response, streamReq.ResponseModel, streamReq.PromptTokenText, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, historySession)
 }

 func (h *Handler) proxyViaOpenAI(w http.ResponseWriter, r *http.Request, stream bool) bool {
@@ -294,12 +329,11 @@ func (h *Handler) handleNonStreamGenerateContent(w http.ResponseWriter, resp *ht
 	}

 	result := sse.CollectStream(resp, thinkingEnabled, true)
-	stripReferenceMarkers := stripReferenceMarkersEnabled()
 	writeJSON(w, http.StatusOK, buildGeminiGenerateContentResponse(
 		model,
 		finalPrompt,
-		cleanVisibleOutput(result.Thinking, stripReferenceMarkers),
-		cleanVisibleOutput(result.Text, stripReferenceMarkers),
+		cleanVisibleOutput(result.Thinking, false),
+		cleanVisibleOutput(result.Text, false),
 		toolNames,
 	))
 }
--- a/internal/httpapi/gemini/handler_routes.go
+++ b/internal/httpapi/gemini/handler_routes.go
@@ -5,6 +5,7 @@ import (

 	"github.com/go-chi/chi/v5"

+	"ds2api/internal/chathistory"
 	"ds2api/internal/textclean"
 	"ds2api/internal/util"
 )
@@ -12,10 +13,11 @@ import (
 var writeJSON = util.WriteJSON

 type Handler struct {
-	Store  ConfigReader
-	Auth   AuthResolver
-	DS     DeepSeekCaller
-	OpenAI OpenAIChatRunner
+	Store       ConfigReader
+	Auth        AuthResolver
+	DS          DeepSeekCaller
+	OpenAI      OpenAIChatRunner
+	ChatHistory *chathistory.Store
 }

 //nolint:unused // used by native Gemini stream/non-stream runtime helpers.
--- a/internal/httpapi/gemini/handler_stream_runtime.go
+++ b/internal/httpapi/gemini/handler_stream_runtime.go
@@ -9,15 +9,23 @@ import (

 	"ds2api/internal/assistantturn"
 	dsprotocol "ds2api/internal/deepseek/protocol"
+	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
 	streamengine "ds2api/internal/stream"
 )

 //nolint:unused // retained for native Gemini stream handling path.
-func (h *Handler) handleStreamGenerateContent(w http.ResponseWriter, r *http.Request, resp *http.Response, model, finalPrompt string, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any) {
+func (h *Handler) handleStreamGenerateContent(w http.ResponseWriter, r *http.Request, resp *http.Response, model, finalPrompt string, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, historySessions ...*responsehistory.Session) {
+	var historySession *responsehistory.Session
+	if len(historySessions) > 0 {
+		historySession = historySessions[0]
+	}
 	defer func() { _ = resp.Body.Close() }()
 	if resp.StatusCode != http.StatusOK {
 		body, _ := io.ReadAll(resp.Body)
+		if historySession != nil {
+			historySession.Error(resp.StatusCode, strings.TrimSpace(string(body)), "error", "", "")
+		}
 		writeGeminiError(w, resp.StatusCode, strings.TrimSpace(string(body)))
 		return
 	}
@@ -29,7 +37,7 @@ func (h *Handler) handleStreamGenerateContent(w http.ResponseWriter, r *http.Req

 	rc := http.NewResponseController(w)
 	_, canFlush := w.(http.Flusher)
-	runtime := newGeminiStreamRuntime(w, rc, canFlush, model, finalPrompt, thinkingEnabled, searchEnabled, stripReferenceMarkersEnabled(), toolNames, toolsRaw)
+	runtime := newGeminiStreamRuntime(w, rc, canFlush, model, finalPrompt, thinkingEnabled, searchEnabled, stripReferenceMarkersEnabled(), toolNames, toolsRaw, historySession)

 	initialType := "text"
 	if thinkingEnabled {
@@ -70,6 +78,7 @@ type geminiStreamRuntime struct {
 	accumulator       *assistantturn.Accumulator
 	contentFilter     bool
 	responseMessageID int
+	history           *responsehistory.Session
 }

 //nolint:unused // retained for native Gemini stream handling path.
@@ -84,6 +93,7 @@ func newGeminiStreamRuntime(
 	stripReferenceMarkers bool,
 	toolNames []string,
 	toolsRaw any,
+	history *responsehistory.Session,
 ) *geminiStreamRuntime {
 	return &geminiStreamRuntime{
 		w:                     w,
@@ -97,6 +107,7 @@ func newGeminiStreamRuntime(
 		stripReferenceMarkers: stripReferenceMarkers,
 		toolNames:             toolNames,
 		toolsRaw:              toolsRaw,
+		history:               history,
 		accumulator: assistantturn.NewAccumulator(assistantturn.AccumulatorOptions{
 			ThinkingEnabled:       thinkingEnabled,
 			SearchEnabled:         searchEnabled,
@@ -170,6 +181,13 @@ func (s *geminiStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Parse
 			"modelVersion": s.model,
 		})
 	}
+	if s.history != nil {
+		rawText, text, rawThinking, thinking, detectionThinking := s.accumulator.Snapshot()
+		s.history.Progress(
+			responsehistory.ThinkingForArchive(rawThinking, detectionThinking, thinking),
+			responsehistory.TextForArchive(rawText, text),
+		)
+	}
 	return streamengine.ParsedDecision{ContentSeen: accumulated.ContentSeen}
 }

@@ -193,6 +211,15 @@ func (s *geminiStreamRuntime) finalize() {
 		ToolsRaw:              s.toolsRaw,
 	})
 	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{})
+	if s.history != nil {
+		s.history.Success(
+			http.StatusOK,
+			responsehistory.ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking),
+			responsehistory.TextForArchive(turn.RawText, turn.Text),
+			assistantturn.FinishReason(turn),
+			responsehistory.GenericUsage(turn),
+		)
+	}

 	if s.bufferContent {
 		parts := buildGeminiPartsFromTurn(turn)
--- a/internal/httpapi/gemini/handler_test.go
+++ b/internal/httpapi/gemini/handler_test.go
@@ -7,12 +7,14 @@ import (
 	"io"
 	"net/http"
 	"net/http/httptest"
+	"path/filepath"
 	"strings"
 	"testing"

 	"github.com/go-chi/chi/v5"

 	"ds2api/internal/auth"
+	"ds2api/internal/chathistory"
 	dsclient "ds2api/internal/deepseek/client"
 )

@@ -138,10 +140,12 @@ func TestGeminiDirectAppliesCurrentInputFile(t *testing.T) {
 	ds := &testGeminiDS{
 		resp: makeGeminiUpstreamResponse(`data: {"p":"response/content","v":"ok"}`),
 	}
+	historyStore := chathistory.New(filepath.Join(t.TempDir(), "history.json"))
 	h := &Handler{
-		Store: testGeminiConfig{},
-		Auth:  testGeminiAuth{},
-		DS:    ds,
+		Store:       testGeminiConfig{},
+		Auth:        testGeminiAuth{},
+		DS:          ds,
+		ChatHistory: historyStore,
 	}
 	reqBody := `{"contents":[{"role":"user","parts":[{"text":"hello from gemini"}]}]}`
 	req := httptest.NewRequest(http.MethodPost, "/v1beta/models/gemini-2.5-pro:generateContent", strings.NewReader(reqBody))
@@ -172,6 +176,29 @@ func TestGeminiDirectAppliesCurrentInputFile(t *testing.T) {
 	if !strings.Contains(prompt, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
 		t.Fatalf("expected continuation prompt, got %q", prompt)
 	}
+	snapshot, err := historyStore.Snapshot()
+	if err != nil {
+		t.Fatalf("snapshot history: %v", err)
+	}
+	if len(snapshot.Items) != 1 {
+		t.Fatalf("expected one history item, got %d", len(snapshot.Items))
+	}
+	full, err := historyStore.Get(snapshot.Items[0].ID)
+	if err != nil {
+		t.Fatalf("get history item: %v", err)
+	}
+	if full.Surface != "gemini.generate_content" {
+		t.Fatalf("unexpected surface: %q", full.Surface)
+	}
+	if full.Content != "ok" {
+		t.Fatalf("expected raw upstream content, got %q", full.Content)
+	}
+	if full.HistoryText != string(ds.uploadCalls[0].Data) {
+		t.Fatalf("expected uploaded current input file to be persisted in history text")
+	}
+	if len(full.Messages) != 1 || !strings.Contains(full.Messages[0].Content, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("expected persisted message to match upstream continuation prompt, got %#v", full.Messages)
+	}
 }

 func TestGeminiRoutesRegistered(t *testing.T) {
--- a/internal/httpapi/openai/chat/chat_history.go
+++ b/internal/httpapi/openai/chat/chat_history.go
@@ -14,9 +14,6 @@ import (
 	"ds2api/internal/promptcompat"
 )

-const adminWebUISourceHeader = "X-Ds2-Source"
-const adminWebUISourceValue = "admin-webui-api-tester"
-
 type chatHistorySession struct {
 	store       *chathistory.Store
 	entryID     string
@@ -40,6 +37,7 @@ func startChatHistory(store *chathistory.Store, r *http.Request, a *auth.Request
 	entry, err := store.Start(chathistory.StartParams{
 		CallerID:    strings.TrimSpace(a.CallerID),
 		AccountID:   strings.TrimSpace(a.AccountID),
+		Surface:     "openai.chat_completions",
 		Model:       strings.TrimSpace(stdReq.ResponseModel),
 		Stream:      stdReq.Stream,
 		UserInput:   extractSingleUserInput(stdReq.Messages),
@@ -50,6 +48,7 @@ func startChatHistory(store *chathistory.Store, r *http.Request, a *auth.Request
 	startParams := chathistory.StartParams{
 		CallerID:    strings.TrimSpace(a.CallerID),
 		AccountID:   strings.TrimSpace(a.AccountID),
+		Surface:     "openai.chat_completions",
 		Model:       strings.TrimSpace(stdReq.ResponseModel),
 		Stream:      stdReq.Stream,
 		UserInput:   extractSingleUserInput(stdReq.Messages),
@@ -82,7 +81,7 @@ func shouldCaptureChatHistory(r *http.Request) bool {
 	if isVercelStreamPrepareRequest(r) || isVercelStreamReleaseRequest(r) {
 		return false
 	}
-	return strings.TrimSpace(r.Header.Get(adminWebUISourceHeader)) != adminWebUISourceValue
+	return true
 }

 func extractSingleUserInput(messages []any) string {
@@ -188,6 +187,23 @@ func (s *chatHistorySession) stopped(thinking, content, finishReason string) {
 	})
 }

+func historyTextForArchive(raw, visible string) string {
+	if strings.TrimSpace(raw) != "" {
+		return raw
+	}
+	return visible
+}
+
+func historyThinkingForArchive(raw, detection, visible string) string {
+	if strings.TrimSpace(raw) != "" {
+		return raw
+	}
+	if strings.TrimSpace(detection) != "" {
+		return detection
+	}
+	return visible
+}
+
 func (s *chatHistorySession) retryMissingEntry() bool {
 	if s == nil || s.store == nil || s.disabled {
 		return false
--- a/internal/httpapi/openai/chat/chat_history_test.go
+++ b/internal/httpapi/openai/chat/chat_history_test.go
@@ -6,6 +6,7 @@ import (
 	"net/http/httptest"
 	"os"
 	"path/filepath"
+	"strconv"
 	"strings"
 	"sync"
 	"testing"
@@ -102,6 +103,86 @@ func TestChatCompletionsNonStreamPersistsHistory(t *testing.T) {
 	}
 }

+func TestChatHistoryNonStreamArchivesRawToolCallMarkup(t *testing.T) {
+	historyStore := newTestChatHistoryStore(t)
+	entry, err := historyStore.Start(chathistory.StartParams{
+		CallerID:  "caller:test",
+		Model:     "deepseek-v4-flash",
+		UserInput: "call tool",
+	})
+	if err != nil {
+		t.Fatalf("start history failed: %v", err)
+	}
+	session := &chatHistorySession{
+		store:       historyStore,
+		entryID:     entry.ID,
+		startedAt:   time.Now(),
+		lastPersist: time.Now().Add(-time.Second),
+		finalPrompt: "call tool",
+	}
+	rawToolCall := `<tool_calls><invoke name="search"><parameter name="q">golang</parameter></invoke></tool_calls>`
+
+	h := &Handler{}
+	rec := httptest.NewRecorder()
+	resp := makeOpenAISSEHTTPResponse(`data: {"p":"response/content","v":`+strconv.Quote(rawToolCall)+`}`, `data: [DONE]`)
+	h.handleNonStream(rec, resp, "cid-tool-history", "deepseek-v4-flash", "prompt", 0, false, false, []string{"search"}, nil, session)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	full, err := historyStore.Get(entry.ID)
+	if err != nil {
+		t.Fatalf("get detail failed: %v", err)
+	}
+	if full.Content != rawToolCall {
+		t.Fatalf("expected raw tool markup archived, got %q", full.Content)
+	}
+	if full.FinishReason != "tool_calls" {
+		t.Fatalf("expected tool_calls finish reason, got %#v", full.FinishReason)
+	}
+}
+
+func TestChatHistoryStreamArchivesRawToolCallMarkup(t *testing.T) {
+	historyStore := newTestChatHistoryStore(t)
+	entry, err := historyStore.Start(chathistory.StartParams{
+		CallerID:  "caller:test",
+		Model:     "deepseek-v4-flash",
+		Stream:    true,
+		UserInput: "call tool",
+	})
+	if err != nil {
+		t.Fatalf("start history failed: %v", err)
+	}
+	session := &chatHistorySession{
+		store:       historyStore,
+		entryID:     entry.ID,
+		startedAt:   time.Now(),
+		lastPersist: time.Now().Add(-time.Second),
+		finalPrompt: "call tool",
+	}
+	rawToolCall := `<tool_calls><invoke name="search"><parameter name="q">golang</parameter></invoke></tool_calls>`
+
+	h := &Handler{}
+	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", nil)
+	rec := httptest.NewRecorder()
+	resp := makeOpenAISSEHTTPResponse(`data: {"p":"response/content","v":`+strconv.Quote(rawToolCall)+`}`, `data: [DONE]`)
+	h.handleStream(rec, req, resp, "cid-stream-tool-history", "deepseek-v4-flash", "prompt", 0, false, false, []string{"search"}, nil, session)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	full, err := historyStore.Get(entry.ID)
+	if err != nil {
+		t.Fatalf("get detail failed: %v", err)
+	}
+	if full.Content != rawToolCall {
+		t.Fatalf("expected raw streamed tool markup archived, got %q", full.Content)
+	}
+	if full.FinishReason != "tool_calls" {
+		t.Fatalf("expected tool_calls finish reason, got %#v", full.FinishReason)
+	}
+}
+
 func TestStartChatHistoryRecoversFromTransientWriteFailure(t *testing.T) {
 	historyStore := newTestChatHistoryStore(t)
 	restore := blockChatHistoryDetailDir(t, historyStore.DetailDir())
@@ -213,7 +294,7 @@ func TestHandleStreamContextCancelledMarksHistoryStopped(t *testing.T) {
 	}
 }

-func TestChatCompletionsSkipsAdminWebUISource(t *testing.T) {
+func TestChatCompletionsRecordsAdminWebUISource(t *testing.T) {
 	historyStore := newTestChatHistoryStore(t)
 	h := &Handler{
 		Store:       mockOpenAIConfig{},
@@ -226,7 +307,7 @@ func TestChatCompletionsSkipsAdminWebUISource(t *testing.T) {
 	req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", strings.NewReader(reqBody))
 	req.Header.Set("Authorization", "Bearer direct-token")
 	req.Header.Set("Content-Type", "application/json")
-	req.Header.Set(adminWebUISourceHeader, adminWebUISourceValue)
+	req.Header.Set("X-Ds2-Source", "admin-webui-api-tester")
 	rec := httptest.NewRecorder()
 	h.ChatCompletions(rec, req)

@@ -237,8 +318,8 @@ func TestChatCompletionsSkipsAdminWebUISource(t *testing.T) {
 	if err != nil {
 		t.Fatalf("snapshot failed: %v", err)
 	}
-	if len(snapshot.Items) != 0 {
-		t.Fatalf("expected admin webui source to be skipped, got %#v", snapshot.Items)
+	if len(snapshot.Items) != 1 {
+		t.Fatalf("expected admin webui source to be recorded, got %#v", snapshot.Items)
 	}
 }

--- a/internal/httpapi/openai/chat/chat_stream_runtime.go
+++ b/internal/httpapi/openai/chat/chat_stream_runtime.go
@@ -195,6 +195,24 @@ func (s *chatStreamRuntime) markContextCancelled() {
 	s.finalFinishReason = string(streamengine.StopReasonContextCancelled)
 }

+func (s *chatStreamRuntime) historyText() string {
+	if s == nil {
+		return ""
+	}
+	return historyTextForArchive(s.accumulator.RawText.String(), s.finalText)
+}
+
+func (s *chatStreamRuntime) historyThinking() string {
+	if s == nil {
+		return ""
+	}
+	return historyThinkingForArchive(
+		s.accumulator.RawThinking.String(),
+		s.accumulator.ToolDetectionThinking.String(),
+		s.finalThinking,
+	)
+}
+
 func (s *chatStreamRuntime) resetStreamToolCallState() {
 	s.streamToolCallIDs = map[int]string{}
 	s.streamToolNames = map[int]string{}
--- a/internal/httpapi/openai/chat/empty_retry_runtime.go
+++ b/internal/httpapi/openai/chat/empty_retry_runtime.go
@@ -31,6 +31,14 @@ type chatNonStreamResult struct {
 	outputError           *assistantturn.OutputError
 }

+func (r chatNonStreamResult) historyText() string {
+	return historyTextForArchive(r.rawText, r.text)
+}
+
+func (r chatNonStreamResult) historyThinking() string {
+	return historyThinkingForArchive(r.rawThinking, r.toolDetectionThinking, r.thinking)
+}
+
 func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Context, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, historySession *chatHistorySession) {
 	attempts := 0
 	currentResp := resp
@@ -70,7 +78,7 @@ func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Co
 		nextResp, err := h.DS.CallCompletion(ctx, a, retryPayload, retryPow, 3)
 		if err != nil {
 			if historySession != nil {
-				historySession.error(http.StatusInternalServerError, "Failed to get completion.", "error", result.thinking, result.text)
+				historySession.error(http.StatusInternalServerError, "Failed to get completion.", "error", result.historyThinking(), result.historyText())
 			}
 			writeOpenAIError(w, http.StatusInternalServerError, "Failed to get completion.")
 			config.Logger.Warn("[openai_empty_retry] retry request failed", "surface", "chat.completions", "stream", false, "retry_attempt", attempts, "error", err)
@@ -90,12 +98,11 @@ func (h *Handler) collectChatNonStreamAttempt(w http.ResponseWriter, resp *http.
 	}
 	result := sse.CollectStream(resp, thinkingEnabled, true)
 	turn := assistantturn.BuildTurnFromCollected(result, assistantturn.BuildOptions{
-		Model:                 model,
-		Prompt:                usagePrompt,
-		SearchEnabled:         searchEnabled,
-		StripReferenceMarkers: stripReferenceMarkersEnabled(),
-		ToolNames:             toolNames,
-		ToolsRaw:              toolsRaw,
+		Model:         model,
+		Prompt:        usagePrompt,
+		SearchEnabled: searchEnabled,
+		ToolNames:     toolNames,
+		ToolsRaw:      toolsRaw,
 	})
 	respBody := openaifmt.BuildChatCompletionWithToolCalls(completionID, model, usagePrompt, turn.Thinking, turn.Text, turn.ToolCalls, toolsRaw)
 	return chatNonStreamResult{
@@ -120,14 +127,14 @@ func (h *Handler) finishChatNonStreamResult(w http.ResponseWriter, result chatNo
 			status, message, code = result.outputError.Status, result.outputError.Message, result.outputError.Code
 		}
 		if historySession != nil {
-			historySession.error(status, message, code, result.thinking, result.text)
+			historySession.error(status, message, code, result.historyThinking(), result.historyText())
 		}
 		writeOpenAIErrorWithCode(w, status, message, code)
 		config.Logger.Info("[openai_empty_retry] terminal empty output", "surface", "chat.completions", "stream", false, "retry_attempts", attempts, "success_source", "none", "content_filter", result.contentFilter)
 		return
 	}
 	if historySession != nil {
-		historySession.success(http.StatusOK, result.thinking, result.text, result.finishReason, openaifmt.BuildChatUsageForModel("", usagePrompt, result.thinking, result.text, refFileTokens))
+		historySession.success(http.StatusOK, result.historyThinking(), result.historyText(), result.finishReason, openaifmt.BuildChatUsageForModel("", usagePrompt, result.thinking, result.text, refFileTokens))
 	}
 	writeJSON(w, http.StatusOK, result.body)
 	source := "first_attempt"
@@ -246,7 +253,7 @@ func (h *Handler) consumeChatStreamAttempt(r *http.Request, resp *http.Response,
 		OnParsed: func(parsed sse.LineResult) streamengine.ParsedDecision {
 			decision := streamRuntime.onParsed(parsed)
 			if historySession != nil {
-				historySession.progress(streamRuntime.accumulator.Thinking.String(), streamRuntime.accumulator.Text.String())
+				historySession.progress(streamRuntime.historyThinking(), streamRuntime.historyText())
 			}
 			return decision
 		},
@@ -258,7 +265,7 @@ func (h *Handler) consumeChatStreamAttempt(r *http.Request, resp *http.Response,
 		OnContextDone: func() {
 			streamRuntime.markContextCancelled()
 			if historySession != nil {
-				historySession.stopped(streamRuntime.accumulator.Thinking.String(), streamRuntime.accumulator.Text.String(), string(streamengine.StopReasonContextCancelled))
+				historySession.stopped(streamRuntime.historyThinking(), streamRuntime.historyText(), string(streamengine.StopReasonContextCancelled))
 			}
 		},
 	})
@@ -278,16 +285,16 @@ func recordChatStreamHistory(streamRuntime *chatStreamRuntime, historySession *c
 		return
 	}
 	if streamRuntime.finalErrorMessage != "" {
-		historySession.error(streamRuntime.finalErrorStatus, streamRuntime.finalErrorMessage, streamRuntime.finalErrorCode, streamRuntime.accumulator.Thinking.String(), streamRuntime.accumulator.Text.String())
+		historySession.error(streamRuntime.finalErrorStatus, streamRuntime.finalErrorMessage, streamRuntime.finalErrorCode, streamRuntime.historyThinking(), streamRuntime.historyText())
 		return
 	}
-	historySession.success(http.StatusOK, streamRuntime.finalThinking, streamRuntime.finalText, streamRuntime.finalFinishReason, streamRuntime.finalUsage)
+	historySession.success(http.StatusOK, streamRuntime.historyThinking(), streamRuntime.historyText(), streamRuntime.finalFinishReason, streamRuntime.finalUsage)
 }

 func failChatStreamRetry(streamRuntime *chatStreamRuntime, historySession *chatHistorySession, status int, message, code string) {
 	streamRuntime.sendFailedChunk(status, message, code)
 	if historySession != nil {
-		historySession.error(status, message, code, streamRuntime.accumulator.Thinking.String(), streamRuntime.accumulator.Text.String())
+		historySession.error(status, message, code, streamRuntime.historyThinking(), streamRuntime.historyText())
 	}
 }

--- a/internal/httpapi/openai/chat/handler_chat.go
+++ b/internal/httpapi/openai/chat/handler_chat.go
@@ -80,14 +80,13 @@ func (h *Handler) ChatCompletions(w http.ResponseWriter, r *http.Request) {

 	if !stdReq.Stream {
 		result, outErr := completionruntime.ExecuteNonStreamWithRetry(r.Context(), h.DS, a, stdReq, completionruntime.Options{
-			StripReferenceMarkers: stripReferenceMarkersEnabled(),
-			RetryEnabled:          true,
-			CurrentInputFile:      h.Store,
+			RetryEnabled:     true,
+			CurrentInputFile: h.Store,
 		})
 		sessionID = result.SessionID
 		if outErr != nil {
 			if historySession != nil {
-				historySession.error(outErr.Status, outErr.Message, outErr.Code, result.Turn.Thinking, result.Turn.Text)
+				historySession.error(outErr.Status, outErr.Message, outErr.Code, historyThinkingForArchive(result.Turn.RawThinking, result.Turn.DetectionThinking, result.Turn.Thinking), historyTextForArchive(result.Turn.RawText, result.Turn.Text))
 			}
 			writeOpenAIErrorWithCode(w, outErr.Status, outErr.Message, outErr.Code)
 			return
@@ -96,7 +95,7 @@ func (h *Handler) ChatCompletions(w http.ResponseWriter, r *http.Request) {
 		respBody["usage"] = assistantturn.OpenAIChatUsage(result.Turn)
 		finishReason := assistantturn.FinalizeTurn(result.Turn, assistantturn.FinalizeOptions{}).FinishReason
 		if historySession != nil {
-			historySession.success(http.StatusOK, result.Turn.Thinking, result.Turn.Text, finishReason, assistantturn.OpenAIChatUsage(result.Turn))
+			historySession.success(http.StatusOK, historyThinkingForArchive(result.Turn.RawThinking, result.Turn.DetectionThinking, result.Turn.Thinking), historyTextForArchive(result.Turn.RawText, result.Turn.Text), finishReason, assistantturn.OpenAIChatUsage(result.Turn))
 		}
 		writeJSON(w, http.StatusOK, respBody)
 		return
@@ -164,20 +163,19 @@ func (h *Handler) handleNonStream(w http.ResponseWriter, resp *http.Response, co
 	result := sse.CollectStream(resp, thinkingEnabled, true)

 	turn := assistantturn.BuildTurnFromCollected(result, assistantturn.BuildOptions{
-		Model:                 model,
-		Prompt:                finalPrompt,
-		RefFileTokens:         refFileTokens,
-		SearchEnabled:         searchEnabled,
-		StripReferenceMarkers: stripReferenceMarkersEnabled(),
-		ToolNames:             toolNames,
-		ToolsRaw:              toolsRaw,
-		ToolChoice:            promptcompat.DefaultToolChoicePolicy(),
+		Model:         model,
+		Prompt:        finalPrompt,
+		RefFileTokens: refFileTokens,
+		SearchEnabled: searchEnabled,
+		ToolNames:     toolNames,
+		ToolsRaw:      toolsRaw,
+		ToolChoice:    promptcompat.DefaultToolChoicePolicy(),
 	})
 	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{})
 	if outcome.ShouldFail {
 		status, message, code := outcome.Error.Status, outcome.Error.Message, outcome.Error.Code
 		if historySession != nil {
-			historySession.error(status, message, code, turn.Thinking, turn.Text)
+			historySession.error(status, message, code, historyThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking), historyTextForArchive(turn.RawText, turn.Text))
 		}
 		writeOpenAIErrorWithCode(w, status, message, code)
 		return
@@ -185,7 +183,7 @@ func (h *Handler) handleNonStream(w http.ResponseWriter, resp *http.Response, co
 	respBody := openaifmt.BuildChatCompletionWithToolCalls(completionID, model, finalPrompt, turn.Thinking, turn.Text, turn.ToolCalls, toolsRaw)
 	respBody["usage"] = assistantturn.OpenAIChatUsage(turn)
 	if historySession != nil {
-		historySession.success(http.StatusOK, turn.Thinking, turn.Text, outcome.FinishReason, assistantturn.OpenAIChatUsage(turn))
+		historySession.success(http.StatusOK, historyThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking), historyTextForArchive(turn.RawText, turn.Text), outcome.FinishReason, assistantturn.OpenAIChatUsage(turn))
 	}
 	writeJSON(w, http.StatusOK, respBody)
 }
@@ -253,7 +251,7 @@ func (h *Handler) handleStream(w http.ResponseWriter, r *http.Request, resp *htt
 		OnParsed: func(parsed sse.LineResult) streamengine.ParsedDecision {
 			decision := streamRuntime.onParsed(parsed)
 			if historySession != nil {
-				historySession.progress(streamRuntime.accumulator.Thinking.String(), streamRuntime.accumulator.Text.String())
+				historySession.progress(streamRuntime.historyThinking(), streamRuntime.historyText())
 			}
 			return decision
 		},
@@ -267,14 +265,15 @@ func (h *Handler) handleStream(w http.ResponseWriter, r *http.Request, resp *htt
 				return
 			}
 			if streamRuntime.finalErrorMessage != "" {
-				historySession.error(streamRuntime.finalErrorStatus, streamRuntime.finalErrorMessage, streamRuntime.finalErrorCode, streamRuntime.accumulator.Thinking.String(), streamRuntime.accumulator.Text.String())
+				historySession.error(streamRuntime.finalErrorStatus, streamRuntime.finalErrorMessage, streamRuntime.finalErrorCode, streamRuntime.historyThinking(), streamRuntime.historyText())
 				return
 			}
-			historySession.success(http.StatusOK, streamRuntime.finalThinking, streamRuntime.finalText, streamRuntime.finalFinishReason, streamRuntime.finalUsage)
+			historySession.success(http.StatusOK, streamRuntime.historyThinking(), streamRuntime.historyText(), streamRuntime.finalFinishReason, streamRuntime.finalUsage)
 		},
 		OnContextDone: func() {
+			streamRuntime.markContextCancelled()
 			if historySession != nil {
-				historySession.stopped(streamRuntime.accumulator.Thinking.String(), streamRuntime.accumulator.Text.String(), string(streamengine.StopReasonContextCancelled))
+				historySession.stopped(streamRuntime.historyThinking(), streamRuntime.historyText(), string(streamengine.StopReasonContextCancelled))
 			}
 		},
 	})
--- a/internal/httpapi/openai/responses/empty_retry_runtime.go
+++ b/internal/httpapi/openai/responses/empty_retry_runtime.go
@@ -10,11 +10,12 @@ import (
 	"ds2api/internal/config"
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	"ds2api/internal/promptcompat"
+	"ds2api/internal/responsehistory"
 	streamengine "ds2api/internal/stream"
 )

-func (h *Handler) handleResponsesStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string) {
-	streamRuntime, initialType, ok := h.prepareResponsesStreamRuntime(w, resp, owner, responseID, model, finalPrompt, refFileTokens, thinkingEnabled, searchEnabled, toolNames, toolsRaw, toolChoice, traceID)
+func (h *Handler) handleResponsesStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string, historySession *responsehistory.Session) {
+	streamRuntime, initialType, ok := h.prepareResponsesStreamRuntime(w, resp, owner, responseID, model, finalPrompt, refFileTokens, thinkingEnabled, searchEnabled, toolNames, toolsRaw, toolChoice, traceID, historySession)
 	if !ok {
 		return
 	}
@@ -55,10 +56,13 @@ func (h *Handler) handleResponsesStreamWithRetry(w http.ResponseWriter, r *http.
 	}
 }

-func (h *Handler) prepareResponsesStreamRuntime(w http.ResponseWriter, resp *http.Response, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string) (*responsesStreamRuntime, string, bool) {
+func (h *Handler) prepareResponsesStreamRuntime(w http.ResponseWriter, resp *http.Response, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string, historySession *responsehistory.Session) (*responsesStreamRuntime, string, bool) {
 	if resp.StatusCode != http.StatusOK {
 		defer func() { _ = resp.Body.Close() }()
 		body, _ := io.ReadAll(resp.Body)
+		if historySession != nil {
+			historySession.Error(resp.StatusCode, strings.TrimSpace(string(body)), "error", "", "")
+		}
 		writeOpenAIError(w, resp.StatusCode, strings.TrimSpace(string(body)))
 		return nil, "", false
 	}
@@ -78,7 +82,7 @@ func (h *Handler) prepareResponsesStreamRuntime(w http.ResponseWriter, resp *htt
 		h.toolcallFeatureMatchEnabled() && h.toolcallEarlyEmitHighConfidence(),
 		toolChoice, traceID, func(obj map[string]any) {
 			h.getResponseStore().put(owner, responseID, obj)
-		},
+		}, historySession,
 	)
 	streamRuntime.refFileTokens = refFileTokens
 	streamRuntime.sendCreated()
--- a/internal/httpapi/openai/responses/empty_retry_runtime_test.go
+++ b/internal/httpapi/openai/responses/empty_retry_runtime_test.go
@@ -47,6 +47,7 @@ func TestConsumeResponsesStreamAttemptMarksContextCancelledState(t *testing.T) {
 		promptcompat.DefaultToolChoicePolicy(),
 		"",
 		nil,
+		nil,
 	)
 	resp := makeResponsesOpenAISSEHTTPResponse(
 		`data: {"p":"response/content","v":"hello"}`,
--- a/internal/httpapi/openai/responses/responses_handler.go
+++ b/internal/httpapi/openai/responses/responses_handler.go
@@ -18,6 +18,7 @@ import (
 	dsprotocol "ds2api/internal/deepseek/protocol"
 	openaifmt "ds2api/internal/format/openai"
 	"ds2api/internal/promptcompat"
+	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
 	streamengine "ds2api/internal/stream"
 )
@@ -95,16 +96,28 @@ func (h *Handler) Responses(w http.ResponseWriter, r *http.Request) {
 	}

 	responseID := "resp_" + strings.ReplaceAll(uuid.NewString(), "-", "")
+	historySession := responsehistory.Start(responsehistory.StartParams{
+		Store:    h.ChatHistory,
+		Request:  r,
+		Auth:     a,
+		Surface:  "openai.responses",
+		Standard: stdReq,
+	})
 	if !stdReq.Stream {
 		result, outErr := completionruntime.ExecuteNonStreamWithRetry(r.Context(), h.DS, a, stdReq, completionruntime.Options{
-			StripReferenceMarkers: stripReferenceMarkersEnabled(),
-			RetryEnabled:          true,
-			CurrentInputFile:      h.Store,
+			RetryEnabled:     true,
+			CurrentInputFile: h.Store,
 		})
 		if outErr != nil {
+			if historySession != nil {
+				historySession.ErrorTurn(outErr.Status, outErr.Message, outErr.Code, result.Turn)
+			}
 			writeOpenAIErrorWithCode(w, outErr.Status, outErr.Message, outErr.Code)
 			return
 		}
+		if historySession != nil {
+			historySession.SuccessTurn(http.StatusOK, result.Turn, assistantturn.OpenAIResponsesUsage(result.Turn))
+		}
 		responseObj := openaifmt.BuildResponseObjectWithToolCalls(responseID, stdReq.ResponseModel, result.Turn.Prompt, result.Turn.Thinking, result.Turn.Text, result.Turn.ToolCalls, stdReq.ToolsRaw)
 		responseObj["usage"] = assistantturn.OpenAIResponsesUsage(result.Turn)
 		h.getResponseStore().put(owner, responseID, responseObj)
@@ -116,13 +129,16 @@ func (h *Handler) Responses(w http.ResponseWriter, r *http.Request) {
 		CurrentInputFile: h.Store,
 	})
 	if outErr != nil {
+		if historySession != nil {
+			historySession.Error(outErr.Status, outErr.Message, outErr.Code, "", "")
+		}
 		writeOpenAIErrorWithCode(w, outErr.Status, outErr.Message, outErr.Code)
 		return
 	}

 	streamReq := start.Request
 	refFileTokens := streamReq.RefFileTokens
-	h.handleResponsesStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, owner, responseID, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, traceID)
+	h.handleResponsesStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, owner, responseID, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, traceID, historySession)
 }

 func (h *Handler) handleResponsesNonStream(w http.ResponseWriter, resp *http.Response, owner, responseID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, traceID string) {
@@ -135,14 +151,13 @@ func (h *Handler) handleResponsesNonStream(w http.ResponseWriter, resp *http.Res
 	result := sse.CollectStream(resp, thinkingEnabled, true)

 	turn := assistantturn.BuildTurnFromCollected(result, assistantturn.BuildOptions{
-		Model:                 model,
-		Prompt:                finalPrompt,
-		RefFileTokens:         refFileTokens,
-		SearchEnabled:         searchEnabled,
-		StripReferenceMarkers: stripReferenceMarkersEnabled(),
-		ToolNames:             toolNames,
-		ToolsRaw:              toolsRaw,
-		ToolChoice:            toolChoice,
+		Model:         model,
+		Prompt:        finalPrompt,
+		RefFileTokens: refFileTokens,
+		SearchEnabled: searchEnabled,
+		ToolNames:     toolNames,
+		ToolsRaw:      toolsRaw,
+		ToolChoice:    toolChoice,
 	})
 	logResponsesToolPolicyRejection(traceID, toolChoice, turn.ParsedToolCalls, "text")
 	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{})
@@ -198,6 +213,7 @@ func (h *Handler) handleResponsesStream(w http.ResponseWriter, r *http.Request,
 		func(obj map[string]any) {
 			h.getResponseStore().put(owner, responseID, obj)
 		},
+		nil,
 	)
 	streamRuntime.refFileTokens = refFileTokens
 	streamRuntime.sendCreated()
--- a/internal/httpapi/openai/responses/responses_history_test.go
+++ b/internal/httpapi/openai/responses/responses_history_test.go
@@ -0,0 +1,100 @@
+package responses
+
+import (
+	"context"
+	"io"
+	"net/http"
+	"net/http/httptest"
+	"path/filepath"
+	"strings"
+	"testing"
+
+	"github.com/go-chi/chi/v5"
+
+	"ds2api/internal/auth"
+	"ds2api/internal/chathistory"
+	dsclient "ds2api/internal/deepseek/client"
+)
+
+type responsesHistoryDS struct {
+	payload map[string]any
+}
+
+func (d *responsesHistoryDS) CreateSession(context.Context, *auth.RequestAuth, int) (string, error) {
+	return "session-id", nil
+}
+
+func (d *responsesHistoryDS) GetPow(context.Context, *auth.RequestAuth, int) (string, error) {
+	return "pow", nil
+}
+
+func (d *responsesHistoryDS) UploadFile(context.Context, *auth.RequestAuth, dsclient.UploadFileRequest, int) (*dsclient.UploadFileResult, error) {
+	return &dsclient.UploadFileResult{ID: "file-id"}, nil
+}
+
+func (d *responsesHistoryDS) CallCompletion(_ context.Context, _ *auth.RequestAuth, payload map[string]any, _ string, _ int) (*http.Response, error) {
+	d.payload = payload
+	return &http.Response{
+		StatusCode: http.StatusOK,
+		Header:     make(http.Header),
+		Body:       io.NopCloser(strings.NewReader("data: {\"p\":\"response/content\",\"v\":\"ok\"}\n")),
+	}, nil
+}
+
+func (d *responsesHistoryDS) DeleteSessionForToken(context.Context, string, string) (*dsclient.DeleteSessionResult, error) {
+	return &dsclient.DeleteSessionResult{Success: true}, nil
+}
+
+func (d *responsesHistoryDS) DeleteAllSessionsForToken(context.Context, string) error {
+	return nil
+}
+
+func TestResponsesRecordsResponseHistory(t *testing.T) {
+	store, resolver := newDirectTokenResolver(t)
+	historyStore := chathistory.New(filepath.Join(t.TempDir(), "history.json"))
+	ds := &responsesHistoryDS{}
+	h := &Handler{
+		Store:       store,
+		Auth:        resolver,
+		DS:          ds,
+		ChatHistory: historyStore,
+	}
+	r := chi.NewRouter()
+	RegisterRoutes(r, h)
+
+	req := httptest.NewRequest(http.MethodPost, "/v1/responses", strings.NewReader(`{"model":"deepseek-v4-flash","input":"hello responses"}`))
+	req.Header.Set("Authorization", "Bearer direct-token")
+	req.Header.Set("Content-Type", "application/json")
+	rec := httptest.NewRecorder()
+	r.ServeHTTP(rec, req)
+
+	if rec.Code != http.StatusOK {
+		t.Fatalf("expected 200, got %d body=%s", rec.Code, rec.Body.String())
+	}
+	if ds.payload == nil {
+		t.Fatalf("expected upstream payload to be sent")
+	}
+	snapshot, err := historyStore.Snapshot()
+	if err != nil {
+		t.Fatalf("snapshot history: %v", err)
+	}
+	if len(snapshot.Items) != 1 {
+		t.Fatalf("expected one history item, got %d", len(snapshot.Items))
+	}
+	item, err := historyStore.Get(snapshot.Items[0].ID)
+	if err != nil {
+		t.Fatalf("get history item: %v", err)
+	}
+	if item.Surface != "openai.responses" {
+		t.Fatalf("unexpected surface: %q", item.Surface)
+	}
+	if !strings.Contains(item.UserInput, "Continue from the latest state in the attached DS2API_HISTORY.txt context.") {
+		t.Fatalf("unexpected user input: %q", item.UserInput)
+	}
+	if !strings.Contains(item.HistoryText, "hello responses") {
+		t.Fatalf("expected original input in persisted history text, got %q", item.HistoryText)
+	}
+	if item.Content != "ok" {
+		t.Fatalf("expected raw upstream content, got %q", item.Content)
+	}
+}
--- a/internal/httpapi/openai/responses/responses_stream_runtime_core.go
+++ b/internal/httpapi/openai/responses/responses_stream_runtime_core.go
@@ -10,6 +10,7 @@ import (
 	openaifmt "ds2api/internal/format/openai"
 	"ds2api/internal/httpapi/openai/shared"
 	"ds2api/internal/promptcompat"
+	"ds2api/internal/responsehistory"
 	"ds2api/internal/sse"
 	streamengine "ds2api/internal/stream"
 	"ds2api/internal/toolstream"
@@ -61,6 +62,7 @@ type responsesStreamRuntime struct {
 	finalErrorCode    string

 	persistResponse func(obj map[string]any)
+	history         *responsehistory.Session
 }

 func newResponsesStreamRuntime(
@@ -80,6 +82,7 @@ func newResponsesStreamRuntime(
 	toolChoice promptcompat.ToolChoicePolicy,
 	traceID string,
 	persistResponse func(obj map[string]any),
+	history *responsehistory.Session,
 ) *responsesStreamRuntime {
 	return &responsesStreamRuntime{
 		w:                     w,
@@ -106,6 +109,7 @@ func newResponsesStreamRuntime(
 		toolChoice:            toolChoice,
 		traceID:               traceID,
 		persistResponse:       persistResponse,
+		history:               history,
 		accumulator: shared.StreamAccumulator{
 			ThinkingEnabled:       thinkingEnabled,
 			SearchEnabled:         searchEnabled,
@@ -138,6 +142,9 @@ func (s *responsesStreamRuntime) failResponse(status int, message, code string)
 	if s.persistResponse != nil {
 		s.persistResponse(failedResp)
 	}
+	if s.history != nil {
+		s.history.Error(status, message, code, responsehistory.ThinkingForArchive(s.accumulator.RawThinking.String(), s.accumulator.ToolDetectionThinking.String(), s.accumulator.Thinking.String()), responsehistory.TextForArchive(s.accumulator.RawText.String(), s.accumulator.Text.String()))
+	}
 	s.sendEvent("response.failed", openaifmt.BuildResponsesFailedPayload(s.responseID, s.model, status, message, code))
 	s.sendDone()
 }
@@ -214,6 +221,15 @@ func (s *responsesStreamRuntime) finalize(finishReason string, deferEmptyOutput
 	if s.persistResponse != nil {
 		s.persistResponse(obj)
 	}
+	if s.history != nil {
+		s.history.Success(
+			http.StatusOK,
+			responsehistory.ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking),
+			responsehistory.TextForArchive(turn.RawText, turn.Text),
+			outcome.FinishReason,
+			assistantturn.OpenAIResponsesUsage(turn),
+		)
+	}
 	s.sendEvent("response.completed", openaifmt.BuildResponsesCompletedPayload(obj))
 	s.sendDone()
 	return true
@@ -272,5 +288,11 @@ func (s *responsesStreamRuntime) onParsed(parsed sse.LineResult) streamengine.Pa
 	}

 	batch.flush()
+	if s.history != nil {
+		s.history.Progress(
+			responsehistory.ThinkingForArchive(s.accumulator.RawThinking.String(), s.accumulator.ToolDetectionThinking.String(), s.accumulator.Thinking.String()),
+			responsehistory.TextForArchive(s.accumulator.RawText.String(), s.accumulator.Text.String()),
+		)
+	}
 	return streamengine.ParsedDecision{ContentSeen: accumulated.ContentSeen}
 }
--- a/internal/httpapi/openai/shared/stream_accumulator.go
+++ b/internal/httpapi/openai/shared/stream_accumulator.go
@@ -89,11 +89,11 @@ func (a *StreamAccumulator) applyTextPart(text string) StreamPartDelta {
 	}
 	a.RawText.WriteString(rawTrimmed)
 	delta := StreamPartDelta{Type: "text", RawText: rawTrimmed}
-	cleanedText := CleanVisibleOutput(rawTrimmed, a.StripReferenceMarkers)
-	if a.SearchEnabled && sse.IsCitation(cleanedText) {
+	if a.SearchEnabled && sse.IsCitation(rawTrimmed) {
 		delta.CitationOnly = true
 		return delta
 	}
+	cleanedText := CleanVisibleOutput(rawTrimmed, a.StripReferenceMarkers)
 	trimmed := sse.TrimContinuationOverlapFromBuilder(&a.Text, cleanedText)
 	if trimmed == "" {
 		return delta
--- a/internal/httpapi/openai/shared/stream_accumulator_test.go
+++ b/internal/httpapi/openai/shared/stream_accumulator_test.go
@@ -95,3 +95,21 @@ func TestStreamAccumulatorSuppressesCitationTextWhenSearchEnabled(t *testing.T)
 		t.Fatalf("visible text = %q", got)
 	}
 }
+
+func TestStreamAccumulatorStripsInlineCitationAndReferenceMarkers(t *testing.T) {
+	acc := StreamAccumulator{SearchEnabled: true, StripReferenceMarkers: true}
+	result := acc.Apply(sse.LineResult{
+		Parsed: true,
+		Parts:  []sse.ContentPart{{Type: "text", Text: "广州天气[citation:1] 多云[reference:0]"}},
+	})
+
+	if !result.ContentSeen {
+		t.Fatalf("expected marker chunk to mark upstream content")
+	}
+	if got := acc.Text.String(); got != "广州天气 多云" {
+		t.Fatalf("visible text = %q", got)
+	}
+	if len(result.Parts) != 1 || result.Parts[0].VisibleText != "广州天气 多云" {
+		t.Fatalf("unexpected parts: %#v", result.Parts)
+	}
+}
--- a/internal/js/chat-stream/sse_parse_impl.js
+++ b/internal/js/chat-stream/sse_parse_impl.js
@@ -621,7 +621,7 @@ function stripReferenceMarkersText(text) {
  if (!text) {
    return text;
  }
-  return text.replace(/\[reference:\s*\d+\]/gi, '');
+  return text.replace(/\[(?:citation|reference):\s*\d+\]/gi, '');
 }

 function asString(v) {
--- a/internal/responsehistory/session.go
+++ b/internal/responsehistory/session.go
@@ -0,0 +1,289 @@
+package responsehistory
+
+import (
+	"errors"
+	"net/http"
+	"strings"
+	"time"
+
+	"ds2api/internal/assistantturn"
+	"ds2api/internal/auth"
+	"ds2api/internal/chathistory"
+	"ds2api/internal/config"
+	"ds2api/internal/prompt"
+	"ds2api/internal/promptcompat"
+)
+
+type Session struct {
+	store       *chathistory.Store
+	entryID     string
+	startedAt   time.Time
+	lastPersist time.Time
+	startParams chathistory.StartParams
+	disabled    bool
+}
+
+type StartParams struct {
+	Store    *chathistory.Store
+	Request  *http.Request
+	Auth     *auth.RequestAuth
+	Surface  string
+	Standard promptcompat.StandardRequest
+}
+
+func Start(params StartParams) *Session {
+	if params.Store == nil || params.Request == nil || params.Auth == nil {
+		return nil
+	}
+	if !params.Store.Enabled() || !shouldCapture(params.Request) {
+		return nil
+	}
+	startParams := chathistory.StartParams{
+		CallerID:    strings.TrimSpace(params.Auth.CallerID),
+		AccountID:   strings.TrimSpace(params.Auth.AccountID),
+		Surface:     strings.TrimSpace(params.Surface),
+		Model:       strings.TrimSpace(params.Standard.ResponseModel),
+		Stream:      params.Standard.Stream,
+		UserInput:   ExtractSingleUserInput(params.Standard.Messages),
+		Messages:    ExtractAllMessages(params.Standard.Messages),
+		HistoryText: params.Standard.HistoryText,
+		FinalPrompt: params.Standard.FinalPrompt,
+	}
+	entry, err := params.Store.Start(startParams)
+	session := &Session{
+		store:       params.Store,
+		entryID:     entry.ID,
+		startedAt:   time.Now(),
+		lastPersist: time.Now(),
+		startParams: startParams,
+	}
+	if err != nil {
+		if entry.ID == "" {
+			config.Logger.Warn("[response_history] start failed", "surface", startParams.Surface, "error", err)
+			return nil
+		}
+		config.Logger.Warn("[response_history] start persisted in memory after write failure", "surface", startParams.Surface, "error", err)
+	}
+	return session
+}
+
+func shouldCapture(r *http.Request) bool {
+	if r == nil || r.URL == nil {
+		return false
+	}
+	if strings.TrimSpace(r.URL.Query().Get("__stream_prepare")) == "1" {
+		return false
+	}
+	if strings.TrimSpace(r.URL.Query().Get("__stream_release")) == "1" {
+		return false
+	}
+	return true
+}
+
+func ExtractSingleUserInput(messages []any) string {
+	for i := len(messages) - 1; i >= 0; i-- {
+		msg, ok := messages[i].(map[string]any)
+		if !ok {
+			continue
+		}
+		role := strings.ToLower(strings.TrimSpace(asString(msg["role"])))
+		if role != "user" {
+			continue
+		}
+		if normalized := strings.TrimSpace(prompt.NormalizeContent(msg["content"])); normalized != "" {
+			return normalized
+		}
+	}
+	return ""
+}
+
+func ExtractAllMessages(messages []any) []chathistory.Message {
+	out := make([]chathistory.Message, 0, len(messages))
+	for _, raw := range messages {
+		msg, ok := raw.(map[string]any)
+		if !ok {
+			continue
+		}
+		role := strings.ToLower(strings.TrimSpace(asString(msg["role"])))
+		content := strings.TrimSpace(prompt.NormalizeContent(msg["content"]))
+		if role == "" || content == "" {
+			continue
+		}
+		out = append(out, chathistory.Message{
+			Role:    role,
+			Content: content,
+		})
+	}
+	return out
+}
+
+func (s *Session) Progress(thinking, content string) {
+	if s == nil || s.store == nil || s.disabled {
+		return
+	}
+	now := time.Now()
+	if now.Sub(s.lastPersist) < 250*time.Millisecond {
+		return
+	}
+	s.lastPersist = now
+	s.persistUpdate(chathistory.UpdateParams{
+		Status:           "streaming",
+		ReasoningContent: thinking,
+		Content:          content,
+		StatusCode:       http.StatusOK,
+		ElapsedMs:        time.Since(s.startedAt).Milliseconds(),
+	})
+}
+
+func (s *Session) Success(statusCode int, thinking, content, finishReason string, usage map[string]any) {
+	if s == nil || s.store == nil || s.disabled {
+		return
+	}
+	s.persistUpdate(chathistory.UpdateParams{
+		Status:           "success",
+		ReasoningContent: thinking,
+		Content:          content,
+		StatusCode:       statusCode,
+		ElapsedMs:        time.Since(s.startedAt).Milliseconds(),
+		FinishReason:     finishReason,
+		Usage:            usage,
+		Completed:        true,
+	})
+}
+
+func (s *Session) Error(statusCode int, message, finishReason, thinking, content string) {
+	if s == nil || s.store == nil || s.disabled {
+		return
+	}
+	s.persistUpdate(chathistory.UpdateParams{
+		Status:           "error",
+		ReasoningContent: thinking,
+		Content:          content,
+		Error:            message,
+		StatusCode:       statusCode,
+		ElapsedMs:        time.Since(s.startedAt).Milliseconds(),
+		FinishReason:     finishReason,
+		Completed:        true,
+	})
+}
+
+func (s *Session) SuccessTurn(statusCode int, turn assistantturn.Turn, usage map[string]any) {
+	outcome := assistantturn.FinalizeTurn(turn, assistantturn.FinalizeOptions{})
+	s.Success(
+		statusCode,
+		ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking),
+		TextForArchive(turn.RawText, turn.Text),
+		outcome.FinishReason,
+		usage,
+	)
+}
+
+func (s *Session) ErrorTurn(statusCode int, message, finishReason string, turn assistantturn.Turn) {
+	s.Error(
+		statusCode,
+		message,
+		finishReason,
+		ThinkingForArchive(turn.RawThinking, turn.DetectionThinking, turn.Thinking),
+		TextForArchive(turn.RawText, turn.Text),
+	)
+}
+
+func TextForArchive(raw, visible string) string {
+	if strings.TrimSpace(raw) != "" {
+		return raw
+	}
+	return visible
+}
+
+func ThinkingForArchive(raw, detection, visible string) string {
+	if strings.TrimSpace(raw) != "" {
+		return raw
+	}
+	if strings.TrimSpace(detection) != "" {
+		return detection
+	}
+	return visible
+}
+
+func GenericUsage(turn assistantturn.Turn) map[string]any {
+	return map[string]any{
+		"input_tokens":     turn.Usage.InputTokens,
+		"output_tokens":    turn.Usage.OutputTokens,
+		"reasoning_tokens": turn.Usage.ReasoningTokens,
+		"total_tokens":     turn.Usage.TotalTokens,
+	}
+}
+
+func (s *Session) retryMissingEntry() bool {
+	if s == nil || s.store == nil || s.disabled {
+		return false
+	}
+	entry, err := s.store.Start(s.startParams)
+	if errors.Is(err, chathistory.ErrDisabled) {
+		s.disabled = true
+		return false
+	}
+	if entry.ID == "" {
+		if err != nil {
+			config.Logger.Warn("[response_history] recreate missing entry failed", "surface", s.startParams.Surface, "error", err)
+		}
+		return false
+	}
+	s.entryID = entry.ID
+	if err != nil {
+		config.Logger.Warn("[response_history] recreate missing entry persisted in memory after write failure", "surface", s.startParams.Surface, "error", err)
+	}
+	return true
+}
+
+func (s *Session) persistUpdate(params chathistory.UpdateParams) {
+	if s == nil || s.store == nil || s.disabled {
+		return
+	}
+	if _, err := s.store.Update(s.entryID, params); err != nil {
+		s.handlePersistError(params, err)
+	}
+}
+
+func (s *Session) handlePersistError(params chathistory.UpdateParams, err error) {
+	if err == nil || s == nil {
+		return
+	}
+	if errors.Is(err, chathistory.ErrDisabled) {
+		s.disabled = true
+		return
+	}
+	if isMissingError(err) {
+		if s.retryMissingEntry() {
+			if _, retryErr := s.store.Update(s.entryID, params); retryErr != nil {
+				if errors.Is(retryErr, chathistory.ErrDisabled) || isMissingError(retryErr) {
+					s.disabled = true
+					return
+				}
+				config.Logger.Warn("[response_history] retry after missing entry failed", "surface", s.startParams.Surface, "error", retryErr)
+			}
+			return
+		}
+		s.disabled = true
+		return
+	}
+	config.Logger.Warn("[response_history] update failed", "surface", s.startParams.Surface, "error", err)
+}
+
+func isMissingError(err error) bool {
+	if err == nil {
+		return false
+	}
+	return strings.Contains(strings.ToLower(err.Error()), "not found")
+}
+
+func asString(v any) string {
+	switch x := v.(type) {
+	case string:
+		return x
+	case nil:
+		return ""
+	default:
+		return strings.TrimSpace(prompt.NormalizeContent(x))
+	}
+}
--- a/internal/server/router.go
+++ b/internal/server/router.go
@@ -65,8 +65,8 @@ func NewApp() (*App, error) {
 	responsesHandler := &responses.Handler{Store: store, Auth: resolver, DS: dsClient, ChatHistory: chatHistoryStore}
 	filesHandler := &files.Handler{Store: store, Auth: resolver, DS: dsClient, ChatHistory: chatHistoryStore}
 	embeddingsHandler := &embeddings.Handler{Store: store, Auth: resolver, DS: dsClient, ChatHistory: chatHistoryStore}
-	claudeHandler := &claude.Handler{Store: store, Auth: resolver, DS: dsClient, OpenAI: chatHandler}
-	geminiHandler := &gemini.Handler{Store: store, Auth: resolver, DS: dsClient, OpenAI: chatHandler}
+	claudeHandler := &claude.Handler{Store: store, Auth: resolver, DS: dsClient, OpenAI: chatHandler, ChatHistory: chatHistoryStore}
+	geminiHandler := &gemini.Handler{Store: store, Auth: resolver, DS: dsClient, OpenAI: chatHandler, ChatHistory: chatHistoryStore}
 	adminHandler := &admin.Handler{Store: store, Pool: pool, DS: dsClient, OpenAI: chatHandler, ChatHistory: chatHistoryStore}
 	webuiHandler := webui.NewHandler()

--- a/internal/textclean/reference_markers.go
+++ b/internal/textclean/reference_markers.go
@@ -2,19 +2,18 @@ package textclean

 import "regexp"

-var referenceMarkerPattern = regexp.MustCompile(`(?i)\[reference:\s*\d+\]`)
+var citationReferenceMarkerPattern = regexp.MustCompile(`(?i)\[(citation|reference):\s*\d+\]`)

 func StripReferenceMarkers(text string) string {
 	if text == "" {
 		return text
 	}
-	return referenceMarkerPattern.ReplaceAllString(text, "")
+	return citationReferenceMarkerPattern.ReplaceAllString(text, "")
 }

-// StripReferenceMarkersEnabled returns true while reference-marker
-// stripping remains the fixed runtime default.  When the behaviour is
-// eventually removed this function can be deleted and callers can drop
-// the conditional.
+// StripReferenceMarkersEnabled returns the default for streaming surfaces,
+// where partial citation/reference markers are hidden before the final
+// link metadata is available.
 func StripReferenceMarkersEnabled() bool {
 	return true
 }
--- a/tests/node/chat-history-utils.test.js
+++ b/tests/node/chat-history-utils.test.js
@@ -58,3 +58,47 @@ test('chat history strict parser inserts history after system messages', async (
    { role: 'user', content: 'latest' },
  ]);
 });
+
+test('chat history transcript parser replaces current input file placeholder', async () => {
+  const {
+    buildListModeMessages,
+  } = await loadUtils();
+  const t = (key) => key;
+  const item = {
+    messages: [{
+      role: 'user',
+      content: 'Continue from the latest state in the attached DS2API_HISTORY.txt context. Treat it as the current working state and answer the latest user request directly.',
+    }],
+    history_text: [
+      '# DS2API_HISTORY.txt',
+      'Prior conversation history and tool progress.',
+      '',
+      '=== 1. SYSTEM ===',
+      'policy',
+      '',
+      '=== 2. USER ===',
+      'hello',
+      '',
+      '=== 3. ASSISTANT ===',
+      'hi',
+      '',
+      '=== 4. TOOL ===',
+      '[name=search_web tool_call_id=call_1]',
+      '{"ok":true}',
+      '',
+      '=== 5. USER ===',
+      'latest',
+      '',
+    ].join('\n'),
+  };
+
+  const result = buildListModeMessages(item, t);
+  assert.equal(result.historyMerged, true);
+  assert.deepEqual(result.messages, [
+    { role: 'system', content: 'policy' },
+    { role: 'user', content: 'hello' },
+    { role: 'assistant', content: 'hi' },
+    { role: 'tool', content: '[name=search_web tool_call_id=call_1]\n{"ok":true}' },
+    { role: 'user', content: 'latest' },
+  ]);
+});
--- a/tests/node/chat-stream.test.js
+++ b/tests/node/chat-stream.test.js
@@ -615,17 +615,17 @@ test('parseChunkForContent preserves space-only content tokens', () => {
  assert.deepEqual(parsed.parts, [{ text: ' ', type: 'text' }]);
 });

-test('parseChunkForContent strips reference markers from fragment content', () => {
+test('parseChunkForContent strips citation and reference markers from fragment content', () => {
  const chunk = {
    p: 'response/fragments',
    o: 'APPEND',
    v: [
-      { type: 'RESPONSE', content: '广州天气 [reference:12] 多云' },
+      { type: 'RESPONSE', content: '广州天气 [citation:1] [reference:12] 多云' },
    ],
  };
  const parsed = parseChunkForContent(chunk, false, 'text');
  assert.equal(parsed.finished, false);
-  assert.deepEqual(parsed.parts, [{ text: '广州天气  多云', type: 'text' }]);
+  assert.deepEqual(parsed.parts, [{ text: '广州天气   多云', type: 'text' }]);
 });

 test('parseChunkForContent detects content_filter status and ignores upstream output tokens', () => {
--- a/webui/src/features/apiTester/useChatStreamClient.js
+++ b/webui/src/features/apiTester/useChatStreamClient.js
@@ -123,7 +123,6 @@ export function useChatStreamClient({
            const headers = {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${effectiveKey}`,
-                'X-Ds2-Source': 'admin-webui-api-tester',
            }
            if (requestAccount) {
                headers['X-Ds2-Target-Account'] = requestAccount
--- a/webui/src/features/chatHistory/ChatHistoryContainer.jsx
+++ b/webui/src/features/chatHistory/ChatHistoryContainer.jsx
@@ -10,6 +10,9 @@ import {
    VIEW_MODE_KEY,
 } from './chatHistoryUtils'

+const LIST_REFRESH_MS = 1500
+const STREAMING_DETAIL_REFRESH_MS = 750
+
 export default function ChatHistoryContainer({ authFetch, onMessage }) {
    const { t, lang } = useI18n()
    const apiFetch = authFetch || fetch
@@ -136,7 +139,7 @@ export default function ChatHistoryContainer({ authFetch, onMessage }) {
        if (!autoRefreshReady || limit === DISABLED_LIMIT) return undefined
        const timer = window.setInterval(() => {
            loadList({ mode: 'silent', announceError: false })
-        }, 5000)
+        }, LIST_REFRESH_MS)
        return () => window.clearInterval(timer)
    }, [autoRefreshReady, limit])

@@ -144,7 +147,7 @@ export default function ChatHistoryContainer({ authFetch, onMessage }) {
        if (!autoRefreshReady || !selectedId || selectedSummary?.status !== 'streaming') return undefined
        const timer = window.setInterval(() => {
            loadDetail(selectedId, { announceError: false })
-        }, 1000)
+        }, STREAMING_DETAIL_REFRESH_MS)
        return () => window.clearInterval(timer)
    }, [autoRefreshReady, selectedId, selectedSummary?.status])

--- a/webui/src/features/chatHistory/ChatHistoryDetail.jsx
+++ b/webui/src/features/chatHistory/ChatHistoryDetail.jsx
@@ -207,6 +207,10 @@ function MetaGrid({ selectedItem, t }) {
                        {formatElapsed(selectedItem.elapsed_ms, t)}
                    </div>
                </div>
+                <div className="rounded-lg border border-border bg-card px-3 py-2">
+                    <div className="text-[11px] text-muted-foreground">{t('chatHistory.metaSurface')}</div>
+                    <div className="text-sm font-medium text-foreground break-all">{selectedItem.surface || t('chatHistory.metaUnknown')}</div>
+                </div>
                <div className="rounded-lg border border-border bg-card px-3 py-2">
                    <div className="text-[11px] text-muted-foreground">{t('chatHistory.metaModel')}</div>
                    <div className="text-sm font-medium text-foreground break-all">{selectedItem.model || t('chatHistory.metaUnknown')}</div>
--- a/webui/src/features/chatHistory/ChatHistoryPanels.jsx
+++ b/webui/src/features/chatHistory/ChatHistoryPanels.jsx
@@ -69,7 +69,7 @@ export function ChatHistoryListPane({ items, selectedItem, deletingId, t, lang,
                                    {item.user_input || t('chatHistory.untitled')}
                                </div>
                                <div className="text-[11px] text-muted-foreground mt-1 truncate">
-                                    {item.model || '-'}
+                                    {[item.surface, item.model].filter(Boolean).join(' · ') || '-'}
                                </div>
                            </div>
                            <div className="flex items-center gap-2 shrink-0">
--- a/webui/src/features/chatHistory/chatHistoryUtils.js
+++ b/webui/src/features/chatHistory/chatHistoryUtils.js
@@ -15,12 +15,24 @@ const CURRENT_INPUT_FILE_PROMPT = 'Continue from the latest state in the attache
 const LEGACY_CURRENT_INPUT_FILE_PROMPTS = new Set([
    'The current request and prior conversation context have already been provided. Answer the latest user request directly.',
 ])
+const HISTORY_TRANSCRIPT_TITLE = '# DS2API_HISTORY.txt'
+const HISTORY_TRANSCRIPT_ENTRY_RE = /^===\s+\d+\.\s+([A-Z][A-Z_ -]*)\s+===\s*$/gm

 function isCurrentInputFilePrompt(value) {
    const text = String(value || '').trim()
    return text === CURRENT_INPUT_FILE_PROMPT || LEGACY_CURRENT_INPUT_FILE_PROMPTS.has(text)
 }

+function normalizeHistoryRole(role) {
+    const normalized = String(role || '').trim().toLowerCase()
+    if (normalized === 'function') return 'tool'
+    if (normalized === 'developer') return 'system'
+    if (normalized === 'system' || normalized === 'user' || normalized === 'assistant' || normalized === 'tool') {
+        return normalized
+    }
+    return normalized || 'system'
+}
+
 export function formatDateTime(value, lang) {
    if (!value) return '-'
    try {
@@ -221,11 +233,37 @@ export function parseStrictHistoryMessages(historyText) {
    return parsed
 }

+export function parseTranscriptHistoryMessages(historyText) {
+    const rawText = String(historyText || '')
+    const titleIndex = rawText.indexOf(HISTORY_TRANSCRIPT_TITLE)
+    const transcript = titleIndex >= 0 ? rawText.slice(titleIndex) : rawText
+    const matches = [...transcript.matchAll(HISTORY_TRANSCRIPT_ENTRY_RE)]
+    if (!matches.length) return null
+
+    const parsed = []
+    for (let i = 0; i < matches.length; i += 1) {
+        const match = matches[i]
+        const next = matches[i + 1]
+        const role = normalizeHistoryRole(match[1])
+        const start = (match.index || 0) + match[0].length
+        const end = next ? next.index : transcript.length
+        const content = transcript.slice(start, end).replace(/^\r?\n/, '').trim()
+        if (!content) continue
+        parsed.push({ role, content })
+    }
+
+    return parsed.length ? parsed : null
+}
+
+export function parseHistoryMessages(historyText) {
+    return parseStrictHistoryMessages(historyText) || parseTranscriptHistoryMessages(historyText)
+}
+
 export function buildListModeMessages(item, t) {
    const liveMessages = Array.isArray(item?.messages) && item.messages.length > 0
        ? item.messages
        : [{ role: 'user', content: item?.user_input || t('chatHistory.emptyUserInput') }]
-    const historyMessages = parseStrictHistoryMessages(item?.history_text)
+    const historyMessages = parseHistoryMessages(item?.history_text)

    if (!historyMessages?.length) {
        return { messages: liveMessages, historyMerged: false }
--- a/webui/src/locales/en.json
+++ b/webui/src/locales/en.json
@@ -18,8 +18,8 @@
            "desc": "Test API connectivity and responses"
        },
        "history": {
-            "label": "Conversations",
-            "desc": "Browse server-side external chat history"
+            "label": "Responses",
+            "desc": "Browse server-side upstream response records"
        },
        "import": {
            "label": "Batch Import",
@@ -261,7 +261,7 @@
        "loading": "Loading conversation history...",
        "loadFailed": "Failed to load conversation history.",
        "retentionTitle": "Retention",
-        "retentionDesc": "The server keeps only the latest N external /v1/chat/completions conversations.",
+        "retentionDesc": "The server keeps only the latest N DeepSeek upstream response records across OpenAI Chat, OpenAI Responses, Claude, and Gemini direct interfaces.",
        "off": "OFF",
        "refresh": "Refresh",
        "clearAll": "Clear all",
@@ -277,7 +277,7 @@
        "viewModeList": "List mode",
        "viewModeMerged": "Merged mode",
        "emptyTitle": "No conversation history yet",
-        "emptyDesc": "When external clients call /v1/chat/completions, the server will save the results here automatically.",
+        "emptyDesc": "When a supported interface talks to DeepSeek upstream and receives a response, the server saves the result here automatically.",
        "untitled": "Untitled conversation",
        "noPreview": "No preview available.",
        "selectPrompt": "Select a record on the left to view details.",
@@ -303,6 +303,7 @@
        "metaTitle": "Metadata",
        "metaAccount": "Account",
        "metaElapsed": "Elapsed",
+        "metaSurface": "Surface",
        "metaModel": "Model",
        "metaStatusCode": "Status code",
        "metaStream": "Output mode",
--- a/webui/src/locales/zh.json
+++ b/webui/src/locales/zh.json
@@ -18,8 +18,8 @@
            "desc": "测试 API 连接与响应"
        },
        "history": {
-            "label": "对话记录",
-            "desc": "查看服务器保存的外部对话历史"
+            "label": "响应记录",
+            "desc": "查看服务器保存的上游响应归档"
        },
        "import": {
            "label": "批量导入",
@@ -261,7 +261,7 @@
        "loading": "正在加载对话记录...",
        "loadFailed": "加载对话记录失败",
        "retentionTitle": "保留条数",
-        "retentionDesc": "服务器端只保留最新 N 条外部 /v1/chat/completions 对话记录。",
+        "retentionDesc": "服务器端只保留最新 N 条 DeepSeek 上游响应记录，覆盖 OpenAI Chat、OpenAI Responses、Claude 和 Gemini 直连接口。",
        "off": "OFF",
        "refresh": "刷新",
        "clearAll": "清空全部",
@@ -277,7 +277,7 @@
        "viewModeList": "列表模式",
        "viewModeMerged": "合并模式",
        "emptyTitle": "还没有可用的对话记录",
-        "emptyDesc": "当外部客户端调用 /v1/chat/completions 时，服务端会自动把结果写入这里。",
+        "emptyDesc": "当支持的接口与 DeepSeek 上游交互并收到响应时，服务端会自动把结果写入这里。",
        "untitled": "未命名对话",
        "noPreview": "暂无预览内容",
        "selectPrompt": "从左侧选择一条记录查看详情。",
@@ -303,6 +303,7 @@
        "metaTitle": "元信息",
        "metaAccount": "使用账号",
        "metaElapsed": "耗时",
+        "metaSurface": "接口",
        "metaModel": "模型",
        "metaStatusCode": "状态码",
        "metaStream": "输出模式",
Author	SHA1	Message	Date
CJACK.	ec4f178908	Merge pull request #416 from CJackHwang/main Add Star History section to README Added a Star History section with a chart to the README.	2026-05-03 20:48:53 +08:00
CJACK.	f413d42b0c	Add Star History section to README Added a Star History section with a chart to the README.	2026-05-03 20:46:22 +08:00
CJACK.	5406f07938	Add Star History section to README Added Star History section with a chart to track repository stars.	2026-05-03 20:45:31 +08:00
CJACK.	fe87ded82b	Merge pull request #415 from CJackHwang/dev [codex] unify response history session management across API backends	2026-05-03 20:41:22 +08:00
CJACK	8ace349f84	feat: refactor context file uploading to support tools and streamline live message construction	2026-05-03 20:27:57 +08:00
CJACK	112bedb05d	refactor: differentiate reference marker handling between stream and non-stream modes - Stream: strip both and [reference:N] markers to prevent leaking partial link metadata during incremental output - Non-stream: convert citation/reference markers to Markdown links for Claude Messages, Gemini generateContent, and OpenAI Chat/Responses - Remove StripReferenceMarkers option from call sites; behavior is now determined automatically by stream vs non-stream context - Extend JS runtime stripReferenceMarkersText() to also match [citation:N] - Add tests for streaming marker stripping and non-stream link conversion Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-03 17:53:49 +08:00
CJACK	c099a6f7bf	feat: add unified response history session management across Claude, Gemini, and OpenAI API backends Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-03 17:24:38 +08:00
CJACK	5e55cf36d8	refactor: prioritize raw model output in chat history archiving to ensure accurate capture of tool call and thinking markup	2026-05-03 15:44:17 +08:00
CJACK	837dc74ffc	feat: implement DS2API_HISTORY.txt transcript parser to merge history into chat messages	2026-05-03 15:25:06 +08:00
@@ -1 +1 @@
 .4.0
 .4.1