feat: implement managed-account rotation on 429 empty-output completion retries

This commit is contained in:
CJACK
2026-05-10 00:41:45 +08:00
parent 3cc7f469f3
commit ddd42e532e
15 changed files with 362 additions and 20 deletions

View File

@@ -42,6 +42,7 @@ Docs: [Overview](README.en.md) / [Architecture](docs/ARCHITECTURE.en.md) / [Depl
- Adapter responsibilities are streamlined to: **request normalization → DeepSeek invocation → protocol-shaped rendering**, reducing legacy split-logic paths.
- Tool-calling semantics are aligned between Go and Node runtime: models should output the DSML shell `<|DSML|tool_calls>``<|DSML|invoke name="...">``<|DSML|parameter name="...">`; DS2API also accepts legacy canonical XML `<tool_calls>``<invoke name="...">``<parameter name="...">`. DSML is normalized back to XML at the parser entry, so internal parsing remains XML-based, with stream-time anti-leak filtering.
- `Admin API` separates static config from runtime policy: `/admin/config*` for configuration state, `/admin/settings*` for runtime behavior.
- When upstream returns a thinking-only response with no visible text, the Go main path for both streaming and non-streaming completions retries once in the same DeepSeek session: it appends the prompt suffix `"Previous reply had no visible output. Please regenerate the visible final answer or tool call now."` and sets `parent_message_id`. If that same-account retry would still end as `429 upstream_empty_output`, managed-account mode switches to the next available account, creates a fresh session, and retries the original payload once before returning 429.
---
@@ -84,7 +85,7 @@ Two header formats accepted:
- Token is in `config.keys`**Managed account mode**: DS2API auto-selects an account via rotation
- Token is not in `config.keys`**Direct token mode**: treated as a DeepSeek token directly
**Optional header**: `X-Ds2-Target-Account: <email_or_mobile>` — Pin a specific managed account; if the target account does not exist or the managed-account queue is exhausted, the request returns `429`, and current responses do not include `Retry-After`. If the account exists but login/refresh fails, the request returns the underlying `401` or upstream error.
**Optional header**: `X-Ds2-Target-Account: <email_or_mobile>` — Pin a specific managed account; if the target account does not exist or the managed-account queue is exhausted, the request returns `429`, and current responses do not include `Retry-After`. If the account exists but login/refresh fails, the request returns the underlying `401` or upstream error. Without a pinned target, managed-account completion requests try one alternate-account fresh retry before returning an empty-output 429; pinned-target requests and requests with no other available account do not switch.
Gemini-compatible clients can also send `x-goog-api-key`, `?key=`, or `?api_key=` as the caller credential source.
### Admin Endpoints (`/admin/*`)
@@ -1258,7 +1259,7 @@ Clients should handle HTTP status code plus `error` / `detail` fields.
| Code | Meaning |
| --- | --- |
| `401` | Authentication failed (invalid key/token, or expired admin JWT) |
| `429` | Too many requests (exceeded inflight + queue capacity; current responses do not include `Retry-After`) |
| `429` | Too many requests (exceeded inflight + queue capacity, or upstream thinking-only output with no visible answer; managed-account mode first tries one alternate-account fresh retry; current responses do not include `Retry-After`) |
| `503` | Model unavailable or upstream error |
---

6
API.md
View File

@@ -42,7 +42,7 @@
- 适配器层职责收敛为:**请求归一化 → DeepSeek 调用 → 协议形态渲染**,减少历史版本中“同能力多处实现”的分叉。
- Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致:推荐模型输出 DSML 外壳 `<|DSML|tool_calls>``<|DSML|invoke name="...">``<|DSML|parameter name="...">`;兼容层也接受 DSML wrapper 别名 `<dsml|tool_calls>``<|tool_calls>``<tool_calls>`、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>`)、`DSML` 与工具标签名黏连的常见 typo`<DSMLtool_calls>`)、控制分隔符漂移(如 `<DSML␂tool_calls>` / 原始 STX `\x02`)、任意协议前缀壳(如 `<proto💥tool_calls>`),以及旧式 canonical XML `<tool_calls>``<invoke name="...">``<parameter name="...">`。实现上采用结构扫描:只要固定本地标签名是 `tool_calls` / `invoke` / `parameter`,前缀壳会在解析入口归一化;只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 会进入工具路径,裸 `<invoke>` 不计为已支持语法;流式场景继续执行防泄漏筛分。若参数体本身是合法 JSON 字面量(如 `123``true``null`、数组或对象),会按结构化值输出,不再一律当作字符串;若 CDATA 偶发漏闭合,则会在最终 parse / flush 恢复阶段做窄修复,尽量保住已完整包裹的外层工具调用。
- `Admin API` 将配置与运行时策略分开:`/admin/config*` 管静态配置,`/admin/settings*` 管运行时行为。
- 当上游返回 thinking-only 响应(模型输出了推理链但无可见文本)时,非流式补全自动重试一次:以多轮对话 follow-up 方式追加 prompt 后缀 `"Previous reply had no visible output. Please regenerate the visible final answer or tool call now."` 并设置 `parent_message_id` 在同一 DeepSeek session 内让模型重新输出;重试最大 1 次。
- 当上游返回 thinking-only 响应(模型输出了推理链但无可见文本)时,Go 主路径的流式与非流式补全都会先自动重试一次:以多轮对话 follow-up 方式追加 prompt 后缀 `"Previous reply had no visible output. Please regenerate the visible final answer or tool call now."` 并设置 `parent_message_id` 在同一 DeepSeek session 内让模型重新输出;同账号重试最大 1 次。若同账号重试后仍即将返回 `429 upstream_empty_output`,托管账号模式会在返回 429 前自动切换到下一个可用账号,新建 session用原始 payload 再 fresh retry 一次。
- 引用标记处理边界:流式输出默认隐藏 `[citation:N]` / `[reference:N]` 这类上游内部占位符;非流式输出默认把 DeepSeek 搜索引用标记转换为 Markdown 引用链接。
---
@@ -86,7 +86,7 @@ Vercel 一键部署可先只填 `DS2API_ADMIN_KEY`,部署后在 `/admin` 导
- token 在 `config.keys` 中 → **托管账号模式**,自动轮询选择账号
- token 不在 `config.keys` 中 → **直通 token 模式**,直接作为 DeepSeek token 使用
**可选请求头**`X-Ds2-Target-Account: <email_or_mobile>` — 指定使用某个托管账号;如果目标账号不存在,或管理账号队列已耗尽,相关业务请求会返回 `429`,当前不会附带 `Retry-After` 头。若账号存在但登录/刷新失败,则返回对应的 `401` 或上游错误。
**可选请求头**`X-Ds2-Target-Account: <email_or_mobile>` — 指定使用某个托管账号;如果目标账号不存在,或管理账号队列已耗尽,相关业务请求会返回 `429`,当前不会附带 `Retry-After` 头。若账号存在但登录/刷新失败,则返回对应的 `401` 或上游错误。未指定目标账号时,托管账号模式的 completion 空输出 429 会先尝试切到另一个可用账号 fresh retry 一次;指定目标账号或无其他可用账号时不会切号。
Gemini 兼容客户端还可以使用 `x-goog-api-key``?key=``?api_key=` 作为凭据来源。
### Admin 接口(`/admin/*`
@@ -1271,7 +1271,7 @@ Gemini 路由使用 Google 风格错误结构:
| 状态码 | 说明 |
| --- | --- |
| `401` | 鉴权失败key/token 无效,或 Admin JWT 过期) |
| `429` | 请求过多(超出并发上限 + 等待队列;当前不附带 `Retry-After` 头) |
| `429` | 请求过多(超出并发上限 + 等待队列,或上游账号 thinking-only 后仍无可见输出;托管账号模式会先尝试一次切号 fresh retry;当前不附带 `Retry-After` 头) |
| `503` | 模型不可用或上游服务异常 |
---

View File

@@ -350,6 +350,7 @@ go run ./cmd/ds2api
可选请求头 `X-Ds2-Target-Account`:指定使用某个托管账号(值为 email 或 mobile
如果指定账号不存在,或者当前管理账号队列已满,请求会返回 `429`;当前 `429` 不附带 `Retry-After` 头。若账号存在但登录/刷新失败,则返回对应的鉴权错误。
未指定目标账号时,如果 completion 因上游 thinking-only 空输出在同账号补偿重试后仍将返回 `429 upstream_empty_output`,托管账号模式会自动切到下一个可用账号,新建 session并用原始 payload 再 fresh retry 一次。
Gemini 路由还可以使用 `x-goog-api-key`,或在没有认证头时使用 `?key=` / `?api_key=` 作为调用方凭据。
## 并发模型
@@ -363,6 +364,7 @@ Gemini 路由还可以使用 `x-goog-api-key`,或在没有认证头时使用 `
- 当 in-flight 槽位满时,请求进入等待队列,**不会立即 429**
- 超出总承载上限后才返回 `429 Too Many Requests`,当前响应不附带 `Retry-After`
- completion 空输出类 429 会先做同账号补偿重试;托管账号模式还会在最终返回 429 前切到另一个可用账号 fresh retry 一次
- `GET /admin/queue/status` 返回实时并发状态
## Tool Call 适配

View File

@@ -336,6 +336,7 @@ For business endpoints (`/v1/*`, `/anthropic/*`, Gemini routes), DS2API supports
| **Direct token** | If the token is not in `config.keys`, DS2API treats it as a DeepSeek token directly |
Optional header `X-Ds2-Target-Account`: Pin a specific managed account (value is email or mobile).
When no target account is pinned, if a completion would end as `429 upstream_empty_output` after the same-account empty-output retry, managed-account mode switches to the next available account, creates a fresh session, and retries the original payload once.
Gemini routes also accept `x-goog-api-key`, or `?key=` / `?api_key=` when no auth header is present.
## Concurrency Model
@@ -349,6 +350,7 @@ Queue limit = DS2API_ACCOUNT_MAX_QUEUE (default = recommended concurrency)
- When inflight slots are full, requests enter a waiting queue — **no immediate 429**
- 429 is returned only when total load exceeds inflight + queue capacity
- Completion empty-output 429s first get the same-account compensation retry; managed-account mode also tries one alternate-account fresh retry before returning the final 429
- `GET /admin/queue/status` returns real-time concurrency state
## Tool Call Adaptation

View File

@@ -27,7 +27,7 @@ ds2api/
│ ├── claudeconv/ # Claude message conversion helpers
│ ├── compat/ # Compatibility and regression helpers
│ ├── assistantturn/ # Upstream output to canonical assistant turn / stream event semantics
│ ├── completionruntime/ # Shared Go DeepSeek completion startup, non-stream collection, and retry
│ ├── completionruntime/ # Shared Go DeepSeek completion startup, collection, empty-output/account-switch retry
│ ├── config/ # Config loading/validation/hot reload
│ ├── deepseek/ # DeepSeek upstream client/protocol/transport
│ │ ├── client/ # Login/session/completion/upload/delete calls
@@ -191,7 +191,7 @@ flowchart LR
- `internal/httpapi/requestbody`: shared HTTP body reading, JSON pre-validation, and UTF-8 error helpers across protocol adapters.
- `internal/promptcompat`: compatibility core for turning OpenAI/Claude/Gemini requests into DeepSeek web-chat plain-text context.
- `internal/assistantturn`: Go output-side canonical semantics, converting DeepSeek SSE collection results and stream finalization state into assistant turns and centralizing thinking, tool call, citation, usage, stop/error behavior.
- `internal/completionruntime`: shared Go completion execution helpers for DeepSeek session/PoW/call startup, non-stream collection, and empty-output retry; streaming paths use it to start upstream requests, continue to use `internal/stream` for real-time consumption, and use `assistantturn` during finalization.
- `internal/completionruntime`: shared Go completion execution helpers for DeepSeek session/PoW/call startup, non-stream collection, empty-output retry, and one managed-account fresh retry before a final 429; streaming paths use it to start upstream requests, continue to use `internal/stream` for real-time consumption, and use `assistantturn` during finalization.
- `internal/translatorcliproxy`: bridge compatibility layer for Claude/Gemini and OpenAI shape translation; it is not the main business protocol conversion center.
- `internal/deepseek/{client,protocol,transport}`: upstream requests, sessions, PoW adaptation, protocol constants, and transport details.
- `internal/js/chat-stream` + `api/chat-stream.js`: Vercel Node streaming bridge; Go prepare/release owns auth, account lease, and completion payload assembly, while Node relays real-time SSE with Go-aligned finalization and tool sieve semantics.

View File

@@ -27,7 +27,7 @@ ds2api/
│ ├── claudeconv/ # Claude 消息格式转换工具
│ ├── compat/ # 兼容性辅助与回归支持
│ ├── assistantturn/ # 上游输出到统一 assistant turn / stream event 的语义层
│ ├── completionruntime/ # Go 主路径共享 DeepSeek completion 启动、非流式收集与 retry
│ ├── completionruntime/ # Go 主路径共享 DeepSeek completion 启动、收集、空输出/切号 retry
│ ├── config/ # 配置加载、校验、热更新
│ ├── deepseek/ # DeepSeek 上游 client/protocol/transport
│ │ ├── client/ # 登录、会话、completion、上传/删除等上游调用
@@ -191,7 +191,7 @@ flowchart LR
- `internal/httpapi/requestbody`跨协议复用的请求体读取、JSON 解码前置校验与 UTF-8 错误处理辅助。
- `internal/promptcompat`OpenAI/Claude/Gemini 请求到 DeepSeek 网页纯文本上下文的兼容内核。
- `internal/assistantturn`Go 输出侧统一语义层,把 DeepSeek SSE 收集结果和流式收尾状态归一成 assistant turn集中处理 thinking、tool call、citation、usage、stop/error 语义。
- `internal/completionruntime`Go surface 共享的 completion 执行辅助,负责 DeepSeek session/PoW/call 启动、非流式 collectempty-output retry流式路径复用它启动上游请求继续用 `internal/stream` 做实时消费,并在最终收尾阶段接入 `assistantturn`
- `internal/completionruntime`Go surface 共享的 completion 执行辅助,负责 DeepSeek session/PoW/call 启动、非流式 collectempty-output retry以及托管账号在最终 429 前的一次切号 fresh retry流式路径复用它启动上游请求继续用 `internal/stream` 做实时消费,并在最终收尾阶段接入 `assistantturn`
- `internal/translatorcliproxy`Claude/Gemini 与 OpenAI 结构互转的桥接兼容层,不作为主业务协议转换中心。
- `internal/deepseek/{client,protocol,transport}`上游请求、会话、PoW 适配、协议常量与传输层。
- `internal/js/chat-stream` + `api/chat-stream.js`Vercel Node 流式桥Go prepare/release 管理鉴权、账号租约和 completion payloadNode 侧负责实时 SSE 转发并保持 Go 对齐的终结态和 tool sieve 语义。

View File

@@ -112,7 +112,7 @@ DS2API 当前的核心思路,不是把客户端传来的 `messages`、`tools`
- Vercel Node 流式路径本轮不迁移,仍使用现有 Node bridge / stream-tool-sieve 实现;后续若变更 Node 流式语义,需要按 `assistantturn` 的 Go canonical 输出语义同步对齐。
- 客户端传入的 thinking / reasoning 开关会被归一到下游 `thinking_enabled`。Gemini `generationConfig.thinkingConfig.thinkingBudget` 会翻译成同一套 thinking 开关;关闭时即使上游返回 `response/thinking_content`,兼容层也不会把它当作可见正文输出。若最终解析出的模型名带 `-nothinking` 后缀,则会无条件强制关闭 thinking优先级高于请求体中的 `thinking` / `reasoning` / `reasoning_effort`。未显式关闭时,各 surface 会按解析后的 DeepSeek 模型默认能力开启 thinking并用各自协议的原生形态暴露OpenAI Chat 为 `reasoning_content`OpenAI Responses 为 `response.reasoning.delta` / `reasoning` contentClaude 为 `thinking` block / `thinking_delta`Gemini 为 `thought: true` part。
- 对 OpenAI Chat / Responses 的非流式收尾,如果最终可见正文为空,兼容层会优先尝试把思维链中的独立 DSML / XML 工具块当作真实工具调用解析出来。流式链路也会在收尾阶段做同样的 fallback 检测,但不会因为思维链内容去中途拦截或改写流式输出;真正的工具识别始终基于原始上游文本,而不是基于“已经做过可见输出清洗”的版本,因此即使最终可见层会剥离完整 leaked DSML / XML `tool_calls` wrapper、并抑制全空参数或无效 wrapper 块,也不会影响真实工具调用转成结构化 `tool_calls` / `function_call`。补发结果会作为本轮 assistant 的结构化 `tool_calls` / `function_call` 输出返回,而不是塞进 `content` 文本;如果客户端没有开启 thinking / reasoning思维链只用于检测不会作为 `reasoning_content` 或可见正文暴露。只有正文为空且思维链里也没有可执行工具调用时,才继续按空回复错误处理。
- OpenAI Chat / Responses、Claude Messages、Gemini generateContent 的空回复错误处理之前会默认做一次内部补偿重试:第一次上游完整结束后,如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用,并且终止原因不是 `content_filter`,兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略,把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。Go 主路径的非流式重试由 `completionruntime.ExecuteNonStreamWithRetry` 统一处理;流式重试由 `completionruntime.ExecuteStreamWithRetry` 统一处理,各协议 runtime 只负责消费/渲染本协议 SSE framing。重试遵循 DeepSeek 多轮对话协议:从第一次上游 SSE 流中提取 `response_message_id`,并在重试 payload 中设置 `parent_message_id` 为该值,使重试成为同一会话的后续轮次而非断裂的根消息;同时重新获取一次 PoW若 PoW 获取失败则回退到原始 PoW。该重试不会重新标准化消息、不会新建 session、不会切换账号,也不会向流式客户端插入重试标记;第二次 thinking / reasoning 会按正常增量直接接到第一次之后,并继续使用 overlap trim 去重。若第二次仍没有任何输出,终端错误为 503 `upstream_unavailable`;若有 reasoning 但没有可见正文或工具调用,仍返回 429 `upstream_empty_output`若任一尝试触发空 `content_filter`,不做补偿重试并保持 `content_filter` 错误。JS Vercel 运行时同样设置 `parent_message_id`,但因无法直接调用 PoW API 而复用原始 PoW。
- OpenAI Chat / Responses、Claude Messages、Gemini generateContent 的空回复错误处理之前会默认做一次内部补偿重试:第一次上游完整结束后,如果最终可见正文为空、没有解析到工具调用、也没有已经向客户端流式发出工具调用,并且终止原因不是 `content_filter`,兼容层会复用同一个 `chat_session_id`、账号、token 与工具策略,把原始 completion `prompt` 追加固定后缀 `Previous reply had no visible output. Please regenerate the visible final answer or tool call now.` 后重新提交一次。Go 主路径的非流式重试由 `completionruntime.ExecuteNonStreamWithRetry` 统一处理;流式重试由 `completionruntime.ExecuteStreamWithRetry` 统一处理,各协议 runtime 只负责消费/渲染本协议 SSE framing。重试遵循 DeepSeek 多轮对话协议:从第一次上游 SSE 流中提取 `response_message_id`,并在重试 payload 中设置 `parent_message_id` 为该值,使重试成为同一会话的后续轮次而非断裂的根消息;同时重新获取一次 PoW若 PoW 获取失败则回退到原始 PoW。该同账号重试不会重新标准化消息、不会新建 session也不会向流式客户端插入重试标记第二次 thinking / reasoning 会按正常增量直接接到第一次之后,并继续使用 overlap trim 去重。若同账号补偿重试后即将返回 429 `upstream_empty_output`并且当前是托管账号模式Go 主路径会在返回 429 前切换到下一个可用账号,新建 `chat_session_id`,使用原始 completion payload 再做一次 fresh retry该切号重试不携带空回复 prompt 后缀,也不设置上一账号的 `parent_message_id`。如果没有可切换账号,或切号后的 fresh retry 仍没有可见正文或工具调用,则继续按原错误返回:无任何输出为 503 `upstream_unavailable`有 reasoning 但没有可见正文或工具调用 429 `upstream_empty_output`若任一尝试触发空 `content_filter`,不做补偿重试并保持 `content_filter` 错误。JS Vercel 运行时同样设置 `parent_message_id`,但因无法直接调用 PoW API 而复用原始 PoW;切号 fresh retry 目前由 Go 主路径提供
- 非流式 OpenAI Chat / Responses、Claude Messages、Gemini generateContent 在最终可见正文渲染阶段,会把 DeepSeek 搜索返回中的 `[citation:N]` / `[reference:N]` 标记替换成对应 Markdown 链接。`citation` 标记按一基序号解析;`reference` 标记只有在同一段正文中出现 `[reference:0]`(允许冒号后有空格)时才按零基序号映射,并且不会影响同段正文里的 `citation` 标记。
- 流式输出仍默认隐藏 `[citation:N]` / `[reference:N]` 这类上游内部标记,避免分片输出中泄漏尚未完成映射的引用占位符。

View File

@@ -241,6 +241,36 @@ func TestSwitchAccountSkipsLoginFailureAndContinues(t *testing.T) {
}
}
func TestSwitchAccountRespectsPinnedTargetAccount(t *testing.T) {
t.Setenv("DS2API_CONFIG_JSON", `{
"keys":["managed-key"],
"accounts":[
{"email":"acc1@test.com","token":"t1"},
{"email":"acc2@test.com","token":"t2"}
]
}`)
store := config.LoadStore()
pool := account.NewPool(store)
r := NewResolver(store, pool, func(_ context.Context, _ config.Account) (string, error) {
return "new-token", nil
})
req, _ := http.NewRequest("POST", "/", nil)
req.Header.Set("Authorization", "Bearer managed-key")
req.Header.Set("X-Ds2-Target-Account", "acc1@test.com")
a, err := r.Determine(req)
if err != nil {
t.Fatalf("determine failed: %v", err)
}
defer r.Release(a)
if r.SwitchAccount(context.Background(), a) {
t.Fatal("expected switch to be disabled for pinned target account")
}
if a.AccountID != "acc1@test.com" {
t.Fatalf("expected pinned account to remain selected, got %q", a.AccountID)
}
}
// ─── Release edge cases ─────────────────────────────────────────────
func TestReleaseNilAuth(t *testing.T) {

View File

@@ -28,6 +28,7 @@ type RequestAuth struct {
DeepSeekToken string
CallerID string
AccountID string
TargetAccount string
Account config.Account
TriedAccounts map[string]bool
resolver *Resolver
@@ -99,6 +100,7 @@ func (r *Resolver) acquireManagedRequestAuth(ctx context.Context, callerID, targ
UseConfigToken: true,
CallerID: callerID,
AccountID: acc.Identifier(),
TargetAccount: target,
Account: acc,
TriedAccounts: tried,
resolver: r,
@@ -185,6 +187,9 @@ func (r *Resolver) SwitchAccount(ctx context.Context, a *RequestAuth) bool {
if !a.UseConfigToken {
return false
}
if strings.TrimSpace(a.TargetAccount) != "" {
return false
}
if a.TriedAccounts == nil {
a.TriedAccounts = map[string]bool{}
}
@@ -208,6 +213,13 @@ func (r *Resolver) SwitchAccount(ctx context.Context, a *RequestAuth) bool {
}
}
func (a *RequestAuth) SwitchAccount(ctx context.Context) bool {
if a == nil || a.resolver == nil {
return false
}
return a.resolver.SwitchAccount(ctx, a)
}
func (r *Resolver) Release(a *RequestAuth) {
if a == nil || !a.UseConfigToken || a.AccountID == "" {
return

View File

@@ -104,6 +104,7 @@ func ExecuteNonStreamStartedWithRetry(ctx context.Context, ds DeepSeekCaller, a
pow := start.Pow
attempts := 0
accountSwitchAttempted := false
currentResp := start.Response
usagePrompt := stdReq.PromptTokenText
accumulatedThinking := ""
@@ -112,6 +113,24 @@ func ExecuteNonStreamStartedWithRetry(ctx context.Context, ds DeepSeekCaller, a
for {
turn, outErr := collectAttempt(currentResp, stdReq, usagePrompt, opts)
if outErr != nil {
if canRetryOnAlternateAccount(ctx, a, outErr, opts.RetryEnabled, &accountSwitchAttempted) {
switched, switchErr := startStandardCompletionOnAlternateAccount(ctx, ds, a, stdReq, maxAttempts)
if switchErr != nil {
return NonStreamResult{SessionID: sessionID, Payload: payload, Attempts: attempts}, switchErr
}
if switched.Response != nil {
config.Logger.Info("[completion_runtime_account_switch_retry] retrying after 429", "surface", stdReq.Surface, "stream", false, "account", a.AccountID)
sessionID = switched.SessionID
payload = switched.Payload
pow = switched.Pow
currentResp = switched.Response
usagePrompt = stdReq.PromptTokenText
accumulatedThinking = ""
accumulatedRawThinking = ""
accumulatedToolDetectionThinking = ""
continue
}
}
return NonStreamResult{SessionID: sessionID, Payload: payload, Attempts: attempts}, outErr
}
accumulatedThinking += sse.TrimContinuationOverlap(accumulatedThinking, turn.Thinking)
@@ -134,6 +153,24 @@ func ExecuteNonStreamStartedWithRetry(ctx context.Context, ds DeepSeekCaller, a
retryMax = shared.EmptyOutputRetryMaxAttempts()
}
if !opts.RetryEnabled || !assistantturn.ShouldRetryEmptyOutput(turn, attempts, retryMax) {
if canRetryOnAlternateAccount(ctx, a, turn.Error, opts.RetryEnabled, &accountSwitchAttempted) {
switched, switchErr := startStandardCompletionOnAlternateAccount(ctx, ds, a, stdReq, maxAttempts)
if switchErr != nil {
return NonStreamResult{SessionID: sessionID, Payload: payload, Turn: turn, Attempts: attempts}, switchErr
}
if switched.Response != nil {
config.Logger.Info("[completion_runtime_account_switch_retry] retrying after 429", "surface", stdReq.Surface, "stream", false, "account", a.AccountID)
sessionID = switched.SessionID
payload = switched.Payload
pow = switched.Pow
currentResp = switched.Response
usagePrompt = stdReq.PromptTokenText
accumulatedThinking = ""
accumulatedRawThinking = ""
accumulatedToolDetectionThinking = ""
continue
}
}
return NonStreamResult{SessionID: sessionID, Payload: payload, Turn: turn, Attempts: attempts}, turn.Error
}
@@ -154,6 +191,37 @@ func ExecuteNonStreamStartedWithRetry(ctx context.Context, ds DeepSeekCaller, a
}
}
func canRetryOnAlternateAccount(ctx context.Context, a *auth.RequestAuth, outErr *assistantturn.OutputError, retryEnabled bool, attempted *bool) bool {
if outErr == nil || outErr.Status != http.StatusTooManyRequests {
return false
}
if !retryEnabled || attempted == nil || *attempted {
return false
}
if a == nil || !a.UseConfigToken {
return false
}
*attempted = true
return a.SwitchAccount(ctx)
}
func startStandardCompletionOnAlternateAccount(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, stdReq promptcompat.StandardRequest, maxAttempts int) (StartResult, *assistantturn.OutputError) {
sessionID, err := ds.CreateSession(ctx, a, maxAttempts)
if err != nil {
return StartResult{}, authOutputError(a)
}
pow, err := ds.GetPow(ctx, a, maxAttempts)
if err != nil {
return StartResult{SessionID: sessionID}, &assistantturn.OutputError{Status: http.StatusUnauthorized, Message: "Failed to get PoW (invalid token or unknown error).", Code: "error"}
}
payload := stdReq.CompletionPayload(sessionID)
resp, err := ds.CallCompletion(ctx, a, payload, pow, maxAttempts)
if err != nil {
return StartResult{SessionID: sessionID, Payload: payload, Pow: pow}, &assistantturn.OutputError{Status: http.StatusInternalServerError, Message: "Failed to get completion.", Code: "error"}
}
return StartResult{SessionID: sessionID, Payload: payload, Pow: pow, Response: resp, Request: stdReq}, nil
}
func collectAttempt(resp *http.Response, stdReq promptcompat.StandardRequest, usagePrompt string, opts Options) (assistantturn.Turn, *assistantturn.OutputError) {
defer func() {
if err := resp.Body.Close(); err != nil {

View File

@@ -7,15 +7,19 @@ import (
"strings"
"testing"
"ds2api/internal/account"
"ds2api/internal/auth"
"ds2api/internal/config"
dsclient "ds2api/internal/deepseek/client"
"ds2api/internal/promptcompat"
)
type fakeDeepSeekCaller struct {
responses []*http.Response
payloads []map[string]any
uploads []dsclient.UploadFileRequest
responses []*http.Response
payloads []map[string]any
uploads []dsclient.UploadFileRequest
completionAccounts []string
sessionByAccount bool
}
type currentInputRuntimeConfig struct{}
@@ -23,7 +27,10 @@ type currentInputRuntimeConfig struct{}
func (currentInputRuntimeConfig) CurrentInputFileEnabled() bool { return true }
func (currentInputRuntimeConfig) CurrentInputFileMinChars() int { return 0 }
func (f *fakeDeepSeekCaller) CreateSession(context.Context, *auth.RequestAuth, int) (string, error) {
func (f *fakeDeepSeekCaller) CreateSession(_ context.Context, a *auth.RequestAuth, _ int) (string, error) {
if f.sessionByAccount && a != nil && a.AccountID != "" {
return "session-" + a.AccountID, nil
}
return "session-1", nil
}
@@ -36,8 +43,11 @@ func (f *fakeDeepSeekCaller) UploadFile(_ context.Context, _ *auth.RequestAuth,
return &dsclient.UploadFileResult{ID: "file-runtime-1"}, nil
}
func (f *fakeDeepSeekCaller) CallCompletion(_ context.Context, _ *auth.RequestAuth, payload map[string]any, _ string, _ int) (*http.Response, error) {
func (f *fakeDeepSeekCaller) CallCompletion(_ context.Context, a *auth.RequestAuth, payload map[string]any, _ string, _ int) (*http.Response, error) {
f.payloads = append(f.payloads, payload)
if a != nil {
f.completionAccounts = append(f.completionAccounts, a.AccountID)
}
if len(f.responses) == 0 {
return sseHTTPResponse(http.StatusOK, `data: {"p":"response/content","v":"fallback"}`), nil
}
@@ -89,6 +99,69 @@ func TestExecuteNonStreamWithRetryBuildsCanonicalTurn(t *testing.T) {
}
}
func TestExecuteNonStreamWithRetrySwitchesManagedAccountBeforeFinal429(t *testing.T) {
t.Setenv("DS2API_CONFIG_JSON", `{
"keys":["managed-key"],
"accounts":[
{"email":"acc1@test.com","password":"pwd"},
{"email":"acc2@test.com","password":"pwd"}
]
}`)
store := config.LoadStore()
resolver := auth.NewResolver(store, account.NewPool(store), func(_ context.Context, acc config.Account) (string, error) {
return "token-" + acc.Identifier(), nil
})
req, _ := http.NewRequest(http.MethodPost, "/", nil)
req.Header.Set("Authorization", "Bearer managed-key")
a, err := resolver.Determine(req)
if err != nil {
t.Fatalf("determine failed: %v", err)
}
defer resolver.Release(a)
ds := &fakeDeepSeekCaller{
sessionByAccount: true,
responses: []*http.Response{
sseHTTPResponse(http.StatusOK, `data: {"response_message_id":11,"p":"response/thinking_content","v":"first empty"}`),
sseHTTPResponse(http.StatusOK, `data: {"response_message_id":12,"p":"response/thinking_content","v":"retry empty"}`),
sseHTTPResponse(http.StatusOK, `data: {"response_message_id":21,"p":"response/content","v":"ok from second account"}`),
},
}
stdReq := promptcompat.StandardRequest{
Surface: "test",
ResponseModel: "deepseek-v4-flash",
PromptTokenText: "prompt",
FinalPrompt: "final prompt",
Thinking: true,
}
result, outErr := ExecuteNonStreamWithRetry(context.Background(), ds, a, stdReq, Options{RetryEnabled: true})
if outErr != nil {
t.Fatalf("unexpected output error after account switch retry: %#v", outErr)
}
if result.Turn.Text != "ok from second account" {
t.Fatalf("text mismatch after switch retry: %q", result.Turn.Text)
}
if result.SessionID != "session-acc2@test.com" {
t.Fatalf("expected switched account session, got %q", result.SessionID)
}
wantAccounts := []string{"acc1@test.com", "acc1@test.com", "acc2@test.com"}
if len(ds.completionAccounts) != len(wantAccounts) {
t.Fatalf("completion account count mismatch: got %v want %v", ds.completionAccounts, wantAccounts)
}
for i, want := range wantAccounts {
if ds.completionAccounts[i] != want {
t.Fatalf("completion account %d = %q want %q (all=%v)", i, ds.completionAccounts[i], want, ds.completionAccounts)
}
}
if got := ds.payloads[2]["chat_session_id"]; got != "session-acc2@test.com" {
t.Fatalf("switched payload session mismatch: %#v", got)
}
if prompt, _ := ds.payloads[2]["prompt"].(string); strings.Contains(prompt, "Previous reply had no visible output") {
t.Fatalf("expected fresh switched-account prompt without empty-output suffix, got %q", prompt)
}
}
func TestExecuteNonStreamWithRetryUsesParentMessageForEmptyRetry(t *testing.T) {
ds := &fakeDeepSeekCaller{responses: []*http.Response{
sseHTTPResponse(http.StatusOK, `data: {"response_message_id":77,"p":"response/thinking_content","v":"plan"}`),

View File

@@ -6,6 +6,7 @@ import (
"net/http"
"strings"
"ds2api/internal/assistantturn"
"ds2api/internal/auth"
"ds2api/internal/config"
"ds2api/internal/httpapi/openai/shared"
@@ -27,6 +28,7 @@ type StreamRetryHooks struct {
OnRetry func(attempts int)
OnRetryPrompt func(prompt string)
OnRetryFailure func(status int, message, code string)
OnAccountSwitch func(sessionID string)
OnTerminal func(attempts int)
}
@@ -48,16 +50,48 @@ func ExecuteStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.Requ
}
attempts := 0
accountSwitchAttempted := false
currentResp := initialResp
currentPayload := clonePayload(payload)
for {
terminalWritten, retryable := hooks.ConsumeAttempt(currentResp, opts.RetryEnabled && attempts < retryMax)
allowAccountSwitch := opts.RetryEnabled && attempts >= retryMax && !accountSwitchAttempted && a != nil && a.UseConfigToken
terminalWritten, retryable := hooks.ConsumeAttempt(currentResp, opts.RetryEnabled && (attempts < retryMax || allowAccountSwitch))
if terminalWritten {
if hooks.OnTerminal != nil {
hooks.OnTerminal(attempts)
}
return
}
if !retryable || !opts.RetryEnabled || attempts >= retryMax {
if !retryable || !opts.RetryEnabled {
if hooks.Finalize != nil {
hooks.Finalize(attempts)
}
return
}
if attempts >= retryMax {
if canRetryOnAlternateAccount(ctx, a, &assistantturn.OutputError{Status: http.StatusTooManyRequests}, opts.RetryEnabled, &accountSwitchAttempted) {
switched, switchErr := startPayloadCompletionOnAlternateAccount(ctx, ds, a, payload, maxAttempts)
if switchErr != nil {
if hooks.OnRetryFailure != nil {
hooks.OnRetryFailure(switchErr.Status, switchErr.Message, switchErr.Code)
}
return
}
if switched.Response != nil {
config.Logger.Info("[completion_runtime_account_switch_retry] retrying after 429", "surface", surface, "stream", opts.Stream, "account", a.AccountID)
currentResp = switched.Response
currentPayload = switched.Payload
pow = switched.Pow
if hooks.OnAccountSwitch != nil {
hooks.OnAccountSwitch(switched.SessionID)
}
if hooks.OnRetryPrompt != nil {
hooks.OnRetryPrompt(opts.UsagePrompt)
}
continue
}
}
if hooks.Finalize != nil {
hooks.Finalize(attempts)
}
@@ -75,7 +109,7 @@ func ExecuteStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.Requ
config.Logger.Warn("[completion_runtime_empty_retry] retry PoW fetch failed, falling back to original PoW", "surface", surface, "stream", opts.Stream, "retry_attempt", attempts, "error", powErr)
retryPow = pow
}
nextResp, err := ds.CallCompletion(ctx, a, shared.ClonePayloadForEmptyOutputRetry(payload, parentMessageID), retryPow, maxAttempts)
nextResp, err := ds.CallCompletion(ctx, a, shared.ClonePayloadForEmptyOutputRetry(currentPayload, parentMessageID), retryPow, maxAttempts)
if err != nil {
if hooks.OnRetryFailure != nil {
hooks.OnRetryFailure(http.StatusInternalServerError, "Failed to get completion.", "error")
@@ -108,6 +142,33 @@ func ExecuteStreamWithRetry(ctx context.Context, ds DeepSeekCaller, a *auth.Requ
}
}
func startPayloadCompletionOnAlternateAccount(ctx context.Context, ds DeepSeekCaller, a *auth.RequestAuth, payload map[string]any, maxAttempts int) (StartResult, *assistantturn.OutputError) {
sessionID, err := ds.CreateSession(ctx, a, maxAttempts)
if err != nil {
return StartResult{}, authOutputError(a)
}
pow, err := ds.GetPow(ctx, a, maxAttempts)
if err != nil {
return StartResult{SessionID: sessionID}, &assistantturn.OutputError{Status: http.StatusUnauthorized, Message: "Failed to get PoW (invalid token or unknown error).", Code: "error"}
}
nextPayload := clonePayload(payload)
nextPayload["chat_session_id"] = sessionID
delete(nextPayload, "parent_message_id")
resp, err := ds.CallCompletion(ctx, a, nextPayload, pow, maxAttempts)
if err != nil {
return StartResult{SessionID: sessionID, Payload: nextPayload, Pow: pow}, &assistantturn.OutputError{Status: http.StatusInternalServerError, Message: "Failed to get completion.", Code: "error"}
}
return StartResult{SessionID: sessionID, Payload: nextPayload, Pow: pow, Response: resp}, nil
}
func clonePayload(payload map[string]any) map[string]any {
clone := make(map[string]any, len(payload))
for k, v := range payload {
clone[k] = v
}
return clone
}
func closeRetryBody(surface string, body io.Closer) {
if body == nil {
return

View File

@@ -7,7 +7,9 @@ import (
"strings"
"testing"
"ds2api/internal/account"
"ds2api/internal/auth"
"ds2api/internal/config"
"ds2api/internal/httpapi/openai/shared"
)
@@ -60,3 +62,89 @@ func TestExecuteStreamWithRetryUsesSharedRetryPayloadAndUsagePrompt(t *testing.T
t.Fatalf("expected retry suffix in usage prompt, got %q", retryPrompt)
}
}
func TestExecuteStreamWithRetrySwitchesManagedAccountBeforeFinal429(t *testing.T) {
t.Setenv("DS2API_CONFIG_JSON", `{
"keys":["managed-key"],
"accounts":[
{"email":"acc1@test.com","password":"pwd"},
{"email":"acc2@test.com","password":"pwd"}
]
}`)
store := config.LoadStore()
resolver := auth.NewResolver(store, account.NewPool(store), func(_ context.Context, acc config.Account) (string, error) {
return "token-" + acc.Identifier(), nil
})
req, _ := http.NewRequest(http.MethodPost, "/", nil)
req.Header.Set("Authorization", "Bearer managed-key")
a, err := resolver.Determine(req)
if err != nil {
t.Fatalf("determine failed: %v", err)
}
defer resolver.Release(a)
ds := &fakeDeepSeekCaller{
sessionByAccount: true,
responses: []*http.Response{
sseHTTPResponse(http.StatusOK, `data: {"response_message_id":12,"p":"response/thinking_content","v":"retry empty"}`),
sseHTTPResponse(http.StatusOK, `data: {"response_message_id":21,"p":"response/content","v":"ok from second account"}`),
},
}
initial := sseHTTPResponse(http.StatusOK, `data: {"response_message_id":11,"p":"response/thinking_content","v":"first empty"}`)
payload := map[string]any{"prompt": "original prompt", "chat_session_id": "session-acc1@test.com"}
attemptsSeen := 0
switchedSession := ""
ExecuteStreamWithRetry(context.Background(), ds, a, initial, payload, "pow", StreamRetryOptions{
Surface: "test.stream",
Stream: true,
RetryEnabled: true,
RetryMaxAttempts: 1,
UsagePrompt: "original prompt",
}, StreamRetryHooks{
ConsumeAttempt: func(resp *http.Response, allowDeferEmpty bool) (bool, bool) {
defer func() {
if err := resp.Body.Close(); err != nil {
t.Fatalf("close failed: %v", err)
}
}()
body, _ := io.ReadAll(resp.Body)
attemptsSeen++
if strings.Contains(string(body), "ok from second account") {
return true, false
}
if !allowDeferEmpty {
t.Fatalf("expected empty attempt %d to be deferred before final 429", attemptsSeen)
}
return false, true
},
ParentMessageID: func() int {
return 11 + attemptsSeen
},
OnAccountSwitch: func(sessionID string) {
switchedSession = sessionID
},
})
if attemptsSeen != 3 {
t.Fatalf("expected three stream attempts, got %d", attemptsSeen)
}
if switchedSession != "session-acc2@test.com" {
t.Fatalf("expected switched session id, got %q", switchedSession)
}
wantAccounts := []string{"acc1@test.com", "acc2@test.com"}
if len(ds.completionAccounts) != len(wantAccounts) {
t.Fatalf("completion accounts mismatch: got %v want %v", ds.completionAccounts, wantAccounts)
}
for i, want := range wantAccounts {
if ds.completionAccounts[i] != want {
t.Fatalf("completion account %d = %q want %q (all=%v)", i, ds.completionAccounts[i], want, ds.completionAccounts)
}
}
if got := ds.payloads[1]["chat_session_id"]; got != "session-acc2@test.com" {
t.Fatalf("switched payload session mismatch: %#v", got)
}
if prompt, _ := ds.payloads[1]["prompt"].(string); strings.Contains(prompt, shared.EmptyOutputRetrySuffix) {
t.Fatalf("expected switched-account prompt without empty-output suffix, got %q", prompt)
}
}

View File

@@ -66,7 +66,7 @@ func (h *Handler) handleNonStreamWithRetry(w http.ResponseWriter, ctx context.Co
writeJSON(w, http.StatusOK, respBody)
}
func (h *Handler) handleStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, historySession *chatHistorySession) {
func (h *Handler) handleStreamWithRetry(w http.ResponseWriter, r *http.Request, a *auth.RequestAuth, resp *http.Response, payload map[string]any, pow, completionID string, sessionIDRef *string, model, finalPrompt string, refFileTokens int, thinkingEnabled, searchEnabled bool, toolNames []string, toolsRaw any, toolChoice promptcompat.ToolChoicePolicy, historySession *chatHistorySession) {
streamRuntime, initialType, ok := h.prepareChatStreamRuntime(w, resp, completionID, model, finalPrompt, refFileTokens, thinkingEnabled, searchEnabled, toolNames, toolsRaw, toolChoice, historySession)
if !ok {
return
@@ -96,6 +96,11 @@ func (h *Handler) handleStreamWithRetry(w http.ResponseWriter, r *http.Request,
OnRetryFailure: func(status int, message, code string) {
failChatStreamRetry(streamRuntime, historySession, status, message, code)
},
OnAccountSwitch: func(sessionID string) {
if sessionIDRef != nil {
*sessionIDRef = sessionID
}
},
OnTerminal: func(attempts int) {
logChatStreamTerminal(streamRuntime, attempts)
},

View File

@@ -114,7 +114,7 @@ func (h *Handler) ChatCompletions(w http.ResponseWriter, r *http.Request) {
}
streamReq := start.Request
refFileTokens := streamReq.RefFileTokens
h.handleStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, sessionID, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, historySession)
h.handleStreamWithRetry(w, r, a, start.Response, start.Payload, start.Pow, sessionID, &sessionID, streamReq.ResponseModel, streamReq.PromptTokenText, refFileTokens, streamReq.Thinking, streamReq.Search, streamReq.ToolNames, streamReq.ToolsRaw, streamReq.ToolChoice, historySession)
}
func (h *Handler) autoDeleteRemoteSession(ctx context.Context, a *auth.RequestAuth, sessionID string) {