From f2ad888de4157a124d0f32838c36236357738730 Mon Sep 17 00:00:00 2001 From: CJACK Date: Sun, 5 Apr 2026 18:16:31 +0800 Subject: [PATCH] refactor: clean up config schema by removing legacy toolcall fields, standardizing auto_delete mode, and updating admin API documentation. --- API.en.md | 21 +++++++++++++++++---- API.md | 20 +++++++++++++++----- README.MD | 16 ++++++++-------- README.en.md | 13 ++++++++----- config.example.json | 9 +++------ docs/DEPLOY.en.md | 2 +- docs/DEPLOY.md | 2 +- 7 files changed, 53 insertions(+), 30 deletions(-) diff --git a/API.en.md b/API.en.md index eb56949..efdc5b7 100644 --- a/API.en.md +++ b/API.en.md @@ -355,7 +355,8 @@ data: [DONE] ``` If `tool_choice=required` is violated in stream mode, DS2API emits `response.failed` then `[DONE]` (no `response.completed`). -Unknown tool names (outside declared `tools`) are rejected and will not be emitted as valid tool calls. + +> Current behavior: the parser tries to extract structured tool calls and does not enforce a hard allow-list reject; your tool executor should still validate against a whitelist before executing. ### `GET /v1/responses/{response_id}` @@ -641,6 +642,7 @@ Reads runtime settings and status, including: - `success` - `admin` (`has_password_hash`, `jwt_expire_hours`, `jwt_valid_after_unix`, `default_password_warning`) - `runtime` (`account_max_inflight`, `account_max_queue`, `global_max_inflight`, `token_refresh_interval_hours`) +- `compat` (`wide_input_strict_output`, `strip_reference_markers`) - `responses` / `embeddings` - `auto_delete` (`mode`: `none` / `single` / `all`; legacy `sessions=true` is still treated as `all`) - `claude_mapping` / `model_aliases` @@ -653,6 +655,7 @@ Hot-updates runtime settings. Supported fields: - `admin.jwt_expire_hours` - `runtime.account_max_inflight` / `runtime.account_max_queue` / `runtime.global_max_inflight` / `runtime.token_refresh_interval_hours` +- `compat.wide_input_strict_output` / `compat.strip_reference_markers` - `responses.store_ttl_seconds` - `embeddings.provider` - `auto_delete.mode` @@ -683,6 +686,8 @@ The request can send config directly, or wrapped as `{"config": {...}, "mode":"m Query params `?mode=merge` / `?mode=replace` are also supported. Import accepts `keys`, `accounts`, `claude_mapping` / `claude_model_mapping`, `model_aliases`, `admin`, `runtime`, `responses`, `embeddings`, and `auto_delete`; legacy `toolcall` fields are ignored. +> `compat` fields are managed via `/admin/settings` or the config file; this import endpoint does not update `compat`. + ### `GET /admin/config/export` Exports full config in three forms: `config`, `json`, and `base64`. @@ -757,17 +762,25 @@ Returned items also include `test_status`, usually `ok` or `failed`. "available_accounts": ["a@example.com"], "in_use_accounts": ["b@example.com"], "max_inflight_per_account": 2, - "recommended_concurrency": 8 + "global_max_inflight": 8, + "recommended_concurrency": 8, + "waiting": 0, + "max_queue_size": 8 } ``` | Field | Description | | --- | --- | -| `available` | Currently available accounts | -| `in_use` | Currently in-use accounts | +| `available` | Accounts that still have spare inflight capacity | +| `in_use` | Number of occupied in-flight slots | | `total` | Total accounts | +| `available_accounts` | List of account IDs with remaining inflight capacity | +| `in_use_accounts` | List of account IDs currently in use | | `max_inflight_per_account` | Per-account inflight limit | +| `global_max_inflight` | Global inflight limit | | `recommended_concurrency` | Suggested concurrency (`total × max_inflight_per_account`) | +| `waiting` | Number of queued requests currently waiting | +| `max_queue_size` | Waiting queue limit | ### `POST /admin/accounts/test` diff --git a/API.md b/API.md index 8ccfd26..dd6dcb0 100644 --- a/API.md +++ b/API.md @@ -508,8 +508,6 @@ data: {"type":"message_stop"} } ``` -返回项还会包含 `test_status`,当前值通常为 `ok` 或 `failed`。 - --- ## Gemini 兼容接口 @@ -650,6 +648,7 @@ data: {"type":"message_stop"} - `success` - `admin`(`has_password_hash`、`jwt_expire_hours`、`jwt_valid_after_unix`、`default_password_warning`) - `runtime`(`account_max_inflight`、`account_max_queue`、`global_max_inflight`、`token_refresh_interval_hours`) +- `compat`(`wide_input_strict_output`、`strip_reference_markers`) - `responses` / `embeddings` - `auto_delete`(`mode`:`none` / `single` / `all`;旧配置 `sessions=true` 仍按 `all` 处理) - `claude_mapping` / `model_aliases` @@ -662,6 +661,7 @@ data: {"type":"message_stop"} - `admin.jwt_expire_hours` - `runtime.account_max_inflight` / `runtime.account_max_queue` / `runtime.global_max_inflight` / `runtime.token_refresh_interval_hours` +- `compat.wide_input_strict_output` / `compat.strip_reference_markers` - `responses.store_ttl_seconds` - `embeddings.provider` - `auto_delete.mode` @@ -692,6 +692,8 @@ data: {"type":"message_stop"} 也支持在查询参数里传 `?mode=merge` / `?mode=replace`。 导入时会接受 `keys`、`accounts`、`claude_mapping` / `claude_model_mapping`、`model_aliases`、`admin`、`runtime`、`responses`、`embeddings`、`auto_delete` 等字段;`toolcall` 相关字段会被忽略。 +> `compat` 相关字段请通过 `/admin/settings` 或配置文件管理;该导入接口不会更新 `compat`。 + ### `GET /admin/config/export` 导出完整配置,返回 `config`、`json`、`base64` 三种格式。 @@ -764,17 +766,25 @@ data: {"type":"message_stop"} "available_accounts": ["a@example.com"], "in_use_accounts": ["b@example.com"], "max_inflight_per_account": 2, - "recommended_concurrency": 8 + "global_max_inflight": 8, + "recommended_concurrency": 8, + "waiting": 0, + "max_queue_size": 8 } ``` | 字段 | 说明 | | --- | --- | -| `available` | 当前可用账号数 | -| `in_use` | 当前使用中的账号数 | +| `available` | 仍有剩余并发槽位的账号数 | +| `in_use` | 当前已占用的 in-flight 槽位数 | | `total` | 总账号数 | +| `available_accounts` | 仍有剩余并发槽位的账号 ID 列表 | +| `in_use_accounts` | 当前处于使用中的账号 ID 列表 | | `max_inflight_per_account` | 每账号并发上限 | +| `global_max_inflight` | 全局并发上限 | | `recommended_concurrency` | 建议并发值(`total × max_inflight_per_account`) | +| `waiting` | 当前等待中的请求数 | +| `max_queue_size` | 等待队列上限 | ### `POST /admin/accounts/test` diff --git a/README.MD b/README.MD index e9d9123..cf5f5c4 100644 --- a/README.MD +++ b/README.MD @@ -76,7 +76,7 @@ flowchart LR - **前端**:React 管理台(`webui/`),运行时托管静态构建产物 - **部署**:本地运行、Docker、Vercel Serverless、Linux systemd -### 3.0 底层架构调整(相较旧版本) +### 3.X 底层架构调整(相较旧版本) - **统一路由内核**:所有协议入口统一汇聚到 `internal/server/router.go`,并在同一路由树中注册 OpenAI / Claude / Gemini / Admin / WebUI 路由,避免多入口行为漂移。 - **统一执行链路**:Claude / Gemini 入口先经 `internal/translatorcliproxy` 做协议转换,再进入 `openai.ChatCompletions` 统一处理工具调用与流式语义,最后再转换回原协议响应。 @@ -111,7 +111,6 @@ flowchart LR | P0 | Anthropic SDK(messages) | ✅ | | P0 | Google Gemini SDK(generateContent) | ✅ | | P1 | LangChain / LlamaIndex / OpenWebUI(OpenAI 兼容接入) | ✅ | -| P2 | MCP 独立桥接层 | 规划中 | ## 模型支持 @@ -289,7 +288,8 @@ cp opencode.json.example opencode.json "o3": "deepseek-reasoner" }, "compat": { - "wide_input_strict_output": true + "wide_input_strict_output": true, + "strip_reference_markers": true }, "responses": { "store_ttl_seconds": 900 @@ -311,7 +311,7 @@ cp opencode.json.example opencode.json "token_refresh_interval_hours": 6 }, "auto_delete": { - "sessions": false + "mode": "none" } } ``` @@ -321,7 +321,8 @@ cp opencode.json.example opencode.json - `token`:配置文件中即使填写也会在加载时被清空(不会从 `config.json` 读取 token);实际 token 仅在运行时内存中维护并自动刷新 - `model_aliases`:常见模型名(如 GPT/Codex/Claude)到 DeepSeek 模型的映射 - `compat.wide_input_strict_output`:建议保持 `true`(当前实现默认宽进严出) -- `toolcall`:策略已固定为特征匹配 + 高置信早发,不再作为可配置项 +- `compat.strip_reference_markers`:建议保持 `true`,用于清理可见输出中的引用/标记 +- `toolcall`:旧字段,当前实现已固定为特征匹配 + 高置信早发;即使保留在配置里也会被忽略 - `responses.store_ttl_seconds`:`/v1/responses/{id}` 的内存缓存 TTL - `embeddings.provider`:embedding 提供方(当前内置 `deterministic/mock/builtin`) - `claude_mapping`:字典中 `fast`/`slow` 后缀映射到对应 DeepSeek 模型(兼容读取 `claude_model_mapping`) @@ -352,9 +353,6 @@ cp opencode.json.example opencode.json | `DS2API_GLOBAL_MAX_INFLIGHT` | 全局最大 in-flight 请求数 | `recommended_concurrency` | | `DS2API_VERCEL_INTERNAL_SECRET` | Vercel 混合流式内部鉴权密钥 | 回退用 `DS2API_ADMIN_KEY` | | `DS2API_VERCEL_STREAM_LEASE_TTL_SECONDS` | 流式 lease 过期秒数 | `900` | -| `DS2API_DEV_PACKET_CAPTURE` | 本地开发抓包开关(记录最近会话请求/响应体) | 本地非 Vercel 默认开启 | -| `DS2API_DEV_PACKET_CAPTURE_LIMIT` | 本地抓包保留条数(超出自动淘汰) | `5` | -| `DS2API_DEV_PACKET_CAPTURE_MAX_BODY_BYTES` | 单条响应体最大记录字节数 | `2097152` | | `VERCEL_TOKEN` | Vercel 同步 token | — | | `VERCEL_PROJECT_ID` | Vercel 项目 ID | — | | `VERCEL_TEAM_ID` | Vercel 团队 ID | — | @@ -450,6 +448,7 @@ ds2api/ │ ├── deepseek/ # DeepSeek API 客户端、PoW WASM │ ├── js/ # Node 运行时流式处理与兼容逻辑 │ ├── devcapture/ # 开发抓包模块 +│ ├── rawsample/ # 原始流样本可见文本提取与回放辅助 │ ├── format/ # 输出格式化 │ ├── prompt/ # Prompt 构建 │ ├── server/ # HTTP 路由与中间件(chi router) @@ -471,6 +470,7 @@ ds2api/ ├── tests/ │ ├── compat/ # 兼容性测试夹具与期望输出 │ ├── node/ # Node 侧单元测试(chat-stream / tool-sieve) +│ ├── raw_stream_samples/ # 原始 SSE 样本与回放元数据 │ └── scripts/ # 统一测试脚本入口(unit/e2e) ├── docs/ # 部署 / 贡献 / 测试等辅助文档 ├── static/admin/ # WebUI 构建产物(不提交到 Git) diff --git a/README.en.md b/README.en.md index 2435ec8..106ebe2 100644 --- a/README.en.md +++ b/README.en.md @@ -76,7 +76,7 @@ flowchart LR - **Frontend**: React admin panel (`webui/`), served as static build at runtime - **Deployment**: local run, Docker, Vercel serverless, Linux systemd -### 3.0 Architecture Changes (vs older releases) +### 3.X Architecture Changes (vs older releases) - **Unified routing core**: all protocol entries are now centralized through `internal/server/router.go`, with OpenAI / Claude / Gemini / Admin / WebUI routes registered in one tree to avoid multi-entry drift. - **Unified execution chain**: Claude/Gemini entries are translated by `internal/translatorcliproxy`, then executed through `openai.ChatCompletions` for shared tool-calling and stream semantics, then translated back to the client protocol. @@ -111,7 +111,6 @@ flowchart LR | P0 | Anthropic SDK (messages) | ✅ | | P0 | Google Gemini SDK (generateContent) | ✅ | | P1 | LangChain / LlamaIndex / OpenWebUI (OpenAI-compatible integration) | ✅ | -| P2 | MCP standalone bridge | Planned | ## Model Support @@ -289,7 +288,8 @@ cp opencode.json.example opencode.json "o3": "deepseek-reasoner" }, "compat": { - "wide_input_strict_output": true + "wide_input_strict_output": true, + "strip_reference_markers": true }, "responses": { "store_ttl_seconds": 900 @@ -311,7 +311,7 @@ cp opencode.json.example opencode.json "token_refresh_interval_hours": 6 }, "auto_delete": { - "sessions": false + "mode": "none" } } ``` @@ -321,7 +321,8 @@ cp opencode.json.example opencode.json - `token`: Even if set in `config.json`, it is cleared during load (DS2API does not read persisted tokens from config); runtime tokens are maintained/refreshed in memory only - `model_aliases`: Map common model names (GPT/Codex/Claude) to DeepSeek models - `compat.wide_input_strict_output`: Keep `true` (current default policy) -- `toolcall`: Fixed to feature matching + high-confidence early emit, no longer configurable +- `compat.strip_reference_markers`: Keep `true`; it strips reference markers from visible output +- `toolcall`: Legacy field; the current behavior is fixed to feature matching + high-confidence early emit, and any config value is ignored - `responses.store_ttl_seconds`: In-memory TTL for `/v1/responses/{id}` - `embeddings.provider`: Embeddings provider (`deterministic/mock/builtin` built-in) - `claude_mapping`: Maps `fast`/`slow` suffixes to corresponding DeepSeek models (still compatible with `claude_model_mapping`) @@ -444,6 +445,7 @@ ds2api/ │ ├── deepseek/ # DeepSeek API client, PoW WASM │ ├── js/ # Node runtime stream/compat logic │ ├── devcapture/ # Dev packet capture module +│ ├── rawsample/ # Visible-text extraction and replay helpers for raw stream samples │ ├── format/ # Output formatting │ ├── prompt/ # Prompt construction │ ├── server/ # HTTP routing and middleware (chi router) @@ -465,6 +467,7 @@ ds2api/ ├── tests/ │ ├── compat/ # Compatibility fixtures and expected outputs │ ├── node/ # Node-side unit tests (chat-stream / tool-sieve) +│ ├── raw_stream_samples/ # Raw SSE samples and replay metadata │ └── scripts/ # Unified test script entrypoints (unit/e2e) ├── docs/ # Deployment / contributing / testing docs ├── static/admin/ # WebUI build output (not committed to Git) diff --git a/config.example.json b/config.example.json index 6cfbd73..f914050 100644 --- a/config.example.json +++ b/config.example.json @@ -31,10 +31,6 @@ "wide_input_strict_output": true, "strip_reference_markers": true }, - "toolcall": { - "mode": "feature_match", - "early_emit_confidence": "high" - }, "responses": { "store_ttl_seconds": 900 }, @@ -51,9 +47,10 @@ "runtime": { "account_max_inflight": 2, "account_max_queue": 0, - "global_max_inflight": 0 + "global_max_inflight": 0, + "token_refresh_interval_hours": 6 }, "auto_delete": { - "sessions": false + "mode": "none" } } diff --git a/docs/DEPLOY.en.md b/docs/DEPLOY.en.md index f7c2542..be2a86d 100644 --- a/docs/DEPLOY.en.md +++ b/docs/DEPLOY.en.md @@ -139,7 +139,7 @@ docker-compose up -d --build The `Dockerfile` now provides two image paths: 1. **Default local/dev path (`runtime-from-source`)**: a three-stage build (WebUI build + Go build + runtime). -2. **Release path (`runtime-from-dist`)**: CI first creates `dist/ds2api__linux_.tar.gz`, then Docker directly reuses the binary and `static/admin` assets from those release archives, without running `npm build`/`go build` again. +2. **Release path (`runtime-from-dist`)**: the release workflow first creates tag-named release archives, then copies the Linux bundles to `dist/docker-input/linux_amd64.tar.gz` / `linux_arm64.tar.gz`; Docker consumes those prepared inputs directly, without rerunning `npm build`/`go build`. The release path keeps Docker images aligned with release archives and reduces duplicate build work. diff --git a/docs/DEPLOY.md b/docs/DEPLOY.md index ad9fff6..618ca4b 100644 --- a/docs/DEPLOY.md +++ b/docs/DEPLOY.md @@ -139,7 +139,7 @@ docker-compose up -d --build `Dockerfile` 提供两条构建路径: 1. **本地/开发默认路径(`runtime-from-source`)**:三阶段构建(WebUI 构建 + Go 构建 + 运行阶段)。 -2. **Release 路径(`runtime-from-dist`)**:CI 先生成 `dist/ds2api__linux_.tar.gz`,再由 Docker 直接复用该发布包内的二进制和 `static/admin` 产物组装运行镜像,不再重复执行 `npm build`/`go build`。 +2. **Release 路径(`runtime-from-dist`)**:发布工作流先生成 tag 命名的 Release 压缩包,再把 Linux 产物复制成 `dist/docker-input/linux_amd64.tar.gz` / `linux_arm64.tar.gz`;Docker 构建阶段直接消费这些输入,不再重复执行 `npm build`/`go build`。 Release 路径可确保 Docker 镜像与 release 压缩包使用同一套产物,减少重复构建带来的差异。