Compare commits

...

8 Commits
v4.6.0 ... main

Author SHA1 Message Date
CJACK.
8316cf8a03 Merge pull request #481 from CJackHwang/dev
[codex] fix WebUI static root path guard
2026-05-10 18:59:23 +08:00
CJACK
3569ae136a fix webui static root path guard 2026-05-10 18:55:57 +08:00
CJACK.
4f0210f163 Merge pull request #480 from CJackHwang/dev
Fix tool detection when unclosed backtick precedes tool call
2026-05-10 18:46:09 +08:00
CJACK
77a47ada4e Fix tool detection when unclosed backtick precedes tool call
Handles cases where a stray backtick opens an inline code span but is never closed.
Previously, any subsequent XML tool tag was treated as inside markdown code and ignored.
Now, tool tags are detected after an unclosed backtick, and the markdown state is reset
when the backtick is confirmed to be literal text at stream boundaries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-10 18:41:51 +08:00
CJACK.
8623920c89 Merge pull request #476 from CJackHwang/codex/fix-security-advisory-ghsa-rf34-c5jc-4ffw
[codex] fix security advisory and toolcall parsing issues
2026-05-10 18:06:24 +08:00
CJACK
e393110121 fix toolcall inline code and query redaction 2026-05-10 18:02:54 +08:00
CJACK
243860f6d3 bump version to 4.6.1 2026-05-10 17:02:40 +08:00
CJACK
03ea3728e7 fix security advisory issues 2026-05-10 17:01:22 +08:00
22 changed files with 1131 additions and 51 deletions

2
API.md
View File

@@ -360,7 +360,7 @@ data: [DONE]
- 解析器当前把推荐半角管道符 DSML 外壳(`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`、DSML wrapper 别名(`<dsml|tool_calls>``<|tool_calls>`)、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`)、`DSML` 与工具标签名黏连的常见 typo`<DSMLtool_calls>` / `<DSMLinvoke>` / `<DSMLparameter>`)、控制分隔符漂移(如 `<DSML␂tool_calls>` / 原始 STX `\x02`、CJK 尖括号、全角感叹号、顿号、PascalCase 本地名、弯引号属性值与属性尾部分隔符漂移(如 `<DSM|parameter name="command"|>...〈/DSM|parameter〉` / `<DSMLinvoke name=“Bash”>` / `<、DSML、tool_calls>` / `<DSmartToolCalls>` / `<DSMLtool_calls※>`)、任意协议前缀壳(如 `<proto💥tool_calls>`)和旧式 canonical XML 工具块(`<tool_calls>` / `<invoke name="...">` / `<parameter name="...">`)作为可执行调用解析;这些非结构性分隔符壳会先归一化回 XML内部仍以 XML 解析语义为准CDATA 开头也会容错 `<[CDATA[` / `<、[CDATA[`。旧式 `<tools>``<tool_call>``<tool_name>``<param>``<function_call>``tool_use`、antml 风格与纯 JSON `tool_calls` 片段默认都会按普通文本处理;完整但 malformed 的 wrapper 同样会作为普通文本释放。
- 解析层不会因为参数值为空而丢弃工具调用;显式空字符串或纯空白参数会按空字符串进入结构化 `tool_calls`。Prompt 会要求模型不要主动输出空参数,缺参/空命令的拒绝应由工具执行侧或客户端 schema 校验负责。
- 当最终可见正文为空但思维链里包含可执行工具调用时Chat / Responses 会在收尾阶段补发标准 OpenAI `tool_calls` / `function_call` 输出;如果客户端未开启 thinking / reasoning该思维链只用于检测不会作为可见正文或 `reasoning_content` 暴露。
- Markdown fenced code block例如 ```json ... ```)中的 `tool_calls` 仅视为示例文本,不会被执行。
- Markdown fenced code block例如 ```json ... ```和行内 code span例如 `` `<tool_calls>...</tool_calls>` ``中的 `tool_calls` 仅视为示例文本,不会被执行。
---

View File

@@ -372,7 +372,7 @@ Gemini 路由还可以使用 `x-goog-api-key`,或在没有认证头时使用 `
当请求中带 `tools` 时DS2API 会做防泄漏处理与结构化转译:
1. 只在**非代码上下文**启用执行型 toolcall 识别(代码块示例默认不触发)
1. 只在**非 Markdown 代码上下文**启用执行型 toolcall 识别(fenced code block 和行内 code span 中的示例默认不触发)
2. 解析层当前把半角管道符 DSML 外壳视为推荐可执行调用:`<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`;兼容旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`,以及若干 DSML 前缀/分隔符漂移。DSML 只是外壳别名,内部仍以 XML 解析语义为准;旧式 `<tools>` / `<tool_call>` / `<tool_name>` / `<param>`、`<function_call>`、`tool_use` / antml 变体与纯 JSON `tool_calls` 片段都会按普通文本处理,完整但 malformed 的 wrapper 也会作为普通文本释放
3. `responses` 流式严格使用官方 item 生命周期事件(`response.output_item.*`、`response.content_part.*`、`response.function_call_arguments.*`
4. `responses` 支持并执行 `tool_choice``auto`/`none`/`required`/强制函数);`required` 违规时非流式返回 `422`,流式返回 `response.failed`

View File

@@ -1 +1 @@
4.6.0
4.6.1

View File

@@ -168,7 +168,7 @@ OpenAI Chat / Responses 在标准化后、current input file 之前,会默认
4. 普通直传请求会把“工具描述 + 格式约束”一起并入 system prompt如果 `current_input_file` 触发,则工具描述/schema 会单独上传成 `DS2API_TOOLS.txt`live prompt 和 system tool 格式提示都会明确要求模型把 `DS2API_TOOLS.txt` 当作可调用工具和参数 schema 的权威来源。
工具调用正例现在优先示范半角管道符 DSML 风格:`<|DSML|tool_calls>``<|DSML|invoke name="...">``<|DSML|parameter name="...">`
兼容层仍接受旧式纯 `<tool_calls>` wrapper并会容错若干 DSML 标签变体,包括短横线形式 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`、下划线形式 `<dsml_tool_calls>` / `<dsml_invoke>` / `<dsml_parameter>`,以及其他前缀分隔形态如 `<vendor|tool_calls>` / `<vendor_tool_calls>` / `<vendor - tool_calls>`;标签壳扫描还会把全角 ASCII 漂移归一化,例如 `<|tool_calls>` 与全角 `` 结束符,也会容错 CJK 尖括号、全角感叹号或顿号分隔符、弯引号属性值、PascalCase 本地名和属性尾部分隔符漂移,例如 `<DSM|parameter name="command"|>...〈/DSM|parameter〉``<DSMLinvoke name=“Bash”>``<、DSML、tool_calls>``<DSmartToolCalls>``<DSMLtool_calls※>`。更一般地Go / Node tag 扫描以固定本地标签名 `tool_calls` / `invoke` / `parameter` 为准,标签名前或标签名后的非结构性协议分隔符都会在解析入口剥离,例如 `<DSML␂tool_calls>``<proto💥tool_calls>` 这类控制符或非 ASCII 分隔符漂移也会归一化回现有 XML 标签后继续走同一套 parser结构性字符如 `<` / `>` / `/` / `=` / 引号、空白和 ASCII 字母数字不会被当作这类分隔符。进入现有 DSML rewrite / XML parse 之前Go / Node 还会先对“已经识别成工具标签壳的 candidate span”做一次窄 canonicalization只折叠 wrapper / `invoke` / `parameter` / `name` / `CDATA` / `DSML` 及其壳层分隔符里的 confusable 字符,清理零宽 / BOM / 控制类干扰并把引号、空白、dash / underscore 变体等统一回可解析的工具语法。这个阶段不会广义改写普通正文、参数内容、CDATA 里的示例文本或其他非工具 XML。CDATA 开头也使用同一类扫描式容错,`<![CDATA[` / `<[CDATA[` / `<、[CDATA[` 都会作为参数原文容器处理。但提示词会优先要求模型输出官方 DSML 标签,并强调不能只输出 closing wrapper 而漏掉 opening tag。需要注意这是“兼容 DSML 外壳,内部仍以 XML 解析语义为准”,不是原生 DSML 全链路实现。解析器会先截获非代码块中的疑似工具 wrapper完整解析失败或工具语义无效时再按普通文本放行。
兼容层仍接受旧式纯 `<tool_calls>` wrapper并会容错若干 DSML 标签变体,包括短横线形式 `<dsml-tool-calls>` / `<dsml-invoke>` / `<dsml-parameter>`、下划线形式 `<dsml_tool_calls>` / `<dsml_invoke>` / `<dsml_parameter>`,以及其他前缀分隔形态如 `<vendor|tool_calls>` / `<vendor_tool_calls>` / `<vendor - tool_calls>`;标签壳扫描还会把全角 ASCII 漂移归一化,例如 `<|tool_calls>` 与全角 `` 结束符,也会容错 CJK 尖括号、全角感叹号或顿号分隔符、弯引号属性值、PascalCase 本地名和属性尾部分隔符漂移,例如 `<DSM|parameter name="command"|>...〈/DSM|parameter〉``<DSMLinvoke name=“Bash”>``<、DSML、tool_calls>``<DSmartToolCalls>``<DSMLtool_calls※>`。更一般地Go / Node tag 扫描以固定本地标签名 `tool_calls` / `invoke` / `parameter` 为准,标签名前或标签名后的非结构性协议分隔符都会在解析入口剥离,例如 `<DSML␂tool_calls>``<proto💥tool_calls>` 这类控制符或非 ASCII 分隔符漂移也会归一化回现有 XML 标签后继续走同一套 parser结构性字符如 `<` / `>` / `/` / `=` / 引号、空白和 ASCII 字母数字不会被当作这类分隔符。进入现有 DSML rewrite / XML parse 之前Go / Node 还会先对“已经识别成工具标签壳的 candidate span”做一次窄 canonicalization只折叠 wrapper / `invoke` / `parameter` / `name` / `CDATA` / `DSML` 及其壳层分隔符里的 confusable 字符,清理零宽 / BOM / 控制类干扰并把引号、空白、dash / underscore 变体等统一回可解析的工具语法。这个阶段不会广义改写普通正文、参数内容、Markdown 行内 code span、CDATA 里的示例文本或其他非工具 XML。CDATA 开头也使用同一类扫描式容错,`<![CDATA[` / `<[CDATA[` / `<、[CDATA[` 都会作为参数原文容器处理。但提示词会优先要求模型输出官方 DSML 标签,并强调不能只输出 closing wrapper 而漏掉 opening tag。需要注意这是“兼容 DSML 外壳,内部仍以 XML 解析语义为准”,不是原生 DSML 全链路实现。解析器会先截获非 Markdown 代码上下文中的疑似工具 wrapper完整解析失败或工具语义无效时再按普通文本放行。
数组参数使用 `<item>...</item>` 子节点表示;当某个参数体只包含 item 子节点时Go / Node 解析器会把它还原成数组,避免 `questions` / `options` 这类 schema 中要求 array 的参数被误解析成 `{ "item": ... }` 对象。除此之外,解析器还会回收一些更松散的列表写法,例如 JSON array 字面量或逗号分隔的 JSON 项序列,只要它们足够明确;但 `<item>` 仍然是首选形态。若模型把完整结构化 XML fragment 误包进 CDATA兼容层会在保护 `content` / `command` 等原文字段的前提下,尝试把非原文字段中的 CDATA XML fragment 还原成 object / array。不过如果 CDATA 只是单个平面的 XML/HTML 标签,例如 `<b>urgent</b>` 这种行内标记,兼容层会保留原始字符串,不会强行升成 object / array只有明显表示结构的 CDATA 片段,例如多兄弟节点、嵌套子节点或 `item` 列表,才会触发结构化恢复。对 `command` / `content` 等长文本参数CDATA 内部的 Markdown fenced DSML / XML 示例会作为原文保护;示例里的 `]]></parameter>``</tool_calls>` 不会截断外层工具调用,解析器会继续等待围栏外真正的参数 / wrapper 结束标签。
Go 侧读取 DeepSeek SSE 时不再依赖 `bufio.Scanner` 的固定 2MiB 单行上限;当写文件类工具把很长的 `content` 放在单个 `data:` 行里返回时,非流式收集、流式解析和 auto-continue 透传都会保留完整行,再进入同一套工具解析与序列化流程。
在 assistant 最终回包阶段,如果某个 tool 参数在声明 schema 中明确是 `string`,兼容层会在把解析后的 `tool_calls` / `function_call` 重新序列化成 OpenAI / Responses / Claude 可见参数前,递归把该路径上的 number / bool / object / array 统一转成字符串;其中 object / array 会压成紧凑 JSON 字符串。这个保护只对 schema 明确声明为 string 的路径生效,不会改写本来就是 `number` / `boolean` / `object` / `array` 的参数。这样可以兼容 DeepSeek 输出了结构化片段、但上游客户端工具 schema 又严格要求字符串参数的场景(例如 `content``prompt``path``taskId` 等)。

View File

@@ -62,12 +62,12 @@
- 已识别成功的工具调用不会再次回流到普通文本
- 不符合新格式的块不会执行,并继续按原样文本透传
- 如果一个 confusable / 漂移过的工具壳在 candidate-span canonicalization + repair 后仍能形成有效工具调用wrapper 后面的 suffix prose 会继续按普通文本输出;如果 canonicalization 后仍不满足 wrapper-confidence 或 XML 语义,整块就作为普通文本释放,不会半吞半漏。
- fenced code block反引号 `` ``` `` 和波浪线 `~~~`)中的 XML 示例始终按普通文本处理
- fenced code block反引号 `` ``` `` 和波浪线 `~~~`)以及 Markdown inline code span例如 `` `<tool_calls>...</tool_calls>` ``)中的 XML 示例始终按普通文本处理
- 支持嵌套围栏(如 4 反引号嵌套 3 反引号)和 CDATA 内围栏保护
-`command` / `content` 等长文本参数CDATA 内部如果包含 Markdown fenced DSML / XML 示例,即使示例里出现 `]]></parameter>` / `</tool_calls>` 这类看起来像外层结束标签的片段,也会继续按参数原文保留,直到真正位于围栏外的外层结束标签
- CDATA 开头也按扫描式识别,除了标准 `<![CDATA[`,还会接受 `<[CDATA[``<、[CDATA[` 这类分隔符漂移,并统一还原为原文字段内容。
- 如果模型把 `<![CDATA[` 打开后却没有闭合,流式扫描阶段仍会保守地继续缓冲,不会误把 CDATA 里的示例 XML 当成真实工具调用;在最终 parse / flush 恢复阶段,会对这类 loose CDATA 做窄修复,尽量保住外层已完整包裹的真实工具调用
- 当文本中 mention 了某种标签名(如 `<dsml|tool_calls>` 或 Markdown inline code 里的 `<|DSML|tool_calls>`而后面紧跟真正工具调用时sieve 会跳过不可解析的 mention 候选并继续匹配后续真实工具块,不会因 mention 导致工具调用丢失,也不会截断 mention 后的正文
- 当文本中 mention 了某种标签名(如 `<dsml|tool_calls>` 或 Markdown inline code 里的 `<|DSML|tool_calls>`而后面紧跟真正工具调用时sieve 会跳过不可解析的 mention 候选并继续匹配后续真实工具块;行内 code span 中即使出现完整 `<tool_calls>...</tool_calls>` 示例也不会执行,不会因 mention 导致工具调用丢失,也不会截断 mention 后的正文
- Go 侧 SSE 读取不再使用 `bufio.Scanner` 的固定 token 上限;单个 `data:` 行中包含很长的写文件参数时,非流式收集、流式解析与 auto-continue 透传都应保留完整行,再交给 tool parser 处理
另外,`<parameter>` 的值如果本身是合法 JSON 字面量,也会按结构化值解析,而不是一律保留为字符串。例如 `123``true``null``[1,2]``{"a":1}` 都会还原成对应的 number / boolean / null / array / object。
@@ -111,6 +111,7 @@ go test -v -run 'TestParseToolCalls|TestProcessToolSieve' ./internal/toolcall ./
- 混搭标签DSML wrapper + canonical inner归一化后正常解析
- 波浪线围栏 `~~~` 内的示例不执行
- 嵌套围栏4 反引号嵌套 3 反引号)内的示例不执行
- Markdown 行内 code span 内的完整工具调用示例不执行
- 文本 mention 标签名后紧跟真正工具调用的场景(含同一 wrapper 变体)
- 空参数结构化保留malformed executable-looking XML wrapper 作为文本释放
- 非兼容内容按普通文本透传

View File

@@ -2,8 +2,6 @@
const CDATA_PATTERN = /^(?:<|〈)(?:!|)\[CDATA\[([\s\S]*?)]](?:>||〉)$/i;
const XML_ATTR_PATTERN = /\b([a-z0-9_:-]+)\s*=\s*("([^"]*)"|'([^']*)')/gi;
const XML_TOOL_CALLS_CLOSE_PATTERN = /[<][\/]tool_calls\s*[>]/gi;
const XML_INVOKE_START_PATTERN = /[<]invoke\b[^>]*\bname\s*[=]\s*(?:"([^"]*)"|'([^']*)'|“([^”]*)”|([^]*)|([^]*)|([^]*))/i;
const TOOL_MARKUP_NAMES = [
{ raw: 'tool_calls', canonical: 'tool_calls' },
{ raw: 'tool-calls', canonical: 'tool_calls', dsmlOnly: true },
@@ -71,6 +69,66 @@ function stripFencedCodeBlocks(text) {
return out.join('');
}
function stripMarkdownCodeSpans(text) {
const raw = toStringSafe(text);
if (!raw) {
return '';
}
let out = '';
for (let i = 0; i < raw.length;) {
const skipped = skipXmlIgnoredSection(raw, i);
if (skipped.blocked) {
out += raw.slice(i);
break;
}
if (skipped.advanced) {
out += raw.slice(i, skipped.next);
i = skipped.next;
continue;
}
const spanEnd = markdownCodeSpanEnd(raw, i);
if (spanEnd.ok) {
i = spanEnd.end;
continue;
}
out += raw[i];
i += 1;
}
return out;
}
function markdownCodeSpanEnd(text, start) {
const raw = toStringSafe(text);
if (start < 0 || start >= raw.length || raw[start] !== '`') {
return { ok: false, end: start };
}
const count = countLeadingChars(raw, start, '`');
if (!count) {
return { ok: false, end: start };
}
let search = start + count;
while (search < raw.length) {
if (raw[search] !== '`') {
search += 1;
continue;
}
const run = countLeadingChars(raw, search, '`');
if (run === count) {
return { ok: true, end: search + run };
}
search += run;
}
return { ok: false, end: start };
}
function countLeadingChars(text, start, ch) {
let count = 0;
while (start + count < text.length && text[start + count] === ch) {
count += 1;
}
return count;
}
function parseFenceOpenLine(trimmed) {
if (trimmed.length < 3) return null;
const ch = trimmed[0];
@@ -136,12 +194,12 @@ function parseMarkupToolCalls(text) {
if (!raw) {
return [];
}
let wrappers = findXmlElementBlocks(raw, 'tool_calls');
let wrappers = findToolCallElementBlocksOutsideIgnored(raw);
if (wrappers.length === 0 && hasRepairableXMLToolCallsWrapper(raw)) {
const repaired = repairMissingXMLToolCallsOpeningWrapper(raw);
if (repaired !== raw) {
raw = repaired;
wrappers = findXmlElementBlocks(raw, 'tool_calls');
wrappers = findToolCallElementBlocksOutsideIgnored(raw);
}
}
const out = [];
@@ -157,6 +215,36 @@ function parseMarkupToolCalls(text) {
return out;
}
function findToolCallElementBlocksOutsideIgnored(text) {
const raw = toStringSafe(text);
const out = [];
for (let searchFrom = 0; searchFrom < raw.length;) {
const tag = findToolMarkupTagOutsideIgnored(raw, searchFrom);
if (!tag) {
break;
}
if (tag.closing || tag.name !== 'tool_calls') {
searchFrom = tag.end + 1;
continue;
}
const closeTag = findMatchingToolMarkupClose(raw, tag);
if (!closeTag) {
searchFrom = tag.end + 1;
continue;
}
const endDelim = xmlTagEndDelimiterLenEndingAt(raw, tag.end);
const attrsEnd = endDelim > 0 ? tag.end + 1 - endDelim : tag.end + 1;
out.push({
attrs: raw.slice(tag.nameEnd, attrsEnd),
body: raw.slice(tag.end + 1, closeTag.start),
start: tag.start,
end: closeTag.end + 1,
});
searchFrom = closeTag.end + 1;
}
return out;
}
function normalizeDSMLToolCallMarkup(text) {
const raw = toStringSafe(text);
if (!raw) {
@@ -196,6 +284,11 @@ function containsToolCallWrapperSyntaxOutsideIgnored(text) {
i = skipped.next;
continue;
}
const spanEnd = markdownCodeSpanEnd(raw, i);
if (spanEnd.ok) {
i = spanEnd.end;
continue;
}
const tag = scanToolMarkupTagAt(raw, i);
if (tag) {
if (tag.name !== 'tool_calls') {
@@ -232,6 +325,11 @@ function containsToolMarkupSyntaxOutsideIgnored(text) {
i = skipped.next;
continue;
}
const spanEnd = markdownCodeSpanEnd(raw, i);
if (spanEnd.ok) {
i = spanEnd.end;
continue;
}
const tag = scanToolMarkupTagAt(raw, i);
if (tag) {
if (tag.dsmlLike) {
@@ -267,6 +365,12 @@ function replaceDSMLToolMarkupOutsideIgnored(text) {
i = skipped.next;
continue;
}
const spanEnd = markdownCodeSpanEnd(raw, i);
if (spanEnd.ok) {
out += raw.slice(i, spanEnd.end);
i = spanEnd.end;
continue;
}
const tag = scanToolMarkupTagAt(raw, i);
if (tag) {
out += `<${tag.closing ? '/' : ''}${tag.name}${raw.slice(tag.nameEnd, tag.end)}>`;
@@ -553,6 +657,11 @@ function findToolMarkupTagOutsideIgnored(text, from) {
i = skipped.next;
continue;
}
const spanEnd = markdownCodeSpanEnd(raw, i);
if (spanEnd.ok) {
i = spanEnd.end;
continue;
}
const tag = scanToolMarkupTagAt(raw, i);
if (tag) {
return tag;
@@ -987,6 +1096,12 @@ function canonicalizeToolCallCandidateSpans(text) {
i = skipped.next;
continue;
}
const spanEnd = markdownCodeSpanEnd(raw, i);
if (spanEnd.ok) {
out += raw.slice(i, spanEnd.end);
i = spanEnd.end;
continue;
}
const tag = scanToolMarkupTagAt(raw, i);
if (!tag) {
out += raw[i];
@@ -2249,30 +2364,62 @@ function sanitizeLooseCDATA(text) {
function hasRepairableXMLToolCallsWrapper(text) {
const raw = toStringSafe(text).trim();
if (!raw || raw.toLowerCase().includes('<tool_calls')) {
if (!raw || firstToolMarkupTagByName(raw, 'tool_calls', false)) {
return false;
}
const closeMatches = [...raw.matchAll(XML_TOOL_CALLS_CLOSE_PATTERN)];
if (closeMatches.length === 0) {
const invoke = firstToolMarkupTagByName(raw, 'invoke', false);
if (!invoke) {
return false;
}
const invoke = raw.match(XML_INVOKE_START_PATTERN);
if (!invoke || invoke.index === undefined) {
const close = lastToolMarkupTagByName(raw, 'tool_calls', true);
if (!close) {
return false;
}
const close = closeMatches[closeMatches.length - 1];
return invoke.index < close.index;
return invoke.start < close.start;
}
function repairMissingXMLToolCallsOpeningWrapper(text) {
const raw = toStringSafe(text);
if (!hasRepairableXMLToolCallsWrapper(raw)) {
if (firstToolMarkupTagByName(raw, 'tool_calls', false)) {
return raw;
}
const closeMatches = [...raw.matchAll(XML_TOOL_CALLS_CLOSE_PATTERN)];
const invoke = raw.match(XML_INVOKE_START_PATTERN);
const close = closeMatches[closeMatches.length - 1];
return `${raw.slice(0, invoke.index)}<tool_calls>${raw.slice(invoke.index, close.index)}</tool_calls>${raw.slice(close.index + close[0].length)}`;
const invoke = firstToolMarkupTagByName(raw, 'invoke', false);
const close = lastToolMarkupTagByName(raw, 'tool_calls', true);
if (!invoke || !close || invoke.start >= close.start) {
return raw;
}
return `${raw.slice(0, invoke.start)}<tool_calls>${raw.slice(invoke.start, close.start)}</tool_calls>${raw.slice(close.end + 1)}`;
}
function firstToolMarkupTagByName(text, name, closing) {
const raw = toStringSafe(text);
for (let searchFrom = 0; searchFrom < raw.length;) {
const tag = findToolMarkupTagOutsideIgnored(raw, searchFrom);
if (!tag) {
break;
}
if (tag.name === name && tag.closing === closing) {
return tag;
}
searchFrom = tag.end + 1;
}
return null;
}
function lastToolMarkupTagByName(text, name, closing) {
const raw = toStringSafe(text);
let last = null;
for (let searchFrom = 0; searchFrom < raw.length;) {
const tag = findToolMarkupTagOutsideIgnored(raw, searchFrom);
if (!tag) {
break;
}
if (tag.name === name && tag.closing === closing) {
last = tag;
}
searchFrom = tag.end + 1;
}
return last;
}
function rawNameForTag(tag) {
@@ -2494,6 +2641,7 @@ function isOnlyRawValue(obj) {
module.exports = {
stripFencedCodeBlocks,
stripMarkdownCodeSpans,
parseMarkupToolCalls,
normalizeDSMLToolCallMarkup,
containsToolMarkupSyntaxOutsideIgnored,

View File

@@ -70,10 +70,17 @@ function processToolSieveChunk(state, chunk, toolNames) {
break;
}
const start = findToolSegmentStart(state, pending);
if (start === HOLD_TOOL_SEGMENT_START) {
break;
}
if (start >= 0) {
const prefix = pending.slice(0, start);
if (prefix) {
const resetMarkdownSpan = shouldResetUnclosedMarkdownPrefix(state, prefix, pending.slice(start));
noteText(state, prefix);
if (resetMarkdownSpan) {
state.markdownCodeSpanTicks = 0;
}
events.push({ type: 'text', text: prefix });
}
state.pending = '';
@@ -98,6 +105,10 @@ function flushToolSieve(state, toolNames) {
return [];
}
const events = processToolSieveChunk(state, '', toolNames);
if (state.pending && Number.isInteger(state.markdownCodeSpanTicks) && state.markdownCodeSpanTicks > 0) {
state.markdownCodeSpanTicks = 0;
events.push(...processToolSieveChunk(state, '', toolNames));
}
if (Array.isArray(state.pendingToolCalls) && state.pendingToolCalls.length > 0) {
events.push({ type: 'tool_calls', calls: state.pendingToolCalls });
state.pendingToolRaw = '';
@@ -164,6 +175,15 @@ function splitSafeContentForToolDetection(state, s) {
if (insideCodeFenceWithState(state, text.slice(0, xmlIdx))) {
return [text, ''];
}
const markdown = markdownCodeSpanStateAt(state, text.slice(0, xmlIdx));
if (markdown.ticks > 0) {
if (markdownCodeSpanCloses(text.slice(xmlIdx), markdown.ticks)) {
return [text, ''];
}
if (markdown.fromPrior) {
return ['', text];
}
}
if (xmlIdx > 0) {
return [text.slice(0, xmlIdx), text.slice(xmlIdx)];
}
@@ -172,6 +192,8 @@ function splitSafeContentForToolDetection(state, s) {
return [text, ''];
}
const HOLD_TOOL_SEGMENT_START = -2;
function findToolSegmentStart(state, s) {
if (!s) {
return -1;
@@ -182,13 +204,98 @@ function findToolSegmentStart(state, s) {
if (!tag) {
return -1;
}
if (!insideCodeFenceWithState(state, s.slice(0, tag.start))) {
if (insideCodeFenceWithState(state, s.slice(0, tag.start))) {
offset = tag.end + 1;
continue;
}
const markdown = markdownCodeSpanStateAt(state, s.slice(0, tag.start));
if (markdown.ticks === 0) {
return tag.start;
}
offset = tag.end + 1;
if (markdownCodeSpanCloses(s.slice(tag.start), markdown.ticks)) {
offset = tag.end + 1;
continue;
}
if (markdown.fromPrior) {
return HOLD_TOOL_SEGMENT_START;
}
return tag.start;
}
}
function markdownCodeSpanStateAt(state, text) {
const raw = typeof text === 'string' ? text : '';
let ticks = state && Number.isInteger(state.markdownCodeSpanTicks) ? state.markdownCodeSpanTicks : 0;
let fromPrior = ticks > 0;
for (let i = 0; i < raw.length;) {
if (raw[i] !== '`') {
i += 1;
continue;
}
const run = countBacktickRun(raw, i);
if (ticks === 0) {
if (run >= 3 && atMarkdownFenceLineStart(raw, i)) {
i += run;
continue;
}
if (state && insideCodeFenceWithState(state, raw.slice(0, i))) {
i += run;
continue;
}
ticks = run;
fromPrior = false;
} else if (run === ticks) {
ticks = 0;
fromPrior = false;
}
i += run;
}
return { ticks, fromPrior };
}
function markdownCodeSpanCloses(text, ticks) {
const raw = typeof text === 'string' ? text : '';
if (!Number.isInteger(ticks) || ticks <= 0) {
return false;
}
for (let i = 0; i < raw.length;) {
if (raw[i] !== '`') {
i += 1;
continue;
}
const run = countBacktickRun(raw, i);
if (run === ticks) {
return true;
}
i += run;
}
return false;
}
function shouldResetUnclosedMarkdownPrefix(state, prefix, suffix) {
const markdown = markdownCodeSpanStateAt(state, prefix);
return markdown.ticks > 0 && !markdown.fromPrior && !markdownCodeSpanCloses(suffix, markdown.ticks);
}
function countBacktickRun(text, start) {
let count = 0;
while (start + count < text.length && text[start + count] === '`') {
count += 1;
}
return count;
}
function atMarkdownFenceLineStart(text, idx) {
for (let i = idx - 1; i >= 0; i -= 1) {
const ch = text[i];
if (ch === ' ' || ch === '\t') {
continue;
}
return ch === '\n' || ch === '\r';
}
return true;
}
function consumeToolCapture(state, toolNames) {
const captured = state.capture || '';
if (!captured) {

View File

@@ -9,6 +9,7 @@ function createToolSieveState() {
codeFencePendingTicks: 0,
codeFencePendingTildes: 0,
codeFenceLineStart: true,
markdownCodeSpanTicks: 0,
pendingToolRaw: '',
pendingToolCalls: [],
disableDeltas: false,
@@ -35,6 +36,7 @@ function noteText(state, text) {
if (!state || !hasMeaningfulText(text)) {
return;
}
updateMarkdownCodeSpanState(state, text);
updateCodeFenceState(state, text);
}
@@ -64,6 +66,68 @@ function insideCodeFenceWithState(state, text) {
return simulated.stack.length > 0;
}
function insideMarkdownCodeSpanWithState(state, text) {
if (!state) {
return simulateMarkdownCodeSpanTicks(null, 0, text) > 0;
}
const ticks = Number.isInteger(state.markdownCodeSpanTicks) ? state.markdownCodeSpanTicks : 0;
return simulateMarkdownCodeSpanTicks(state, ticks, text) > 0;
}
function updateMarkdownCodeSpanState(state, text) {
if (!state || !hasMeaningfulText(text)) {
return;
}
const ticks = Number.isInteger(state.markdownCodeSpanTicks) ? state.markdownCodeSpanTicks : 0;
state.markdownCodeSpanTicks = simulateMarkdownCodeSpanTicks(state, ticks, text);
}
function simulateMarkdownCodeSpanTicks(state, initialTicks, text) {
const raw = typeof text === 'string' ? text : '';
let ticks = Number.isInteger(initialTicks) ? initialTicks : 0;
for (let i = 0; i < raw.length;) {
if (raw[i] !== '`') {
i += 1;
continue;
}
const run = countBacktickRun(raw, i);
if (ticks === 0) {
if (run >= 3 && atMarkdownFenceLineStart(raw, i)) {
i += run;
continue;
}
if (state && insideCodeFenceWithState(state, raw.slice(0, i))) {
i += run;
continue;
}
ticks = run;
} else if (run === ticks) {
ticks = 0;
}
i += run;
}
return ticks;
}
function countBacktickRun(text, start) {
let count = 0;
while (start + count < text.length && text[start + count] === '`') {
count += 1;
}
return count;
}
function atMarkdownFenceLineStart(text, idx) {
for (let i = idx - 1; i >= 0; i -= 1) {
const ch = text[i];
if (ch === ' ' || ch === '\t') {
continue;
}
return ch === '\n' || ch === '\r';
}
return true;
}
function updateCodeFenceState(state, text) {
if (!state) {
return;
@@ -188,7 +252,9 @@ module.exports = {
looksLikeToolExampleContext,
insideCodeFence,
insideCodeFenceWithState,
insideMarkdownCodeSpanWithState,
updateCodeFenceState,
updateMarkdownCodeSpanState,
hasMeaningfulText,
toStringSafe,
};

View File

@@ -6,6 +6,7 @@ import (
"fmt"
"log"
"net/http"
"net/url"
"os"
"runtime"
"strings"
@@ -160,6 +161,16 @@ func (f *filteredLogFormatter) NewLogEntry(r *http.Request) middleware.LogEntry
return noopLogEntry{}
}
}
if r != nil && r.URL != nil {
if redacted, changed := redactSensitiveQueryParams(r.URL); changed {
cloned := *r
clonedURL := *r.URL
clonedURL.RawQuery = redacted
cloned.URL = &clonedURL
cloned.RequestURI = clonedURL.RequestURI()
return f.base.NewLogEntry(&cloned)
}
}
return f.base.NewLogEntry(r)
}
@@ -169,6 +180,86 @@ func (noopLogEntry) Write(_ int, _ int, _ http.Header, _ time.Duration, _ interf
func (noopLogEntry) Panic(_ interface{}, _ []byte) {}
func redactSensitiveQueryParams(u *url.URL) (string, bool) {
if u == nil || u.RawQuery == "" {
return "", false
}
values, err := url.ParseQuery(u.RawQuery)
if err != nil {
return redactSensitiveRawQueryParams(u.RawQuery)
}
changed := false
for name, vals := range values {
if !isSensitiveQueryParam(name) {
continue
}
for i := range vals {
vals[i] = "REDACTED"
}
values[name] = vals
changed = true
}
if !changed {
return "", false
}
return values.Encode(), true
}
func redactSensitiveRawQueryParams(rawQuery string) (string, bool) {
if rawQuery == "" {
return "", false
}
var b strings.Builder
b.Grow(len(rawQuery))
changed := false
start := 0
for i := 0; i <= len(rawQuery); i++ {
if i < len(rawQuery) && rawQuery[i] != '&' && rawQuery[i] != ';' {
continue
}
segment := rawQuery[start:i]
b.WriteString(redactSensitiveRawQuerySegment(segment, &changed))
if i < len(rawQuery) {
b.WriteByte(rawQuery[i])
}
start = i + 1
}
if !changed {
return "", false
}
return b.String(), true
}
func redactSensitiveRawQuerySegment(segment string, changed *bool) string {
if segment == "" {
return segment
}
name := segment
valueStart := -1
if eq := strings.IndexByte(segment, '='); eq >= 0 {
name = segment[:eq]
valueStart = eq + 1
}
decodedName, err := url.QueryUnescape(name)
if err != nil {
decodedName = name
}
if !isSensitiveQueryParam(decodedName) {
return segment
}
if changed != nil {
*changed = true
}
if valueStart < 0 {
return name + "=REDACTED"
}
return segment[:valueStart] + "REDACTED"
}
func isSensitiveQueryParam(name string) bool {
return strings.EqualFold(name, "key") || strings.EqualFold(name, "api_key")
}
var defaultCORSAllowHeaders = []string{
"Content-Type",
"Authorization",

View File

@@ -0,0 +1,104 @@
package server
import (
"bytes"
"log"
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"github.com/go-chi/chi/v5/middleware"
)
func TestFilteredLogFormatterRedactsSensitiveQueryParams(t *testing.T) {
var buf bytes.Buffer
formatter := &filteredLogFormatter{
base: &middleware.DefaultLogFormatter{
Logger: log.New(&buf, "", 0),
NoColor: true,
},
}
req := httptest.NewRequest(
http.MethodPost,
"/v1beta/models/gemini-2.5-pro:generateContent?key=caller-secret&api_key=second-secret&alt=sse",
nil,
)
entry := formatter.NewLogEntry(req)
entry.Write(http.StatusOK, 0, http.Header{}, time.Millisecond, nil)
got := buf.String()
for _, secret := range []string{"caller-secret", "second-secret"} {
if strings.Contains(got, secret) {
t.Fatalf("log line contains sensitive query value %q: %s", secret, got)
}
}
if !strings.Contains(got, "key=REDACTED") || !strings.Contains(got, "api_key=REDACTED") {
t.Fatalf("log line did not include redacted sensitive params: %s", got)
}
if !strings.Contains(got, "alt=sse") {
t.Fatalf("log line did not preserve non-sensitive query param: %s", got)
}
if req.URL.RawQuery != "key=caller-secret&api_key=second-secret&alt=sse" {
t.Fatalf("request was mutated, RawQuery = %q", req.URL.RawQuery)
}
}
func TestFilteredLogFormatterRedactsSensitiveQueryParamsWhenMalformed(t *testing.T) {
tests := []struct {
name string
target string
secrets []string
redacted []string
preserved []string
}{
{
name: "semicolon separator",
target: "/v1beta/models/gemini-2.5-pro:generateContent?key=caller-secret;alt=sse",
secrets: []string{"caller-secret"},
redacted: []string{"key=REDACTED"},
preserved: []string{"alt=sse"},
},
{
name: "bad escape in sensitive value",
target: "/v1beta/models/gemini-2.5-pro:generateContent?api_key=second-secret%ZZ",
secrets: []string{"second-secret"},
redacted: []string{"api_key=REDACTED"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var buf bytes.Buffer
formatter := &filteredLogFormatter{
base: &middleware.DefaultLogFormatter{
Logger: log.New(&buf, "", 0),
NoColor: true,
},
}
req := httptest.NewRequest(http.MethodPost, tt.target, nil)
entry := formatter.NewLogEntry(req)
entry.Write(http.StatusOK, 0, http.Header{}, time.Millisecond, nil)
got := buf.String()
for _, secret := range tt.secrets {
if strings.Contains(got, secret) {
t.Fatalf("log line contains sensitive query value %q: %s", secret, got)
}
}
for _, want := range tt.redacted {
if !strings.Contains(got, want) {
t.Fatalf("log line missing redacted query %q: %s", want, got)
}
}
for _, want := range tt.preserved {
if !strings.Contains(got, want) {
t.Fatalf("log line missing preserved query %q: %s", want, got)
}
}
})
}
}

View File

@@ -64,3 +64,44 @@ func TestStripFencedCodeBlocks_InlineBackticksNotFence(t *testing.T) {
t.Fatalf("expected Before/After, got %q", got)
}
}
func TestParseToolCalls_IgnoresMarkdownDocumentationExamples(t *testing.T) {
text := "解析器支持多种工具调用格式。\n\n" +
"入口函数 `ParseToolCalls(text, availableToolNames)` 会返回调用列表。\n\n" +
"核心流程会解析 XML 格式的 `<tool_calls>` / `<invoke>` 标记。\n\n" +
"### 标准 XML 结构\n" +
"```xml\n" +
"<tool_calls>\n" +
" <invoke name=\"read_file\">\n" +
" <parameter name=\"path\">config.json</parameter>\n" +
" </invoke>\n" +
"</tool_calls>\n" +
"```\n\n" +
"DSML 风格形如 `<invoke name=\"tool\">...</invoke>`,也可能提到 `<tool_calls>` 包裹。\n"
got := ParseToolCallsDetailed(text, []string{"read_file"})
if len(got.Calls) != 0 {
t.Fatalf("markdown documentation examples should not parse as tool calls, got %#v", got.Calls)
}
}
func TestParseToolCalls_IgnoresInlineMarkdownToolCallExample(t *testing.T) {
text := "示例:`<tool_calls><invoke name=\"read_file\"><parameter name=\"path\">README.md</parameter></invoke></tool_calls>`"
got := ParseToolCallsDetailed(text, []string{"read_file"})
if len(got.Calls) != 0 {
t.Fatalf("inline markdown tool example should not parse as tool calls, got %#v", got.Calls)
}
}
func TestParseToolCalls_PreservesBackticksInsideToolParameters(t *testing.T) {
text := "<tool_calls><invoke name=\"Bash\"><parameter name=\"command\">echo `date`</parameter></invoke></tool_calls>"
got := ParseToolCallsDetailed(text, []string{"Bash"})
if len(got.Calls) != 1 {
t.Fatalf("expected one tool call, got %#v", got.Calls)
}
if got.Calls[0].Input["command"] != "echo `date`" {
t.Fatalf("expected command backticks preserved, got %#v", got.Calls[0].Input["command"])
}
}

View File

@@ -28,6 +28,11 @@ func canonicalizeToolCallCandidateSpans(text string) string {
i = next
continue
}
if end, ok := markdownCodeSpanEnd(text, i); ok {
b.WriteString(text[i:end])
i = end
continue
}
tag, ok := scanToolMarkupTagAt(text, i)
if !ok {
b.WriteByte(text[i])
@@ -619,19 +624,18 @@ func hasRepairableXMLToolCallsWrapper(text string) bool {
if strings.TrimSpace(text) == "" {
return false
}
if strings.Contains(strings.ToLower(text), "<tool_calls") {
if _, ok := firstToolMarkupTagByName(text, "tool_calls", false); ok {
return false
}
closeMatches := xmlToolCallsClosePattern.FindAllStringIndex(text, -1)
if len(closeMatches) == 0 {
invokeTag, ok := firstToolMarkupTagByName(text, "invoke", false)
if !ok {
return false
}
invokeLoc := xmlInvokeStartPattern.FindStringIndex(text)
if invokeLoc == nil {
closeTag, ok := lastToolMarkupTagByName(text, "tool_calls", true)
if !ok {
return false
}
closeLoc := closeMatches[len(closeMatches)-1]
return invokeLoc[0] < closeLoc[0]
return invokeTag.Start < closeTag.Start
}
func toolCDATAOpenLenAt(text string, idx int) int {

View File

@@ -33,6 +33,11 @@ func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
i = next
continue
}
if end, ok := markdownCodeSpanEnd(text, i); ok {
b.WriteString(text[i:end])
i = end
continue
}
tag, ok := scanToolMarkupTagAt(text, i)
if !ok {
b.WriteByte(text[i])

View File

@@ -153,6 +153,29 @@ func stripFencedCodeBlocks(text string) string {
return b.String()
}
func markdownCodeSpanEnd(text string, start int) (int, bool) {
if start < 0 || start >= len(text) || text[start] != '`' {
return start, false
}
count := countLeadingFenceChars(text[start:], '`')
if count == 0 {
return start, false
}
search := start + count
for search < len(text) {
if text[search] != '`' {
search++
continue
}
run := countLeadingFenceChars(text[search:], '`')
if run == count {
return search + run, true
}
search += run
}
return start, false
}
func cdataStartsBeforeFence(line string) bool {
cdataIdx := indexToolCDATAOpen(line, 0)
if cdataIdx < 0 {

View File

@@ -10,16 +10,14 @@ import (
)
var xmlAttrPattern = regexp.MustCompile(`(?is)\b([a-z0-9_:-]+)\s*=\s*("([^"]*)"|'([^']*)')`)
var xmlToolCallsClosePattern = regexp.MustCompile(`(?is)</tool_calls>`)
var xmlInvokeStartPattern = regexp.MustCompile(`(?is)<invoke\b[^>]*\bname\s*=\s*("([^"]*)"|'([^']*)')`)
var cdataBRSeparatorPattern = regexp.MustCompile(`(?i)<br\s*/?>`)
func parseXMLToolCalls(text string) []ParsedToolCall {
wrappers := findXMLElementBlocks(text, "tool_calls")
wrappers := findToolCallElementBlocksOutsideIgnored(text)
if len(wrappers) == 0 {
repaired := repairMissingXMLToolCallsOpeningWrapper(text)
if repaired != text {
wrappers = findXMLElementBlocks(repaired, "tool_calls")
wrappers = findToolCallElementBlocksOutsideIgnored(repaired)
}
}
if len(wrappers) == 0 {
@@ -41,26 +39,89 @@ func parseXMLToolCalls(text string) []ParsedToolCall {
return out
}
func findToolCallElementBlocksOutsideIgnored(text string) []xmlElementBlock {
if text == "" {
return nil
}
var out []xmlElementBlock
for searchFrom := 0; searchFrom < len(text); {
tag, ok := FindToolMarkupTagOutsideIgnored(text, searchFrom)
if !ok {
break
}
if tag.Closing || tag.Name != "tool_calls" {
searchFrom = tag.End + 1
continue
}
closeTag, ok := FindMatchingToolMarkupClose(text, tag)
if !ok {
searchFrom = tag.End + 1
continue
}
attrsEnd := tag.End + 1
if delimLen := xmlTagEndDelimiterLenEndingAt(text, tag.End); delimLen > 0 {
attrsEnd = tag.End + 1 - delimLen
}
out = append(out, xmlElementBlock{
Attrs: text[tag.NameEnd:attrsEnd],
Body: text[tag.End+1 : closeTag.Start],
Start: tag.Start,
End: closeTag.End + 1,
})
searchFrom = closeTag.End + 1
}
return out
}
func repairMissingXMLToolCallsOpeningWrapper(text string) string {
lower := strings.ToLower(text)
if strings.Contains(lower, "<tool_calls") {
if _, ok := firstToolMarkupTagByName(text, "tool_calls", false); ok {
return text
}
closeMatches := xmlToolCallsClosePattern.FindAllStringIndex(text, -1)
if len(closeMatches) == 0 {
invokeTag, ok := firstToolMarkupTagByName(text, "invoke", false)
if !ok {
return text
}
invokeLoc := xmlInvokeStartPattern.FindStringIndex(text)
if invokeLoc == nil {
return text
}
closeLoc := closeMatches[len(closeMatches)-1]
if invokeLoc[0] >= closeLoc[0] {
closeTag, ok := lastToolMarkupTagByName(text, "tool_calls", true)
if !ok || invokeTag.Start >= closeTag.Start {
return text
}
return text[:invokeLoc[0]] + "<tool_calls>" + text[invokeLoc[0]:closeLoc[0]] + "</tool_calls>" + text[closeLoc[1]:]
return text[:invokeTag.Start] + "<tool_calls>" + text[invokeTag.Start:closeTag.Start] + "</tool_calls>" + text[closeTag.End+1:]
}
func firstToolMarkupTagByName(text, name string, closing bool) (ToolMarkupTag, bool) {
for searchFrom := 0; searchFrom < len(text); {
tag, ok := FindToolMarkupTagOutsideIgnored(text, searchFrom)
if !ok {
break
}
if tag.Name == name && tag.Closing == closing {
return tag, true
}
searchFrom = tag.End + 1
}
return ToolMarkupTag{}, false
}
func lastToolMarkupTagByName(text, name string, closing bool) (ToolMarkupTag, bool) {
var last ToolMarkupTag
found := false
for searchFrom := 0; searchFrom < len(text); {
tag, ok := FindToolMarkupTagOutsideIgnored(text, searchFrom)
if !ok {
break
}
if tag.Name == name && tag.Closing == closing {
last = tag
found = true
}
searchFrom = tag.End + 1
}
if !found {
return ToolMarkupTag{}, false
}
return last, true
}
func parseSingleXMLToolCall(block xmlElementBlock) (ParsedToolCall, bool) {

View File

@@ -42,6 +42,10 @@ func ContainsToolMarkupSyntaxOutsideIgnored(text string) (hasDSML, hasCanonical
i = next
continue
}
if end, ok := markdownCodeSpanEnd(text, i); ok {
i = end
continue
}
if tag, ok := scanToolMarkupTagAt(text, i); ok {
if tag.DSMLLike {
hasDSML = true
@@ -69,6 +73,10 @@ func ContainsToolCallWrapperSyntaxOutsideIgnored(text string) (hasDSML, hasCanon
i = next
continue
}
if end, ok := markdownCodeSpanEnd(text, i); ok {
i = end
continue
}
if tag, ok := scanToolMarkupTagAt(text, i); ok {
if tag.Name != "tool_calls" {
i = tag.End + 1
@@ -100,6 +108,10 @@ func FindToolMarkupTagOutsideIgnored(text string, start int) (ToolMarkupTag, boo
i = next
continue
}
if end, ok := markdownCodeSpanEnd(text, i); ok {
i = end
continue
}
if tag, ok := scanToolMarkupTagAt(text, i); ok {
return tag, true
}

View File

@@ -57,3 +57,123 @@ func TestProcessToolSieveNestedFourBacktickFenceDoesNotTrigger(t *testing.T) {
t.Fatalf("expected 4-backtick fenced example to stay text, got %d tool calls", toolCalls)
}
}
func TestProcessToolSieveMarkdownDocumentationExamplesDoNotTrigger(t *testing.T) {
var state State
chunks := []string{
"解析器支持多种工具调用格式。\n\n",
"入口函数 `ParseToolCalls(text, availableToolNames)` 会返回调用列表。\n\n",
"核心流程会解析 XML 格式的 `<tool_calls>` / `<invoke>` 标记。\n\n",
"### 标准 XML 结构\n",
"```xml\n",
"<tool_calls>\n",
" <invoke name=\"read_file\">\n",
" <parameter name=\"path\">config.json</parameter>\n",
" </invoke>\n",
"</tool_calls>\n",
"```\n\n",
"DSML 风格形如 `<invoke name=\"tool\">...</invoke>`,也可能提到 `<tool_calls>` 包裹。\n",
}
var events []Event
for _, c := range chunks {
events = append(events, ProcessChunk(&state, c, []string{"read_file"})...)
}
events = append(events, Flush(&state, []string{"read_file"})...)
var textContent strings.Builder
toolCalls := 0
for _, evt := range events {
textContent.WriteString(evt.Content)
toolCalls += len(evt.ToolCalls)
}
if toolCalls != 0 {
t.Fatalf("expected markdown documentation examples to stay text, got %d tool calls", toolCalls)
}
if !strings.Contains(textContent.String(), "标准 XML 结构") || !strings.Contains(textContent.String(), "DSML 风格") {
t.Fatalf("expected documentation text preserved, got %q", textContent.String())
}
}
func TestProcessToolSieveInlineMarkdownToolCallSplitAcrossChunksDoesNotTrigger(t *testing.T) {
var state State
chunks := []string{
"示例:`",
"<tool_calls><invoke name=\"read_file\"><parameter name=\"path\">README.md</parameter></invoke></tool_calls>",
"` 完毕。",
}
var events []Event
for _, c := range chunks {
events = append(events, ProcessChunk(&state, c, []string{"read_file"})...)
}
events = append(events, Flush(&state, []string{"read_file"})...)
var textContent strings.Builder
toolCalls := 0
for _, evt := range events {
textContent.WriteString(evt.Content)
toolCalls += len(evt.ToolCalls)
}
if toolCalls != 0 {
t.Fatalf("expected split inline markdown tool example to stay text, got %d tool calls", toolCalls)
}
if !strings.Contains(textContent.String(), "<tool_calls>") || !strings.Contains(textContent.String(), "完毕") {
t.Fatalf("expected inline example text preserved, got %q", textContent.String())
}
}
func TestProcessToolSieveUnclosedInlineMarkdownBeforeToolDoesTrigger(t *testing.T) {
var state State
input := "note with stray ` before real call " +
"<tool_calls><invoke name=\"read_file\"><parameter name=\"path\">real.md</parameter></invoke></tool_calls>"
var events []Event
events = append(events, ProcessChunk(&state, input, []string{"read_file"})...)
events = append(events, Flush(&state, []string{"read_file"})...)
var textContent strings.Builder
var calls []string
for _, evt := range events {
textContent.WriteString(evt.Content)
for _, call := range evt.ToolCalls {
if path, _ := call.Input["path"].(string); path != "" {
calls = append(calls, path)
}
}
}
if len(calls) != 1 || calls[0] != "real.md" {
t.Fatalf("expected real tool call after stray backtick, got %#v from events %#v", calls, events)
}
if !strings.Contains(textContent.String(), "stray ` before real call") {
t.Fatalf("expected stray-backtick prefix preserved, got %q", textContent.String())
}
}
func TestProcessToolSieveUnclosedInlineMarkdownBeforeSplitToolDoesTriggerOnFlush(t *testing.T) {
var state State
chunks := []string{
"note with stray ` before real call ",
"<tool_calls><invoke name=\"read_file\"><parameter name=\"path\">real.md</parameter></invoke></tool_calls>",
}
var events []Event
for _, c := range chunks {
events = append(events, ProcessChunk(&state, c, []string{"read_file"})...)
}
events = append(events, Flush(&state, []string{"read_file"})...)
var calls []string
for _, evt := range events {
for _, call := range evt.ToolCalls {
if path, _ := call.Input["path"].(string); path != "" {
calls = append(calls, path)
}
}
}
if len(calls) != 1 || calls[0] != "real.md" {
t.Fatalf("expected split real tool call after stray backtick, got %#v from events %#v", calls, events)
}
}

View File

@@ -57,10 +57,17 @@ func ProcessChunk(state *State, chunk string, toolNames []string) []Event {
break
}
start := findToolSegmentStart(state, pending)
if start == holdToolSegmentStart {
break
}
if start >= 0 {
prefix := pending[:start]
if prefix != "" {
resetMarkdownSpan := shouldResetUnclosedMarkdownPrefix(state, prefix, pending[start:])
state.noteText(prefix)
if resetMarkdownSpan {
state.markdownCodeSpanTicks = 0
}
events = append(events, Event{Content: prefix})
}
state.pending.Reset()
@@ -88,6 +95,13 @@ func Flush(state *State, toolNames []string) []Event {
return nil
}
events := ProcessChunk(state, "", toolNames)
if state.pending.Len() > 0 && state.markdownCodeSpanTicks > 0 {
// At end of stream, an unmatched backtick is literal Markdown text.
// Re-scan pending content so a real tool call after that stray
// backtick is not permanently hidden by inline-code state.
state.markdownCodeSpanTicks = 0
events = append(events, ProcessChunk(state, "", toolNames)...)
}
if len(state.pendingToolCalls) > 0 {
events = append(events, Event{ToolCalls: state.pendingToolCalls})
state.pendingToolRaw = ""
@@ -158,6 +172,15 @@ func splitSafeContentForToolDetection(state *State, s string) (safe, hold string
if insideCodeFenceWithState(state, s[:xmlIdx]) {
return s, ""
}
markdown := markdownCodeSpanStateAt(state, s[:xmlIdx])
if markdown.ticks > 0 {
if markdownCodeSpanCloses(s[xmlIdx:], markdown.ticks) {
return s, ""
}
if markdown.fromPrior {
return "", s
}
}
if xmlIdx > 0 {
return s[:xmlIdx], s[xmlIdx:]
}
@@ -166,6 +189,8 @@ func splitSafeContentForToolDetection(state *State, s string) (safe, hold string
return s, ""
}
const holdToolSegmentStart = -2
func findToolSegmentStart(state *State, s string) int {
if s == "" {
return -1
@@ -177,13 +202,86 @@ func findToolSegmentStart(state *State, s string) int {
return -1
}
start := includeDuplicateLeadingLessThan(s, tag.Start)
if !insideCodeFenceWithState(state, s[:start]) {
if insideCodeFenceWithState(state, s[:start]) {
offset = tag.End + 1
continue
}
markdown := markdownCodeSpanStateAt(state, s[:start])
if markdown.ticks == 0 {
return start
}
offset = tag.End + 1
if markdownCodeSpanCloses(s[start:], markdown.ticks) {
offset = tag.End + 1
continue
}
if markdown.fromPrior {
return holdToolSegmentStart
}
return start
}
}
type markdownCodeSpanScan struct {
ticks int
fromPrior bool
}
func markdownCodeSpanStateAt(state *State, text string) markdownCodeSpanScan {
ticks := 0
fromPrior := false
if state != nil && state.markdownCodeSpanTicks > 0 {
ticks = state.markdownCodeSpanTicks
fromPrior = true
}
for i := 0; i < len(text); {
if text[i] != '`' {
i++
continue
}
run := countBacktickRun(text, i)
if ticks == 0 {
if run >= 3 && atMarkdownFenceLineStart(text, i) {
i += run
continue
}
if state != nil && insideCodeFenceWithState(state, text[:i]) {
i += run
continue
}
ticks = run
fromPrior = false
} else if run == ticks {
ticks = 0
fromPrior = false
}
i += run
}
return markdownCodeSpanScan{ticks: ticks, fromPrior: fromPrior}
}
func markdownCodeSpanCloses(text string, ticks int) bool {
if ticks <= 0 {
return false
}
for i := 0; i < len(text); {
if text[i] != '`' {
i++
continue
}
run := countBacktickRun(text, i)
if run == ticks {
return true
}
i += run
}
return false
}
func shouldResetUnclosedMarkdownPrefix(state *State, prefix, suffix string) bool {
markdown := markdownCodeSpanStateAt(state, prefix)
return markdown.ticks > 0 && !markdown.fromPrior && !markdownCodeSpanCloses(suffix, markdown.ticks)
}
func includeDuplicateLeadingLessThan(s string, idx int) int {
for idx > 0 && s[idx-1] == '<' {
idx--

View File

@@ -13,6 +13,7 @@ type State struct {
codeFencePendingTicks int
codeFencePendingTildes int
codeFenceNotLineStart bool // inverted: zero-value false means "at line start"
markdownCodeSpanTicks int
pendingToolRaw string
pendingToolCalls []toolcall.ParsedToolCall
disableDeltas bool
@@ -50,6 +51,7 @@ func (s *State) noteText(content string) {
if !hasMeaningfulText(content) {
return
}
updateMarkdownCodeSpanState(s, content)
updateCodeFenceState(s, content)
}
@@ -78,6 +80,61 @@ func insideCodeFence(text string) bool {
return len(simulateCodeFenceState(nil, 0, 0, true, text).stack) > 0
}
func updateMarkdownCodeSpanState(state *State, text string) {
if state == nil || !hasMeaningfulText(text) {
return
}
state.markdownCodeSpanTicks = simulateMarkdownCodeSpanTicks(state, state.markdownCodeSpanTicks, text)
}
func simulateMarkdownCodeSpanTicks(state *State, initialTicks int, text string) int {
ticks := initialTicks
for i := 0; i < len(text); {
if text[i] != '`' {
i++
continue
}
run := countBacktickRun(text, i)
if ticks == 0 {
if run >= 3 && atMarkdownFenceLineStart(text, i) {
i += run
continue
}
if state != nil && insideCodeFenceWithState(state, text[:i]) {
i += run
continue
}
ticks = run
} else if run == ticks {
ticks = 0
}
i += run
}
return ticks
}
func countBacktickRun(text string, start int) int {
count := 0
for start+count < len(text) && text[start+count] == '`' {
count++
}
return count
}
func atMarkdownFenceLineStart(text string, idx int) bool {
for i := idx - 1; i >= 0; i-- {
switch text[i] {
case ' ', '\t':
continue
case '\n', '\r':
return true
default:
return false
}
}
return true
}
func updateCodeFenceState(state *State, text string) {
if state == nil || !hasMeaningfulText(text) {
return

View File

@@ -95,11 +95,12 @@ func setStaticContentType(w http.ResponseWriter, fullPath string) {
}
func (h *Handler) serveFromDisk(w http.ResponseWriter, r *http.Request, staticDir string) {
root := filepath.Clean(staticDir)
path := strings.TrimPrefix(r.URL.Path, "/admin")
path = strings.TrimPrefix(path, "/")
if path != "" && strings.Contains(path, ".") {
full := filepath.Join(staticDir, filepath.Clean(path))
if !strings.HasPrefix(full, staticDir) {
full := filepath.Join(root, filepath.Clean(path))
if !isPathInsideRoot(full, root) {
http.NotFound(w, r)
return
}
@@ -116,7 +117,7 @@ func (h *Handler) serveFromDisk(w http.ResponseWriter, r *http.Request, staticDi
http.NotFound(w, r)
return
}
index := filepath.Join(staticDir, "index.html")
index := filepath.Join(root, "index.html")
if _, err := os.Stat(index); err != nil {
http.Error(w, "index.html not found", http.StatusNotFound)
return
@@ -126,6 +127,20 @@ func (h *Handler) serveFromDisk(w http.ResponseWriter, r *http.Request, staticDi
http.ServeFile(w, r, index)
}
func isPathInsideRoot(path, root string) bool {
cleanPath := filepath.Clean(path)
cleanRoot := filepath.Clean(root)
if cleanPath == cleanRoot {
return true
}
volume := filepath.VolumeName(cleanRoot)
rootWithoutVolume := cleanRoot[len(volume):]
if rootWithoutVolume == string(os.PathSeparator) {
return strings.HasPrefix(cleanPath, cleanRoot)
}
return strings.HasPrefix(cleanPath, cleanRoot+string(os.PathSeparator))
}
func resolveStaticAdminDir(preferred string) string {
if strings.TrimSpace(os.Getenv("DS2API_STATIC_ADMIN_DIR")) != "" {
return filepath.Clean(preferred)

View File

@@ -78,6 +78,52 @@ func TestServeFromDiskPinsContentType(t *testing.T) {
}
}
func TestServeFromDiskRejectsSiblingDirectoryWithSharedPrefix(t *testing.T) {
parent := t.TempDir()
staticDir := filepath.Join(parent, "admin")
siblingDir := filepath.Join(parent, "admin-leak")
if err := os.MkdirAll(staticDir, 0o755); err != nil {
t.Fatalf("mkdir static dir: %v", err)
}
if err := os.MkdirAll(siblingDir, 0o755); err != nil {
t.Fatalf("mkdir sibling dir: %v", err)
}
if err := os.WriteFile(filepath.Join(siblingDir, "secret.txt"), []byte("secret"), 0o644); err != nil {
t.Fatalf("write sibling secret: %v", err)
}
h := &Handler{StaticDir: staticDir}
req := httptest.NewRequest(http.MethodGet, "/admin/../admin-leak/secret.txt", nil)
rec := httptest.NewRecorder()
h.serveFromDisk(rec, req, staticDir)
if rec.Code != http.StatusNotFound {
t.Fatalf("status = %d, want 404", rec.Code)
}
if body := rec.Body.String(); strings.Contains(body, "secret") {
t.Fatal("served content from sibling directory")
}
}
func TestIsPathInsideRootAllowsFilesystemRootChildren(t *testing.T) {
root := filepath.VolumeName(os.TempDir()) + string(os.PathSeparator)
child := filepath.Join(root, "assets", "index.css")
if !isPathInsideRoot(child, root) {
t.Fatalf("expected filesystem-root child %q inside %q", child, root)
}
}
func TestIsPathInsideRootRejectsSharedPrefixSibling(t *testing.T) {
parent := t.TempDir()
root := filepath.Join(parent, "admin")
sibling := filepath.Join(parent, "admin-leak", "secret.txt")
if isPathInsideRoot(sibling, root) {
t.Fatalf("expected shared-prefix sibling %q outside %q", sibling, root)
}
}
// TestSetStaticContentTypeUnknownExtensionFallsThrough verifies that unknown
// extensions leave the Content-Type header unset, so http.ServeFile can apply
// its own detection (sniffing or mime.TypeByExtension) for cases the pinned

View File

@@ -568,6 +568,19 @@ test('parseToolCalls skips prose mention of same wrapper variant', () => {
assert.equal(calls[0].input.command, 'git status');
});
test('parseToolCalls ignores inline markdown tool example', () => {
const payload = '示例:`<tool_calls><invoke name="read_file"><parameter name="path">README.md</parameter></invoke></tool_calls>`';
const calls = parseToolCalls(payload, ['read_file']);
assert.equal(calls.length, 0);
});
test('parseToolCalls preserves backticks inside tool parameters', () => {
const payload = '<tool_calls><invoke name="Bash"><parameter name="command">echo `date`</parameter></invoke></tool_calls>';
const calls = parseToolCalls(payload, ['Bash']);
assert.equal(calls.length, 1);
assert.equal(calls[0].input.command, 'echo `date`');
});
test('sieve emits tool_calls after prose mentions same wrapper variant', () => {
const events = runSieve([
'Summary: support canonical <tool_calls> and DSML <|DSML|tool_calls> wrappers.\n\n',
@@ -584,6 +597,74 @@ test('sieve emits tool_calls after prose mentions same wrapper variant', () => {
assert.equal(collectText(events).includes('Summary:'), true);
});
test('sieve ignores markdown documentation examples', () => {
const events = runSieve([
'解析器支持多种工具调用格式。\n\n',
'入口函数 `ParseToolCalls(text, availableToolNames)` 会返回调用列表。\n\n',
'核心流程会解析 XML 格式的 `<tool_calls>` / `<invoke>` 标记。\n\n',
'### 标准 XML 结构\n',
'```xml\n',
'<tool_calls>\n',
' <invoke name="read_file">\n',
' <parameter name="path">config.json</parameter>\n',
' </invoke>\n',
'</tool_calls>\n',
'```\n\n',
'DSML 风格形如 `<invoke name="tool">...</invoke>`,也可能提到 `<tool_calls>` 包裹。\n',
], ['read_file']);
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
const text = collectText(events);
assert.equal(finalCalls.length, 0);
assert.equal(text.includes('标准 XML 结构'), true);
assert.equal(text.includes('DSML 风格'), true);
});
test('sieve ignores inline markdown tool example split across chunks', () => {
const events = runSieve([
'示例:`',
'<tool_calls><invoke name="read_file"><parameter name="path">README.md</parameter></invoke></tool_calls>',
'` 完毕。',
], ['read_file']);
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
const text = collectText(events);
assert.equal(finalCalls.length, 0);
assert.equal(text.includes('<tool_calls>'), true);
assert.equal(text.includes('完毕'), true);
});
test('sieve emits real tool after unclosed inline markdown in same chunk', () => {
const events = runSieve([
'note with stray ` before real call <tool_calls><invoke name="read_file"><parameter name="path">real.md</parameter></invoke></tool_calls>',
], ['read_file']);
const text = collectText(events);
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
assert.equal(finalCalls.length, 1);
assert.equal(finalCalls[0].input.path, 'real.md');
assert.equal(text.includes('stray ` before real call'), true);
});
test('sieve emits real tool after unclosed inline markdown across chunks', () => {
const events = runSieve([
'note with stray ` before real call ',
'<tool_calls><invoke name="read_file"><parameter name="path">real.md</parameter></invoke></tool_calls>',
], ['read_file']);
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
assert.equal(finalCalls.length, 1);
assert.equal(finalCalls[0].input.path, 'real.md');
});
test('sieve emits real tool after split inline markdown tool example closes', () => {
const events = runSieve([
'示例:`',
'<tool_calls><invoke name="read_file"><parameter name="path">README.md</parameter></invoke></tool_calls>',
'` ',
'<tool_calls><invoke name="read_file"><parameter name="path">real.md</parameter></invoke></tool_calls>',
], ['read_file']);
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
assert.equal(finalCalls.length, 1);
assert.equal(finalCalls[0].input.path, 'real.md');
});
test('sieve emits tool_calls for DSML space-separator typo', () => {
const events = runSieve([
'准备读取文件。\n',