diff --git a/API.md b/API.md
index 65d9cb6..a045b6c 100644
--- a/API.md
+++ b/API.md
@@ -37,7 +37,7 @@
- OpenAI / Claude / Gemini 三套协议已统一挂在同一 `chi` 路由树上,由 `internal/server/router.go` 负责装配。
- 适配器层职责收敛为:**请求归一化 → DeepSeek 调用 → 协议形态渲染**,减少历史版本中“同能力多处实现”的分叉。
-- Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致:推荐模型输出 DSML 外壳 `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`;兼容层也接受 DSML wrapper 别名 ``、`<|tool_calls>`、`<|tool_calls>`、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>`),以及旧式 canonical XML `` → `` → ``,内部仍以 XML 解析语义为准,并在流式场景执行防泄漏筛分。
+- Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致:推荐模型输出 DSML 外壳 `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`;兼容层也接受 DSML wrapper 别名 ``、`<|tool_calls>`、`<|tool_calls>`、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>`)、`DSML` 与工具标签名黏连的常见 typo(如 ``),以及旧式 canonical XML `` → `` → ``。实现上采用窄容错结构扫描:只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 会进入工具路径,裸 `` 不计为已支持语法;流式场景继续执行防泄漏筛分。若参数体本身是合法 JSON 字面量(如 `123`、`true`、`null`、数组或对象),会按结构化值输出,不再一律当作字符串;若 CDATA 偶发漏闭合,则会在最终 parse / flush 恢复阶段做窄修复,尽量保住已完整包裹的外层工具调用。
- `Admin API` 将配置与运行时策略分开:`/admin/config*` 管静态配置,`/admin/settings*` 管运行时行为。
---
@@ -344,7 +344,7 @@ data: [DONE]
补充说明:
- **非代码块上下文**下,工具负载即使与普通文本混合,也会按特征识别并产出可执行 tool call(前后普通文本仍可透传)。
-- 解析器当前把 DSML 外壳(`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`)、DSML wrapper 别名(``、`<|tool_calls>`、`<|tool_calls>`)、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`)和旧式 canonical XML 工具块(`` / `` / ``)作为可执行调用解析;DSML 会先归一化回 XML,内部仍以 XML 解析语义为准。旧式 ``、``、``、``、``、`tool_use`、antml 风格与纯 JSON `tool_calls` 片段默认都会按普通文本处理。
+- 解析器当前把 DSML 外壳(`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`)、DSML wrapper 别名(``、`<|tool_calls>`、`<|tool_calls>`)、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`)、`DSML` 与工具标签名黏连的常见 typo(如 `` / `` / ``)和旧式 canonical XML 工具块(`` / `` / ``)作为可执行调用解析;DSML 会先归一化回 XML,内部仍以 XML 解析语义为准。旧式 ``、``、``、``、``、`tool_use`、antml 风格与纯 JSON `tool_calls` 片段默认都会按普通文本处理。
- 当最终可见正文为空但思维链里包含可执行工具调用时,Chat / Responses 会在收尾阶段补发标准 OpenAI `tool_calls` / `function_call` 输出;如果客户端未开启 thinking / reasoning,该思维链只用于检测,不会作为可见正文或 `reasoning_content` 暴露。
- Markdown fenced code block(例如 ```json ... ```)中的 `tool_calls` 仅视为示例文本,不会被执行。
diff --git a/docs/toolcall-semantics.md b/docs/toolcall-semantics.md
index 3466ce0..5529a4b 100644
--- a/docs/toolcall-semantics.md
+++ b/docs/toolcall-semantics.md
@@ -39,8 +39,9 @@
兼容修复:
- 如果模型漏掉 opening wrapper,但后面仍输出了一个或多个 invoke 并以 closing wrapper 收尾,Go 解析链路会在解析前补回缺失的 opening wrapper。
-- 如果模型把 DSML 标签里的分隔符 `|` 写漏成空格(例如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`,或无 leading pipe 的 `` 形态),Go / Node 会在固定工具标签名范围内归一化;相似但非工具标签名(如 `tool_calls_extra`)仍按普通文本处理。
+- 如果模型把 DSML 标签里的分隔符 `|` 写漏成空格(例如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`,或无 leading pipe 的 `` 形态),或把 `DSML` 与工具标签名直接黏连(例如 `` / `` / ``),Go / Node 会在固定工具标签名范围内归一化;相似但非工具标签名(如 `tool_calls_extra`)仍按普通文本处理。
- 这是一个针对常见模型失误的窄修复,不改变推荐输出格式;prompt 仍要求模型直接输出完整 DSML 外壳。
+- 裸 `` / `` 不会被当成“已支持的工具语法”;只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 才会进入工具调用路径。
## 2) 非兼容内容
@@ -52,20 +53,23 @@
在流式链路中(Go / Node 一致):
-- DSML `<|DSML|tool_calls>` wrapper、兼容变体(``、`<|tool_calls>`、`<|tool_calls>`)、窄容错空格分隔形态(如 `<|DSML tool_calls>`)和 canonical `` wrapper 都会进入结构化捕获
+- DSML `<|DSML|tool_calls>` wrapper、兼容变体(``、`<|tool_calls>`、`<|tool_calls>`)、窄容错空格分隔形态(如 `<|DSML tool_calls>`)、黏连形态(如 ``)和 canonical `` wrapper 都会进入结构化捕获
- 如果流里直接从 invoke 开始,但后面补上了 closing wrapper,Go 流式筛分也会按缺失 opening wrapper 的修复路径尝试恢复
- 已识别成功的工具调用不会再次回流到普通文本
- 不符合新格式的块不会执行,并继续按原样文本透传
- fenced code block(反引号 `` ``` `` 和波浪线 `~~~`)中的 XML 示例始终按普通文本处理
- 支持嵌套围栏(如 4 反引号嵌套 3 反引号)和 CDATA 内围栏保护
+- 如果模型把 `` 或 Markdown inline code 里的 `<|DSML|tool_calls>`)而后面紧跟真正工具调用时,sieve 会跳过不可解析的 mention 候选并继续匹配后续真实工具块,不会因 mention 导致工具调用丢失,也不会截断 mention 后的正文
+另外,`` 的值如果本身是合法 JSON 字面量,也会按结构化值解析,而不是一律保留为字符串。例如 `123`、`true`、`null`、`[1,2]`、`{"a":1}` 都会还原成对应的 number / boolean / null / array / object。
+
## 4) 输出结构
`ParseToolCallsDetailed` / `parseToolCallsDetailed` 返回:
- `calls`:解析出的工具调用列表(`name` + `input`)
-- `sawToolCallSyntax`:检测到 DSML / canonical wrapper,或命中“缺失 opening wrapper 但可修复”的形态时会为 `true`
+- `sawToolCallSyntax`:检测到 DSML / canonical wrapper,或命中“缺失 opening wrapper 但可修复”的形态时会为 `true`;裸 `invoke` 不计入该标记
- `rejectedByPolicy`:当前固定为 `false`
- `rejectedToolNames`:当前固定为空数组
@@ -88,7 +92,7 @@ node --test tests/node/stream-tool-sieve.test.js
- DSML `<|DSML|tool_calls>` wrapper 正常解析
- legacy canonical `` wrapper 正常解析
-- 别名变体(``、`<|tool_calls>`、`<|tool_calls>`)和 DSML 空格分隔 typo(如 `<|DSML tool_calls>`)正常解析
+- 别名变体(``、`<|tool_calls>`、`<|tool_calls>`)、DSML 空格分隔 typo(如 `<|DSML tool_calls>`)和黏连 typo(如 ``)正常解析
- 混搭标签(DSML wrapper + canonical inner)归一化后正常解析
- 波浪线围栏 `~~~` 内的示例不执行
- 嵌套围栏(4 反引号嵌套 3 反引号)内的示例不执行
diff --git a/internal/js/helpers/stream-tool-sieve/parse.js b/internal/js/helpers/stream-tool-sieve/parse.js
index d81661f..82f8f94 100644
--- a/internal/js/helpers/stream-tool-sieve/parse.js
+++ b/internal/js/helpers/stream-tool-sieve/parse.js
@@ -6,10 +6,10 @@ const {
const {
parseMarkupToolCalls,
stripFencedCodeBlocks,
+ containsToolCallWrapperSyntaxOutsideIgnored,
+ sanitizeLooseCDATA,
} = require('./parse_payload');
-const TOOL_MARKUP_PREFIXES = [' lower.includes(prefix));
+ const styles = containsToolCallWrapperSyntaxOutsideIgnored(text);
+ return styles.dsml || styles.canonical;
}
function shouldSkipToolCallParsingForCodeFenceExample(text) {
diff --git a/internal/js/helpers/stream-tool-sieve/parse_payload.js b/internal/js/helpers/stream-tool-sieve/parse_payload.js
index 40d3e08..185ed4d 100644
--- a/internal/js/helpers/stream-tool-sieve/parse_payload.js
+++ b/internal/js/helpers/stream-tool-sieve/parse_payload.js
@@ -3,6 +3,7 @@
const TOOL_CALL_MARKUP_KV_PATTERN = /<(?:[a-z0-9_:-]+:)?([a-z0-9_.-]+)\b[^>]*>([\s\S]*?)<\/(?:[a-z0-9_:-]+:)?\1>/gi;
const CDATA_PATTERN = /^$/i;
const XML_ATTR_PATTERN = /\b([a-z0-9_:-]+)\s*=\s*("([^"]*)"|'([^']*)')/gi;
+const TOOL_MARKUP_NAMES = ['tool_calls', 'invoke', 'parameter'];
const {
toStringSafe,
@@ -138,13 +139,10 @@ function normalizeDSMLToolCallMarkup(text) {
if (!raw) {
return { text: '', ok: true };
}
- const styles = toolMarkupStylesOutsideIgnored(raw);
+ const styles = containsToolMarkupSyntaxOutsideIgnored(raw);
if (!styles.dsml) {
return { text: raw, ok: true };
}
- // Always normalize DSML aliases to canonical form, even when canonical
- // tags coexist. Models frequently mix DSML wrapper tags with canonical
- // inner tags (e.g., <|tool_calls>).
return {
text: replaceDSMLToolMarkupOutsideIgnored(raw),
ok: true,
@@ -152,65 +150,21 @@ function normalizeDSMLToolCallMarkup(text) {
}
function containsDSMLToolMarkup(text) {
- return toolMarkupStylesOutsideIgnored(text).dsml;
+ return containsToolMarkupSyntaxOutsideIgnored(text).dsml;
}
function containsCanonicalToolMarkup(text) {
- return toolMarkupStylesOutsideIgnored(text).canonical;
+ return containsToolMarkupSyntaxOutsideIgnored(text).canonical;
}
-const DSML_TOOL_MARKUP_ALIASES = [
- { from: '<|dsml|tool_calls', to: '', to: '' },
- { from: '<|dsml|invoke', to: '', to: '' },
- { from: '<|dsml|parameter', to: '', to: '' },
- { from: '<|dsml tool_calls', to: '', to: '' },
- { from: '<|dsml invoke', to: '', to: '' },
- { from: '<|dsml parameter', to: '', to: '' },
- { from: '', to: '' },
- { from: '', to: '' },
- { from: '', to: '' },
- { from: '', to: '' },
- { from: '', to: '' },
- { from: '', to: '' },
- { from: '<|tool_calls', to: '', to: '' },
- { from: '<|invoke', to: '', to: '' },
- { from: '<|parameter', to: '', to: '' },
- { from: '<|tool_calls', to: '', to: '' },
- { from: '<|invoke', to: '', to: '' },
- { from: '<|parameter', to: '', to: '' },
-];
-
-const CANONICAL_TOOL_MARKUP_PREFIXES = [
- '',
- '',
- '',
-];
-
-function toolMarkupStylesOutsideIgnored(text) {
- const lower = toStringSafe(text).toLowerCase();
+function containsToolCallWrapperSyntaxOutsideIgnored(text) {
+ const raw = toStringSafe(text);
const styles = { dsml: false, canonical: false };
- for (let i = 0; i < lower.length;) {
+ if (!raw) {
+ return styles;
+ }
+ const lower = raw.toLowerCase();
+ for (let i = 0; i < raw.length;) {
const skipped = skipXmlIgnoredSection(lower, i);
if (skipped.blocked) {
return styles;
@@ -219,15 +173,55 @@ function toolMarkupStylesOutsideIgnored(text) {
i = skipped.next;
continue;
}
- if (CANONICAL_TOOL_MARKUP_PREFIXES.some(prefix => lower.startsWith(prefix, i))) {
- styles.canonical = true;
+ const tag = scanToolMarkupTagAt(raw, i);
+ if (tag) {
+ if (tag.name !== 'tool_calls') {
+ i = tag.end + 1;
+ continue;
+ }
+ if (tag.dsmlLike) {
+ styles.dsml = true;
+ } else {
+ styles.canonical = true;
+ }
+ if (styles.dsml && styles.canonical) {
+ return styles;
+ }
+ i = tag.end + 1;
+ continue;
}
- if (DSML_TOOL_MARKUP_ALIASES.some(alias => lower.startsWith(alias.from, i))) {
- styles.dsml = true;
- }
- if (styles.dsml && styles.canonical) {
+ i += 1;
+ }
+ return styles;
+}
+function containsToolMarkupSyntaxOutsideIgnored(text) {
+ const raw = toStringSafe(text);
+ const styles = { dsml: false, canonical: false };
+ if (!raw) {
+ return styles;
+ }
+ for (let i = 0; i < raw.length;) {
+ const skipped = skipXmlIgnoredSection(raw.toLowerCase(), i);
+ if (skipped.blocked) {
return styles;
}
+ if (skipped.advanced) {
+ i = skipped.next;
+ continue;
+ }
+ const tag = scanToolMarkupTagAt(raw, i);
+ if (tag) {
+ if (tag.dsmlLike) {
+ styles.dsml = true;
+ } else {
+ styles.canonical = true;
+ }
+ if (styles.dsml && styles.canonical) {
+ return styles;
+ }
+ i = tag.end + 1;
+ continue;
+ }
i += 1;
}
return styles;
@@ -235,6 +229,9 @@ function toolMarkupStylesOutsideIgnored(text) {
function replaceDSMLToolMarkupOutsideIgnored(text) {
const raw = toStringSafe(text);
+ if (!raw) {
+ return '';
+ }
const lower = raw.toLowerCase();
let out = '';
for (let i = 0; i < raw.length;) {
@@ -248,10 +245,14 @@ function replaceDSMLToolMarkupOutsideIgnored(text) {
i = skipped.next;
continue;
}
- const alias = DSML_TOOL_MARKUP_ALIASES.find(item => lower.startsWith(item.from, i));
- if (alias) {
- out += alias.to;
- i += alias.from.length;
+ const tag = scanToolMarkupTagAt(raw, i);
+ if (tag) {
+ if (tag.dsmlLike) {
+ out += `<${tag.closing ? '/' : ''}${tag.name}${raw.slice(tag.nameEnd, tag.end + 1)}`;
+ } else {
+ out += raw.slice(tag.start, tag.end + 1);
+ }
+ i = tag.end + 1;
continue;
}
out += raw[i];
@@ -417,6 +418,150 @@ function skipXmlIgnoredSection(lower, i) {
return { advanced: false, blocked: false, next: i };
}
+function scanToolMarkupTagAt(text, start) {
+ const raw = toStringSafe(text);
+ if (!raw || start < 0 || start >= raw.length || raw[start] !== '<') {
+ return null;
+ }
+ const lower = raw.toLowerCase();
+ let i = start + 1;
+ const closing = raw[i] === '/';
+ if (closing) {
+ i += 1;
+ }
+ let dsmlLike = false;
+ if (i < raw.length && isToolMarkupPipe(raw[i])) {
+ dsmlLike = true;
+ i += 1;
+ }
+ if (lower.startsWith('dsml', i)) {
+ dsmlLike = true;
+ i += 'dsml'.length;
+ while (i < raw.length && isToolMarkupSeparator(raw[i])) {
+ i += 1;
+ }
+ }
+ const { name, len } = matchToolMarkupName(lower, i);
+ if (!name) {
+ return null;
+ }
+ const nameEnd = i + len;
+ if (!hasXmlTagBoundary(raw, nameEnd)) {
+ return null;
+ }
+ const end = findXmlTagEnd(raw, nameEnd);
+ if (end < 0) {
+ return null;
+ }
+ return {
+ start,
+ end,
+ nameStart: i,
+ nameEnd,
+ name,
+ closing,
+ selfClosing: raw.slice(start, end + 1).trim().endsWith('/>'),
+ dsmlLike,
+ canonical: !dsmlLike,
+ };
+}
+
+function findToolMarkupTagOutsideIgnored(text, from) {
+ const raw = toStringSafe(text);
+ const lower = raw.toLowerCase();
+ for (let i = Math.max(0, from || 0); i < raw.length;) {
+ const skipped = skipXmlIgnoredSection(lower, i);
+ if (skipped.blocked) {
+ return null;
+ }
+ if (skipped.advanced) {
+ i = skipped.next;
+ continue;
+ }
+ const tag = scanToolMarkupTagAt(raw, i);
+ if (tag) {
+ return tag;
+ }
+ i += 1;
+ }
+ return null;
+}
+
+function findMatchingToolMarkupClose(text, openTag) {
+ const raw = toStringSafe(text);
+ if (!raw || !openTag || !openTag.name || openTag.closing) {
+ return null;
+ }
+ let depth = 1;
+ for (let pos = openTag.end + 1; pos < raw.length;) {
+ const tag = findToolMarkupTagOutsideIgnored(raw, pos);
+ if (!tag) {
+ return null;
+ }
+ if (tag.name !== openTag.name) {
+ pos = tag.end + 1;
+ continue;
+ }
+ if (tag.closing) {
+ depth -= 1;
+ if (depth === 0) {
+ return tag;
+ }
+ } else if (!tag.selfClosing) {
+ depth += 1;
+ }
+ pos = tag.end + 1;
+ }
+ return null;
+}
+
+function findPartialToolMarkupStart(text) {
+ const raw = toStringSafe(text);
+ const lastLT = raw.lastIndexOf('<');
+ if (lastLT < 0) {
+ return -1;
+ }
+ const tail = raw.slice(lastLT);
+ if (tail.includes('>')) {
+ return -1;
+ }
+ const lowerTail = tail.toLowerCase();
+ const candidates = [
+ '= 0) {
+ const end = endRel + closeMarker.length;
+ out += raw.slice(start, end);
+ pos = end;
+ continue;
+ }
+
+ changed = true;
+ out += raw.slice(contentStart);
+ pos = raw.length;
+ }
+
+ return changed ? out : raw;
+}
+
function parseTagAttributes(raw) {
const source = toStringSafe(raw);
const out = {};
@@ -631,4 +830,10 @@ module.exports = {
stripFencedCodeBlocks,
parseMarkupToolCalls,
normalizeDSMLToolCallMarkup,
+ containsToolMarkupSyntaxOutsideIgnored,
+ containsToolCallWrapperSyntaxOutsideIgnored,
+ findToolMarkupTagOutsideIgnored,
+ findMatchingToolMarkupClose,
+ findPartialToolMarkupStart,
+ sanitizeLooseCDATA,
};
diff --git a/internal/js/helpers/stream-tool-sieve/sieve-xml.js b/internal/js/helpers/stream-tool-sieve/sieve-xml.js
index ef69ac1..463e4db 100644
--- a/internal/js/helpers/stream-tool-sieve/sieve-xml.js
+++ b/internal/js/helpers/stream-tool-sieve/sieve-xml.js
@@ -1,71 +1,53 @@
'use strict';
const { parseToolCalls } = require('./parse');
-
-// XML wrapper tag pair used by the streaming sieve.
-const XML_TOOL_TAG_PAIRS = [
- { open: '<|dsml|tool_calls', close: '|dsml|tool_calls>' },
- { open: '<|dsml tool_calls', close: '|dsml tool_calls>' },
- { open: '' },
- { open: '' },
- { open: '<|tool_calls', close: '|tool_calls>' },
- { open: '<|tool_calls', close: '|tool_calls>' },
- { open: '' },
-];
-
-const XML_TOOL_OPENING_TAGS = [
- ...XML_TOOL_TAG_PAIRS.map(p => p.open),
- '<|dsml|invoke', '<|dsml invoke', ' 0) {
- const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
- if (!best || openIdx < best.start) {
- best = {
- start: openIdx,
- prefix: trimmedFence.prefix,
- calls: parsed,
- suffix: trimmedFence.suffix,
- };
- }
- break;
- }
- if (!rejected || openIdx < rejected.start) {
- rejected = {
- start: openIdx,
- prefix: prefixPart + xmlBlock,
- suffix: suffixPart,
+ // Scan every recognized wrapper occurrence. Prose can mention a wrapper tag
+ // before the actual tool block, including the same variant as the real block.
+ for (let searchFrom = 0; searchFrom < captured.length;) {
+ const openTag = findFirstToolTag(captured, searchFrom, 'tool_calls', false);
+ if (!openTag) {
+ break;
+ }
+ const closeTag = findMatchingToolMarkupClose(captured, openTag);
+ if (!closeTag) {
+ anyOpenFound = true;
+ searchFrom = openTag.end + 1;
+ continue;
+ }
+ const xmlBlock = captured.slice(openTag.start, closeTag.end + 1);
+ const prefixPart = captured.slice(0, openTag.start);
+ const suffixPart = captured.slice(closeTag.end + 1);
+ const parsed = parseToolCalls(xmlBlock, toolNames);
+ if (Array.isArray(parsed) && parsed.length > 0) {
+ const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
+ if (!best || openTag.start < best.start) {
+ best = {
+ start: openTag.start,
+ prefix: trimmedFence.prefix,
+ calls: parsed,
+ suffix: trimmedFence.suffix,
};
}
- searchFrom = openIdx + pair.open.length;
+ break;
}
+ if (!rejected || openTag.start < rejected.start) {
+ rejected = {
+ start: openTag.start,
+ prefix: prefixPart + xmlBlock,
+ suffix: suffixPart,
+ };
+ }
+ searchFrom = openTag.end + 1;
}
if (best) {
return { ready: true, prefix: best.prefix, calls: best.calls, suffix: best.suffix };
@@ -78,17 +60,15 @@ function consumeXMLToolCapture(captured, toolNames, trimWrappingJSONFence) {
// If this block failed to become a tool call, pass it through as text.
return { ready: true, prefix: rejected.prefix, calls: [], suffix: rejected.suffix };
}
- if (!containsAnyToolCallWrapper(lower)) {
- const found = firstInvokeIndex(lower);
- if (found.index >= 0) {
- const closeTag = found.dsml ? '|dsml|tool_calls>' : '';
- const openWrapper = found.dsml ? '<|DSML|tool_calls>' : '';
- const closeIdx = findXMLCloseOutsideCDATA(captured, closeTag, found.index);
- if (closeIdx > found.index) {
- const closeEnd = closeIdx + closeTag.length;
- const xmlBlock = openWrapper + captured.slice(found.index, closeIdx) + closeTag;
- let prefixPart = captured.slice(0, found.index);
- let suffixPart = captured.slice(closeEnd);
+ const invokeTag = findFirstToolTag(captured, 0, 'invoke', false);
+ if (invokeTag) {
+ const wrapperOpen = findFirstToolTag(captured, 0, 'tool_calls', false);
+ if (!wrapperOpen || wrapperOpen.start > invokeTag.start) {
+ const closeTag = findFirstToolTag(captured, invokeTag.start + 1, 'tool_calls', true);
+ if (closeTag && closeTag.start > invokeTag.start) {
+ const xmlBlock = '' + captured.slice(invokeTag.start, closeTag.end + 1);
+ const prefixPart = captured.slice(0, invokeTag.start);
+ const suffixPart = captured.slice(closeTag.end + 1);
const parsed = parseToolCalls(xmlBlock, toolNames);
if (Array.isArray(parsed) && parsed.length > 0) {
const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
@@ -99,194 +79,43 @@ function consumeXMLToolCapture(captured, toolNames, trimWrappingJSONFence) {
suffix: trimmedFence.suffix,
};
}
- return { ready: true, prefix: prefixPart + captured.slice(found.index, closeEnd), calls: [], suffix: suffixPart };
+ return { ready: true, prefix: prefixPart + captured.slice(invokeTag.start, closeTag.end + 1), calls: [], suffix: suffixPart };
}
}
}
return { ready: false, prefix: '', calls: [], suffix: '' };
}
-function findMatchingXMLToolWrapperClose(s, openTag, closeTag, openIdx) {
- const text = typeof s === 'string' ? s : '';
- const openTarget = String(openTag || '').toLowerCase();
- const closeTarget = String(closeTag || '').toLowerCase();
- if (!text || !openTarget || !closeTarget || openIdx < 0) {
- return -1;
- }
- const lower = text.toLowerCase();
- let depth = 1;
- for (let i = openIdx + openTarget.length; i < text.length;) {
- if (lower.startsWith('', i + ''.length;
- continue;
- }
- if (lower.startsWith('', i + ''.length;
- continue;
- }
- if (lower.startsWith(closeTarget, i)) {
- depth -= 1;
- if (depth === 0) {
- return i;
- }
- i += closeTarget.length;
- continue;
- }
- if (lower.startsWith(openTarget, i) && hasXMLToolTagBoundary(text, i + openTarget.length)) {
- depth += 1;
- i += openTarget.length;
- continue;
- }
- i += 1;
- }
- return -1;
-}
-
-function findXMLOpenOutsideCDATA(s, openTag, start) {
- const text = typeof s === 'string' ? s : '';
- const target = String(openTag || '').toLowerCase();
- if (!text || !target) {
- return -1;
- }
- const lower = text.toLowerCase();
- for (let i = Math.max(0, start || 0); i < text.length;) {
- if (lower.startsWith('', i + ''.length;
- continue;
- }
- if (lower.startsWith('', i + ''.length;
- continue;
- }
- if (lower.startsWith(target, i) && hasXMLToolTagBoundary(text, i + target.length)) {
- return i;
- }
- i += 1;
- }
- return -1;
-}
-
-function hasXMLToolTagBoundary(text, idx) {
- if (idx >= text.length) {
- return true;
- }
- return [' ', '\t', '\n', '\r', '>', '/'].includes(text[idx]);
-}
-
function hasOpenXMLToolTag(captured) {
- for (const pair of XML_TOOL_TAG_PAIRS) {
- const openIdx = findXMLOpenOutsideCDATA(captured, pair.open, 0);
- if (openIdx >= 0) {
- if (findMatchingXMLToolWrapperClose(captured, pair.open, pair.close, openIdx) < 0) {
- return true;
- }
+ for (let pos = 0; pos < captured.length;) {
+ const tag = findFirstToolTag(captured, pos, 'tool_calls', false);
+ if (!tag) {
+ return false;
}
+ if (!findMatchingToolMarkupClose(captured, tag)) {
+ return true;
+ }
+ pos = tag.end + 1;
}
return false;
}
-function containsAnyToolCallWrapper(lower) {
- return lower.includes('= 0 && (dsmlIdx < 0 || idx < dsmlIdx)) {
- dsmlIdx = idx;
+function findFirstToolTag(text, from, name, closing) {
+ for (let pos = Math.max(0, from || 0); pos < text.length;) {
+ const tag = findToolMarkupTagOutsideIgnored(text, pos);
+ if (!tag) {
+ return null;
}
- }
- if (xmlIdx < 0) {
- return { index: dsmlIdx, dsml: dsmlIdx >= 0 };
- }
- if (dsmlIdx < 0) {
- return { index: xmlIdx, dsml: false };
- }
- if (dsmlIdx < xmlIdx) {
- return { index: dsmlIdx, dsml: true };
- }
- return { index: xmlIdx, dsml: false };
-}
-
-function findPartialXMLToolTagStart(s) {
- const lastLT = s.lastIndexOf('<');
- if (lastLT < 0) {
- return -1;
- }
- const tail = s.slice(lastLT);
- if (tail.includes('>')) {
- return -1;
- }
- const lowerTail = tail.toLowerCase();
- for (const tag of XML_TOOL_OPENING_TAGS) {
- const tagWithLT = tag.startsWith('<') ? tag : '<' + tag;
- if (tagWithLT.startsWith(lowerTail)) {
- return lastLT;
+ if (tag.name === name && tag.closing === closing) {
+ return tag;
}
+ pos = tag.end + 1;
}
- return -1;
-}
-
-function findXMLCloseOutsideCDATA(s, closeTag, start) {
- const text = typeof s === 'string' ? s : '';
- const target = String(closeTag || '').toLowerCase();
- if (!text || !target) {
- return -1;
- }
- const lower = text.toLowerCase();
- for (let i = Math.max(0, start || 0); i < text.length;) {
- if (lower.startsWith('', i + ''.length;
- continue;
- }
- if (lower.startsWith('', i + ''.length;
- continue;
- }
- if (lower.startsWith(target, i)) {
- return i;
- }
- i += 1;
- }
- return -1;
+ return null;
}
module.exports = {
consumeXMLToolCapture,
hasOpenXMLToolTag,
- findPartialXMLToolTagStart,
+ findPartialXMLToolTagStart: findPartialToolMarkupStart,
};
diff --git a/internal/js/helpers/stream-tool-sieve/sieve.js b/internal/js/helpers/stream-tool-sieve/sieve.js
index 8a31888..a90a662 100644
--- a/internal/js/helpers/stream-tool-sieve/sieve.js
+++ b/internal/js/helpers/stream-tool-sieve/sieve.js
@@ -6,8 +6,9 @@ const {
} = require('./state');
const { trimWrappingJSONFence } = require('./jsonscan');
const {
- XML_TOOL_SEGMENT_TAGS,
-} = require('./tool-keywords');
+ findToolMarkupTagOutsideIgnored,
+ sanitizeLooseCDATA,
+} = require('./parse_payload');
const {
consumeXMLToolCapture: consumeXMLToolCaptureImpl,
hasOpenXMLToolTag,
@@ -117,8 +118,27 @@ function flushToolSieve(state, toolNames) {
}
} else if (state.capture) {
const content = state.capture;
- noteText(state, content);
- events.push({ type: 'text', text: content });
+ const recovered = sanitizeLooseCDATA(content);
+ if (recovered !== content) {
+ const recoveredResult = consumeXMLToolCaptureImpl(recovered, toolNames, trimWrappingJSONFence);
+ if (recoveredResult.ready && Array.isArray(recoveredResult.calls) && recoveredResult.calls.length > 0) {
+ if (recoveredResult.prefix) {
+ noteText(state, recoveredResult.prefix);
+ events.push({ type: 'text', text: recoveredResult.prefix });
+ }
+ events.push({ type: 'tool_calls', calls: recoveredResult.calls });
+ if (recoveredResult.suffix) {
+ noteText(state, recoveredResult.suffix);
+ events.push({ type: 'text', text: recoveredResult.suffix });
+ }
+ } else {
+ noteText(state, content);
+ events.push({ type: 'text', text: content });
+ }
+ } else {
+ noteText(state, content);
+ events.push({ type: 'text', text: content });
+ }
}
state.capture = '';
state.capturing = false;
@@ -155,26 +175,16 @@ function findToolSegmentStart(state, s) {
if (!s) {
return -1;
}
- const lower = s.toLowerCase();
let offset = 0;
while (true) {
- // Only check XML tool tags.
- let bestIdx = -1;
- let matchedTag = '';
- for (const tag of XML_TOOL_SEGMENT_TAGS) {
- const idx = lower.indexOf(tag, offset);
- if (idx >= 0 && (bestIdx < 0 || idx < bestIdx)) {
- bestIdx = idx;
- matchedTag = tag;
- }
- }
- if (bestIdx < 0) {
+ const tag = findToolMarkupTagOutsideIgnored(s, offset);
+ if (!tag) {
return -1;
}
- if (!insideCodeFenceWithState(state, s.slice(0, bestIdx))) {
- return bestIdx;
+ if (!insideCodeFenceWithState(state, s.slice(0, tag.start))) {
+ return tag.start;
}
- offset = bestIdx + matchedTag.length;
+ offset = tag.end + 1;
}
}
diff --git a/internal/js/helpers/stream-tool-sieve/tool-keywords.js b/internal/js/helpers/stream-tool-sieve/tool-keywords.js
index 0aaaccb..382e5a2 100644
--- a/internal/js/helpers/stream-tool-sieve/tool-keywords.js
+++ b/internal/js/helpers/stream-tool-sieve/tool-keywords.js
@@ -3,10 +3,14 @@
const XML_TOOL_SEGMENT_TAGS = [
'<|dsml|tool_calls>', '<|dsml|tool_calls\n', '<|dsml|tool_calls ',
'<|dsml|invoke ', '<|dsml|invoke\n', '<|dsml|invoke\t', '<|dsml|invoke\r',
+ '<|dsmltool_calls>', '<|dsmltool_calls\n', '<|dsmltool_calls ',
+ '<|dsmlinvoke ', '<|dsmlinvoke\n', '<|dsmlinvoke\t', '<|dsmlinvoke\r',
'<|dsml tool_calls>', '<|dsml tool_calls\n', '<|dsml tool_calls ',
'<|dsml invoke ', '<|dsml invoke\n', '<|dsml invoke\t', '<|dsml invoke\r',
'', '', '', '', '<|tool_calls\n', '<|tool_calls ',
@@ -19,8 +23,10 @@ const XML_TOOL_SEGMENT_TAGS = [
const XML_TOOL_OPENING_TAGS = [
'<|dsml|tool_calls',
+ '<|dsmltool_calls',
'<|dsml tool_calls',
'',
+ '|dsmltool_calls>',
'|dsml tool_calls>',
'',
+ '',
'',
'|tool_calls>',
'|tool_calls>',
diff --git a/internal/toolcall/regression_test.go b/internal/toolcall/regression_test.go
index 7615fa3..fc88db0 100644
--- a/internal/toolcall/regression_test.go
+++ b/internal/toolcall/regression_test.go
@@ -12,9 +12,9 @@ func TestRegression_RobustXMLAndCDATA(t *testing.T) {
expected []ParsedToolCall
}{
{
- name: "Standard JSON parameters (Regression)",
+ name: "Standard JSON scalar parameters (Regression)",
text: `1`,
- expected: []ParsedToolCall{{Name: "foo", Input: map[string]any{"a": "1"}}},
+ expected: []ParsedToolCall{{Name: "foo", Input: map[string]any{"a": float64(1)}}},
},
{
name: "XML tags parameters (Regression)",
diff --git a/internal/toolcall/toolcalls_dsml.go b/internal/toolcall/toolcalls_dsml.go
index 4801a78..c93e04c 100644
--- a/internal/toolcall/toolcalls_dsml.go
+++ b/internal/toolcall/toolcalls_dsml.go
@@ -6,96 +6,17 @@ func normalizeDSMLToolCallMarkup(text string) (string, bool) {
if text == "" {
return "", true
}
- hasAliasLikeMarkup, _ := toolMarkupStylesOutsideIgnored(text)
+ hasAliasLikeMarkup, _ := ContainsToolMarkupSyntaxOutsideIgnored(text)
if !hasAliasLikeMarkup {
return text, true
}
- // Always normalize DSML aliases to canonical form, even when canonical
- // tags coexist. Models frequently mix DSML wrapper tags with canonical
- // inner tags (e.g., <|tool_calls>).
- return replaceDSMLToolMarkupOutsideIgnored(text), true
+ return rewriteDSMLToolMarkupOutsideIgnored(text), true
}
-var dsmlToolMarkupAliases = []struct {
- from string
- to string
-}{
- {"<|dsml|tool_calls", "", ""},
- {"<|dsml|invoke", "", ""},
- {"<|dsml|parameter", "", ""},
- {"<|dsml tool_calls", "", ""},
- {"<|dsml invoke", "", ""},
- {"<|dsml parameter", "", ""},
- {"", ""},
- {"", ""},
- {"", ""},
- {"", ""},
- {"", ""},
- {"", ""},
- {"<|tool_calls", "", ""},
- {"<|invoke", "", ""},
- {"<|parameter", "", ""},
- {"<|tool_calls", "", ""},
- {"<|invoke", "", ""},
- {"<|parameter", "", ""},
-}
-
-var canonicalToolMarkupPrefixes = []string{
- "",
- "",
- "",
-}
-
-func toolMarkupStylesOutsideIgnored(text string) (hasDSML, hasCanonical bool) {
- lower := strings.ToLower(text)
- for i := 0; i < len(text); {
- next, advanced, blocked := skipXMLIgnoredSection(lower, i)
- if blocked {
- return hasDSML, hasCanonical
- }
- if advanced {
- i = next
- continue
- }
- if hasPrefixAt(lower, i, canonicalToolMarkupPrefixes) {
- hasCanonical = true
- }
- for _, alias := range dsmlToolMarkupAliases {
- if strings.HasPrefix(lower[i:], alias.from) {
- hasDSML = true
- break
- }
- }
- if hasDSML && hasCanonical {
- return true, true
- }
- i++
+func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
+ if text == "" {
+ return ""
}
- return hasDSML, hasCanonical
-}
-
-func replaceDSMLToolMarkupOutsideIgnored(text string) string {
lower := strings.ToLower(text)
var b strings.Builder
b.Grow(len(text))
@@ -110,29 +31,24 @@ func replaceDSMLToolMarkupOutsideIgnored(text string) string {
i = next
continue
}
- replaced := false
- for _, alias := range dsmlToolMarkupAliases {
- if strings.HasPrefix(lower[i:], alias.from) {
- b.WriteString(alias.to)
- i += len(alias.from)
- replaced = true
- break
- }
- }
- if replaced {
+ tag, ok := scanToolMarkupTagAt(text, i)
+ if !ok {
+ b.WriteByte(text[i])
+ i++
continue
}
- b.WriteByte(text[i])
- i++
+ if tag.DSMLLike {
+ b.WriteByte('<')
+ if tag.Closing {
+ b.WriteByte('/')
+ }
+ b.WriteString(tag.Name)
+ b.WriteString(text[tag.NameEnd : tag.End+1])
+ i = tag.End + 1
+ continue
+ }
+ b.WriteString(text[tag.Start : tag.End+1])
+ i = tag.End + 1
}
return b.String()
}
-
-func hasPrefixAt(text string, idx int, prefixes []string) bool {
- for _, prefix := range prefixes {
- if strings.HasPrefix(text[idx:], prefix) {
- return true
- }
- }
- return false
-}
diff --git a/internal/toolcall/toolcalls_markup.go b/internal/toolcall/toolcalls_markup.go
index b01ba21..f9f2b4f 100644
--- a/internal/toolcall/toolcalls_markup.go
+++ b/internal/toolcall/toolcalls_markup.go
@@ -111,5 +111,72 @@ func extractStandaloneCDATA(inner string) (string, bool) {
if cdataMatches := cdataPattern.FindStringSubmatch(trimmed); len(cdataMatches) >= 2 {
return cdataMatches[1], true
}
+ if strings.HasPrefix(strings.ToLower(trimmed), ""
+
+ var b strings.Builder
+ b.Grow(len(text))
+ changed := false
+ pos := 0
+ for pos < len(text) {
+ startRel := strings.Index(lower[pos:], openMarker)
+ if startRel < 0 {
+ b.WriteString(text[pos:])
+ break
+ }
+ start := pos + startRel
+ contentStart := start + len(openMarker)
+ b.WriteString(text[pos:start])
+
+ if endRel := strings.Index(lower[contentStart:], closeMarker); endRel >= 0 {
+ end := contentStart + endRel + len(closeMarker)
+ b.WriteString(text[start:end])
+ pos = end
+ continue
+ }
+
+ changed = true
+ b.WriteString(text[contentStart:])
+ pos = len(text)
+ }
+
+ if !changed {
+ return text
+ }
+ return b.String()
+}
diff --git a/internal/toolcall/toolcalls_parse.go b/internal/toolcall/toolcalls_parse.go
index ff0b87a..f5f9d39 100644
--- a/internal/toolcall/toolcalls_parse.go
+++ b/internal/toolcall/toolcalls_parse.go
@@ -65,6 +65,12 @@ func parseToolCallsDetailedXMLOnly(text string) ToolCallParseResult {
return result
}
parsed := parseXMLToolCalls(normalized)
+ if len(parsed) == 0 && strings.Contains(strings.ToLower(normalized), " 0 {
- if len(parsed) == 1 {
- if rawValue, ok := parsed["_raw"].(string); ok {
- return rawValue
+ decoded := html.UnescapeString(extractRawTagValue(trimmed))
+ if strings.Contains(decoded, "<") && strings.Contains(decoded, ">") {
+ if parsed := parseStructuredToolCallInput(decoded); len(parsed) > 0 {
+ if len(parsed) == 1 {
+ if rawValue, ok := parsed["_raw"].(string); ok {
+ return rawValue
+ }
}
+ return parsed
}
+ }
+ if parsed, ok := parseJSONLiteralValue(decoded); ok {
return parsed
}
- return html.UnescapeString(extractRawTagValue(trimmed))
+ return decoded
}
diff --git a/internal/toolcall/toolcalls_scan.go b/internal/toolcall/toolcalls_scan.go
new file mode 100644
index 0000000..099f73b
--- /dev/null
+++ b/internal/toolcall/toolcalls_scan.go
@@ -0,0 +1,219 @@
+package toolcall
+
+import "strings"
+
+var toolMarkupNames = []string{"tool_calls", "invoke", "parameter"}
+
+type ToolMarkupTag struct {
+ Start int
+ End int
+ NameStart int
+ NameEnd int
+ Name string
+ Closing bool
+ SelfClosing bool
+ DSMLLike bool
+ Canonical bool
+}
+
+func ContainsToolMarkupSyntaxOutsideIgnored(text string) (hasDSML, hasCanonical bool) {
+ lower := strings.ToLower(text)
+ for i := 0; i < len(text); {
+ next, advanced, blocked := skipXMLIgnoredSection(lower, i)
+ if blocked {
+ return hasDSML, hasCanonical
+ }
+ if advanced {
+ i = next
+ continue
+ }
+ if tag, ok := scanToolMarkupTagAt(text, i); ok {
+ if tag.DSMLLike {
+ hasDSML = true
+ } else {
+ hasCanonical = true
+ }
+ if hasDSML && hasCanonical {
+ return true, true
+ }
+ i = tag.End + 1
+ continue
+ }
+ i++
+ }
+ return hasDSML, hasCanonical
+}
+
+func ContainsToolCallWrapperSyntaxOutsideIgnored(text string) (hasDSML, hasCanonical bool) {
+ lower := strings.ToLower(text)
+ for i := 0; i < len(text); {
+ next, advanced, blocked := skipXMLIgnoredSection(lower, i)
+ if blocked {
+ return hasDSML, hasCanonical
+ }
+ if advanced {
+ i = next
+ continue
+ }
+ if tag, ok := scanToolMarkupTagAt(text, i); ok {
+ if tag.Name != "tool_calls" {
+ i = tag.End + 1
+ continue
+ }
+ if tag.DSMLLike {
+ hasDSML = true
+ } else {
+ hasCanonical = true
+ }
+ if hasDSML && hasCanonical {
+ return true, true
+ }
+ i = tag.End + 1
+ continue
+ }
+ i++
+ }
+ return hasDSML, hasCanonical
+}
+
+func FindToolMarkupTagOutsideIgnored(text string, start int) (ToolMarkupTag, bool) {
+ lower := strings.ToLower(text)
+ for i := maxInt(start, 0); i < len(text); {
+ next, advanced, blocked := skipXMLIgnoredSection(lower, i)
+ if blocked {
+ return ToolMarkupTag{}, false
+ }
+ if advanced {
+ i = next
+ continue
+ }
+ if tag, ok := scanToolMarkupTagAt(text, i); ok {
+ return tag, true
+ }
+ i++
+ }
+ return ToolMarkupTag{}, false
+}
+
+func FindMatchingToolMarkupClose(text string, open ToolMarkupTag) (ToolMarkupTag, bool) {
+ if text == "" || open.Name == "" || open.Closing {
+ return ToolMarkupTag{}, false
+ }
+ depth := 1
+ for pos := open.End + 1; pos < len(text); {
+ tag, ok := FindToolMarkupTagOutsideIgnored(text, pos)
+ if !ok {
+ return ToolMarkupTag{}, false
+ }
+ if tag.Name != open.Name {
+ pos = tag.End + 1
+ continue
+ }
+ if tag.Closing {
+ depth--
+ if depth == 0 {
+ return tag, true
+ }
+ } else if !tag.SelfClosing {
+ depth++
+ }
+ pos = tag.End + 1
+ }
+ return ToolMarkupTag{}, false
+}
+
+func scanToolMarkupTagAt(text string, start int) (ToolMarkupTag, bool) {
+ if start < 0 || start >= len(text) || text[start] != '<' {
+ return ToolMarkupTag{}, false
+ }
+ lower := strings.ToLower(text)
+ i := start + 1
+ closing := false
+ if i < len(text) && text[i] == '/' {
+ closing = true
+ i++
+ }
+ dsmlLike := false
+ if next, ok := consumeToolMarkupPipe(text, i); ok {
+ dsmlLike = true
+ i = next
+ }
+ if strings.HasPrefix(lower[i:], "dsml") {
+ dsmlLike = true
+ i += len("dsml")
+ for next, ok := consumeToolMarkupSeparator(text, i); ok; next, ok = consumeToolMarkupSeparator(text, i) {
+ i = next
+ }
+ }
+ name, nameLen := matchToolMarkupName(lower, i)
+ if nameLen == 0 {
+ return ToolMarkupTag{}, false
+ }
+ nameEnd := i + nameLen
+ if !hasToolMarkupBoundary(text, nameEnd) {
+ return ToolMarkupTag{}, false
+ }
+ end := findXMLTagEnd(text, nameEnd)
+ if end < 0 {
+ return ToolMarkupTag{}, false
+ }
+ trimmed := strings.TrimSpace(text[start : end+1])
+ return ToolMarkupTag{
+ Start: start,
+ End: end,
+ NameStart: i,
+ NameEnd: nameEnd,
+ Name: name,
+ Closing: closing,
+ SelfClosing: strings.HasSuffix(trimmed, "/>"),
+ DSMLLike: dsmlLike,
+ Canonical: !dsmlLike,
+ }, true
+}
+
+func matchToolMarkupName(lower string, start int) (string, int) {
+ for _, name := range toolMarkupNames {
+ if strings.HasPrefix(lower[start:], name) {
+ return name, len(name)
+ }
+ }
+ return "", 0
+}
+
+func consumeToolMarkupPipe(text string, idx int) (int, bool) {
+ if idx >= len(text) {
+ return idx, false
+ }
+ if text[idx] == '|' {
+ return idx + 1, true
+ }
+ if strings.HasPrefix(text[idx:], "|") {
+ return idx + len("|"), true
+ }
+ return idx, false
+}
+
+func consumeToolMarkupSeparator(text string, idx int) (int, bool) {
+ if idx >= len(text) {
+ return idx, false
+ }
+ if text[idx] == ' ' || text[idx] == '\t' || text[idx] == '\r' || text[idx] == '\n' {
+ return idx + 1, true
+ }
+ if next, ok := consumeToolMarkupPipe(text, idx); ok {
+ return next, true
+ }
+ return idx, false
+}
+
+func hasToolMarkupBoundary(text string, idx int) bool {
+ if idx >= len(text) {
+ return true
+ }
+ switch text[idx] {
+ case ' ', '\t', '\n', '\r', '>', '/':
+ return true
+ default:
+ return false
+ }
+}
diff --git a/internal/toolcall/toolcalls_test.go b/internal/toolcall/toolcalls_test.go
index 091d9ec..b48f88c 100644
--- a/internal/toolcall/toolcalls_test.go
+++ b/internal/toolcall/toolcalls_test.go
@@ -53,6 +53,18 @@ func TestParseToolCallsSupportsDSMLShellWithCanonicalExampleInCDATA(t *testing.T
}
}
+func TestParseToolCallsTreatsUnclosedCDATAAsText(t *testing.T) {
+ text := ``
+ res := ParseToolCallsDetailed(text, []string{"Write"})
+ if len(res.Calls) != 1 {
+ t.Fatalf("expected unclosed CDATA to still parse via outer wrapper, got %#v", res.Calls)
+ }
+ got, _ := res.Calls[0].Input["content"].(string)
+ if got != "hello world" {
+ t.Fatalf("expected recovered CDATA payload, got %q", got)
+ }
+}
+
func TestParseToolCallsNormalizesMixedDSMLAndCanonicalToolTags(t *testing.T) {
// Models commonly mix DSML wrapper tags with canonical inner tags.
// These should be normalized and parsed, not rejected.
@@ -130,6 +142,23 @@ func TestParseToolCallsSupportsInvokeParameters(t *testing.T) {
}
}
+func TestParseToolCallsSupportsJSONScalarParameters(t *testing.T) {
+ text := `123true`
+ calls := ParseToolCalls(text, []string{"configure"})
+ if len(calls) != 1 {
+ t.Fatalf("expected 1 call, got %#v", calls)
+ }
+ if got, ok := calls[0].Input["count"].(float64); !ok || got != 123 {
+ t.Fatalf("expected numeric count, got %#v", calls[0].Input["count"])
+ }
+ if got, ok := calls[0].Input["max_tokens"].(float64); !ok || got != 256 {
+ t.Fatalf("expected numeric max_tokens, got %#v", calls[0].Input["max_tokens"])
+ }
+ if got, ok := calls[0].Input["enabled"].(bool); !ok || !got {
+ t.Fatalf("expected boolean enabled, got %#v", calls[0].Input["enabled"])
+ }
+}
+
func TestParseToolCallsPreservesRawMalformedParams(t *testing.T) {
text := `cd /root && git status`
calls := ParseToolCalls(text, []string{"execute_command"})
@@ -478,6 +507,49 @@ func TestParseToolCallsDoesNotAcceptDSMLSpaceLookalikeTagName(t *testing.T) {
}
}
+func TestParseToolCallsToleratesDSMLCollapsedTagNames(t *testing.T) {
+ todos := `[x] 检查 toolcalls_format.go 格式化逻辑
+[x] 检查 toolcalls_parse.go 解析逻辑
+[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go
+[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go
+[x] 检查 prompt/tool_calls.go 注入逻辑
+[x] 检查 toolstream 流式解析
+[x] 查看测试文件确认预期行为
+[x] 给出调查结论`
+ text := strings.Join([]string{
+ "[]",
+ "",
+ "",
+ "",
+ "",
+ "",
+ }, "\n")
+ calls := ParseToolCalls(text, []string{"update_todo_list"})
+ if len(calls) != 1 {
+ t.Fatalf("expected one call from collapsed DSML tags, got %#v", calls)
+ }
+ if calls[0].Name != "update_todo_list" {
+ t.Fatalf("expected update_todo_list call, got %#v", calls[0])
+ }
+ if got, _ := calls[0].Input["todos"].(string); got != todos {
+ t.Fatalf("expected todos to round-trip, got %q", got)
+ }
+}
+
+func TestParseToolCallsDoesNotAcceptDSMLCollapsedLookalikeTagName(t *testing.T) {
+ text := strings.Join([]string{
+ "",
+ "",
+ "x",
+ "",
+ "",
+ }, "\n")
+ calls := ParseToolCalls(text, []string{"update_todo_list"})
+ if len(calls) != 0 {
+ t.Fatalf("expected no calls from collapsed lookalike tag, got %#v", calls)
+ }
+}
+
func TestParseToolCallsSkipsProseMentionOfSameWrapperVariant(t *testing.T) {
text := strings.Join([]string{
"Summary: support canonical and DSML <|DSML|tool_calls> wrappers.",
diff --git a/internal/toolstream/complex_edge_test.go b/internal/toolstream/complex_edge_test.go
index c1c6488..759a80f 100644
--- a/internal/toolstream/complex_edge_test.go
+++ b/internal/toolstream/complex_edge_test.go
@@ -615,3 +615,68 @@ func TestSieve_DSMLSpaceLookalikeTagNameStaysText(t *testing.T) {
t.Fatalf("相似标签名应作为正文透传, got %q", text.String())
}
}
+
+func TestSieve_DSMLCollapsedTagNamesWithPrefixText(t *testing.T) {
+ var state State
+ todos := `[x] 检查 toolcalls_format.go 格式化逻辑
+[x] 检查 toolcalls_parse.go 解析逻辑
+[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go
+[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go
+[x] 检查 prompt/tool_calls.go 注入逻辑
+[x] 检查 toolstream 流式解析
+[x] 查看测试文件确认预期行为
+[x] 给出调查结论`
+ chunks := []string{
+ "[]\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "",
+ }
+ var events []Event
+ for _, c := range chunks {
+ events = append(events, ProcessChunk(&state, c, []string{"update_todo_list"})...)
+ }
+ events = append(events, Flush(&state, []string{"update_todo_list"})...)
+
+ var text strings.Builder
+ var gotTodos string
+ callCount := 0
+ for _, e := range events {
+ text.WriteString(e.Content)
+ for _, call := range e.ToolCalls {
+ callCount++
+ gotTodos, _ = call.Input["todos"].(string)
+ }
+ }
+ if callCount != 1 {
+ t.Fatalf("应解析出 1 个工具调用,got %d, text=%q", callCount, text.String())
+ }
+ if gotTodos != todos {
+ t.Fatalf("todos 应完整保留,got %q", gotTodos)
+ }
+ if text.String() != "[]\n" {
+ t.Fatalf("前置正文应完整保留且不泄漏工具块, got %q", text.String())
+ }
+}
+
+func TestSieve_DSMLCollapsedLookalikeTagNameStaysText(t *testing.T) {
+ var state State
+ input := "x"
+ events := ProcessChunk(&state, input, []string{"update_todo_list"})
+ events = append(events, Flush(&state, []string{"update_todo_list"})...)
+
+ var text strings.Builder
+ callCount := 0
+ for _, e := range events {
+ text.WriteString(e.Content)
+ callCount += len(e.ToolCalls)
+ }
+ if callCount != 0 {
+ t.Fatalf("相似 collapsed 标签名不应触发工具调用,got %d", callCount)
+ }
+ if text.String() != input {
+ t.Fatalf("相似 collapsed 标签名应作为正文透传, got %q", text.String())
+ }
+}
diff --git a/internal/toolstream/tool_sieve_core.go b/internal/toolstream/tool_sieve_core.go
index 3f77b8e..a228c13 100644
--- a/internal/toolstream/tool_sieve_core.go
+++ b/internal/toolstream/tool_sieve_core.go
@@ -114,10 +114,30 @@ func Flush(state *State, toolNames []string) []Event {
} else {
content := state.capture.String()
if content != "" {
- // If capture never resolved into a real tool call, release the
- // buffered text instead of swallowing it.
- state.noteText(content)
- events = append(events, Event{Content: content})
+ recovered := toolcall.SanitizeLooseCDATA(content)
+ if recovered != content {
+ if prefix, calls, suffix, recoveredReady := consumeXMLToolCapture(recovered, toolNames); recoveredReady && len(calls) > 0 {
+ if prefix != "" {
+ state.noteText(prefix)
+ events = append(events, Event{Content: prefix})
+ }
+ events = append(events, Event{ToolCalls: calls})
+ if suffix != "" {
+ state.noteText(suffix)
+ events = append(events, Event{Content: suffix})
+ }
+ } else {
+ // If capture never resolved into a real tool call, release
+ // the buffered text instead of swallowing it.
+ state.noteText(content)
+ events = append(events, Event{Content: content})
+ }
+ } else {
+ // If capture never resolved into a real tool call, release the
+ // buffered text instead of swallowing it.
+ state.noteText(content)
+ events = append(events, Event{Content: content})
+ }
}
}
state.capture.Reset()
diff --git a/internal/toolstream/tool_sieve_xml.go b/internal/toolstream/tool_sieve_xml.go
index 06fc469..9a6789e 100644
--- a/internal/toolstream/tool_sieve_xml.go
+++ b/internal/toolstream/tool_sieve_xml.go
@@ -7,7 +7,6 @@ import (
// consumeXMLToolCapture tries to extract complete XML tool call blocks from captured text.
func consumeXMLToolCapture(captured string, toolNames []string) (prefix string, calls []toolcall.ParsedToolCall, suffix string, ready bool) {
- lower := strings.ToLower(captured)
anyOpenFound := false
type candidate struct {
start int
@@ -23,41 +22,40 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
var best *candidate
var rejected *rejectedBlock
- // Scan every wrapper occurrence. Prose can mention a wrapper tag before the
- // actual tool block, including the same variant as the real block.
- for _, pair := range xmlToolCallTagPairs {
- searchFrom := 0
- for searchFrom < len(lower) {
- openIdx := findXMLOpenOutsideCDATA(captured, pair.open, searchFrom)
- if openIdx < 0 {
- break
- }
- // Find the matching closing tag outside CDATA. Long write-file tool
- // calls often contain XML examples in CDATA, including .
- closeIdx := findMatchingXMLToolWrapperClose(captured, pair.open, pair.close, openIdx)
- if closeIdx < 0 {
- anyOpenFound = true
- searchFrom = openIdx + len(pair.open)
- continue
- }
- closeEnd := closeIdx + len(pair.close)
-
- xmlBlock := captured[openIdx:closeEnd]
- prefixPart := captured[:openIdx]
- suffixPart := captured[closeEnd:]
- parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
- if len(parsed) > 0 {
- prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
- if best == nil || openIdx < best.start {
- best = &candidate{start: openIdx, prefix: prefixPart, calls: parsed, suffix: suffixPart}
- }
- break
- }
- if rejected == nil || openIdx < rejected.start {
- rejected = &rejectedBlock{start: openIdx, prefix: prefixPart + xmlBlock, suffix: suffixPart}
- }
- searchFrom = openIdx + len(pair.open)
+ // Scan every recognized tool tag occurrence. Prose can mention a wrapper
+ // tag before the actual tool block, including the same variant as the real
+ // block. We only accept complete tool_calls wrappers that parse cleanly.
+ for searchFrom := 0; searchFrom < len(captured); {
+ tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(captured, searchFrom)
+ if !ok {
+ break
}
+ if tag.Closing || tag.Name != "tool_calls" {
+ searchFrom = tag.End + 1
+ continue
+ }
+ closeTag, ok := toolcall.FindMatchingToolMarkupClose(captured, tag)
+ if !ok {
+ anyOpenFound = true
+ searchFrom = tag.End + 1
+ continue
+ }
+
+ xmlBlock := captured[tag.Start : closeTag.End+1]
+ prefixPart := captured[:tag.Start]
+ suffixPart := captured[closeTag.End+1:]
+ parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
+ if len(parsed) > 0 {
+ prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
+ if best == nil || tag.Start < best.start {
+ best = &candidate{start: tag.Start, prefix: prefixPart, calls: parsed, suffix: suffixPart}
+ }
+ break
+ }
+ if rejected == nil || tag.Start < rejected.start {
+ rejected = &rejectedBlock{start: tag.Start, prefix: prefixPart + xmlBlock, suffix: suffixPart}
+ }
+ searchFrom = tag.End + 1
}
if best != nil {
return best.prefix, best.calls, best.suffix, true
@@ -71,26 +69,19 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
// If this block failed to become a tool call, pass it through as text.
return rejected.prefix, nil, rejected.suffix, true
}
- if !containsAnyToolCallWrapper(lower) {
- invokeIdx, dsml := firstInvokeIndex(lower)
- closeTag := ""
- openWrapper := ""
- if dsml {
- closeTag = "|dsml|tool_calls>"
- openWrapper = "<|DSML|tool_calls>"
- }
- closeIdx := findXMLCloseOutsideCDATA(captured, closeTag, invokeIdx)
- if invokeIdx >= 0 && closeIdx > invokeIdx {
- closeEnd := closeIdx + len(closeTag)
- xmlBlock := openWrapper + captured[invokeIdx:closeIdx] + closeTag
- prefixPart := captured[:invokeIdx]
- suffixPart := captured[closeEnd:]
- parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
- if len(parsed) > 0 {
- prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
- return prefixPart, parsed, suffixPart, true
+ if invokeTag, ok := findFirstToolMarkupTagByName(captured, 0, "invoke"); ok {
+ if wrapperOpen, ok := findFirstToolMarkupTagByName(captured, 0, "tool_calls"); !ok || wrapperOpen.Start > invokeTag.Start {
+ if closeTag, ok := findFirstToolMarkupTagByNameFrom(captured, invokeTag.Start+1, "tool_calls", true); ok && closeTag.Start > invokeTag.Start {
+ xmlBlock := "" + captured[invokeTag.Start:closeTag.End+1]
+ prefixPart := captured[:invokeTag.Start]
+ suffixPart := captured[closeTag.End+1:]
+ parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
+ if len(parsed) > 0 {
+ prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
+ return prefixPart, parsed, suffixPart, true
+ }
+ return prefixPart + captured[invokeTag.Start:closeTag.End+1], nil, suffixPart, true
}
- return prefixPart + captured[invokeIdx:closeEnd], nil, suffixPart, true
}
}
return "", nil, "", false
@@ -99,46 +90,35 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
// hasOpenXMLToolTag returns true if captured text contains an XML tool opening tag
// whose SPECIFIC closing tag has not appeared yet.
func hasOpenXMLToolTag(captured string) bool {
- for _, pair := range xmlToolCallTagPairs {
- openIdx := findXMLOpenOutsideCDATA(captured, pair.open, 0)
- if openIdx >= 0 {
- if findMatchingXMLToolWrapperClose(captured, pair.open, pair.close, openIdx) < 0 {
- return true
- }
+ for searchFrom := 0; searchFrom < len(captured); {
+ tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(captured, searchFrom)
+ if !ok {
+ return false
}
+ if tag.Closing || tag.Name != "tool_calls" {
+ searchFrom = tag.End + 1
+ continue
+ }
+ if _, ok := toolcall.FindMatchingToolMarkupClose(captured, tag); !ok {
+ return true
+ }
+ searchFrom = tag.End + 1
}
return false
}
func shouldKeepBareInvokeCapture(captured string) bool {
- lower := strings.ToLower(captured)
- invokeIdx, dsml := firstInvokeIndex(lower)
- if invokeIdx < 0 || containsAnyToolCallWrapper(lower) {
+ invokeTag, ok := findFirstToolMarkupTagByName(captured, 0, "invoke")
+ if !ok {
return false
}
- invokeOpenLen := len(" invokeIdx {
+ if closeTag, ok := findFirstToolMarkupTagByNameFrom(captured, invokeTag.Start+1, "tool_calls", true); ok && closeTag.Start > invokeTag.Start {
return true
}
-
- startEnd := findXMLTagEnd(captured, invokeIdx+invokeOpenLen)
+ startEnd := invokeTag.End
if startEnd < 0 {
return true
}
@@ -148,84 +128,16 @@ func shouldKeepBareInvokeCapture(captured string) bool {
return true
}
- invokeCloseIdx := findAnyXMLCloseOutsideCDATA(captured, possibleInvokeCloseTags(dsml), startEnd+1)
- if invokeCloseIdx >= 0 {
- afterClose := captured[invokeCloseIdx:]
- for _, closeTag := range possibleInvokeCloseTags(dsml) {
- if strings.HasPrefix(strings.ToLower(afterClose), closeTag) {
- afterClose = afterClose[len(closeTag):]
- break
- }
- }
- return strings.TrimSpace(afterClose) == ""
+ if invokeCloseTag, ok := findFirstToolMarkupTagByNameFrom(captured, startEnd+1, "invoke", true); ok {
+ return strings.TrimSpace(captured[invokeCloseTag.End+1:]) == ""
}
trimmedLower := strings.ToLower(trimmedBody)
- return strings.HasPrefix(trimmedLower, parameterOpen) ||
+ return strings.HasPrefix(trimmedLower, ""}
- }
- return []string{"|dsml|tool_calls>", "|dsml tool_calls>", "", "", "|tool_calls>", "|tool_calls>"}
-}
-
-func possibleInvokeCloseTags(dsml bool) []string {
- if !dsml {
- return []string{""}
- }
- return []string{"|dsml|invoke>", "|dsml invoke>", "", "", "|invoke>", "|invoke>"}
-}
-
-func findAnyXMLCloseOutsideCDATA(s string, closeTags []string, start int) int {
- best := -1
- for _, closeTag := range closeTags {
- idx := findXMLCloseOutsideCDATA(s, closeTag, start)
- if idx >= 0 && (best < 0 || idx < best) {
- best = idx
- }
- }
- return best
-}
-
-func firstInvokeIndex(lower string) (int, bool) {
- xmlIdx := strings.Index(lower, "= 0 && (dsmlIdx < 0 || idx < dsmlIdx) {
- dsmlIdx = idx
- }
- }
- switch {
- case xmlIdx < 0:
- return dsmlIdx, dsmlIdx >= 0
- case dsmlIdx < 0:
- return xmlIdx, false
- case dsmlIdx < xmlIdx:
- return dsmlIdx, true
- default:
- return xmlIdx, false
- }
-}
-
-// findPartialXMLToolTagStart checks if the string ends with a partial canonical
-// XML wrapper tag (e.g., "")
- if end < 0 {
- return -1
- }
- i += len("")
- case strings.HasPrefix(lower[i:], "")
- if end < 0 {
- return -1
- }
- i += len("")
- case strings.HasPrefix(lower[i:], closeTarget):
- depth--
- if depth == 0 {
- return i
- }
- i += len(closeTarget)
- case strings.HasPrefix(lower[i:], openTarget) && hasXMLToolTagBoundary(s, i+len(openTarget)):
- depth++
- i += len(openTarget)
- default:
- i++
- }
- }
- return -1
+func findFirstToolMarkupTagByName(s string, start int, name string) (toolcall.ToolMarkupTag, bool) {
+ return findFirstToolMarkupTagByNameFrom(s, start, name, false)
}
-func findXMLOpenOutsideCDATA(s, openTag string, start int) int {
- if s == "" || openTag == "" {
- return -1
- }
- if start < 0 {
- start = 0
- }
- lower := strings.ToLower(s)
- target := strings.ToLower(openTag)
- for i := start; i < len(s); {
- switch {
- case strings.HasPrefix(lower[i:], "")
- if end < 0 {
- return -1
- }
- i += len("")
- case strings.HasPrefix(lower[i:], "")
- if end < 0 {
- return -1
- }
- i += len("")
- case strings.HasPrefix(lower[i:], target) && hasXMLToolTagBoundary(s, i+len(target)):
- return i
- default:
- i++
+func findFirstToolMarkupTagByNameFrom(s string, start int, name string, closing bool) (toolcall.ToolMarkupTag, bool) {
+ for pos := maxInt(start, 0); pos < len(s); {
+ tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(s, pos)
+ if !ok {
+ return toolcall.ToolMarkupTag{}, false
}
+ if tag.Name == name && tag.Closing == closing {
+ return tag, true
+ }
+ pos = tag.End + 1
}
- return -1
+ return toolcall.ToolMarkupTag{}, false
}
-func findXMLCloseOutsideCDATA(s, closeTag string, start int) int {
- if s == "" || closeTag == "" {
- return -1
+func maxInt(a, b int) int {
+ if a > b {
+ return a
}
- if start < 0 {
- start = 0
- }
- lower := strings.ToLower(s)
- target := strings.ToLower(closeTag)
- for i := start; i < len(s); {
- switch {
- case strings.HasPrefix(lower[i:], "")
- if end < 0 {
- return -1
- }
- i += len("")
- case strings.HasPrefix(lower[i:], "")
- if end < 0 {
- return -1
- }
- i += len("")
- case strings.HasPrefix(lower[i:], target):
- return i
- default:
- i++
- }
- }
- return -1
-}
-
-func hasXMLToolTagBoundary(text string, idx int) bool {
- if idx >= len(text) {
- return true
- }
- switch text[idx] {
- case ' ', '\t', '\n', '\r', '>', '/':
- return true
- default:
- return false
- }
-}
-
-func findXMLTagEnd(s string, start int) int {
- quote := byte(0)
- for i := start; i < len(s); i++ {
- ch := s[i]
- if quote != 0 {
- if ch == quote {
- quote = 0
- }
- continue
- }
- if ch == '"' || ch == '\'' {
- quote = ch
- continue
- }
- if ch == '>' {
- return i
- }
- }
- return -1
+ return b
}
diff --git a/internal/toolstream/tool_sieve_xml_tags.go b/internal/toolstream/tool_sieve_xml_tags.go
index 6a9a19c..d4179bd 100644
--- a/internal/toolstream/tool_sieve_xml_tags.go
+++ b/internal/toolstream/tool_sieve_xml_tags.go
@@ -5,28 +5,7 @@ import "regexp"
// --- XML tool call support for the streaming sieve ---
//nolint:unused // kept as explicit tag inventory for future XML sieve refinements.
-var xmlToolCallClosingTags = []string{"", "|dsml|tool_calls>", "|dsml tool_calls>", "", "", "|tool_calls>", "|tool_calls>"}
-var xmlToolCallOpeningTags = []string{
- ""},
- {"<|dsml tool_calls", "|dsml tool_calls>"},
- {""},
- {""},
- {"<|tool_calls", "|tool_calls>"},
- {"<|tool_calls", "|tool_calls>"},
- {""},
-}
+var xmlToolCallClosingTags = []string{"", "|dsml|tool_calls>", "|dsmltool_calls>", "|dsml tool_calls>", "", "", "", "|tool_calls>", "|tool_calls>"}
// xmlToolCallBlockPattern matches a complete canonical XML tool call block.
//
@@ -37,10 +16,14 @@ var xmlToolCallBlockPattern = regexp.MustCompile(`(?is)((?:", "<|dsml|tool_calls\n", "<|dsml|tool_calls ",
"<|dsml|invoke ", "<|dsml|invoke\n", "<|dsml|invoke\t", "<|dsml|invoke\r",
+ "<|dsmltool_calls>", "<|dsmltool_calls\n", "<|dsmltool_calls ",
+ "<|dsmlinvoke ", "<|dsmlinvoke\n", "<|dsmlinvoke\t", "<|dsmlinvoke\r",
"<|dsml tool_calls>", "<|dsml tool_calls\n", "<|dsml tool_calls ",
"<|dsml invoke ", "<|dsml invoke\n", "<|dsml invoke\t", "<|dsml invoke\r",
"", "", "", "", "<|tool_calls\n", "<|tool_calls ",
diff --git a/internal/toolstream/tool_sieve_xml_test.go b/internal/toolstream/tool_sieve_xml_test.go
index 0b9a7bb..efcf56d 100644
--- a/internal/toolstream/tool_sieve_xml_test.go
+++ b/internal/toolstream/tool_sieve_xml_test.go
@@ -174,6 +174,41 @@ func TestProcessToolSieveKeepsCDATAEmbeddedToolClosingBuffered(t *testing.T) {
}
}
+func TestProcessToolSieveFallsBackWhenCDATANeverCloses(t *testing.T) {
+ var state State
+ chunks := []string{
+ "\n \n \n \n",
+ }
+ var events []Event
+ for _, c := range chunks {
+ events = append(events, ProcessChunk(&state, c, []string{"Write"})...)
+ }
+ events = append(events, Flush(&state, []string{"Write"})...)
+
+ var textContent strings.Builder
+ toolCalls := 0
+ for _, evt := range events {
+ if evt.Content != "" {
+ textContent.WriteString(evt.Content)
+ }
+ toolCalls += len(evt.ToolCalls)
+ if len(evt.ToolCalls) > 0 {
+ if got, _ := evt.ToolCalls[0].Input["content"].(string); got != "hello world" {
+ t.Fatalf("expected recovered CDATA payload, got %q", got)
+ }
+ }
+ }
+
+ if toolCalls != 1 {
+ t.Fatalf("expected unclosed CDATA payload to still parse, got %d tool calls events=%#v", toolCalls, events)
+ }
+ if textContent.Len() != 0 {
+ t.Fatalf("expected no leaked text, got %q", textContent.String())
+ }
+}
+
func TestProcessToolSieveXMLWithLeadingText(t *testing.T) {
var state State
// Model outputs some prose then an XML tool call.
diff --git a/tests/node/stream-tool-sieve.test.js b/tests/node/stream-tool-sieve.test.js
index dabaae2..1938984 100644
--- a/tests/node/stream-tool-sieve.test.js
+++ b/tests/node/stream-tool-sieve.test.js
@@ -71,6 +71,30 @@ test('parseToolCalls ignores DSML space lookalike tag names', () => {
assert.equal(calls.length, 0);
});
+test('parseToolCalls tolerates collapsed DSML tag names', () => {
+ const todos = [
+ '[x] 检查 toolcalls_format.go 格式化逻辑',
+ '[x] 检查 toolcalls_parse.go 解析逻辑',
+ '[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go',
+ '[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go',
+ '[x] 检查 prompt/tool_calls.go 注入逻辑',
+ '[x] 检查 toolstream 流式解析',
+ '[x] 查看测试文件确认预期行为',
+ '[x] 给出调查结论',
+ ].join('\n');
+ const payload = ``;
+ const calls = parseToolCalls(payload, ['update_todo_list']);
+ assert.equal(calls.length, 1);
+ assert.equal(calls[0].name, 'update_todo_list');
+ assert.equal(calls[0].input.todos, todos);
+});
+
+test('parseToolCalls ignores collapsed DSML lookalike tag names', () => {
+ const payload = 'x';
+ const calls = parseToolCalls(payload, ['update_todo_list']);
+ assert.equal(calls.length, 0);
+});
+
test('parseToolCalls keeps canonical XML examples inside DSML CDATA', () => {
const content = 'x';
const payload = `<|DSML|tool_calls><|DSML|invoke name="write_file"><|DSML|parameter name="path">notes.md|DSML|parameter><|DSML|parameter name="content">|DSML|parameter>|DSML|invoke>|DSML|tool_calls>`;
@@ -80,6 +104,24 @@ test('parseToolCalls keeps canonical XML examples inside DSML CDATA', () => {
assert.deepEqual(calls[0].input, { path: 'notes.md', content });
});
+test('parseToolCalls recovers when CDATA never closes inside a valid wrapper', () => {
+ const payload = '';
+ const calls = parseToolCalls(payload, ['Write']);
+ assert.equal(calls.length, 1);
+ assert.equal(calls[0].name, 'Write');
+ assert.equal(calls[0].input.content, 'hello world');
+});
+
+test('parseToolCalls supports JSON scalar parameters', () => {
+ const payload = '123true';
+ const calls = parseToolCalls(payload, ['configure']);
+ assert.equal(calls.length, 1);
+ assert.equal(calls[0].name, 'configure');
+ assert.equal(calls[0].input.count, 123);
+ assert.equal(calls[0].input.max_tokens, 256);
+ assert.equal(calls[0].input.enabled, true);
+});
+
test('parseToolCalls normalizes mixed DSML and XML tool tags', () => {
// Models commonly mix DSML wrapper tags with canonical inner tags.
const payload = '<|DSML|tool_calls><|DSML|parameter name="path">README.MD|DSML|parameter>|DSML|tool_calls>';
@@ -147,6 +189,41 @@ test('sieve keeps DSML space lookalike tag names as text', () => {
assert.equal(collectText(events), input);
});
+test('sieve emits tool_calls for collapsed DSML tag names and preserves prefix text', () => {
+ const todos = [
+ '[x] 检查 toolcalls_format.go 格式化逻辑',
+ '[x] 检查 toolcalls_parse.go 解析逻辑',
+ '[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go',
+ '[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go',
+ '[x] 检查 prompt/tool_calls.go 注入逻辑',
+ '[x] 检查 toolstream 流式解析',
+ '[x] 查看测试文件确认预期行为',
+ '[x] 给出调查结论',
+ ].join('\n');
+ const events = runSieve([
+ '[]\n',
+ '\n',
+ '\n',
+ `\n`,
+ '\n',
+ '',
+ ], ['update_todo_list']);
+ const text = collectText(events);
+ const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
+ assert.equal(finalCalls.length, 1);
+ assert.equal(finalCalls[0].name, 'update_todo_list');
+ assert.equal(finalCalls[0].input.todos, todos);
+ assert.equal(text, '[]\n');
+});
+
+test('sieve keeps collapsed DSML lookalike tag names as text', () => {
+ const input = 'x';
+ const events = runSieve([input], ['update_todo_list']);
+ const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
+ assert.equal(finalCalls.length, 0);
+ assert.equal(collectText(events), input);
+});
+
test('sieve preserves review body with alias mentions before real DSML tool calls', () => {
const events = runSieve([
"Done reviewing the diff. Here's my analysis before we commit:\n\n",
@@ -277,6 +354,23 @@ test('sieve keeps long XML tool calls buffered until the closing tag arrives', (
assert.equal(finalCalls[0].input.content, longContent);
});
+test('sieve recovers when CDATA never closes inside a valid wrapper', () => {
+ const events = runSieve(
+ [
+ '\n \n \n \n',
+ ],
+ ['Write'],
+ );
+ const leakedText = collectText(events);
+ const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
+ assert.equal(finalCalls.length, 1);
+ assert.equal(finalCalls[0].name, 'Write');
+ assert.equal(finalCalls[0].input.content, 'hello world');
+ assert.equal(leakedText, '');
+});
+
test('sieve keeps CDATA tool examples buffered until the outer closing tag arrives', () => {
const content = [
'# DS2API 4.0 更新内容',