mirror of
https://github.com/CJackHwang/ds2api.git
synced 2026-05-05 00:45:29 +08:00
refactor: unify Go/Node XML tool markup scanning and expand DSML alias support
- Add shared ToolMarkupTag scanner (toolcalls_scan.go) replacing hardcoded alias tables - Support DSML collapsed tag names (<DSMLtool_calls>, <DSMLinvoke>, <DSMLparameter>) - Parse JSON literal values from parameter bodies (123→number, true→bool, null) - Recover unclosed CDATA in final parse/flush via SanitizeLooseCDATA - Align Go and Node implementations (scanToolMarkupTagAt, findMatchingToolMarkupClose) - Reject bare <invoke> as unsupported syntax, only tool_calls wrapper triggers tool path - Update API.md and toolcall-semantics.md documentation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
4
API.md
4
API.md
@@ -37,7 +37,7 @@
|
||||
|
||||
- OpenAI / Claude / Gemini 三套协议已统一挂在同一 `chi` 路由树上,由 `internal/server/router.go` 负责装配。
|
||||
- 适配器层职责收敛为:**请求归一化 → DeepSeek 调用 → 协议形态渲染**,减少历史版本中“同能力多处实现”的分叉。
|
||||
- Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致:推荐模型输出 DSML 外壳 `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`;兼容层也接受 DSML wrapper 别名 `<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>`),以及旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`,内部仍以 XML 解析语义为准,并在流式场景执行防泄漏筛分。
|
||||
- Tool Calling 的解析策略在 Go 与 Node Runtime 间保持一致:推荐模型输出 DSML 外壳 `<|DSML|tool_calls>` → `<|DSML|invoke name="...">` → `<|DSML|parameter name="...">`;兼容层也接受 DSML wrapper 别名 `<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>`)、`DSML` 与工具标签名黏连的常见 typo(如 `<DSMLtool_calls>`),以及旧式 canonical XML `<tool_calls>` → `<invoke name="...">` → `<parameter name="...">`。实现上采用窄容错结构扫描:只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 会进入工具路径,裸 `<invoke>` 不计为已支持语法;流式场景继续执行防泄漏筛分。若参数体本身是合法 JSON 字面量(如 `123`、`true`、`null`、数组或对象),会按结构化值输出,不再一律当作字符串;若 CDATA 偶发漏闭合,则会在最终 parse / flush 恢复阶段做窄修复,尽量保住已完整包裹的外层工具调用。
|
||||
- `Admin API` 将配置与运行时策略分开:`/admin/config*` 管静态配置,`/admin/settings*` 管运行时行为。
|
||||
|
||||
---
|
||||
@@ -344,7 +344,7 @@ data: [DONE]
|
||||
补充说明:
|
||||
|
||||
- **非代码块上下文**下,工具负载即使与普通文本混合,也会按特征识别并产出可执行 tool call(前后普通文本仍可透传)。
|
||||
- 解析器当前把 DSML 外壳(`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`)、DSML wrapper 别名(`<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`)、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`)和旧式 canonical XML 工具块(`<tool_calls>` / `<invoke name="...">` / `<parameter name="...">`)作为可执行调用解析;DSML 会先归一化回 XML,内部仍以 XML 解析语义为准。旧式 `<tools>`、`<tool_call>`、`<tool_name>`、`<param>`、`<function_call>`、`tool_use`、antml 风格与纯 JSON `tool_calls` 片段默认都会按普通文本处理。
|
||||
- 解析器当前把 DSML 外壳(`<|DSML|tool_calls>` / `<|DSML|invoke name="...">` / `<|DSML|parameter name="...">`)、DSML wrapper 别名(`<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`)、常见 DSML 分隔符漏写形态(如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`)、`DSML` 与工具标签名黏连的常见 typo(如 `<DSMLtool_calls>` / `<DSMLinvoke>` / `<DSMLparameter>`)和旧式 canonical XML 工具块(`<tool_calls>` / `<invoke name="...">` / `<parameter name="...">`)作为可执行调用解析;DSML 会先归一化回 XML,内部仍以 XML 解析语义为准。旧式 `<tools>`、`<tool_call>`、`<tool_name>`、`<param>`、`<function_call>`、`tool_use`、antml 风格与纯 JSON `tool_calls` 片段默认都会按普通文本处理。
|
||||
- 当最终可见正文为空但思维链里包含可执行工具调用时,Chat / Responses 会在收尾阶段补发标准 OpenAI `tool_calls` / `function_call` 输出;如果客户端未开启 thinking / reasoning,该思维链只用于检测,不会作为可见正文或 `reasoning_content` 暴露。
|
||||
- Markdown fenced code block(例如 ```json ... ```)中的 `tool_calls` 仅视为示例文本,不会被执行。
|
||||
|
||||
|
||||
@@ -39,8 +39,9 @@
|
||||
兼容修复:
|
||||
|
||||
- 如果模型漏掉 opening wrapper,但后面仍输出了一个或多个 invoke 并以 closing wrapper 收尾,Go 解析链路会在解析前补回缺失的 opening wrapper。
|
||||
- 如果模型把 DSML 标签里的分隔符 `|` 写漏成空格(例如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`,或无 leading pipe 的 `<DSML tool_calls>` 形态),Go / Node 会在固定工具标签名范围内归一化;相似但非工具标签名(如 `tool_calls_extra`)仍按普通文本处理。
|
||||
- 如果模型把 DSML 标签里的分隔符 `|` 写漏成空格(例如 `<|DSML tool_calls>` / `<|DSML invoke>` / `<|DSML parameter>`,或无 leading pipe 的 `<DSML tool_calls>` 形态),或把 `DSML` 与工具标签名直接黏连(例如 `<DSMLtool_calls>` / `<DSMLinvoke>` / `<DSMLparameter>`),Go / Node 会在固定工具标签名范围内归一化;相似但非工具标签名(如 `tool_calls_extra`)仍按普通文本处理。
|
||||
- 这是一个针对常见模型失误的窄修复,不改变推荐输出格式;prompt 仍要求模型直接输出完整 DSML 外壳。
|
||||
- 裸 `<invoke ...>` / `<parameter ...>` 不会被当成“已支持的工具语法”;只有 `tool_calls` wrapper 或可修复的缺失 opening wrapper 才会进入工具调用路径。
|
||||
|
||||
## 2) 非兼容内容
|
||||
|
||||
@@ -52,20 +53,23 @@
|
||||
|
||||
在流式链路中(Go / Node 一致):
|
||||
|
||||
- DSML `<|DSML|tool_calls>` wrapper、兼容变体(`<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`)、窄容错空格分隔形态(如 `<|DSML tool_calls>`)和 canonical `<tool_calls>` wrapper 都会进入结构化捕获
|
||||
- DSML `<|DSML|tool_calls>` wrapper、兼容变体(`<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`)、窄容错空格分隔形态(如 `<|DSML tool_calls>`)、黏连形态(如 `<DSMLtool_calls>`)和 canonical `<tool_calls>` wrapper 都会进入结构化捕获
|
||||
- 如果流里直接从 invoke 开始,但后面补上了 closing wrapper,Go 流式筛分也会按缺失 opening wrapper 的修复路径尝试恢复
|
||||
- 已识别成功的工具调用不会再次回流到普通文本
|
||||
- 不符合新格式的块不会执行,并继续按原样文本透传
|
||||
- fenced code block(反引号 `` ``` `` 和波浪线 `~~~`)中的 XML 示例始终按普通文本处理
|
||||
- 支持嵌套围栏(如 4 反引号嵌套 3 反引号)和 CDATA 内围栏保护
|
||||
- 如果模型把 `<![CDATA[` 打开后却没有闭合,流式扫描阶段仍会保守地继续缓冲,不会误把 CDATA 里的示例 XML 当成真实工具调用;在最终 parse / flush 恢复阶段,会对这类 loose CDATA 做窄修复,尽量保住外层已完整包裹的真实工具调用
|
||||
- 当文本中 mention 了某种标签名(如 `<dsml|tool_calls>` 或 Markdown inline code 里的 `<|DSML|tool_calls>`)而后面紧跟真正工具调用时,sieve 会跳过不可解析的 mention 候选并继续匹配后续真实工具块,不会因 mention 导致工具调用丢失,也不会截断 mention 后的正文
|
||||
|
||||
另外,`<parameter>` 的值如果本身是合法 JSON 字面量,也会按结构化值解析,而不是一律保留为字符串。例如 `123`、`true`、`null`、`[1,2]`、`{"a":1}` 都会还原成对应的 number / boolean / null / array / object。
|
||||
|
||||
## 4) 输出结构
|
||||
|
||||
`ParseToolCallsDetailed` / `parseToolCallsDetailed` 返回:
|
||||
|
||||
- `calls`:解析出的工具调用列表(`name` + `input`)
|
||||
- `sawToolCallSyntax`:检测到 DSML / canonical wrapper,或命中“缺失 opening wrapper 但可修复”的形态时会为 `true`
|
||||
- `sawToolCallSyntax`:检测到 DSML / canonical wrapper,或命中“缺失 opening wrapper 但可修复”的形态时会为 `true`;裸 `invoke` 不计入该标记
|
||||
- `rejectedByPolicy`:当前固定为 `false`
|
||||
- `rejectedToolNames`:当前固定为空数组
|
||||
|
||||
@@ -88,7 +92,7 @@ node --test tests/node/stream-tool-sieve.test.js
|
||||
|
||||
- DSML `<|DSML|tool_calls>` wrapper 正常解析
|
||||
- legacy canonical `<tool_calls>` wrapper 正常解析
|
||||
- 别名变体(`<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`)和 DSML 空格分隔 typo(如 `<|DSML tool_calls>`)正常解析
|
||||
- 别名变体(`<dsml|tool_calls>`、`<|tool_calls>`、`<|tool_calls>`)、DSML 空格分隔 typo(如 `<|DSML tool_calls>`)和黏连 typo(如 `<DSMLtool_calls>`)正常解析
|
||||
- 混搭标签(DSML wrapper + canonical inner)归一化后正常解析
|
||||
- 波浪线围栏 `~~~` 内的示例不执行
|
||||
- 嵌套围栏(4 反引号嵌套 3 反引号)内的示例不执行
|
||||
|
||||
@@ -6,10 +6,10 @@ const {
|
||||
const {
|
||||
parseMarkupToolCalls,
|
||||
stripFencedCodeBlocks,
|
||||
containsToolCallWrapperSyntaxOutsideIgnored,
|
||||
sanitizeLooseCDATA,
|
||||
} = require('./parse_payload');
|
||||
|
||||
const TOOL_MARKUP_PREFIXES = ['<tool_calls', '<|dsml|tool_calls', '<|dsml tool_calls', '<dsml|tool_calls', '<dsml tool_calls', '<|tool_calls', '<|tool_calls'];
|
||||
|
||||
function extractToolNames(tools) {
|
||||
if (!Array.isArray(tools) || tools.length === 0) {
|
||||
return [];
|
||||
@@ -46,7 +46,13 @@ function parseToolCallsDetailed(text, toolNames) {
|
||||
return result;
|
||||
}
|
||||
// XML markup parsing only.
|
||||
const parsed = parseMarkupToolCalls(normalized);
|
||||
let parsed = parseMarkupToolCalls(normalized);
|
||||
if (parsed.length === 0 && normalized.toLowerCase().includes('<![cdata[')) {
|
||||
const recovered = sanitizeLooseCDATA(normalized);
|
||||
if (recovered !== normalized) {
|
||||
parsed = parseMarkupToolCalls(recovered);
|
||||
}
|
||||
}
|
||||
if (parsed.length === 0) {
|
||||
return result;
|
||||
}
|
||||
@@ -73,7 +79,13 @@ function parseStandaloneToolCallsDetailed(text, toolNames) {
|
||||
return result;
|
||||
}
|
||||
// XML markup parsing only.
|
||||
const parsed = parseMarkupToolCalls(trimmed);
|
||||
let parsed = parseMarkupToolCalls(trimmed);
|
||||
if (parsed.length === 0 && trimmed.toLowerCase().includes('<![cdata[')) {
|
||||
const recovered = sanitizeLooseCDATA(trimmed);
|
||||
if (recovered !== trimmed) {
|
||||
parsed = parseMarkupToolCalls(recovered);
|
||||
}
|
||||
}
|
||||
if (parsed.length === 0) {
|
||||
return result;
|
||||
}
|
||||
@@ -110,8 +122,8 @@ function filterToolCallsDetailed(parsed, toolNames) {
|
||||
}
|
||||
|
||||
function looksLikeToolCallSyntax(text) {
|
||||
const lower = toStringSafe(text).toLowerCase();
|
||||
return TOOL_MARKUP_PREFIXES.some((prefix) => lower.includes(prefix));
|
||||
const styles = containsToolCallWrapperSyntaxOutsideIgnored(text);
|
||||
return styles.dsml || styles.canonical;
|
||||
}
|
||||
|
||||
function shouldSkipToolCallParsingForCodeFenceExample(text) {
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
const TOOL_CALL_MARKUP_KV_PATTERN = /<(?:[a-z0-9_:-]+:)?([a-z0-9_.-]+)\b[^>]*>([\s\S]*?)<\/(?:[a-z0-9_:-]+:)?\1>/gi;
|
||||
const CDATA_PATTERN = /^<!\[CDATA\[([\s\S]*?)]]>$/i;
|
||||
const XML_ATTR_PATTERN = /\b([a-z0-9_:-]+)\s*=\s*("([^"]*)"|'([^']*)')/gi;
|
||||
const TOOL_MARKUP_NAMES = ['tool_calls', 'invoke', 'parameter'];
|
||||
|
||||
const {
|
||||
toStringSafe,
|
||||
@@ -138,13 +139,10 @@ function normalizeDSMLToolCallMarkup(text) {
|
||||
if (!raw) {
|
||||
return { text: '', ok: true };
|
||||
}
|
||||
const styles = toolMarkupStylesOutsideIgnored(raw);
|
||||
const styles = containsToolMarkupSyntaxOutsideIgnored(raw);
|
||||
if (!styles.dsml) {
|
||||
return { text: raw, ok: true };
|
||||
}
|
||||
// Always normalize DSML aliases to canonical form, even when canonical
|
||||
// tags coexist. Models frequently mix DSML wrapper tags with canonical
|
||||
// inner tags (e.g., <|tool_calls><invoke name="...">).
|
||||
return {
|
||||
text: replaceDSMLToolMarkupOutsideIgnored(raw),
|
||||
ok: true,
|
||||
@@ -152,65 +150,21 @@ function normalizeDSMLToolCallMarkup(text) {
|
||||
}
|
||||
|
||||
function containsDSMLToolMarkup(text) {
|
||||
return toolMarkupStylesOutsideIgnored(text).dsml;
|
||||
return containsToolMarkupSyntaxOutsideIgnored(text).dsml;
|
||||
}
|
||||
|
||||
function containsCanonicalToolMarkup(text) {
|
||||
return toolMarkupStylesOutsideIgnored(text).canonical;
|
||||
return containsToolMarkupSyntaxOutsideIgnored(text).canonical;
|
||||
}
|
||||
|
||||
const DSML_TOOL_MARKUP_ALIASES = [
|
||||
{ from: '<|dsml|tool_calls', to: '<tool_calls' },
|
||||
{ from: '</|dsml|tool_calls>', to: '</tool_calls>' },
|
||||
{ from: '<|dsml|invoke', to: '<invoke' },
|
||||
{ from: '</|dsml|invoke>', to: '</invoke>' },
|
||||
{ from: '<|dsml|parameter', to: '<parameter' },
|
||||
{ from: '</|dsml|parameter>', to: '</parameter>' },
|
||||
{ from: '<|dsml tool_calls', to: '<tool_calls' },
|
||||
{ from: '</|dsml tool_calls>', to: '</tool_calls>' },
|
||||
{ from: '<|dsml invoke', to: '<invoke' },
|
||||
{ from: '</|dsml invoke>', to: '</invoke>' },
|
||||
{ from: '<|dsml parameter', to: '<parameter' },
|
||||
{ from: '</|dsml parameter>', to: '</parameter>' },
|
||||
{ from: '<dsml tool_calls', to: '<tool_calls' },
|
||||
{ from: '</dsml tool_calls>', to: '</tool_calls>' },
|
||||
{ from: '<dsml invoke', to: '<invoke' },
|
||||
{ from: '</dsml invoke>', to: '</invoke>' },
|
||||
{ from: '<dsml parameter', to: '<parameter' },
|
||||
{ from: '</dsml parameter>', to: '</parameter>' },
|
||||
{ from: '<dsml|tool_calls', to: '<tool_calls' },
|
||||
{ from: '</dsml|tool_calls>', to: '</tool_calls>' },
|
||||
{ from: '<dsml|invoke', to: '<invoke' },
|
||||
{ from: '</dsml|invoke>', to: '</invoke>' },
|
||||
{ from: '<dsml|parameter', to: '<parameter' },
|
||||
{ from: '</dsml|parameter>', to: '</parameter>' },
|
||||
{ from: '<|tool_calls', to: '<tool_calls' },
|
||||
{ from: '</|tool_calls>', to: '</tool_calls>' },
|
||||
{ from: '<|invoke', to: '<invoke' },
|
||||
{ from: '</|invoke>', to: '</invoke>' },
|
||||
{ from: '<|parameter', to: '<parameter' },
|
||||
{ from: '</|parameter>', to: '</parameter>' },
|
||||
{ from: '<|tool_calls', to: '<tool_calls' },
|
||||
{ from: '</|tool_calls>', to: '</tool_calls>' },
|
||||
{ from: '<|invoke', to: '<invoke' },
|
||||
{ from: '</|invoke>', to: '</invoke>' },
|
||||
{ from: '<|parameter', to: '<parameter' },
|
||||
{ from: '</|parameter>', to: '</parameter>' },
|
||||
];
|
||||
|
||||
const CANONICAL_TOOL_MARKUP_PREFIXES = [
|
||||
'<tool_calls',
|
||||
'</tool_calls>',
|
||||
'<invoke',
|
||||
'</invoke>',
|
||||
'<parameter',
|
||||
'</parameter>',
|
||||
];
|
||||
|
||||
function toolMarkupStylesOutsideIgnored(text) {
|
||||
const lower = toStringSafe(text).toLowerCase();
|
||||
function containsToolCallWrapperSyntaxOutsideIgnored(text) {
|
||||
const raw = toStringSafe(text);
|
||||
const styles = { dsml: false, canonical: false };
|
||||
for (let i = 0; i < lower.length;) {
|
||||
if (!raw) {
|
||||
return styles;
|
||||
}
|
||||
const lower = raw.toLowerCase();
|
||||
for (let i = 0; i < raw.length;) {
|
||||
const skipped = skipXmlIgnoredSection(lower, i);
|
||||
if (skipped.blocked) {
|
||||
return styles;
|
||||
@@ -219,15 +173,55 @@ function toolMarkupStylesOutsideIgnored(text) {
|
||||
i = skipped.next;
|
||||
continue;
|
||||
}
|
||||
if (CANONICAL_TOOL_MARKUP_PREFIXES.some(prefix => lower.startsWith(prefix, i))) {
|
||||
styles.canonical = true;
|
||||
const tag = scanToolMarkupTagAt(raw, i);
|
||||
if (tag) {
|
||||
if (tag.name !== 'tool_calls') {
|
||||
i = tag.end + 1;
|
||||
continue;
|
||||
}
|
||||
if (tag.dsmlLike) {
|
||||
styles.dsml = true;
|
||||
} else {
|
||||
styles.canonical = true;
|
||||
}
|
||||
if (styles.dsml && styles.canonical) {
|
||||
return styles;
|
||||
}
|
||||
i = tag.end + 1;
|
||||
continue;
|
||||
}
|
||||
if (DSML_TOOL_MARKUP_ALIASES.some(alias => lower.startsWith(alias.from, i))) {
|
||||
styles.dsml = true;
|
||||
}
|
||||
if (styles.dsml && styles.canonical) {
|
||||
i += 1;
|
||||
}
|
||||
return styles;
|
||||
}
|
||||
function containsToolMarkupSyntaxOutsideIgnored(text) {
|
||||
const raw = toStringSafe(text);
|
||||
const styles = { dsml: false, canonical: false };
|
||||
if (!raw) {
|
||||
return styles;
|
||||
}
|
||||
for (let i = 0; i < raw.length;) {
|
||||
const skipped = skipXmlIgnoredSection(raw.toLowerCase(), i);
|
||||
if (skipped.blocked) {
|
||||
return styles;
|
||||
}
|
||||
if (skipped.advanced) {
|
||||
i = skipped.next;
|
||||
continue;
|
||||
}
|
||||
const tag = scanToolMarkupTagAt(raw, i);
|
||||
if (tag) {
|
||||
if (tag.dsmlLike) {
|
||||
styles.dsml = true;
|
||||
} else {
|
||||
styles.canonical = true;
|
||||
}
|
||||
if (styles.dsml && styles.canonical) {
|
||||
return styles;
|
||||
}
|
||||
i = tag.end + 1;
|
||||
continue;
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
return styles;
|
||||
@@ -235,6 +229,9 @@ function toolMarkupStylesOutsideIgnored(text) {
|
||||
|
||||
function replaceDSMLToolMarkupOutsideIgnored(text) {
|
||||
const raw = toStringSafe(text);
|
||||
if (!raw) {
|
||||
return '';
|
||||
}
|
||||
const lower = raw.toLowerCase();
|
||||
let out = '';
|
||||
for (let i = 0; i < raw.length;) {
|
||||
@@ -248,10 +245,14 @@ function replaceDSMLToolMarkupOutsideIgnored(text) {
|
||||
i = skipped.next;
|
||||
continue;
|
||||
}
|
||||
const alias = DSML_TOOL_MARKUP_ALIASES.find(item => lower.startsWith(item.from, i));
|
||||
if (alias) {
|
||||
out += alias.to;
|
||||
i += alias.from.length;
|
||||
const tag = scanToolMarkupTagAt(raw, i);
|
||||
if (tag) {
|
||||
if (tag.dsmlLike) {
|
||||
out += `<${tag.closing ? '/' : ''}${tag.name}${raw.slice(tag.nameEnd, tag.end + 1)}`;
|
||||
} else {
|
||||
out += raw.slice(tag.start, tag.end + 1);
|
||||
}
|
||||
i = tag.end + 1;
|
||||
continue;
|
||||
}
|
||||
out += raw[i];
|
||||
@@ -417,6 +418,150 @@ function skipXmlIgnoredSection(lower, i) {
|
||||
return { advanced: false, blocked: false, next: i };
|
||||
}
|
||||
|
||||
function scanToolMarkupTagAt(text, start) {
|
||||
const raw = toStringSafe(text);
|
||||
if (!raw || start < 0 || start >= raw.length || raw[start] !== '<') {
|
||||
return null;
|
||||
}
|
||||
const lower = raw.toLowerCase();
|
||||
let i = start + 1;
|
||||
const closing = raw[i] === '/';
|
||||
if (closing) {
|
||||
i += 1;
|
||||
}
|
||||
let dsmlLike = false;
|
||||
if (i < raw.length && isToolMarkupPipe(raw[i])) {
|
||||
dsmlLike = true;
|
||||
i += 1;
|
||||
}
|
||||
if (lower.startsWith('dsml', i)) {
|
||||
dsmlLike = true;
|
||||
i += 'dsml'.length;
|
||||
while (i < raw.length && isToolMarkupSeparator(raw[i])) {
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
const { name, len } = matchToolMarkupName(lower, i);
|
||||
if (!name) {
|
||||
return null;
|
||||
}
|
||||
const nameEnd = i + len;
|
||||
if (!hasXmlTagBoundary(raw, nameEnd)) {
|
||||
return null;
|
||||
}
|
||||
const end = findXmlTagEnd(raw, nameEnd);
|
||||
if (end < 0) {
|
||||
return null;
|
||||
}
|
||||
return {
|
||||
start,
|
||||
end,
|
||||
nameStart: i,
|
||||
nameEnd,
|
||||
name,
|
||||
closing,
|
||||
selfClosing: raw.slice(start, end + 1).trim().endsWith('/>'),
|
||||
dsmlLike,
|
||||
canonical: !dsmlLike,
|
||||
};
|
||||
}
|
||||
|
||||
function findToolMarkupTagOutsideIgnored(text, from) {
|
||||
const raw = toStringSafe(text);
|
||||
const lower = raw.toLowerCase();
|
||||
for (let i = Math.max(0, from || 0); i < raw.length;) {
|
||||
const skipped = skipXmlIgnoredSection(lower, i);
|
||||
if (skipped.blocked) {
|
||||
return null;
|
||||
}
|
||||
if (skipped.advanced) {
|
||||
i = skipped.next;
|
||||
continue;
|
||||
}
|
||||
const tag = scanToolMarkupTagAt(raw, i);
|
||||
if (tag) {
|
||||
return tag;
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function findMatchingToolMarkupClose(text, openTag) {
|
||||
const raw = toStringSafe(text);
|
||||
if (!raw || !openTag || !openTag.name || openTag.closing) {
|
||||
return null;
|
||||
}
|
||||
let depth = 1;
|
||||
for (let pos = openTag.end + 1; pos < raw.length;) {
|
||||
const tag = findToolMarkupTagOutsideIgnored(raw, pos);
|
||||
if (!tag) {
|
||||
return null;
|
||||
}
|
||||
if (tag.name !== openTag.name) {
|
||||
pos = tag.end + 1;
|
||||
continue;
|
||||
}
|
||||
if (tag.closing) {
|
||||
depth -= 1;
|
||||
if (depth === 0) {
|
||||
return tag;
|
||||
}
|
||||
} else if (!tag.selfClosing) {
|
||||
depth += 1;
|
||||
}
|
||||
pos = tag.end + 1;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function findPartialToolMarkupStart(text) {
|
||||
const raw = toStringSafe(text);
|
||||
const lastLT = raw.lastIndexOf('<');
|
||||
if (lastLT < 0) {
|
||||
return -1;
|
||||
}
|
||||
const tail = raw.slice(lastLT);
|
||||
if (tail.includes('>')) {
|
||||
return -1;
|
||||
}
|
||||
const lowerTail = tail.toLowerCase();
|
||||
const candidates = [
|
||||
'<tool_calls', '<invoke', '<parameter',
|
||||
'<|tool_calls', '<|invoke', '<|parameter',
|
||||
'<|tool_calls', '<|invoke', '<|parameter',
|
||||
'<|dsml|tool_calls', '<|dsml|invoke', '<|dsml|parameter',
|
||||
'<dsmltool_calls', '<dsmlinvoke', '<dsmlparameter',
|
||||
'<dsml tool_calls', '<dsml invoke', '<dsml parameter',
|
||||
'<dsml|tool_calls', '<dsml|invoke', '<dsml|parameter',
|
||||
'<|dsmltool_calls', '<|dsmlinvoke', '<|dsmlparameter',
|
||||
'<|dsml tool_calls', '<|dsml invoke', '<|dsml parameter',
|
||||
];
|
||||
for (const candidate of candidates) {
|
||||
if (candidate.startsWith(lowerTail)) {
|
||||
return lastLT;
|
||||
}
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
function isToolMarkupPipe(ch) {
|
||||
return ch === '|' || ch === '|';
|
||||
}
|
||||
|
||||
function isToolMarkupSeparator(ch) {
|
||||
return ch === ' ' || ch === '\t' || ch === '\r' || ch === '\n' || isToolMarkupPipe(ch);
|
||||
}
|
||||
|
||||
function matchToolMarkupName(lower, start) {
|
||||
for (const name of TOOL_MARKUP_NAMES) {
|
||||
if (lower.startsWith(name, start)) {
|
||||
return { name, len: name.length };
|
||||
}
|
||||
}
|
||||
return { name: '', len: 0 };
|
||||
}
|
||||
|
||||
function findXmlTagEnd(text, from) {
|
||||
let quote = '';
|
||||
for (let i = Math.max(0, from || 0); i < text.length; i += 1) {
|
||||
@@ -494,7 +639,8 @@ function parseMarkupKVObject(text) {
|
||||
function parseMarkupValue(raw) {
|
||||
const cdata = extractStandaloneCDATA(raw);
|
||||
if (cdata.ok) {
|
||||
return cdata.value;
|
||||
const literal = parseJSONLiteralValue(cdata.value);
|
||||
return literal.ok ? literal.value : cdata.value;
|
||||
}
|
||||
const s = toStringSafe(extractRawTagValue(raw)).trim();
|
||||
if (!s) {
|
||||
@@ -511,12 +657,9 @@ function parseMarkupValue(raw) {
|
||||
}
|
||||
}
|
||||
|
||||
if (s.startsWith('{') || s.startsWith('[')) {
|
||||
try {
|
||||
return JSON.parse(s);
|
||||
} catch (_err) {
|
||||
return s;
|
||||
}
|
||||
const literal = parseJSONLiteralValue(s);
|
||||
if (literal.ok) {
|
||||
return literal.value;
|
||||
}
|
||||
return s;
|
||||
}
|
||||
@@ -554,9 +697,65 @@ function extractStandaloneCDATA(inner) {
|
||||
if (cdataMatch && cdataMatch[1] !== undefined) {
|
||||
return { ok: true, value: cdataMatch[1] };
|
||||
}
|
||||
if (s.toLowerCase().startsWith('<![cdata[')) {
|
||||
return { ok: true, value: s.slice('<![CDATA['.length) };
|
||||
}
|
||||
return { ok: false, value: '' };
|
||||
}
|
||||
|
||||
function parseJSONLiteralValue(raw) {
|
||||
const s = toStringSafe(raw).trim();
|
||||
if (!s) {
|
||||
return { ok: false, value: null };
|
||||
}
|
||||
if (!['{', '[', '"', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 't', 'f', 'n'].includes(s[0])) {
|
||||
return { ok: false, value: null };
|
||||
}
|
||||
try {
|
||||
return { ok: true, value: JSON.parse(s) };
|
||||
} catch (_err) {
|
||||
return { ok: false, value: null };
|
||||
}
|
||||
}
|
||||
|
||||
function sanitizeLooseCDATA(text) {
|
||||
const raw = toStringSafe(text);
|
||||
if (!raw) {
|
||||
return '';
|
||||
}
|
||||
const lower = raw.toLowerCase();
|
||||
const openMarker = '<![cdata[';
|
||||
const closeMarker = ']]>';
|
||||
|
||||
let out = '';
|
||||
let pos = 0;
|
||||
let changed = false;
|
||||
while (pos < raw.length) {
|
||||
const startRel = lower.indexOf(openMarker, pos);
|
||||
if (startRel < 0) {
|
||||
out += raw.slice(pos);
|
||||
break;
|
||||
}
|
||||
const start = startRel;
|
||||
const contentStart = start + openMarker.length;
|
||||
out += raw.slice(pos, start);
|
||||
|
||||
const endRel = lower.indexOf(closeMarker, contentStart);
|
||||
if (endRel >= 0) {
|
||||
const end = endRel + closeMarker.length;
|
||||
out += raw.slice(start, end);
|
||||
pos = end;
|
||||
continue;
|
||||
}
|
||||
|
||||
changed = true;
|
||||
out += raw.slice(contentStart);
|
||||
pos = raw.length;
|
||||
}
|
||||
|
||||
return changed ? out : raw;
|
||||
}
|
||||
|
||||
function parseTagAttributes(raw) {
|
||||
const source = toStringSafe(raw);
|
||||
const out = {};
|
||||
@@ -631,4 +830,10 @@ module.exports = {
|
||||
stripFencedCodeBlocks,
|
||||
parseMarkupToolCalls,
|
||||
normalizeDSMLToolCallMarkup,
|
||||
containsToolMarkupSyntaxOutsideIgnored,
|
||||
containsToolCallWrapperSyntaxOutsideIgnored,
|
||||
findToolMarkupTagOutsideIgnored,
|
||||
findMatchingToolMarkupClose,
|
||||
findPartialToolMarkupStart,
|
||||
sanitizeLooseCDATA,
|
||||
};
|
||||
|
||||
@@ -1,71 +1,53 @@
|
||||
'use strict';
|
||||
const { parseToolCalls } = require('./parse');
|
||||
|
||||
// XML wrapper tag pair used by the streaming sieve.
|
||||
const XML_TOOL_TAG_PAIRS = [
|
||||
{ open: '<|dsml|tool_calls', close: '</|dsml|tool_calls>' },
|
||||
{ open: '<|dsml tool_calls', close: '</|dsml tool_calls>' },
|
||||
{ open: '<dsml|tool_calls', close: '</dsml|tool_calls>' },
|
||||
{ open: '<dsml tool_calls', close: '</dsml tool_calls>' },
|
||||
{ open: '<|tool_calls', close: '</|tool_calls>' },
|
||||
{ open: '<|tool_calls', close: '</|tool_calls>' },
|
||||
{ open: '<tool_calls', close: '</tool_calls>' },
|
||||
];
|
||||
|
||||
const XML_TOOL_OPENING_TAGS = [
|
||||
...XML_TOOL_TAG_PAIRS.map(p => p.open),
|
||||
'<|dsml|invoke', '<|dsml invoke', '<dsml|invoke', '<dsml invoke', '<|invoke', '<|invoke', '<invoke',
|
||||
];
|
||||
const {
|
||||
findToolMarkupTagOutsideIgnored,
|
||||
findMatchingToolMarkupClose,
|
||||
findPartialToolMarkupStart,
|
||||
} = require('./parse_payload');
|
||||
|
||||
function consumeXMLToolCapture(captured, toolNames, trimWrappingJSONFence) {
|
||||
const lower = captured.toLowerCase();
|
||||
let anyOpenFound = false;
|
||||
let best = null;
|
||||
let rejected = null;
|
||||
|
||||
// Scan every wrapper occurrence. Prose can mention a wrapper tag before the
|
||||
// actual tool block, including the same variant as the real block.
|
||||
for (const pair of XML_TOOL_TAG_PAIRS) {
|
||||
let searchFrom = 0;
|
||||
while (searchFrom < lower.length) {
|
||||
const openIdx = findXMLOpenOutsideCDATA(captured, pair.open, searchFrom);
|
||||
if (openIdx < 0) {
|
||||
break;
|
||||
}
|
||||
// Ignore closing tags that appear inside CDATA payloads, such as
|
||||
// write-file content containing tool-call documentation examples.
|
||||
const closeIdx = findMatchingXMLToolWrapperClose(captured, pair.open, pair.close, openIdx);
|
||||
if (closeIdx < 0) {
|
||||
anyOpenFound = true;
|
||||
searchFrom = openIdx + pair.open.length;
|
||||
continue;
|
||||
}
|
||||
const closeEnd = closeIdx + pair.close.length;
|
||||
const xmlBlock = captured.slice(openIdx, closeEnd);
|
||||
let prefixPart = captured.slice(0, openIdx);
|
||||
let suffixPart = captured.slice(closeEnd);
|
||||
const parsed = parseToolCalls(xmlBlock, toolNames);
|
||||
if (Array.isArray(parsed) && parsed.length > 0) {
|
||||
const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
|
||||
if (!best || openIdx < best.start) {
|
||||
best = {
|
||||
start: openIdx,
|
||||
prefix: trimmedFence.prefix,
|
||||
calls: parsed,
|
||||
suffix: trimmedFence.suffix,
|
||||
};
|
||||
}
|
||||
break;
|
||||
}
|
||||
if (!rejected || openIdx < rejected.start) {
|
||||
rejected = {
|
||||
start: openIdx,
|
||||
prefix: prefixPart + xmlBlock,
|
||||
suffix: suffixPart,
|
||||
// Scan every recognized wrapper occurrence. Prose can mention a wrapper tag
|
||||
// before the actual tool block, including the same variant as the real block.
|
||||
for (let searchFrom = 0; searchFrom < captured.length;) {
|
||||
const openTag = findFirstToolTag(captured, searchFrom, 'tool_calls', false);
|
||||
if (!openTag) {
|
||||
break;
|
||||
}
|
||||
const closeTag = findMatchingToolMarkupClose(captured, openTag);
|
||||
if (!closeTag) {
|
||||
anyOpenFound = true;
|
||||
searchFrom = openTag.end + 1;
|
||||
continue;
|
||||
}
|
||||
const xmlBlock = captured.slice(openTag.start, closeTag.end + 1);
|
||||
const prefixPart = captured.slice(0, openTag.start);
|
||||
const suffixPart = captured.slice(closeTag.end + 1);
|
||||
const parsed = parseToolCalls(xmlBlock, toolNames);
|
||||
if (Array.isArray(parsed) && parsed.length > 0) {
|
||||
const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
|
||||
if (!best || openTag.start < best.start) {
|
||||
best = {
|
||||
start: openTag.start,
|
||||
prefix: trimmedFence.prefix,
|
||||
calls: parsed,
|
||||
suffix: trimmedFence.suffix,
|
||||
};
|
||||
}
|
||||
searchFrom = openIdx + pair.open.length;
|
||||
break;
|
||||
}
|
||||
if (!rejected || openTag.start < rejected.start) {
|
||||
rejected = {
|
||||
start: openTag.start,
|
||||
prefix: prefixPart + xmlBlock,
|
||||
suffix: suffixPart,
|
||||
};
|
||||
}
|
||||
searchFrom = openTag.end + 1;
|
||||
}
|
||||
if (best) {
|
||||
return { ready: true, prefix: best.prefix, calls: best.calls, suffix: best.suffix };
|
||||
@@ -78,17 +60,15 @@ function consumeXMLToolCapture(captured, toolNames, trimWrappingJSONFence) {
|
||||
// If this block failed to become a tool call, pass it through as text.
|
||||
return { ready: true, prefix: rejected.prefix, calls: [], suffix: rejected.suffix };
|
||||
}
|
||||
if (!containsAnyToolCallWrapper(lower)) {
|
||||
const found = firstInvokeIndex(lower);
|
||||
if (found.index >= 0) {
|
||||
const closeTag = found.dsml ? '</|dsml|tool_calls>' : '</tool_calls>';
|
||||
const openWrapper = found.dsml ? '<|DSML|tool_calls>' : '<tool_calls>';
|
||||
const closeIdx = findXMLCloseOutsideCDATA(captured, closeTag, found.index);
|
||||
if (closeIdx > found.index) {
|
||||
const closeEnd = closeIdx + closeTag.length;
|
||||
const xmlBlock = openWrapper + captured.slice(found.index, closeIdx) + closeTag;
|
||||
let prefixPart = captured.slice(0, found.index);
|
||||
let suffixPart = captured.slice(closeEnd);
|
||||
const invokeTag = findFirstToolTag(captured, 0, 'invoke', false);
|
||||
if (invokeTag) {
|
||||
const wrapperOpen = findFirstToolTag(captured, 0, 'tool_calls', false);
|
||||
if (!wrapperOpen || wrapperOpen.start > invokeTag.start) {
|
||||
const closeTag = findFirstToolTag(captured, invokeTag.start + 1, 'tool_calls', true);
|
||||
if (closeTag && closeTag.start > invokeTag.start) {
|
||||
const xmlBlock = '<tool_calls>' + captured.slice(invokeTag.start, closeTag.end + 1);
|
||||
const prefixPart = captured.slice(0, invokeTag.start);
|
||||
const suffixPart = captured.slice(closeTag.end + 1);
|
||||
const parsed = parseToolCalls(xmlBlock, toolNames);
|
||||
if (Array.isArray(parsed) && parsed.length > 0) {
|
||||
const trimmedFence = trimWrappingJSONFence(prefixPart, suffixPart);
|
||||
@@ -99,194 +79,43 @@ function consumeXMLToolCapture(captured, toolNames, trimWrappingJSONFence) {
|
||||
suffix: trimmedFence.suffix,
|
||||
};
|
||||
}
|
||||
return { ready: true, prefix: prefixPart + captured.slice(found.index, closeEnd), calls: [], suffix: suffixPart };
|
||||
return { ready: true, prefix: prefixPart + captured.slice(invokeTag.start, closeTag.end + 1), calls: [], suffix: suffixPart };
|
||||
}
|
||||
}
|
||||
}
|
||||
return { ready: false, prefix: '', calls: [], suffix: '' };
|
||||
}
|
||||
|
||||
function findMatchingXMLToolWrapperClose(s, openTag, closeTag, openIdx) {
|
||||
const text = typeof s === 'string' ? s : '';
|
||||
const openTarget = String(openTag || '').toLowerCase();
|
||||
const closeTarget = String(closeTag || '').toLowerCase();
|
||||
if (!text || !openTarget || !closeTarget || openIdx < 0) {
|
||||
return -1;
|
||||
}
|
||||
const lower = text.toLowerCase();
|
||||
let depth = 1;
|
||||
for (let i = openIdx + openTarget.length; i < text.length;) {
|
||||
if (lower.startsWith('<![cdata[', i)) {
|
||||
const end = lower.indexOf(']]>', i + '<![cdata['.length);
|
||||
if (end < 0) {
|
||||
return -1;
|
||||
}
|
||||
i = end + ']]>'.length;
|
||||
continue;
|
||||
}
|
||||
if (lower.startsWith('<!--', i)) {
|
||||
const end = lower.indexOf('-->', i + '<!--'.length);
|
||||
if (end < 0) {
|
||||
return -1;
|
||||
}
|
||||
i = end + '-->'.length;
|
||||
continue;
|
||||
}
|
||||
if (lower.startsWith(closeTarget, i)) {
|
||||
depth -= 1;
|
||||
if (depth === 0) {
|
||||
return i;
|
||||
}
|
||||
i += closeTarget.length;
|
||||
continue;
|
||||
}
|
||||
if (lower.startsWith(openTarget, i) && hasXMLToolTagBoundary(text, i + openTarget.length)) {
|
||||
depth += 1;
|
||||
i += openTarget.length;
|
||||
continue;
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
function findXMLOpenOutsideCDATA(s, openTag, start) {
|
||||
const text = typeof s === 'string' ? s : '';
|
||||
const target = String(openTag || '').toLowerCase();
|
||||
if (!text || !target) {
|
||||
return -1;
|
||||
}
|
||||
const lower = text.toLowerCase();
|
||||
for (let i = Math.max(0, start || 0); i < text.length;) {
|
||||
if (lower.startsWith('<![cdata[', i)) {
|
||||
const end = lower.indexOf(']]>', i + '<![cdata['.length);
|
||||
if (end < 0) {
|
||||
return -1;
|
||||
}
|
||||
i = end + ']]>'.length;
|
||||
continue;
|
||||
}
|
||||
if (lower.startsWith('<!--', i)) {
|
||||
const end = lower.indexOf('-->', i + '<!--'.length);
|
||||
if (end < 0) {
|
||||
return -1;
|
||||
}
|
||||
i = end + '-->'.length;
|
||||
continue;
|
||||
}
|
||||
if (lower.startsWith(target, i) && hasXMLToolTagBoundary(text, i + target.length)) {
|
||||
return i;
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
function hasXMLToolTagBoundary(text, idx) {
|
||||
if (idx >= text.length) {
|
||||
return true;
|
||||
}
|
||||
return [' ', '\t', '\n', '\r', '>', '/'].includes(text[idx]);
|
||||
}
|
||||
|
||||
function hasOpenXMLToolTag(captured) {
|
||||
for (const pair of XML_TOOL_TAG_PAIRS) {
|
||||
const openIdx = findXMLOpenOutsideCDATA(captured, pair.open, 0);
|
||||
if (openIdx >= 0) {
|
||||
if (findMatchingXMLToolWrapperClose(captured, pair.open, pair.close, openIdx) < 0) {
|
||||
return true;
|
||||
}
|
||||
for (let pos = 0; pos < captured.length;) {
|
||||
const tag = findFirstToolTag(captured, pos, 'tool_calls', false);
|
||||
if (!tag) {
|
||||
return false;
|
||||
}
|
||||
if (!findMatchingToolMarkupClose(captured, tag)) {
|
||||
return true;
|
||||
}
|
||||
pos = tag.end + 1;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
function containsAnyToolCallWrapper(lower) {
|
||||
return lower.includes('<tool_calls') ||
|
||||
lower.includes('<|dsml|tool_calls') ||
|
||||
lower.includes('<|dsml tool_calls') ||
|
||||
lower.includes('<dsml|tool_calls') ||
|
||||
lower.includes('<dsml tool_calls') ||
|
||||
lower.includes('<|tool_calls') ||
|
||||
lower.includes('<|tool_calls');
|
||||
}
|
||||
|
||||
function firstInvokeIndex(lower) {
|
||||
const xmlIdx = lower.indexOf('<invoke');
|
||||
// Check all DSML-like invoke prefixes.
|
||||
const dsmlPrefixes = ['<|dsml|invoke', '<|dsml invoke', '<dsml|invoke', '<dsml invoke', '<|invoke', '<|invoke'];
|
||||
let dsmlIdx = -1;
|
||||
for (const prefix of dsmlPrefixes) {
|
||||
const idx = lower.indexOf(prefix);
|
||||
if (idx >= 0 && (dsmlIdx < 0 || idx < dsmlIdx)) {
|
||||
dsmlIdx = idx;
|
||||
function findFirstToolTag(text, from, name, closing) {
|
||||
for (let pos = Math.max(0, from || 0); pos < text.length;) {
|
||||
const tag = findToolMarkupTagOutsideIgnored(text, pos);
|
||||
if (!tag) {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
if (xmlIdx < 0) {
|
||||
return { index: dsmlIdx, dsml: dsmlIdx >= 0 };
|
||||
}
|
||||
if (dsmlIdx < 0) {
|
||||
return { index: xmlIdx, dsml: false };
|
||||
}
|
||||
if (dsmlIdx < xmlIdx) {
|
||||
return { index: dsmlIdx, dsml: true };
|
||||
}
|
||||
return { index: xmlIdx, dsml: false };
|
||||
}
|
||||
|
||||
function findPartialXMLToolTagStart(s) {
|
||||
const lastLT = s.lastIndexOf('<');
|
||||
if (lastLT < 0) {
|
||||
return -1;
|
||||
}
|
||||
const tail = s.slice(lastLT);
|
||||
if (tail.includes('>')) {
|
||||
return -1;
|
||||
}
|
||||
const lowerTail = tail.toLowerCase();
|
||||
for (const tag of XML_TOOL_OPENING_TAGS) {
|
||||
const tagWithLT = tag.startsWith('<') ? tag : '<' + tag;
|
||||
if (tagWithLT.startsWith(lowerTail)) {
|
||||
return lastLT;
|
||||
if (tag.name === name && tag.closing === closing) {
|
||||
return tag;
|
||||
}
|
||||
pos = tag.end + 1;
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
function findXMLCloseOutsideCDATA(s, closeTag, start) {
|
||||
const text = typeof s === 'string' ? s : '';
|
||||
const target = String(closeTag || '').toLowerCase();
|
||||
if (!text || !target) {
|
||||
return -1;
|
||||
}
|
||||
const lower = text.toLowerCase();
|
||||
for (let i = Math.max(0, start || 0); i < text.length;) {
|
||||
if (lower.startsWith('<![cdata[', i)) {
|
||||
const end = lower.indexOf(']]>', i + '<![cdata['.length);
|
||||
if (end < 0) {
|
||||
return -1;
|
||||
}
|
||||
i = end + ']]>'.length;
|
||||
continue;
|
||||
}
|
||||
if (lower.startsWith('<!--', i)) {
|
||||
const end = lower.indexOf('-->', i + '<!--'.length);
|
||||
if (end < 0) {
|
||||
return -1;
|
||||
}
|
||||
i = end + '-->'.length;
|
||||
continue;
|
||||
}
|
||||
if (lower.startsWith(target, i)) {
|
||||
return i;
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
return -1;
|
||||
return null;
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
consumeXMLToolCapture,
|
||||
hasOpenXMLToolTag,
|
||||
findPartialXMLToolTagStart,
|
||||
findPartialXMLToolTagStart: findPartialToolMarkupStart,
|
||||
};
|
||||
|
||||
@@ -6,8 +6,9 @@ const {
|
||||
} = require('./state');
|
||||
const { trimWrappingJSONFence } = require('./jsonscan');
|
||||
const {
|
||||
XML_TOOL_SEGMENT_TAGS,
|
||||
} = require('./tool-keywords');
|
||||
findToolMarkupTagOutsideIgnored,
|
||||
sanitizeLooseCDATA,
|
||||
} = require('./parse_payload');
|
||||
const {
|
||||
consumeXMLToolCapture: consumeXMLToolCaptureImpl,
|
||||
hasOpenXMLToolTag,
|
||||
@@ -117,8 +118,27 @@ function flushToolSieve(state, toolNames) {
|
||||
}
|
||||
} else if (state.capture) {
|
||||
const content = state.capture;
|
||||
noteText(state, content);
|
||||
events.push({ type: 'text', text: content });
|
||||
const recovered = sanitizeLooseCDATA(content);
|
||||
if (recovered !== content) {
|
||||
const recoveredResult = consumeXMLToolCaptureImpl(recovered, toolNames, trimWrappingJSONFence);
|
||||
if (recoveredResult.ready && Array.isArray(recoveredResult.calls) && recoveredResult.calls.length > 0) {
|
||||
if (recoveredResult.prefix) {
|
||||
noteText(state, recoveredResult.prefix);
|
||||
events.push({ type: 'text', text: recoveredResult.prefix });
|
||||
}
|
||||
events.push({ type: 'tool_calls', calls: recoveredResult.calls });
|
||||
if (recoveredResult.suffix) {
|
||||
noteText(state, recoveredResult.suffix);
|
||||
events.push({ type: 'text', text: recoveredResult.suffix });
|
||||
}
|
||||
} else {
|
||||
noteText(state, content);
|
||||
events.push({ type: 'text', text: content });
|
||||
}
|
||||
} else {
|
||||
noteText(state, content);
|
||||
events.push({ type: 'text', text: content });
|
||||
}
|
||||
}
|
||||
state.capture = '';
|
||||
state.capturing = false;
|
||||
@@ -155,26 +175,16 @@ function findToolSegmentStart(state, s) {
|
||||
if (!s) {
|
||||
return -1;
|
||||
}
|
||||
const lower = s.toLowerCase();
|
||||
let offset = 0;
|
||||
while (true) {
|
||||
// Only check XML tool tags.
|
||||
let bestIdx = -1;
|
||||
let matchedTag = '';
|
||||
for (const tag of XML_TOOL_SEGMENT_TAGS) {
|
||||
const idx = lower.indexOf(tag, offset);
|
||||
if (idx >= 0 && (bestIdx < 0 || idx < bestIdx)) {
|
||||
bestIdx = idx;
|
||||
matchedTag = tag;
|
||||
}
|
||||
}
|
||||
if (bestIdx < 0) {
|
||||
const tag = findToolMarkupTagOutsideIgnored(s, offset);
|
||||
if (!tag) {
|
||||
return -1;
|
||||
}
|
||||
if (!insideCodeFenceWithState(state, s.slice(0, bestIdx))) {
|
||||
return bestIdx;
|
||||
if (!insideCodeFenceWithState(state, s.slice(0, tag.start))) {
|
||||
return tag.start;
|
||||
}
|
||||
offset = bestIdx + matchedTag.length;
|
||||
offset = tag.end + 1;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -3,10 +3,14 @@
|
||||
const XML_TOOL_SEGMENT_TAGS = [
|
||||
'<|dsml|tool_calls>', '<|dsml|tool_calls\n', '<|dsml|tool_calls ',
|
||||
'<|dsml|invoke ', '<|dsml|invoke\n', '<|dsml|invoke\t', '<|dsml|invoke\r',
|
||||
'<|dsmltool_calls>', '<|dsmltool_calls\n', '<|dsmltool_calls ',
|
||||
'<|dsmlinvoke ', '<|dsmlinvoke\n', '<|dsmlinvoke\t', '<|dsmlinvoke\r',
|
||||
'<|dsml tool_calls>', '<|dsml tool_calls\n', '<|dsml tool_calls ',
|
||||
'<|dsml invoke ', '<|dsml invoke\n', '<|dsml invoke\t', '<|dsml invoke\r',
|
||||
'<dsml|tool_calls>', '<dsml|tool_calls\n', '<dsml|tool_calls ',
|
||||
'<dsml|invoke ', '<dsml|invoke\n', '<dsml|invoke\t', '<dsml|invoke\r',
|
||||
'<dsmltool_calls>', '<dsmltool_calls\n', '<dsmltool_calls ',
|
||||
'<dsmlinvoke ', '<dsmlinvoke\n', '<dsmlinvoke\t', '<dsmlinvoke\r',
|
||||
'<dsml tool_calls>', '<dsml tool_calls\n', '<dsml tool_calls ',
|
||||
'<dsml invoke ', '<dsml invoke\n', '<dsml invoke\t', '<dsml invoke\r',
|
||||
'<|tool_calls>', '<|tool_calls\n', '<|tool_calls ',
|
||||
@@ -19,8 +23,10 @@ const XML_TOOL_SEGMENT_TAGS = [
|
||||
|
||||
const XML_TOOL_OPENING_TAGS = [
|
||||
'<|dsml|tool_calls',
|
||||
'<|dsmltool_calls',
|
||||
'<|dsml tool_calls',
|
||||
'<dsml|tool_calls',
|
||||
'<dsmltool_calls',
|
||||
'<dsml tool_calls',
|
||||
'<|tool_calls',
|
||||
'<|tool_calls',
|
||||
@@ -29,8 +35,10 @@ const XML_TOOL_OPENING_TAGS = [
|
||||
|
||||
const XML_TOOL_CLOSING_TAGS = [
|
||||
'</|dsml|tool_calls>',
|
||||
'</|dsmltool_calls>',
|
||||
'</|dsml tool_calls>',
|
||||
'</dsml|tool_calls>',
|
||||
'</dsmltool_calls>',
|
||||
'</dsml tool_calls>',
|
||||
'</|tool_calls>',
|
||||
'</|tool_calls>',
|
||||
|
||||
@@ -12,9 +12,9 @@ func TestRegression_RobustXMLAndCDATA(t *testing.T) {
|
||||
expected []ParsedToolCall
|
||||
}{
|
||||
{
|
||||
name: "Standard JSON parameters (Regression)",
|
||||
name: "Standard JSON scalar parameters (Regression)",
|
||||
text: `<tool_calls><invoke name="foo"><parameter name="a">1</parameter></invoke></tool_calls>`,
|
||||
expected: []ParsedToolCall{{Name: "foo", Input: map[string]any{"a": "1"}}},
|
||||
expected: []ParsedToolCall{{Name: "foo", Input: map[string]any{"a": float64(1)}}},
|
||||
},
|
||||
{
|
||||
name: "XML tags parameters (Regression)",
|
||||
|
||||
@@ -6,96 +6,17 @@ func normalizeDSMLToolCallMarkup(text string) (string, bool) {
|
||||
if text == "" {
|
||||
return "", true
|
||||
}
|
||||
hasAliasLikeMarkup, _ := toolMarkupStylesOutsideIgnored(text)
|
||||
hasAliasLikeMarkup, _ := ContainsToolMarkupSyntaxOutsideIgnored(text)
|
||||
if !hasAliasLikeMarkup {
|
||||
return text, true
|
||||
}
|
||||
// Always normalize DSML aliases to canonical form, even when canonical
|
||||
// tags coexist. Models frequently mix DSML wrapper tags with canonical
|
||||
// inner tags (e.g., <|tool_calls><invoke name="...">).
|
||||
return replaceDSMLToolMarkupOutsideIgnored(text), true
|
||||
return rewriteDSMLToolMarkupOutsideIgnored(text), true
|
||||
}
|
||||
|
||||
var dsmlToolMarkupAliases = []struct {
|
||||
from string
|
||||
to string
|
||||
}{
|
||||
{"<|dsml|tool_calls", "<tool_calls"},
|
||||
{"</|dsml|tool_calls>", "</tool_calls>"},
|
||||
{"<|dsml|invoke", "<invoke"},
|
||||
{"</|dsml|invoke>", "</invoke>"},
|
||||
{"<|dsml|parameter", "<parameter"},
|
||||
{"</|dsml|parameter>", "</parameter>"},
|
||||
{"<|dsml tool_calls", "<tool_calls"},
|
||||
{"</|dsml tool_calls>", "</tool_calls>"},
|
||||
{"<|dsml invoke", "<invoke"},
|
||||
{"</|dsml invoke>", "</invoke>"},
|
||||
{"<|dsml parameter", "<parameter"},
|
||||
{"</|dsml parameter>", "</parameter>"},
|
||||
{"<dsml tool_calls", "<tool_calls"},
|
||||
{"</dsml tool_calls>", "</tool_calls>"},
|
||||
{"<dsml invoke", "<invoke"},
|
||||
{"</dsml invoke>", "</invoke>"},
|
||||
{"<dsml parameter", "<parameter"},
|
||||
{"</dsml parameter>", "</parameter>"},
|
||||
{"<dsml|tool_calls", "<tool_calls"},
|
||||
{"</dsml|tool_calls>", "</tool_calls>"},
|
||||
{"<dsml|invoke", "<invoke"},
|
||||
{"</dsml|invoke>", "</invoke>"},
|
||||
{"<dsml|parameter", "<parameter"},
|
||||
{"</dsml|parameter>", "</parameter>"},
|
||||
{"<|tool_calls", "<tool_calls"},
|
||||
{"</|tool_calls>", "</tool_calls>"},
|
||||
{"<|invoke", "<invoke"},
|
||||
{"</|invoke>", "</invoke>"},
|
||||
{"<|parameter", "<parameter"},
|
||||
{"</|parameter>", "</parameter>"},
|
||||
{"<|tool_calls", "<tool_calls"},
|
||||
{"</|tool_calls>", "</tool_calls>"},
|
||||
{"<|invoke", "<invoke"},
|
||||
{"</|invoke>", "</invoke>"},
|
||||
{"<|parameter", "<parameter"},
|
||||
{"</|parameter>", "</parameter>"},
|
||||
}
|
||||
|
||||
var canonicalToolMarkupPrefixes = []string{
|
||||
"<tool_calls",
|
||||
"</tool_calls>",
|
||||
"<invoke",
|
||||
"</invoke>",
|
||||
"<parameter",
|
||||
"</parameter>",
|
||||
}
|
||||
|
||||
func toolMarkupStylesOutsideIgnored(text string) (hasDSML, hasCanonical bool) {
|
||||
lower := strings.ToLower(text)
|
||||
for i := 0; i < len(text); {
|
||||
next, advanced, blocked := skipXMLIgnoredSection(lower, i)
|
||||
if blocked {
|
||||
return hasDSML, hasCanonical
|
||||
}
|
||||
if advanced {
|
||||
i = next
|
||||
continue
|
||||
}
|
||||
if hasPrefixAt(lower, i, canonicalToolMarkupPrefixes) {
|
||||
hasCanonical = true
|
||||
}
|
||||
for _, alias := range dsmlToolMarkupAliases {
|
||||
if strings.HasPrefix(lower[i:], alias.from) {
|
||||
hasDSML = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if hasDSML && hasCanonical {
|
||||
return true, true
|
||||
}
|
||||
i++
|
||||
func rewriteDSMLToolMarkupOutsideIgnored(text string) string {
|
||||
if text == "" {
|
||||
return ""
|
||||
}
|
||||
return hasDSML, hasCanonical
|
||||
}
|
||||
|
||||
func replaceDSMLToolMarkupOutsideIgnored(text string) string {
|
||||
lower := strings.ToLower(text)
|
||||
var b strings.Builder
|
||||
b.Grow(len(text))
|
||||
@@ -110,29 +31,24 @@ func replaceDSMLToolMarkupOutsideIgnored(text string) string {
|
||||
i = next
|
||||
continue
|
||||
}
|
||||
replaced := false
|
||||
for _, alias := range dsmlToolMarkupAliases {
|
||||
if strings.HasPrefix(lower[i:], alias.from) {
|
||||
b.WriteString(alias.to)
|
||||
i += len(alias.from)
|
||||
replaced = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if replaced {
|
||||
tag, ok := scanToolMarkupTagAt(text, i)
|
||||
if !ok {
|
||||
b.WriteByte(text[i])
|
||||
i++
|
||||
continue
|
||||
}
|
||||
b.WriteByte(text[i])
|
||||
i++
|
||||
if tag.DSMLLike {
|
||||
b.WriteByte('<')
|
||||
if tag.Closing {
|
||||
b.WriteByte('/')
|
||||
}
|
||||
b.WriteString(tag.Name)
|
||||
b.WriteString(text[tag.NameEnd : tag.End+1])
|
||||
i = tag.End + 1
|
||||
continue
|
||||
}
|
||||
b.WriteString(text[tag.Start : tag.End+1])
|
||||
i = tag.End + 1
|
||||
}
|
||||
return b.String()
|
||||
}
|
||||
|
||||
func hasPrefixAt(text string, idx int, prefixes []string) bool {
|
||||
for _, prefix := range prefixes {
|
||||
if strings.HasPrefix(text[idx:], prefix) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
@@ -111,5 +111,72 @@ func extractStandaloneCDATA(inner string) (string, bool) {
|
||||
if cdataMatches := cdataPattern.FindStringSubmatch(trimmed); len(cdataMatches) >= 2 {
|
||||
return cdataMatches[1], true
|
||||
}
|
||||
if strings.HasPrefix(strings.ToLower(trimmed), "<![cdata[") {
|
||||
return trimmed[len("<![CDATA["):], true
|
||||
}
|
||||
return "", false
|
||||
}
|
||||
|
||||
func parseJSONLiteralValue(raw string) (any, bool) {
|
||||
trimmed := strings.TrimSpace(raw)
|
||||
if trimmed == "" {
|
||||
return nil, false
|
||||
}
|
||||
|
||||
switch trimmed[0] {
|
||||
case '{', '[', '"', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 't', 'f', 'n':
|
||||
default:
|
||||
return nil, false
|
||||
}
|
||||
|
||||
var parsed any
|
||||
if err := json.Unmarshal([]byte(trimmed), &parsed); err != nil {
|
||||
return nil, false
|
||||
}
|
||||
return parsed, true
|
||||
}
|
||||
|
||||
// SanitizeLooseCDATA repairs malformed trailing CDATA openings just enough for
|
||||
// final parsing and flush-time recovery. Properly closed CDATA blocks are left
|
||||
// untouched; an unclosed opener is stripped so the remaining text can still be
|
||||
// parsed as part of the surrounding tool markup.
|
||||
func SanitizeLooseCDATA(text string) string {
|
||||
if text == "" {
|
||||
return ""
|
||||
}
|
||||
|
||||
lower := strings.ToLower(text)
|
||||
const openMarker = "<![cdata["
|
||||
const closeMarker = "]]>"
|
||||
|
||||
var b strings.Builder
|
||||
b.Grow(len(text))
|
||||
changed := false
|
||||
pos := 0
|
||||
for pos < len(text) {
|
||||
startRel := strings.Index(lower[pos:], openMarker)
|
||||
if startRel < 0 {
|
||||
b.WriteString(text[pos:])
|
||||
break
|
||||
}
|
||||
start := pos + startRel
|
||||
contentStart := start + len(openMarker)
|
||||
b.WriteString(text[pos:start])
|
||||
|
||||
if endRel := strings.Index(lower[contentStart:], closeMarker); endRel >= 0 {
|
||||
end := contentStart + endRel + len(closeMarker)
|
||||
b.WriteString(text[start:end])
|
||||
pos = end
|
||||
continue
|
||||
}
|
||||
|
||||
changed = true
|
||||
b.WriteString(text[contentStart:])
|
||||
pos = len(text)
|
||||
}
|
||||
|
||||
if !changed {
|
||||
return text
|
||||
}
|
||||
return b.String()
|
||||
}
|
||||
|
||||
@@ -65,6 +65,12 @@ func parseToolCallsDetailedXMLOnly(text string) ToolCallParseResult {
|
||||
return result
|
||||
}
|
||||
parsed := parseXMLToolCalls(normalized)
|
||||
if len(parsed) == 0 && strings.Contains(strings.ToLower(normalized), "<![cdata[") {
|
||||
recovered := SanitizeLooseCDATA(normalized)
|
||||
if recovered != normalized {
|
||||
parsed = parseXMLToolCalls(recovered)
|
||||
}
|
||||
}
|
||||
if len(parsed) == 0 {
|
||||
return result
|
||||
}
|
||||
@@ -92,14 +98,8 @@ func filterToolCallsDetailed(parsed []ParsedToolCall) ([]ParsedToolCall, []strin
|
||||
}
|
||||
|
||||
func looksLikeToolCallSyntax(text string) bool {
|
||||
lower := strings.ToLower(text)
|
||||
return strings.Contains(lower, "<|dsml|tool_calls") ||
|
||||
strings.Contains(lower, "<|dsml tool_calls") ||
|
||||
strings.Contains(lower, "<dsml|tool_calls") ||
|
||||
strings.Contains(lower, "<dsml tool_calls") ||
|
||||
strings.Contains(lower, "<|tool_calls") ||
|
||||
strings.Contains(lower, "<|tool_calls") ||
|
||||
strings.Contains(lower, "<tool_calls")
|
||||
hasDSML, hasCanonical := ContainsToolCallWrapperSyntaxOutsideIgnored(text)
|
||||
return hasDSML || hasCanonical
|
||||
}
|
||||
|
||||
func stripFencedCodeBlocks(text string) string {
|
||||
|
||||
@@ -295,15 +295,24 @@ func parseInvokeParameterValue(raw string) any {
|
||||
return ""
|
||||
}
|
||||
if value, ok := extractStandaloneCDATA(trimmed); ok {
|
||||
if parsed, ok := parseJSONLiteralValue(value); ok {
|
||||
return parsed
|
||||
}
|
||||
return value
|
||||
}
|
||||
if parsed := parseStructuredToolCallInput(trimmed); len(parsed) > 0 {
|
||||
if len(parsed) == 1 {
|
||||
if rawValue, ok := parsed["_raw"].(string); ok {
|
||||
return rawValue
|
||||
decoded := html.UnescapeString(extractRawTagValue(trimmed))
|
||||
if strings.Contains(decoded, "<") && strings.Contains(decoded, ">") {
|
||||
if parsed := parseStructuredToolCallInput(decoded); len(parsed) > 0 {
|
||||
if len(parsed) == 1 {
|
||||
if rawValue, ok := parsed["_raw"].(string); ok {
|
||||
return rawValue
|
||||
}
|
||||
}
|
||||
return parsed
|
||||
}
|
||||
}
|
||||
if parsed, ok := parseJSONLiteralValue(decoded); ok {
|
||||
return parsed
|
||||
}
|
||||
return html.UnescapeString(extractRawTagValue(trimmed))
|
||||
return decoded
|
||||
}
|
||||
|
||||
219
internal/toolcall/toolcalls_scan.go
Normal file
219
internal/toolcall/toolcalls_scan.go
Normal file
@@ -0,0 +1,219 @@
|
||||
package toolcall
|
||||
|
||||
import "strings"
|
||||
|
||||
var toolMarkupNames = []string{"tool_calls", "invoke", "parameter"}
|
||||
|
||||
type ToolMarkupTag struct {
|
||||
Start int
|
||||
End int
|
||||
NameStart int
|
||||
NameEnd int
|
||||
Name string
|
||||
Closing bool
|
||||
SelfClosing bool
|
||||
DSMLLike bool
|
||||
Canonical bool
|
||||
}
|
||||
|
||||
func ContainsToolMarkupSyntaxOutsideIgnored(text string) (hasDSML, hasCanonical bool) {
|
||||
lower := strings.ToLower(text)
|
||||
for i := 0; i < len(text); {
|
||||
next, advanced, blocked := skipXMLIgnoredSection(lower, i)
|
||||
if blocked {
|
||||
return hasDSML, hasCanonical
|
||||
}
|
||||
if advanced {
|
||||
i = next
|
||||
continue
|
||||
}
|
||||
if tag, ok := scanToolMarkupTagAt(text, i); ok {
|
||||
if tag.DSMLLike {
|
||||
hasDSML = true
|
||||
} else {
|
||||
hasCanonical = true
|
||||
}
|
||||
if hasDSML && hasCanonical {
|
||||
return true, true
|
||||
}
|
||||
i = tag.End + 1
|
||||
continue
|
||||
}
|
||||
i++
|
||||
}
|
||||
return hasDSML, hasCanonical
|
||||
}
|
||||
|
||||
func ContainsToolCallWrapperSyntaxOutsideIgnored(text string) (hasDSML, hasCanonical bool) {
|
||||
lower := strings.ToLower(text)
|
||||
for i := 0; i < len(text); {
|
||||
next, advanced, blocked := skipXMLIgnoredSection(lower, i)
|
||||
if blocked {
|
||||
return hasDSML, hasCanonical
|
||||
}
|
||||
if advanced {
|
||||
i = next
|
||||
continue
|
||||
}
|
||||
if tag, ok := scanToolMarkupTagAt(text, i); ok {
|
||||
if tag.Name != "tool_calls" {
|
||||
i = tag.End + 1
|
||||
continue
|
||||
}
|
||||
if tag.DSMLLike {
|
||||
hasDSML = true
|
||||
} else {
|
||||
hasCanonical = true
|
||||
}
|
||||
if hasDSML && hasCanonical {
|
||||
return true, true
|
||||
}
|
||||
i = tag.End + 1
|
||||
continue
|
||||
}
|
||||
i++
|
||||
}
|
||||
return hasDSML, hasCanonical
|
||||
}
|
||||
|
||||
func FindToolMarkupTagOutsideIgnored(text string, start int) (ToolMarkupTag, bool) {
|
||||
lower := strings.ToLower(text)
|
||||
for i := maxInt(start, 0); i < len(text); {
|
||||
next, advanced, blocked := skipXMLIgnoredSection(lower, i)
|
||||
if blocked {
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
if advanced {
|
||||
i = next
|
||||
continue
|
||||
}
|
||||
if tag, ok := scanToolMarkupTagAt(text, i); ok {
|
||||
return tag, true
|
||||
}
|
||||
i++
|
||||
}
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
|
||||
func FindMatchingToolMarkupClose(text string, open ToolMarkupTag) (ToolMarkupTag, bool) {
|
||||
if text == "" || open.Name == "" || open.Closing {
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
depth := 1
|
||||
for pos := open.End + 1; pos < len(text); {
|
||||
tag, ok := FindToolMarkupTagOutsideIgnored(text, pos)
|
||||
if !ok {
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
if tag.Name != open.Name {
|
||||
pos = tag.End + 1
|
||||
continue
|
||||
}
|
||||
if tag.Closing {
|
||||
depth--
|
||||
if depth == 0 {
|
||||
return tag, true
|
||||
}
|
||||
} else if !tag.SelfClosing {
|
||||
depth++
|
||||
}
|
||||
pos = tag.End + 1
|
||||
}
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
|
||||
func scanToolMarkupTagAt(text string, start int) (ToolMarkupTag, bool) {
|
||||
if start < 0 || start >= len(text) || text[start] != '<' {
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
lower := strings.ToLower(text)
|
||||
i := start + 1
|
||||
closing := false
|
||||
if i < len(text) && text[i] == '/' {
|
||||
closing = true
|
||||
i++
|
||||
}
|
||||
dsmlLike := false
|
||||
if next, ok := consumeToolMarkupPipe(text, i); ok {
|
||||
dsmlLike = true
|
||||
i = next
|
||||
}
|
||||
if strings.HasPrefix(lower[i:], "dsml") {
|
||||
dsmlLike = true
|
||||
i += len("dsml")
|
||||
for next, ok := consumeToolMarkupSeparator(text, i); ok; next, ok = consumeToolMarkupSeparator(text, i) {
|
||||
i = next
|
||||
}
|
||||
}
|
||||
name, nameLen := matchToolMarkupName(lower, i)
|
||||
if nameLen == 0 {
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
nameEnd := i + nameLen
|
||||
if !hasToolMarkupBoundary(text, nameEnd) {
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
end := findXMLTagEnd(text, nameEnd)
|
||||
if end < 0 {
|
||||
return ToolMarkupTag{}, false
|
||||
}
|
||||
trimmed := strings.TrimSpace(text[start : end+1])
|
||||
return ToolMarkupTag{
|
||||
Start: start,
|
||||
End: end,
|
||||
NameStart: i,
|
||||
NameEnd: nameEnd,
|
||||
Name: name,
|
||||
Closing: closing,
|
||||
SelfClosing: strings.HasSuffix(trimmed, "/>"),
|
||||
DSMLLike: dsmlLike,
|
||||
Canonical: !dsmlLike,
|
||||
}, true
|
||||
}
|
||||
|
||||
func matchToolMarkupName(lower string, start int) (string, int) {
|
||||
for _, name := range toolMarkupNames {
|
||||
if strings.HasPrefix(lower[start:], name) {
|
||||
return name, len(name)
|
||||
}
|
||||
}
|
||||
return "", 0
|
||||
}
|
||||
|
||||
func consumeToolMarkupPipe(text string, idx int) (int, bool) {
|
||||
if idx >= len(text) {
|
||||
return idx, false
|
||||
}
|
||||
if text[idx] == '|' {
|
||||
return idx + 1, true
|
||||
}
|
||||
if strings.HasPrefix(text[idx:], "|") {
|
||||
return idx + len("|"), true
|
||||
}
|
||||
return idx, false
|
||||
}
|
||||
|
||||
func consumeToolMarkupSeparator(text string, idx int) (int, bool) {
|
||||
if idx >= len(text) {
|
||||
return idx, false
|
||||
}
|
||||
if text[idx] == ' ' || text[idx] == '\t' || text[idx] == '\r' || text[idx] == '\n' {
|
||||
return idx + 1, true
|
||||
}
|
||||
if next, ok := consumeToolMarkupPipe(text, idx); ok {
|
||||
return next, true
|
||||
}
|
||||
return idx, false
|
||||
}
|
||||
|
||||
func hasToolMarkupBoundary(text string, idx int) bool {
|
||||
if idx >= len(text) {
|
||||
return true
|
||||
}
|
||||
switch text[idx] {
|
||||
case ' ', '\t', '\n', '\r', '>', '/':
|
||||
return true
|
||||
default:
|
||||
return false
|
||||
}
|
||||
}
|
||||
@@ -53,6 +53,18 @@ func TestParseToolCallsSupportsDSMLShellWithCanonicalExampleInCDATA(t *testing.T
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseToolCallsTreatsUnclosedCDATAAsText(t *testing.T) {
|
||||
text := `<tool_calls><invoke name="Write"><parameter name="content"><![CDATA[hello world</parameter></invoke></tool_calls>`
|
||||
res := ParseToolCallsDetailed(text, []string{"Write"})
|
||||
if len(res.Calls) != 1 {
|
||||
t.Fatalf("expected unclosed CDATA to still parse via outer wrapper, got %#v", res.Calls)
|
||||
}
|
||||
got, _ := res.Calls[0].Input["content"].(string)
|
||||
if got != "hello world" {
|
||||
t.Fatalf("expected recovered CDATA payload, got %q", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseToolCallsNormalizesMixedDSMLAndCanonicalToolTags(t *testing.T) {
|
||||
// Models commonly mix DSML wrapper tags with canonical inner tags.
|
||||
// These should be normalized and parsed, not rejected.
|
||||
@@ -130,6 +142,23 @@ func TestParseToolCallsSupportsInvokeParameters(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseToolCallsSupportsJSONScalarParameters(t *testing.T) {
|
||||
text := `<tool_calls><invoke name="configure"><parameter name="count">123</parameter><parameter name="max_tokens"><![CDATA[256]]></parameter><parameter name="enabled">true</parameter></invoke></tool_calls>`
|
||||
calls := ParseToolCalls(text, []string{"configure"})
|
||||
if len(calls) != 1 {
|
||||
t.Fatalf("expected 1 call, got %#v", calls)
|
||||
}
|
||||
if got, ok := calls[0].Input["count"].(float64); !ok || got != 123 {
|
||||
t.Fatalf("expected numeric count, got %#v", calls[0].Input["count"])
|
||||
}
|
||||
if got, ok := calls[0].Input["max_tokens"].(float64); !ok || got != 256 {
|
||||
t.Fatalf("expected numeric max_tokens, got %#v", calls[0].Input["max_tokens"])
|
||||
}
|
||||
if got, ok := calls[0].Input["enabled"].(bool); !ok || !got {
|
||||
t.Fatalf("expected boolean enabled, got %#v", calls[0].Input["enabled"])
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseToolCallsPreservesRawMalformedParams(t *testing.T) {
|
||||
text := `<tool_calls><invoke name="execute_command"><parameter name="command">cd /root && git status</parameter></invoke></tool_calls>`
|
||||
calls := ParseToolCalls(text, []string{"execute_command"})
|
||||
@@ -478,6 +507,49 @@ func TestParseToolCallsDoesNotAcceptDSMLSpaceLookalikeTagName(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseToolCallsToleratesDSMLCollapsedTagNames(t *testing.T) {
|
||||
todos := `[x] 检查 toolcalls_format.go 格式化逻辑
|
||||
[x] 检查 toolcalls_parse.go 解析逻辑
|
||||
[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go
|
||||
[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go
|
||||
[x] 检查 prompt/tool_calls.go 注入逻辑
|
||||
[x] 检查 toolstream 流式解析
|
||||
[x] 查看测试文件确认预期行为
|
||||
[x] 给出调查结论`
|
||||
text := strings.Join([]string{
|
||||
"[]",
|
||||
"<DSMLtool_calls>",
|
||||
"<DSMLinvoke name=\"update_todo_list\">",
|
||||
"<DSMLparameter name=\"todos\"><![CDATA[" + todos + "]]></DSMLparameter>",
|
||||
"</DSMLinvoke>",
|
||||
"</DSMLtool_calls>",
|
||||
}, "\n")
|
||||
calls := ParseToolCalls(text, []string{"update_todo_list"})
|
||||
if len(calls) != 1 {
|
||||
t.Fatalf("expected one call from collapsed DSML tags, got %#v", calls)
|
||||
}
|
||||
if calls[0].Name != "update_todo_list" {
|
||||
t.Fatalf("expected update_todo_list call, got %#v", calls[0])
|
||||
}
|
||||
if got, _ := calls[0].Input["todos"].(string); got != todos {
|
||||
t.Fatalf("expected todos to round-trip, got %q", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseToolCallsDoesNotAcceptDSMLCollapsedLookalikeTagName(t *testing.T) {
|
||||
text := strings.Join([]string{
|
||||
"<DSMLtool_calls_extra>",
|
||||
"<DSMLinvoke name=\"update_todo_list\">",
|
||||
"<DSMLparameter name=\"todos\">x</DSMLparameter>",
|
||||
"</DSMLinvoke>",
|
||||
"</DSMLtool_calls_extra>",
|
||||
}, "\n")
|
||||
calls := ParseToolCalls(text, []string{"update_todo_list"})
|
||||
if len(calls) != 0 {
|
||||
t.Fatalf("expected no calls from collapsed lookalike tag, got %#v", calls)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseToolCallsSkipsProseMentionOfSameWrapperVariant(t *testing.T) {
|
||||
text := strings.Join([]string{
|
||||
"Summary: support canonical <tool_calls> and DSML <|DSML|tool_calls> wrappers.",
|
||||
|
||||
@@ -615,3 +615,68 @@ func TestSieve_DSMLSpaceLookalikeTagNameStaysText(t *testing.T) {
|
||||
t.Fatalf("相似标签名应作为正文透传, got %q", text.String())
|
||||
}
|
||||
}
|
||||
|
||||
func TestSieve_DSMLCollapsedTagNamesWithPrefixText(t *testing.T) {
|
||||
var state State
|
||||
todos := `[x] 检查 toolcalls_format.go 格式化逻辑
|
||||
[x] 检查 toolcalls_parse.go 解析逻辑
|
||||
[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go
|
||||
[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go
|
||||
[x] 检查 prompt/tool_calls.go 注入逻辑
|
||||
[x] 检查 toolstream 流式解析
|
||||
[x] 查看测试文件确认预期行为
|
||||
[x] 给出调查结论`
|
||||
chunks := []string{
|
||||
"[]\n",
|
||||
"<DSMLtool_calls>\n",
|
||||
"<DSMLinvoke name=\"update_todo_list\">\n",
|
||||
"<DSMLparameter name=\"todos\"><![CDATA[" + todos + "]]></DSMLparameter>\n",
|
||||
"</DSMLinvoke>\n",
|
||||
"</DSMLtool_calls>",
|
||||
}
|
||||
var events []Event
|
||||
for _, c := range chunks {
|
||||
events = append(events, ProcessChunk(&state, c, []string{"update_todo_list"})...)
|
||||
}
|
||||
events = append(events, Flush(&state, []string{"update_todo_list"})...)
|
||||
|
||||
var text strings.Builder
|
||||
var gotTodos string
|
||||
callCount := 0
|
||||
for _, e := range events {
|
||||
text.WriteString(e.Content)
|
||||
for _, call := range e.ToolCalls {
|
||||
callCount++
|
||||
gotTodos, _ = call.Input["todos"].(string)
|
||||
}
|
||||
}
|
||||
if callCount != 1 {
|
||||
t.Fatalf("应解析出 1 个工具调用,got %d, text=%q", callCount, text.String())
|
||||
}
|
||||
if gotTodos != todos {
|
||||
t.Fatalf("todos 应完整保留,got %q", gotTodos)
|
||||
}
|
||||
if text.String() != "[]\n" {
|
||||
t.Fatalf("前置正文应完整保留且不泄漏工具块, got %q", text.String())
|
||||
}
|
||||
}
|
||||
|
||||
func TestSieve_DSMLCollapsedLookalikeTagNameStaysText(t *testing.T) {
|
||||
var state State
|
||||
input := "<DSMLtool_calls_extra><DSMLinvoke name=\"update_todo_list\"><DSMLparameter name=\"todos\">x</DSMLparameter></DSMLinvoke></DSMLtool_calls_extra>"
|
||||
events := ProcessChunk(&state, input, []string{"update_todo_list"})
|
||||
events = append(events, Flush(&state, []string{"update_todo_list"})...)
|
||||
|
||||
var text strings.Builder
|
||||
callCount := 0
|
||||
for _, e := range events {
|
||||
text.WriteString(e.Content)
|
||||
callCount += len(e.ToolCalls)
|
||||
}
|
||||
if callCount != 0 {
|
||||
t.Fatalf("相似 collapsed 标签名不应触发工具调用,got %d", callCount)
|
||||
}
|
||||
if text.String() != input {
|
||||
t.Fatalf("相似 collapsed 标签名应作为正文透传, got %q", text.String())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -114,10 +114,30 @@ func Flush(state *State, toolNames []string) []Event {
|
||||
} else {
|
||||
content := state.capture.String()
|
||||
if content != "" {
|
||||
// If capture never resolved into a real tool call, release the
|
||||
// buffered text instead of swallowing it.
|
||||
state.noteText(content)
|
||||
events = append(events, Event{Content: content})
|
||||
recovered := toolcall.SanitizeLooseCDATA(content)
|
||||
if recovered != content {
|
||||
if prefix, calls, suffix, recoveredReady := consumeXMLToolCapture(recovered, toolNames); recoveredReady && len(calls) > 0 {
|
||||
if prefix != "" {
|
||||
state.noteText(prefix)
|
||||
events = append(events, Event{Content: prefix})
|
||||
}
|
||||
events = append(events, Event{ToolCalls: calls})
|
||||
if suffix != "" {
|
||||
state.noteText(suffix)
|
||||
events = append(events, Event{Content: suffix})
|
||||
}
|
||||
} else {
|
||||
// If capture never resolved into a real tool call, release
|
||||
// the buffered text instead of swallowing it.
|
||||
state.noteText(content)
|
||||
events = append(events, Event{Content: content})
|
||||
}
|
||||
} else {
|
||||
// If capture never resolved into a real tool call, release the
|
||||
// buffered text instead of swallowing it.
|
||||
state.noteText(content)
|
||||
events = append(events, Event{Content: content})
|
||||
}
|
||||
}
|
||||
}
|
||||
state.capture.Reset()
|
||||
|
||||
@@ -7,7 +7,6 @@ import (
|
||||
|
||||
// consumeXMLToolCapture tries to extract complete XML tool call blocks from captured text.
|
||||
func consumeXMLToolCapture(captured string, toolNames []string) (prefix string, calls []toolcall.ParsedToolCall, suffix string, ready bool) {
|
||||
lower := strings.ToLower(captured)
|
||||
anyOpenFound := false
|
||||
type candidate struct {
|
||||
start int
|
||||
@@ -23,41 +22,40 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
|
||||
var best *candidate
|
||||
var rejected *rejectedBlock
|
||||
|
||||
// Scan every wrapper occurrence. Prose can mention a wrapper tag before the
|
||||
// actual tool block, including the same variant as the real block.
|
||||
for _, pair := range xmlToolCallTagPairs {
|
||||
searchFrom := 0
|
||||
for searchFrom < len(lower) {
|
||||
openIdx := findXMLOpenOutsideCDATA(captured, pair.open, searchFrom)
|
||||
if openIdx < 0 {
|
||||
break
|
||||
}
|
||||
// Find the matching closing tag outside CDATA. Long write-file tool
|
||||
// calls often contain XML examples in CDATA, including </tool_calls>.
|
||||
closeIdx := findMatchingXMLToolWrapperClose(captured, pair.open, pair.close, openIdx)
|
||||
if closeIdx < 0 {
|
||||
anyOpenFound = true
|
||||
searchFrom = openIdx + len(pair.open)
|
||||
continue
|
||||
}
|
||||
closeEnd := closeIdx + len(pair.close)
|
||||
|
||||
xmlBlock := captured[openIdx:closeEnd]
|
||||
prefixPart := captured[:openIdx]
|
||||
suffixPart := captured[closeEnd:]
|
||||
parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
|
||||
if len(parsed) > 0 {
|
||||
prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
|
||||
if best == nil || openIdx < best.start {
|
||||
best = &candidate{start: openIdx, prefix: prefixPart, calls: parsed, suffix: suffixPart}
|
||||
}
|
||||
break
|
||||
}
|
||||
if rejected == nil || openIdx < rejected.start {
|
||||
rejected = &rejectedBlock{start: openIdx, prefix: prefixPart + xmlBlock, suffix: suffixPart}
|
||||
}
|
||||
searchFrom = openIdx + len(pair.open)
|
||||
// Scan every recognized tool tag occurrence. Prose can mention a wrapper
|
||||
// tag before the actual tool block, including the same variant as the real
|
||||
// block. We only accept complete tool_calls wrappers that parse cleanly.
|
||||
for searchFrom := 0; searchFrom < len(captured); {
|
||||
tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(captured, searchFrom)
|
||||
if !ok {
|
||||
break
|
||||
}
|
||||
if tag.Closing || tag.Name != "tool_calls" {
|
||||
searchFrom = tag.End + 1
|
||||
continue
|
||||
}
|
||||
closeTag, ok := toolcall.FindMatchingToolMarkupClose(captured, tag)
|
||||
if !ok {
|
||||
anyOpenFound = true
|
||||
searchFrom = tag.End + 1
|
||||
continue
|
||||
}
|
||||
|
||||
xmlBlock := captured[tag.Start : closeTag.End+1]
|
||||
prefixPart := captured[:tag.Start]
|
||||
suffixPart := captured[closeTag.End+1:]
|
||||
parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
|
||||
if len(parsed) > 0 {
|
||||
prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
|
||||
if best == nil || tag.Start < best.start {
|
||||
best = &candidate{start: tag.Start, prefix: prefixPart, calls: parsed, suffix: suffixPart}
|
||||
}
|
||||
break
|
||||
}
|
||||
if rejected == nil || tag.Start < rejected.start {
|
||||
rejected = &rejectedBlock{start: tag.Start, prefix: prefixPart + xmlBlock, suffix: suffixPart}
|
||||
}
|
||||
searchFrom = tag.End + 1
|
||||
}
|
||||
if best != nil {
|
||||
return best.prefix, best.calls, best.suffix, true
|
||||
@@ -71,26 +69,19 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
|
||||
// If this block failed to become a tool call, pass it through as text.
|
||||
return rejected.prefix, nil, rejected.suffix, true
|
||||
}
|
||||
if !containsAnyToolCallWrapper(lower) {
|
||||
invokeIdx, dsml := firstInvokeIndex(lower)
|
||||
closeTag := "</tool_calls>"
|
||||
openWrapper := "<tool_calls>"
|
||||
if dsml {
|
||||
closeTag = "</|dsml|tool_calls>"
|
||||
openWrapper = "<|DSML|tool_calls>"
|
||||
}
|
||||
closeIdx := findXMLCloseOutsideCDATA(captured, closeTag, invokeIdx)
|
||||
if invokeIdx >= 0 && closeIdx > invokeIdx {
|
||||
closeEnd := closeIdx + len(closeTag)
|
||||
xmlBlock := openWrapper + captured[invokeIdx:closeIdx] + closeTag
|
||||
prefixPart := captured[:invokeIdx]
|
||||
suffixPart := captured[closeEnd:]
|
||||
parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
|
||||
if len(parsed) > 0 {
|
||||
prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
|
||||
return prefixPart, parsed, suffixPart, true
|
||||
if invokeTag, ok := findFirstToolMarkupTagByName(captured, 0, "invoke"); ok {
|
||||
if wrapperOpen, ok := findFirstToolMarkupTagByName(captured, 0, "tool_calls"); !ok || wrapperOpen.Start > invokeTag.Start {
|
||||
if closeTag, ok := findFirstToolMarkupTagByNameFrom(captured, invokeTag.Start+1, "tool_calls", true); ok && closeTag.Start > invokeTag.Start {
|
||||
xmlBlock := "<tool_calls>" + captured[invokeTag.Start:closeTag.End+1]
|
||||
prefixPart := captured[:invokeTag.Start]
|
||||
suffixPart := captured[closeTag.End+1:]
|
||||
parsed := toolcall.ParseToolCalls(xmlBlock, toolNames)
|
||||
if len(parsed) > 0 {
|
||||
prefixPart, suffixPart = trimWrappingJSONFence(prefixPart, suffixPart)
|
||||
return prefixPart, parsed, suffixPart, true
|
||||
}
|
||||
return prefixPart + captured[invokeTag.Start:closeTag.End+1], nil, suffixPart, true
|
||||
}
|
||||
return prefixPart + captured[invokeIdx:closeEnd], nil, suffixPart, true
|
||||
}
|
||||
}
|
||||
return "", nil, "", false
|
||||
@@ -99,46 +90,35 @@ func consumeXMLToolCapture(captured string, toolNames []string) (prefix string,
|
||||
// hasOpenXMLToolTag returns true if captured text contains an XML tool opening tag
|
||||
// whose SPECIFIC closing tag has not appeared yet.
|
||||
func hasOpenXMLToolTag(captured string) bool {
|
||||
for _, pair := range xmlToolCallTagPairs {
|
||||
openIdx := findXMLOpenOutsideCDATA(captured, pair.open, 0)
|
||||
if openIdx >= 0 {
|
||||
if findMatchingXMLToolWrapperClose(captured, pair.open, pair.close, openIdx) < 0 {
|
||||
return true
|
||||
}
|
||||
for searchFrom := 0; searchFrom < len(captured); {
|
||||
tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(captured, searchFrom)
|
||||
if !ok {
|
||||
return false
|
||||
}
|
||||
if tag.Closing || tag.Name != "tool_calls" {
|
||||
searchFrom = tag.End + 1
|
||||
continue
|
||||
}
|
||||
if _, ok := toolcall.FindMatchingToolMarkupClose(captured, tag); !ok {
|
||||
return true
|
||||
}
|
||||
searchFrom = tag.End + 1
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func shouldKeepBareInvokeCapture(captured string) bool {
|
||||
lower := strings.ToLower(captured)
|
||||
invokeIdx, dsml := firstInvokeIndex(lower)
|
||||
if invokeIdx < 0 || containsAnyToolCallWrapper(lower) {
|
||||
invokeTag, ok := findFirstToolMarkupTagByName(captured, 0, "invoke")
|
||||
if !ok {
|
||||
return false
|
||||
}
|
||||
invokeOpenLen := len("<invoke")
|
||||
parameterOpen := "<parameter"
|
||||
if dsml {
|
||||
invokeOpenLen = len("<|dsml|invoke")
|
||||
parameterOpen = "<|dsml|parameter"
|
||||
if wrapperOpen, ok := findFirstToolMarkupTagByName(captured, 0, "tool_calls"); ok && wrapperOpen.Start <= invokeTag.Start {
|
||||
return false
|
||||
}
|
||||
if dsml && strings.HasPrefix(lower[invokeIdx:], "<|dsml invoke") {
|
||||
invokeOpenLen = len("<|dsml invoke")
|
||||
parameterOpen = "<|dsml parameter"
|
||||
}
|
||||
if dsml && strings.HasPrefix(lower[invokeIdx:], "<dsml|invoke") {
|
||||
invokeOpenLen = len("<dsml|invoke")
|
||||
parameterOpen = "<dsml|parameter"
|
||||
}
|
||||
if dsml && strings.HasPrefix(lower[invokeIdx:], "<dsml invoke") {
|
||||
invokeOpenLen = len("<dsml invoke")
|
||||
parameterOpen = "<dsml parameter"
|
||||
}
|
||||
if findAnyXMLCloseOutsideCDATA(captured, possibleWrapperCloseTags(dsml), invokeIdx) > invokeIdx {
|
||||
if closeTag, ok := findFirstToolMarkupTagByNameFrom(captured, invokeTag.Start+1, "tool_calls", true); ok && closeTag.Start > invokeTag.Start {
|
||||
return true
|
||||
}
|
||||
|
||||
startEnd := findXMLTagEnd(captured, invokeIdx+invokeOpenLen)
|
||||
startEnd := invokeTag.End
|
||||
if startEnd < 0 {
|
||||
return true
|
||||
}
|
||||
@@ -148,84 +128,16 @@ func shouldKeepBareInvokeCapture(captured string) bool {
|
||||
return true
|
||||
}
|
||||
|
||||
invokeCloseIdx := findAnyXMLCloseOutsideCDATA(captured, possibleInvokeCloseTags(dsml), startEnd+1)
|
||||
if invokeCloseIdx >= 0 {
|
||||
afterClose := captured[invokeCloseIdx:]
|
||||
for _, closeTag := range possibleInvokeCloseTags(dsml) {
|
||||
if strings.HasPrefix(strings.ToLower(afterClose), closeTag) {
|
||||
afterClose = afterClose[len(closeTag):]
|
||||
break
|
||||
}
|
||||
}
|
||||
return strings.TrimSpace(afterClose) == ""
|
||||
if invokeCloseTag, ok := findFirstToolMarkupTagByNameFrom(captured, startEnd+1, "invoke", true); ok {
|
||||
return strings.TrimSpace(captured[invokeCloseTag.End+1:]) == ""
|
||||
}
|
||||
|
||||
trimmedLower := strings.ToLower(trimmedBody)
|
||||
return strings.HasPrefix(trimmedLower, parameterOpen) ||
|
||||
return strings.HasPrefix(trimmedLower, "<parameter") ||
|
||||
strings.HasPrefix(trimmedLower, "{") ||
|
||||
strings.HasPrefix(trimmedLower, "[")
|
||||
}
|
||||
|
||||
func containsAnyToolCallWrapper(lower string) bool {
|
||||
return strings.Contains(lower, "<tool_calls") ||
|
||||
strings.Contains(lower, "<|dsml|tool_calls") ||
|
||||
strings.Contains(lower, "<|dsml tool_calls") ||
|
||||
strings.Contains(lower, "<dsml|tool_calls") ||
|
||||
strings.Contains(lower, "<dsml tool_calls") ||
|
||||
strings.Contains(lower, "<|tool_calls") ||
|
||||
strings.Contains(lower, "<|tool_calls")
|
||||
}
|
||||
|
||||
func possibleWrapperCloseTags(dsml bool) []string {
|
||||
if !dsml {
|
||||
return []string{"</tool_calls>"}
|
||||
}
|
||||
return []string{"</|dsml|tool_calls>", "</|dsml tool_calls>", "</dsml|tool_calls>", "</dsml tool_calls>", "</|tool_calls>", "</|tool_calls>"}
|
||||
}
|
||||
|
||||
func possibleInvokeCloseTags(dsml bool) []string {
|
||||
if !dsml {
|
||||
return []string{"</invoke>"}
|
||||
}
|
||||
return []string{"</|dsml|invoke>", "</|dsml invoke>", "</dsml|invoke>", "</dsml invoke>", "</|invoke>", "</|invoke>"}
|
||||
}
|
||||
|
||||
func findAnyXMLCloseOutsideCDATA(s string, closeTags []string, start int) int {
|
||||
best := -1
|
||||
for _, closeTag := range closeTags {
|
||||
idx := findXMLCloseOutsideCDATA(s, closeTag, start)
|
||||
if idx >= 0 && (best < 0 || idx < best) {
|
||||
best = idx
|
||||
}
|
||||
}
|
||||
return best
|
||||
}
|
||||
|
||||
func firstInvokeIndex(lower string) (int, bool) {
|
||||
xmlIdx := strings.Index(lower, "<invoke")
|
||||
// Check all DSML-like invoke prefixes.
|
||||
dsmlPrefixes := []string{"<|dsml|invoke", "<|dsml invoke", "<dsml|invoke", "<dsml invoke", "<|invoke", "<|invoke"}
|
||||
dsmlIdx := -1
|
||||
for _, prefix := range dsmlPrefixes {
|
||||
idx := strings.Index(lower, prefix)
|
||||
if idx >= 0 && (dsmlIdx < 0 || idx < dsmlIdx) {
|
||||
dsmlIdx = idx
|
||||
}
|
||||
}
|
||||
switch {
|
||||
case xmlIdx < 0:
|
||||
return dsmlIdx, dsmlIdx >= 0
|
||||
case dsmlIdx < 0:
|
||||
return xmlIdx, false
|
||||
case dsmlIdx < xmlIdx:
|
||||
return dsmlIdx, true
|
||||
default:
|
||||
return xmlIdx, false
|
||||
}
|
||||
}
|
||||
|
||||
// findPartialXMLToolTagStart checks if the string ends with a partial canonical
|
||||
// XML wrapper tag (e.g., "<too") and returns the position of the '<'.
|
||||
func findPartialXMLToolTagStart(s string) int {
|
||||
lastLT := strings.LastIndex(s, "<")
|
||||
if lastLT < 0 {
|
||||
@@ -237,13 +149,18 @@ func findPartialXMLToolTagStart(s string) int {
|
||||
return -1
|
||||
}
|
||||
lowerTail := strings.ToLower(tail)
|
||||
// Check if the tail is a prefix of any known XML tool tag.
|
||||
for _, tag := range xmlToolCallOpeningTags {
|
||||
tagWithLT := tag
|
||||
if !strings.HasPrefix(tagWithLT, "<") {
|
||||
tagWithLT = "<" + tagWithLT
|
||||
}
|
||||
if strings.HasPrefix(tagWithLT, lowerTail) {
|
||||
for _, tag := range []string{
|
||||
"<tool_calls", "<invoke", "<parameter",
|
||||
"<|tool_calls", "<|invoke", "<|parameter",
|
||||
"<|tool_calls", "<|invoke", "<|parameter",
|
||||
"<|dsml|tool_calls", "<|dsml|invoke", "<|dsml|parameter",
|
||||
"<dsmltool_calls", "<dsmlinvoke", "<dsmlparameter",
|
||||
"<dsml tool_calls", "<dsml invoke", "<dsml parameter",
|
||||
"<dsml|tool_calls", "<dsml|invoke", "<dsml|parameter",
|
||||
"<|dsmltool_calls", "<|dsmlinvoke", "<|dsmlparameter",
|
||||
"<|dsml tool_calls", "<|dsml invoke", "<|dsml parameter",
|
||||
} {
|
||||
if strings.HasPrefix(tag, lowerTail) {
|
||||
return lastLT
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,138 +1,28 @@
|
||||
package toolstream
|
||||
|
||||
import "strings"
|
||||
import "ds2api/internal/toolcall"
|
||||
|
||||
func findMatchingXMLToolWrapperClose(s, openTag, closeTag string, openIdx int) int {
|
||||
if s == "" || openTag == "" || closeTag == "" || openIdx < 0 {
|
||||
return -1
|
||||
}
|
||||
lower := strings.ToLower(s)
|
||||
openTarget := strings.ToLower(openTag)
|
||||
closeTarget := strings.ToLower(closeTag)
|
||||
depth := 1
|
||||
for i := openIdx + len(openTarget); i < len(s); {
|
||||
switch {
|
||||
case strings.HasPrefix(lower[i:], "<![cdata["):
|
||||
end := strings.Index(lower[i+len("<![cdata["):], "]]>")
|
||||
if end < 0 {
|
||||
return -1
|
||||
}
|
||||
i += len("<![cdata[") + end + len("]]>")
|
||||
case strings.HasPrefix(lower[i:], "<!--"):
|
||||
end := strings.Index(lower[i+len("<!--"):], "-->")
|
||||
if end < 0 {
|
||||
return -1
|
||||
}
|
||||
i += len("<!--") + end + len("-->")
|
||||
case strings.HasPrefix(lower[i:], closeTarget):
|
||||
depth--
|
||||
if depth == 0 {
|
||||
return i
|
||||
}
|
||||
i += len(closeTarget)
|
||||
case strings.HasPrefix(lower[i:], openTarget) && hasXMLToolTagBoundary(s, i+len(openTarget)):
|
||||
depth++
|
||||
i += len(openTarget)
|
||||
default:
|
||||
i++
|
||||
}
|
||||
}
|
||||
return -1
|
||||
func findFirstToolMarkupTagByName(s string, start int, name string) (toolcall.ToolMarkupTag, bool) {
|
||||
return findFirstToolMarkupTagByNameFrom(s, start, name, false)
|
||||
}
|
||||
|
||||
func findXMLOpenOutsideCDATA(s, openTag string, start int) int {
|
||||
if s == "" || openTag == "" {
|
||||
return -1
|
||||
}
|
||||
if start < 0 {
|
||||
start = 0
|
||||
}
|
||||
lower := strings.ToLower(s)
|
||||
target := strings.ToLower(openTag)
|
||||
for i := start; i < len(s); {
|
||||
switch {
|
||||
case strings.HasPrefix(lower[i:], "<![cdata["):
|
||||
end := strings.Index(lower[i+len("<![cdata["):], "]]>")
|
||||
if end < 0 {
|
||||
return -1
|
||||
}
|
||||
i += len("<![cdata[") + end + len("]]>")
|
||||
case strings.HasPrefix(lower[i:], "<!--"):
|
||||
end := strings.Index(lower[i+len("<!--"):], "-->")
|
||||
if end < 0 {
|
||||
return -1
|
||||
}
|
||||
i += len("<!--") + end + len("-->")
|
||||
case strings.HasPrefix(lower[i:], target) && hasXMLToolTagBoundary(s, i+len(target)):
|
||||
return i
|
||||
default:
|
||||
i++
|
||||
func findFirstToolMarkupTagByNameFrom(s string, start int, name string, closing bool) (toolcall.ToolMarkupTag, bool) {
|
||||
for pos := maxInt(start, 0); pos < len(s); {
|
||||
tag, ok := toolcall.FindToolMarkupTagOutsideIgnored(s, pos)
|
||||
if !ok {
|
||||
return toolcall.ToolMarkupTag{}, false
|
||||
}
|
||||
if tag.Name == name && tag.Closing == closing {
|
||||
return tag, true
|
||||
}
|
||||
pos = tag.End + 1
|
||||
}
|
||||
return -1
|
||||
return toolcall.ToolMarkupTag{}, false
|
||||
}
|
||||
|
||||
func findXMLCloseOutsideCDATA(s, closeTag string, start int) int {
|
||||
if s == "" || closeTag == "" {
|
||||
return -1
|
||||
func maxInt(a, b int) int {
|
||||
if a > b {
|
||||
return a
|
||||
}
|
||||
if start < 0 {
|
||||
start = 0
|
||||
}
|
||||
lower := strings.ToLower(s)
|
||||
target := strings.ToLower(closeTag)
|
||||
for i := start; i < len(s); {
|
||||
switch {
|
||||
case strings.HasPrefix(lower[i:], "<![cdata["):
|
||||
end := strings.Index(lower[i+len("<![cdata["):], "]]>")
|
||||
if end < 0 {
|
||||
return -1
|
||||
}
|
||||
i += len("<![cdata[") + end + len("]]>")
|
||||
case strings.HasPrefix(lower[i:], "<!--"):
|
||||
end := strings.Index(lower[i+len("<!--"):], "-->")
|
||||
if end < 0 {
|
||||
return -1
|
||||
}
|
||||
i += len("<!--") + end + len("-->")
|
||||
case strings.HasPrefix(lower[i:], target):
|
||||
return i
|
||||
default:
|
||||
i++
|
||||
}
|
||||
}
|
||||
return -1
|
||||
}
|
||||
|
||||
func hasXMLToolTagBoundary(text string, idx int) bool {
|
||||
if idx >= len(text) {
|
||||
return true
|
||||
}
|
||||
switch text[idx] {
|
||||
case ' ', '\t', '\n', '\r', '>', '/':
|
||||
return true
|
||||
default:
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
func findXMLTagEnd(s string, start int) int {
|
||||
quote := byte(0)
|
||||
for i := start; i < len(s); i++ {
|
||||
ch := s[i]
|
||||
if quote != 0 {
|
||||
if ch == quote {
|
||||
quote = 0
|
||||
}
|
||||
continue
|
||||
}
|
||||
if ch == '"' || ch == '\'' {
|
||||
quote = ch
|
||||
continue
|
||||
}
|
||||
if ch == '>' {
|
||||
return i
|
||||
}
|
||||
}
|
||||
return -1
|
||||
return b
|
||||
}
|
||||
|
||||
@@ -5,28 +5,7 @@ import "regexp"
|
||||
// --- XML tool call support for the streaming sieve ---
|
||||
|
||||
//nolint:unused // kept as explicit tag inventory for future XML sieve refinements.
|
||||
var xmlToolCallClosingTags = []string{"</tool_calls>", "</|dsml|tool_calls>", "</|dsml tool_calls>", "</dsml|tool_calls>", "</dsml tool_calls>", "</|tool_calls>", "</|tool_calls>"}
|
||||
var xmlToolCallOpeningTags = []string{
|
||||
"<tool_calls", "<invoke",
|
||||
"<|dsml|tool_calls", "<|dsml|invoke",
|
||||
"<|dsml tool_calls", "<|dsml invoke",
|
||||
"<dsml|tool_calls", "<dsml|invoke",
|
||||
"<dsml tool_calls", "<dsml invoke",
|
||||
"<|tool_calls", "<|invoke",
|
||||
"<|tool_calls", "<|invoke",
|
||||
}
|
||||
|
||||
// xmlToolCallTagPairs maps each opening tag to its expected closing tag.
|
||||
// Order matters: longer/wrapper tags must be checked first.
|
||||
var xmlToolCallTagPairs = []struct{ open, close string }{
|
||||
{"<|dsml|tool_calls", "</|dsml|tool_calls>"},
|
||||
{"<|dsml tool_calls", "</|dsml tool_calls>"},
|
||||
{"<dsml|tool_calls", "</dsml|tool_calls>"},
|
||||
{"<dsml tool_calls", "</dsml tool_calls>"},
|
||||
{"<|tool_calls", "</|tool_calls>"},
|
||||
{"<|tool_calls", "</|tool_calls>"},
|
||||
{"<tool_calls", "</tool_calls>"},
|
||||
}
|
||||
var xmlToolCallClosingTags = []string{"</tool_calls>", "</|dsml|tool_calls>", "</|dsmltool_calls>", "</|dsml tool_calls>", "</dsml|tool_calls>", "</dsmltool_calls>", "</dsml tool_calls>", "</|tool_calls>", "</|tool_calls>"}
|
||||
|
||||
// xmlToolCallBlockPattern matches a complete canonical XML tool call block.
|
||||
//
|
||||
@@ -37,10 +16,14 @@ var xmlToolCallBlockPattern = regexp.MustCompile(`(?is)((?:<tool_calls\b|<\|dsml
|
||||
var xmlToolTagsToDetect = []string{
|
||||
"<|dsml|tool_calls>", "<|dsml|tool_calls\n", "<|dsml|tool_calls ",
|
||||
"<|dsml|invoke ", "<|dsml|invoke\n", "<|dsml|invoke\t", "<|dsml|invoke\r",
|
||||
"<|dsmltool_calls>", "<|dsmltool_calls\n", "<|dsmltool_calls ",
|
||||
"<|dsmlinvoke ", "<|dsmlinvoke\n", "<|dsmlinvoke\t", "<|dsmlinvoke\r",
|
||||
"<|dsml tool_calls>", "<|dsml tool_calls\n", "<|dsml tool_calls ",
|
||||
"<|dsml invoke ", "<|dsml invoke\n", "<|dsml invoke\t", "<|dsml invoke\r",
|
||||
"<dsml|tool_calls>", "<dsml|tool_calls\n", "<dsml|tool_calls ",
|
||||
"<dsml|invoke ", "<dsml|invoke\n", "<dsml|invoke\t", "<dsml|invoke\r",
|
||||
"<dsmltool_calls>", "<dsmltool_calls\n", "<dsmltool_calls ",
|
||||
"<dsmlinvoke ", "<dsmlinvoke\n", "<dsmlinvoke\t", "<dsmlinvoke\r",
|
||||
"<dsml tool_calls>", "<dsml tool_calls\n", "<dsml tool_calls ",
|
||||
"<dsml invoke ", "<dsml invoke\n", "<dsml invoke\t", "<dsml invoke\r",
|
||||
"<|tool_calls>", "<|tool_calls\n", "<|tool_calls ",
|
||||
|
||||
@@ -174,6 +174,41 @@ func TestProcessToolSieveKeepsCDATAEmbeddedToolClosingBuffered(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestProcessToolSieveFallsBackWhenCDATANeverCloses(t *testing.T) {
|
||||
var state State
|
||||
chunks := []string{
|
||||
"<tool_calls>\n <invoke name=\"Write\">\n <parameter name=\"content\"><![CDATA[",
|
||||
"hello world",
|
||||
"</parameter>\n </invoke>\n</tool_calls>",
|
||||
}
|
||||
var events []Event
|
||||
for _, c := range chunks {
|
||||
events = append(events, ProcessChunk(&state, c, []string{"Write"})...)
|
||||
}
|
||||
events = append(events, Flush(&state, []string{"Write"})...)
|
||||
|
||||
var textContent strings.Builder
|
||||
toolCalls := 0
|
||||
for _, evt := range events {
|
||||
if evt.Content != "" {
|
||||
textContent.WriteString(evt.Content)
|
||||
}
|
||||
toolCalls += len(evt.ToolCalls)
|
||||
if len(evt.ToolCalls) > 0 {
|
||||
if got, _ := evt.ToolCalls[0].Input["content"].(string); got != "hello world" {
|
||||
t.Fatalf("expected recovered CDATA payload, got %q", got)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if toolCalls != 1 {
|
||||
t.Fatalf("expected unclosed CDATA payload to still parse, got %d tool calls events=%#v", toolCalls, events)
|
||||
}
|
||||
if textContent.Len() != 0 {
|
||||
t.Fatalf("expected no leaked text, got %q", textContent.String())
|
||||
}
|
||||
}
|
||||
|
||||
func TestProcessToolSieveXMLWithLeadingText(t *testing.T) {
|
||||
var state State
|
||||
// Model outputs some prose then an XML tool call.
|
||||
|
||||
@@ -71,6 +71,30 @@ test('parseToolCalls ignores DSML space lookalike tag names', () => {
|
||||
assert.equal(calls.length, 0);
|
||||
});
|
||||
|
||||
test('parseToolCalls tolerates collapsed DSML tag names', () => {
|
||||
const todos = [
|
||||
'[x] 检查 toolcalls_format.go 格式化逻辑',
|
||||
'[x] 检查 toolcalls_parse.go 解析逻辑',
|
||||
'[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go',
|
||||
'[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go',
|
||||
'[x] 检查 prompt/tool_calls.go 注入逻辑',
|
||||
'[x] 检查 toolstream 流式解析',
|
||||
'[x] 查看测试文件确认预期行为',
|
||||
'[x] 给出调查结论',
|
||||
].join('\n');
|
||||
const payload = `<DSMLtool_calls><DSMLinvoke name="update_todo_list"><DSMLparameter name="todos"><![CDATA[${todos}]]></DSMLparameter></DSMLinvoke></DSMLtool_calls>`;
|
||||
const calls = parseToolCalls(payload, ['update_todo_list']);
|
||||
assert.equal(calls.length, 1);
|
||||
assert.equal(calls[0].name, 'update_todo_list');
|
||||
assert.equal(calls[0].input.todos, todos);
|
||||
});
|
||||
|
||||
test('parseToolCalls ignores collapsed DSML lookalike tag names', () => {
|
||||
const payload = '<DSMLtool_calls_extra><DSMLinvoke name="update_todo_list"><DSMLparameter name="todos">x</DSMLparameter></DSMLinvoke></DSMLtool_calls_extra>';
|
||||
const calls = parseToolCalls(payload, ['update_todo_list']);
|
||||
assert.equal(calls.length, 0);
|
||||
});
|
||||
|
||||
test('parseToolCalls keeps canonical XML examples inside DSML CDATA', () => {
|
||||
const content = '<tool_calls><invoke name="demo"><parameter name="value">x</parameter></invoke></tool_calls>';
|
||||
const payload = `<|DSML|tool_calls><|DSML|invoke name="write_file"><|DSML|parameter name="path">notes.md</|DSML|parameter><|DSML|parameter name="content"><![CDATA[${content}]]></|DSML|parameter></|DSML|invoke></|DSML|tool_calls>`;
|
||||
@@ -80,6 +104,24 @@ test('parseToolCalls keeps canonical XML examples inside DSML CDATA', () => {
|
||||
assert.deepEqual(calls[0].input, { path: 'notes.md', content });
|
||||
});
|
||||
|
||||
test('parseToolCalls recovers when CDATA never closes inside a valid wrapper', () => {
|
||||
const payload = '<tool_calls><invoke name="Write"><parameter name="content"><![CDATA[hello world</parameter></invoke></tool_calls>';
|
||||
const calls = parseToolCalls(payload, ['Write']);
|
||||
assert.equal(calls.length, 1);
|
||||
assert.equal(calls[0].name, 'Write');
|
||||
assert.equal(calls[0].input.content, 'hello world');
|
||||
});
|
||||
|
||||
test('parseToolCalls supports JSON scalar parameters', () => {
|
||||
const payload = '<tool_calls><invoke name="configure"><parameter name="count">123</parameter><parameter name="max_tokens"><![CDATA[256]]></parameter><parameter name="enabled">true</parameter></invoke></tool_calls>';
|
||||
const calls = parseToolCalls(payload, ['configure']);
|
||||
assert.equal(calls.length, 1);
|
||||
assert.equal(calls[0].name, 'configure');
|
||||
assert.equal(calls[0].input.count, 123);
|
||||
assert.equal(calls[0].input.max_tokens, 256);
|
||||
assert.equal(calls[0].input.enabled, true);
|
||||
});
|
||||
|
||||
test('parseToolCalls normalizes mixed DSML and XML tool tags', () => {
|
||||
// Models commonly mix DSML wrapper tags with canonical inner tags.
|
||||
const payload = '<|DSML|tool_calls><invoke name="read_file"><|DSML|parameter name="path">README.MD</|DSML|parameter></invoke></|DSML|tool_calls>';
|
||||
@@ -147,6 +189,41 @@ test('sieve keeps DSML space lookalike tag names as text', () => {
|
||||
assert.equal(collectText(events), input);
|
||||
});
|
||||
|
||||
test('sieve emits tool_calls for collapsed DSML tag names and preserves prefix text', () => {
|
||||
const todos = [
|
||||
'[x] 检查 toolcalls_format.go 格式化逻辑',
|
||||
'[x] 检查 toolcalls_parse.go 解析逻辑',
|
||||
'[x] 检查 toolcalls_xml.go 和 toolcalls_dsml.go',
|
||||
'[x] 检查 toolcalls_markup.go 和 toolcalls_json_repair.go',
|
||||
'[x] 检查 prompt/tool_calls.go 注入逻辑',
|
||||
'[x] 检查 toolstream 流式解析',
|
||||
'[x] 查看测试文件确认预期行为',
|
||||
'[x] 给出调查结论',
|
||||
].join('\n');
|
||||
const events = runSieve([
|
||||
'[]\n',
|
||||
'<DSMLtool_calls>\n',
|
||||
'<DSMLinvoke name="update_todo_list">\n',
|
||||
`<DSMLparameter name="todos"><![CDATA[${todos}]]></DSMLparameter>\n`,
|
||||
'</DSMLinvoke>\n',
|
||||
'</DSMLtool_calls>',
|
||||
], ['update_todo_list']);
|
||||
const text = collectText(events);
|
||||
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
|
||||
assert.equal(finalCalls.length, 1);
|
||||
assert.equal(finalCalls[0].name, 'update_todo_list');
|
||||
assert.equal(finalCalls[0].input.todos, todos);
|
||||
assert.equal(text, '[]\n');
|
||||
});
|
||||
|
||||
test('sieve keeps collapsed DSML lookalike tag names as text', () => {
|
||||
const input = '<DSMLtool_calls_extra><DSMLinvoke name="update_todo_list"><DSMLparameter name="todos">x</DSMLparameter></DSMLinvoke></DSMLtool_calls_extra>';
|
||||
const events = runSieve([input], ['update_todo_list']);
|
||||
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
|
||||
assert.equal(finalCalls.length, 0);
|
||||
assert.equal(collectText(events), input);
|
||||
});
|
||||
|
||||
test('sieve preserves review body with alias mentions before real DSML tool calls', () => {
|
||||
const events = runSieve([
|
||||
"Done reviewing the diff. Here's my analysis before we commit:\n\n",
|
||||
@@ -277,6 +354,23 @@ test('sieve keeps long XML tool calls buffered until the closing tag arrives', (
|
||||
assert.equal(finalCalls[0].input.content, longContent);
|
||||
});
|
||||
|
||||
test('sieve recovers when CDATA never closes inside a valid wrapper', () => {
|
||||
const events = runSieve(
|
||||
[
|
||||
'<tool_calls>\n <invoke name="Write">\n <parameter name="content"><![CDATA[',
|
||||
'hello world',
|
||||
'</parameter>\n </invoke>\n</tool_calls>',
|
||||
],
|
||||
['Write'],
|
||||
);
|
||||
const leakedText = collectText(events);
|
||||
const finalCalls = events.filter((evt) => evt.type === 'tool_calls').flatMap((evt) => evt.calls || []);
|
||||
assert.equal(finalCalls.length, 1);
|
||||
assert.equal(finalCalls[0].name, 'Write');
|
||||
assert.equal(finalCalls[0].input.content, 'hello world');
|
||||
assert.equal(leakedText, '');
|
||||
});
|
||||
|
||||
test('sieve keeps CDATA tool examples buffered until the outer closing tag arrives', () => {
|
||||
const content = [
|
||||
'# DS2API 4.0 更新内容',
|
||||
|
||||
Reference in New Issue
Block a user