PR #460 introduced fullwidth pipe characters (|) in DSML tool call formatting
to improve parsing robustness, but models exposed to these fullwidth pipes in
system prompts exhibit significantly higher rates of tool output hallucinations.
Reverting to halfwidth pipes (|) drastically reduces tokenizer/perplexity-driven
hallucinations while retaining the existing confusable-hardening in the parser.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "delete current conversation" feature was not working on Vercel
deployment because the stream flow uses a separate lease mechanism.
The session_id created during prepare phase was not preserved for
deletion when the stream ends.
Changes:
- Add SessionID field to streamLease struct to preserve session_id
- Pass session_id to holdStreamLease during prepare
- Modify releaseStreamLease to return auth and session_id
- Call autoDeleteRemoteSession in handleVercelStreamRelease when
releasing a lease with auto-delete mode enabled
Closes #vercel-auto-delete
Detect camelCase→PascalCase boundaries between arbitrary prefixes and fixed
local names (tool_calls/invoke/parameter), so that fused forms like
<DSmartToolCalls> are recognized without explicit separator characters.
Also add the underscore-free alias "toolcalls" as a valid DSML local name.
Includes lookalike rejection tests to ensure near-matches like
<DSmartToolCallsExtra> are not falsely accepted.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the hardcoded isToolMarkupPipe (matching |, |, ␂, \x02, !) and
isToolCDATAOpenSeparator (exclusion-based) with a single isToolMarkupSeparator
that treats any Unicode punctuation outside structural characters as a valid
DSML separator. This eliminates the need for a per-character allowlist — novel
separators like ※ are automatically supported without code changes. Also
removes the unused cdataPattern regexp and updates docs to use "non-structural
separator" terminology.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The closing tag format was <|/DSML|tag> but must be </|DSML|tag>.
The scanner's closing-tag detection checks text[1] == '/', so the
slash must come immediately after '<', before the first full-width
pipe (U+FF5C). Tags like <|/DSML|tool_calls> would not set
closing=true and would not match any tool markup name.
Files fixed:
- internal/toolcall/tool_prompt.go: all closing tags
- internal/promptcompat/prompt_build_test.go: 1 test expectation
Replace all strings.ToLower usage with ASCII case-insensitive matching
(hasASCIIPrefixFoldAt, indexASCIIFold, hasDSMLPrefix) to prevent slice
bounds errors when Unicode characters change byte length after case
folding (e.g., Turkish İ U+0130 → i + combining dot: 2 bytes → 3 bytes).
Root cause: code created a strings.ToLower(text) copy, found byte
positions in that copy, then used those positions to slice the
original text — byte offsets that were valid in the lowercased copy
became out-of-bounds in the original when case folding changed byte
lengths.
Files changed:
- toolcalls_scan.go: remove 5 lower usages, add hasDSMLPrefix
- toolcalls_parse_markup.go: remove 3 lower usages, add indexASCIIFold
- toolcalls_markup.go: SanitizeLooseCDATA lower removal
- toolcalls_parse.go: updateCDATAStateForStrip lower removal
- tool_prompt.go: align DSML pipe characters with tool call spec
- tool_prompt_test.go: fix pre-existing test character mismatch
The lower parameter was a footgun: callers had to keep it in sync with the
loop bound over text. Instead, skipXMLIgnoredSection now accepts only text
and constructs strings.ToLower(tail) internally for its prefix checks.
This eliminates the entire class of len(text) vs len(lower) boundary bugs
along with the min() workaround.
Also changes:
- findToolCDATAEnd: drop lower param, use text directly for closeMarker
search (]]> is ASCII, ToLower is a no-op for it)
- cdataEndLooksStructural: drop lower param, use raw text byte comparison
- All external callers: loop bound reverts to plain len(text)
The inner tag-matching functions (findXMLStartTagOutsideCDATA,
findMatchingXMLEndTagOutsideCDATA) retain their own local lower for
HasPrefix comparisons against the target tag name, keeping concerns
properly separated.
Fixes#435.