PR #460 introduced fullwidth pipe characters (|) in DSML tool call formatting
to improve parsing robustness, but models exposed to these fullwidth pipes in
system prompts exhibit significantly higher rates of tool output hallucinations.
Reverting to halfwidth pipes (|) drastically reduces tokenizer/perplexity-driven
hallucinations while retaining the existing confusable-hardening in the parser.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "delete current conversation" feature was not working on Vercel
deployment because the stream flow uses a separate lease mechanism.
The session_id created during prepare phase was not preserved for
deletion when the stream ends.
Changes:
- Add SessionID field to streamLease struct to preserve session_id
- Pass session_id to holdStreamLease during prepare
- Modify releaseStreamLease to return auth and session_id
- Call autoDeleteRemoteSession in handleVercelStreamRelease when
releasing a lease with auto-delete mode enabled
Closes #vercel-auto-delete
Replace all strings.ToLower usage with ASCII case-insensitive matching
(hasASCIIPrefixFoldAt, indexASCIIFold, hasDSMLPrefix) to prevent slice
bounds errors when Unicode characters change byte length after case
folding (e.g., Turkish İ U+0130 → i + combining dot: 2 bytes → 3 bytes).
Root cause: code created a strings.ToLower(text) copy, found byte
positions in that copy, then used those positions to slice the
original text — byte offsets that were valid in the lowercased copy
became out-of-bounds in the original when case folding changed byte
lengths.
Files changed:
- toolcalls_scan.go: remove 5 lower usages, add hasDSMLPrefix
- toolcalls_parse_markup.go: remove 3 lower usages, add indexASCIIFold
- toolcalls_markup.go: SanitizeLooseCDATA lower removal
- toolcalls_parse.go: updateCDATAStateForStrip lower removal
- tool_prompt.go: align DSML pipe characters with tool call spec
- tool_prompt_test.go: fix pre-existing test character mismatch
- Stream: strip both and [reference:N] markers to prevent
leaking partial link metadata during incremental output
- Non-stream: convert citation/reference markers to Markdown links for
Claude Messages, Gemini generateContent, and OpenAI Chat/Responses
- Remove StripReferenceMarkers option from call sites; behavior is now
determined automatically by stream vs non-stream context
- Extend JS runtime stripReferenceMarkersText() to also match [citation:N]
- Add tests for streaming marker stripping and non-stream link conversion
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- ValidateTurn no longer errors on thinking-only responses, deferring to
ShouldRetryEmptyOutput which now also covers thinking-only outputs.
- Empty output retry uses multi-turn follow-up with a regeneration prompt
suffix and parent_message_id in the same DeepSeek session.
- Centralize StripReferenceMarkersEnabled into textclean package to
eliminate duplicated hardcoded booleans across 4 protocol handlers.
- Log a deprecation warning when the legacy "compat" config key is used.
- Document thinking-only retry and reference marker stripping in API.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Increase StreamIdleTimeout from 90s to 300s and MaxKeepaliveCount from 10 to 40
to prevent premature stream termination with DeepSeek V4 Pro (~50K token contexts)
- Add r.Context().Err() check after ConsumeSSE in empty_retry_runtime (chat + responses)
to prevent historySession.error() from overwriting historySession.stopped()
when the request context is cancelled
References:
- MaxKeepaliveCount=10 creates a 50s no-content timeout that kills the stream
before DeepSeek V4 Pro can produce its first token with large contexts
- Hermes Agent reports 'No response from provider for 180s' because the
underlying SSE connection was already terminated by ds2api at 50s
- Context cancellation path: OnContextDone -> stopped(), then finalize()
with empty output -> retry -> error() overwrites stopped()
Track byte sizes of inline-uploaded files during PreprocessInlineFileInputs and convert them to conservative token estimates (bytes/3). RefFileTokens is threaded through StandardRequest into all OpenAI chat/responses usage builders so returned prompt_tokens/input_tokens reflect the full upstream context cost including attached files.