- Stream: strip both and [reference:N] markers to prevent
leaking partial link metadata during incremental output
- Non-stream: convert citation/reference markers to Markdown links for
Claude Messages, Gemini generateContent, and OpenAI Chat/Responses
- Remove StripReferenceMarkers option from call sites; behavior is now
determined automatically by stream vs non-stream context
- Extend JS runtime stripReferenceMarkersText() to also match [citation:N]
- Add tests for streaming marker stripping and non-stream link conversion
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- ValidateTurn no longer errors on thinking-only responses, deferring to
ShouldRetryEmptyOutput which now also covers thinking-only outputs.
- Empty output retry uses multi-turn follow-up with a regeneration prompt
suffix and parent_message_id in the same DeepSeek session.
- Centralize StripReferenceMarkersEnabled into textclean package to
eliminate duplicated hardcoded booleans across 4 protocol handlers.
- Log a deprecation warning when the legacy "compat" config key is used.
- Document thinking-only retry and reference marker stripping in API.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Increase StreamIdleTimeout from 90s to 300s and MaxKeepaliveCount from 10 to 40
to prevent premature stream termination with DeepSeek V4 Pro (~50K token contexts)
- Add r.Context().Err() check after ConsumeSSE in empty_retry_runtime (chat + responses)
to prevent historySession.error() from overwriting historySession.stopped()
when the request context is cancelled
References:
- MaxKeepaliveCount=10 creates a 50s no-content timeout that kills the stream
before DeepSeek V4 Pro can produce its first token with large contexts
- Hermes Agent reports 'No response from provider for 180s' because the
underlying SSE connection was already terminated by ds2api at 50s
- Context cancellation path: OnContextDone -> stopped(), then finalize()
with empty output -> retry -> error() overwrites stopped()
Track byte sizes of inline-uploaded files during PreprocessInlineFileInputs and convert them to conservative token estimates (bytes/3). RefFileTokens is threaded through StandardRequest into all OpenAI chat/responses usage builders so returned prompt_tokens/input_tokens reflect the full upstream context cost including attached files.
PromptTokenText now reflects the actual downstream context cost: the uploaded IGNORE.txt file content plus the neutral live prompt, instead of only the pre-split prompt text.
Lock in the current_input_file regression with API-level tests and document that returned context token counts now track full prompt semantics with conservative sizing.
Use the stored full-context prompt text for chat non-stream, stream, and retry accounting so current_input_file no longer shrinks returned prompt token counts.
Extract the compacted-context prompt string into a single function in
promptcompat and add a [context note] block to the injected file wrapper
so the model knows the attached history is compressed context, not new
instructions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously retry/continue requests reused the initial PoW header and
lacked parent_message_id, causing them to land as disconnected root
messages in the DeepSeek session instead of proper follow-up turns.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>