- Increase StreamIdleTimeout from 90s to 300s and MaxKeepaliveCount from 10 to 40
to prevent premature stream termination with DeepSeek V4 Pro (~50K token contexts)
- Add r.Context().Err() check after ConsumeSSE in empty_retry_runtime (chat + responses)
to prevent historySession.error() from overwriting historySession.stopped()
when the request context is cancelled
References:
- MaxKeepaliveCount=10 creates a 50s no-content timeout that kills the stream
before DeepSeek V4 Pro can produce its first token with large contexts
- Hermes Agent reports 'No response from provider for 180s' because the
underlying SSE connection was already terminated by ds2api at 50s
- Context cancellation path: OnContextDone -> stopped(), then finalize()
with empty output -> retry -> error() overwrites stopped()
Track byte sizes of inline-uploaded files during PreprocessInlineFileInputs and convert them to conservative token estimates (bytes/3). RefFileTokens is threaded through StandardRequest into all OpenAI chat/responses usage builders so returned prompt_tokens/input_tokens reflect the full upstream context cost including attached files.
PromptTokenText now reflects the actual downstream context cost: the uploaded IGNORE.txt file content plus the neutral live prompt, instead of only the pre-split prompt text.
Lock in the current_input_file regression with API-level tests and document that returned context token counts now track full prompt semantics with conservative sizing.
Unify Claude count_tokens, legacy stream accounting, and legacy render usage with preserved prompt text so Claude stops falling back to lossy message formatting.
Use the stored full-context prompt text for chat non-stream, stream, and retry accounting so current_input_file no longer shrinks returned prompt token counts.
Extract the compacted-context prompt string into a single function in
promptcompat and add a [context note] block to the injected file wrapper
so the model knows the attached history is compressed context, not new
instructions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously retry/continue requests reused the initial PoW header and
lacked parent_message_id, causing them to land as disconnected root
messages in the DeepSeek session instead of proper follow-up turns.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Auto-retry Chat/Responses streams once when upstream output is empty but not content-filtered, reusing session/token/PoW and appending a regeneration suffix to the prompt
- Wire DeepSeek continue API into Vercel streams for multi-round thinking output exhaustion
- Defer empty-output errors in stream finalizers to enable synthetic retry; only surface failure when the retry budget is exhausted
- Track content_filter stops to avoid retry on filtered outputs
- Add comprehensive tests for stream/non-stream retry, Responses retry, and content_filter no-retry
- Update prompt-compatibility.md documentation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Guard current_input_file.enabled / thinking_injection.{enabled,prompt} with hasNestedSettingsKey so partial updates don't overwrite omitted fields
- Expand DSML alias support to tolerate space-separated tags (e.g. <|dsml invoke>) alongside pipe-separated forms
- Sync Go sieve, Node sieve, toolcall parser, and tests for all new DSML variants
- Update API.md and toolcall-semantics.md with expanded alias coverage
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Chat history: early 304 via Revision()/DetailRevision() to avoid full snapshot reads
- WebUI: lazy-load tab containers with Suspense fallback
- Toolstream: split tool_sieve_xml.go into tags.go and scan.go
- CI: trigger on main branch, guard cross-build to dev/main pushes only
- Docs: add DEVELOPER.md developer quick reference
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>