- Increase StreamIdleTimeout from 90s to 300s and MaxKeepaliveCount from 10 to 40
to prevent premature stream termination with DeepSeek V4 Pro (~50K token contexts)
- Add r.Context().Err() check after ConsumeSSE in empty_retry_runtime (chat + responses)
to prevent historySession.error() from overwriting historySession.stopped()
when the request context is cancelled
References:
- MaxKeepaliveCount=10 creates a 50s no-content timeout that kills the stream
before DeepSeek V4 Pro can produce its first token with large contexts
- Hermes Agent reports 'No response from provider for 180s' because the
underlying SSE connection was already terminated by ds2api at 50s
- Context cancellation path: OnContextDone -> stopped(), then finalize()
with empty output -> retry -> error() overwrites stopped()
Replace bufio.Scanner with bufio.NewReaderSize + ReadBytes('\n') across all
SSE read paths to preserve long single-line data (e.g. write_file content).
Add quasi_status and auto_continue handling as direct path-based patches in
both Go continue observer and Node vercel_stream_impl, mirroring existing
batch-patch logic. Add 2MiB+ line throughput tests at every SSE layer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace hardcoded DSML typo variant lists in Go/Node tool call parsers with
generalized prefix consumption that tolerates repeated leading <, repeated DSML
prefix noise, and trailing pipe terminators. Split tiktoken-dependent token
counting into a build-tagged file for non-cgo platform compatibility. Add /data
directory to Dockerfile for bind-mount permissions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>