Commit Graph

73 Commits

Author SHA1 Message Date
CJACK
112bedb05d refactor: differentiate reference marker handling between stream and non-stream modes
- Stream: strip both and [reference:N] markers to prevent
  leaking partial link metadata during incremental output
- Non-stream: convert citation/reference markers to Markdown links for
  Claude Messages, Gemini generateContent, and OpenAI Chat/Responses
- Remove StripReferenceMarkers option from call sites; behavior is now
  determined automatically by stream vs non-stream context
- Extend JS runtime stripReferenceMarkersText() to also match [citation:N]
- Add tests for streaming marker stripping and non-stream link conversion

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 17:53:49 +08:00
CJACK
c099a6f7bf feat: add unified response history session management across Claude, Gemini, and OpenAI API backends
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 17:24:38 +08:00
CJACK
5e55cf36d8 refactor: prioritize raw model output in chat history archiving to ensure accurate capture of tool call and thinking markup 2026-05-03 15:44:17 +08:00
CJACK
a299c7d1c4 refactor: remove thinking content from empty output validation logic to enforce stricter completion requirements 2026-05-03 06:59:20 +08:00
CJACK
a7522b4188 fix: retry thinking-only empty outputs, centralize reference marker stripping
- ValidateTurn no longer errors on thinking-only responses, deferring to
  ShouldRetryEmptyOutput which now also covers thinking-only outputs.
- Empty output retry uses multi-turn follow-up with a regeneration prompt
  suffix and parent_message_id in the same DeepSeek session.
- Centralize StripReferenceMarkersEnabled into textclean package to
  eliminate duplicated hardcoded booleans across 4 protocol handlers.
- Log a deprecation warning when the legacy "compat" config key is used.
- Document thinking-only retry and reference marker stripping in API.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 05:02:26 +08:00
CJACK
1286b02247 refactor: remove legacy compatibility configuration and UI components 2026-05-03 04:14:19 +08:00
CJACK
5f110e6910 refactor: remove legacy history split configuration and integrate current input file handling into the completion runtime pipeline. 2026-05-03 01:50:50 +08:00
CJACK
7c0bc9ec0f feat: implement support for thinking blocks in Gemini API and enable thinking by default for supported models 2026-05-03 01:00:06 +08:00
CJACK
a901250de7 refactor: replace bufio.Scanner with bufio.Reader for SSE stream parsing and track emitted text to prevent redundant output blocks 2026-05-02 23:50:35 +08:00
CJACK
dc5bffdf89 refactor: centralize assistant turn semantics and stream accumulation into new assistantturn and completionruntime packages 2026-05-02 23:28:43 +08:00
CJACK
eccd8c957b fix: prevent continuation replay overlap by trimming redundant text from thinking and response streams 2026-05-02 21:34:36 +08:00
CJACK
0156f6b45b Merge origin/dev into PR 406 2026-05-02 21:17:02 +08:00
CJACK
e7d6807c7c feat: emit empty completion chunk along with keep-alive heartbeat in chat stream 2026-05-02 20:54:10 +08:00
d407ccb773 perf(streaming): optimize TTFT and reduce buffering latency
Core changes:
- stream.go: New accumulation buffer architecture with scanner goroutine
  + select loop, MinChars=16, MaxWait=10ms, first-flush-immediate
- dedupe.go: Add TrimContinuationOverlapFromBuilder to avoid string copies
- claude/stream_runtime_core.go: Integrate toolstream for incremental text
- claude/stream_runtime_finalize.go: toolstream flush support
- stream_emitter.js: Reduce DeltaCoalescer thresholds (160->16 chars, 80->20ms)
- empty_retry: Add thinking-aware empty output detection
- Fix reasoning_content leak and finish_reason=null in edge cases
- Fix tail content truncation when max_tokens exceeded

Tests: sync test expectations with upstream for thinking content
2026-05-02 20:28:30 +08:00
CJACK
c8f7b6b371 refactor streaming accumulation and chat history UI 2026-05-02 20:15:38 +08:00
NgoQuocViet2001
36d0239dc6 feat(openai): retrieve uploaded file metadata 2026-05-02 14:33:42 +07:00
CJACK
e2756f800d feat: introduce JSON UTF-8 validation middleware and prepend output integrity guard system prompt to messages 2026-05-02 02:22:34 +08:00
CJACK
55abf64717 feat: add model type support for file uploads with automatic resolution and header propagation 2026-05-02 00:55:17 +08:00
CJACK
0bca6e2cee feat: implement context cancellation handling for chat and response stream runtimes to ensure clean termination without retries 2026-05-01 23:20:46 +08:00
CJACK.
934b40e572 Merge pull request #392 from wyv202011y/fix/timeout-and-context-cancel
fix: increase stream timeout constants for large-context models; guar…
2026-05-01 23:17:31 +08:00
CJACK
dd5a0c5213 refactor: update and standardize current input file continuation prompt instructions 2026-05-01 22:27:59 +08:00
CJACK
43402e7a26 refactor: rename history file constant from HISTORY.txt to DS2API_HISTORY.txt across codebase and tests 2026-05-01 22:05:45 +08:00
CJACK
df1cfac9bc refactor: replace history transcript format with numbered sections and rename upload file to HISTORY.txt 2026-05-01 21:15:17 +08:00
706e68de23 fix: increase stream timeout constants for large-context models; guard against context-cancelled double-recording
- Increase StreamIdleTimeout from 90s to 300s and MaxKeepaliveCount from 10 to 40
  to prevent premature stream termination with DeepSeek V4 Pro (~50K token contexts)
- Add r.Context().Err() check after ConsumeSSE in empty_retry_runtime (chat + responses)
  to prevent historySession.error() from overwriting historySession.stopped()
  when the request context is cancelled

References:
- MaxKeepaliveCount=10 creates a 50s no-content timeout that kills the stream
  before DeepSeek V4 Pro can produce its first token with large contexts
- Hermes Agent reports 'No response from provider for 180s' because the
  underlying SSE connection was already terminated by ds2api at 50s
- Context cancellation path: OnContextDone -> stopped(), then finalize()
  with empty output -> retry -> error() overwrites stopped()
2026-05-01 21:11:36 +08:00
CJACK
2671298439 fix: coalesce small stream deltas to prevent character swallowing; add read-tool cache guard
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 13:53:27 +08:00
CJACK
92e321fe2c 修复吞字问题 2026-05-01 01:31:48 +08:00
CJACK.
95b7665643 Merge branch 'dev' into codex/run-all-tests-and-fix-failures 2026-04-30 02:39:18 +08:00
CJACK.
966f21211d Fix nil-session guard in chat history test 2026-04-30 02:31:06 +08:00
NgoQuocViet2001
7dc3af40b2 feat(openai): add root route aliases 2026-04-30 01:24:53 +07:00
CJACK.
2f6b5ffda0 Fix current-input token text test expectation 2026-04-30 02:22:17 +08:00
CJACK.
7c3ff6ee7e Merge pull request #374 from shern-point/feat/full-context-file-token-accounting
Feat/full context file token accounting
2026-04-30 02:12:55 +08:00
CJACK.
63e62fd1b0 Merge pull request #372 from shern-point/feat/accurate-context-token-length
Feat/accurate context token length
2026-04-30 02:11:32 +08:00
shern-point
6a778e0d35 feat: include inline-uploaded file tokens in context token accounting
Track byte sizes of inline-uploaded files during PreprocessInlineFileInputs and convert them to conservative token estimates (bytes/3). RefFileTokens is threaded through StandardRequest into all OpenAI chat/responses usage builders so returned prompt_tokens/input_tokens reflect the full upstream context cost including attached files.
2026-04-30 01:42:51 +08:00
NgoQuocViet2001
9035c350a7 fix(openai): return 400 for inline file limit 2026-04-30 00:35:59 +07:00
shern-point
ba80052a26 fix: count uploaded file content in context token accounting
PromptTokenText now reflects the actual downstream context cost: the uploaded IGNORE.txt file content plus the neutral live prompt, instead of only the pre-split prompt text.
2026-04-30 01:12:35 +08:00
shern-point
78fdd63470 feat: add full-context token regression coverage and docs
Lock in the current_input_file regression with API-level tests and document that returned context token counts now track full prompt semantics with conservative sizing.
2026-04-30 00:46:06 +08:00
shern-point
4b4f097006 feat: use model-aware prompt counting in Gemini paths
Preserve Gemini prompt token text during normalization and remove the hardcoded DeepSeek model from native Gemini usage helpers.
2026-04-30 00:46:05 +08:00
shern-point
d3018c281b feat: use tokenizer-based counting in Claude token paths
Unify Claude count_tokens, legacy stream accounting, and legacy render usage with preserved prompt text so Claude stops falling back to lossy message formatting.
2026-04-30 00:46:04 +08:00
shern-point
415a2359ad feat: route OpenAI responses usage through preserved prompt text
Use the stored full-context prompt text for responses accounting so neutral placeholder prompts do not underreport returned input token counts.
2026-04-30 00:45:31 +08:00
shern-point
f702d45a24 feat: route OpenAI chat usage through preserved prompt text
Use the stored full-context prompt text for chat non-stream, stream, and retry accounting so current_input_file no longer shrinks returned prompt token counts.
2026-04-30 00:45:30 +08:00
shern-point
b96f736bd2 feat: preserve full prompt text across current_input_file rewrites
Keep token accounting tied to the original prompt even after the live prompt is replaced with a neutral placeholder and hidden context file.
2026-04-30 00:45:01 +08:00
CJACK.
33f6fef015 Fix tool-call fallback on sanitized empty text and remove history wrapper tags 2026-04-29 23:04:45 +08:00
MiY
241334c658 Fix stream compatibility and vision model exposure 2026-04-29 20:23:13 +08:00
CJACK.
22160de2c4 Merge pull request #359 from NgoQuocViet2001/ai/ds2api-small-fix
fix(openai): keep citation indexes one-based with zero-based references
2026-04-29 18:27:15 +08:00
NgoQuocViet2001
0cbc2c875d fix(openai): keep citation indexes one-based 2026-04-29 15:43:09 +07:00
CJACK.
2c8409dcbb fix docker defaults to writable /data config path and align docs 2026-04-29 13:46:22 +08:00
CJACK.
929d9a8ef7 Merge pull request #352 from shern-point/fix/tool-string-schema-protection
Fix/tool type schema protection
2026-04-29 07:51:21 +08:00
shern-point
6e21714e23 test: cover Claude schema-aware tool normalization 2026-04-29 01:59:42 +08:00
shern-point
48c4f0df9f fix: preserve runtime tool schemas in Claude tool output 2026-04-29 01:59:24 +08:00
ouqiting
28d2b0410f feat: parse split context files in list view 2026-04-29 01:15:29 +08:00