feat(timeline): compression parity upgrades (P2+P3+P6)#8
Merged
Conversation
Close three compression gaps vs OpenClaude while staying single-tier,
LLM-agnostic, KISS. Always uses len(str)/4 heuristic — zero provider
coupling. System+tools coverage via optional caller-supplied callbacks.
Phase 1 — Heuristic threshold (P3)
- New env knob DANA_COMPACT_TRIGGER_TOKENS (default 150000, clamp [8k, 2M])
- CompressedTimeline accepts optional system_tokens_fn / tools_tokens_fn
callbacks; folded into needs_compression() estimate
- star_agent passes len(system_prompt)//4 + len(json.dumps(tools))//4
Phase 2 — Cheap client-side shrink (P6)
- cheap_shrink_tool_results() stubs old tool_result bodies to
"[cleared for context budget]" preserving tool_call_id
- Predictive gate blocks mutation when savings insufficient (avoids
vacuous summary over stubs)
- Idempotent via content-equality (no metadata flag)
- Opt-in via enable_cheap_shrink_tool_results; off by default
Phase 3 — Reactive compact + circuit breaker (P2)
- PromptTooLongError typed exception; provider mapping for Anthropic
(invalid_request_error + "prompt is too long"), OpenAI-compat
(context_length_exceeded), Gemini (post-hoc WARNING on MAX_TOKENS)
- llm_caller._invoke_llm_sync/async wraps with PTL catch →
reactive_compact(attempt) → retry with 1s/3s backoff
- reactive_compact drops 5→10→20 oldest + _remove_forward_orphans +
full re-summary (no shrink-bypass)
- Per-session circuit breaker with cooldown recovery
(DANA_CIRCUIT_COOLDOWN_SECONDS, default 300s) + half-open probe
- Kill switch via DANA_DISABLE_REACTIVE_COMPACT=1
- star_agent._maybe_compress_timeline re-raises PTL explicitly
Phase 4 — Telemetry & polish
- CompressionLogFields TypedDict allowlist + new_compaction_id()
- AST-based test asserts log extra={} keys stay within allowlist
Known gaps (documented in review report, follow-up PRs):
- PTL retry closes over captured messages list; post-compact retry
re-sends stale oversized payload
- Recent huge tool_result cannot be reclaimed (shrink keep_recent
blocks it; reactive_compact drops oldest only)
- Multi-compression does not preserve prior summary text
- Stubbed content persisted across reload produces vacuous summaries
Addresses the code review in
plans/reports/code-review-260420-1112-compression-parity-triggering.md.
Scope: two CRITICAL merge-blockers, one HIGH bug, and the user-requested
snapshot-based persistence. HIGH-2/HIGH-3 and MEDIUM/LOW items are
intentionally deferred.
CRITICAL-1 — stale messages on PTL retry (llm_caller):
_invoke_llm_sync/async closed over `messages`, so after reactive_compact
trimmed the timeline, retries re-sent the same oversized payload and the
circuit opened on genuinely-recoverable sessions. Added a
`messages_fn: Callable[[], list[LLMMessage]]` parameter threaded through
call_llm → _call_with_failover → _invoke_llm_sync/async. After each
reactive_compact, the factory is re-invoked so the retry observes the
compacted payload. star_agent passes a factory that re-calls
runtime.build_prompt. Backwards-compatible (parameter defaults to None).
CRITICAL-2 — unreclaimable giant tool_result:
Single huge tool_result in the keep-recent window couldn't be stubbed
(cheap_shrink skips recent) nor dropped (reactive_compact drops oldest),
wedging sessions after one big call. Fixed at ingest time:
maybe_dump_oversized_content writes bodies >50KB (env-tunable via
DANA_TOOL_RESULT_DUMP_THRESHOLD_CHARS) to
{session}/tool_results/{tool_call_id}.txt and replaces the timeline
content with a compact marker that preserves tool_call_id. A new
ToolResultDumpResource exposes a read_tool_result tool with
offset/limit slicing; auto-wired into STARAgent (opt-out via
DANA_DISABLE_TOOL_RESULT_DUMP_RESOURCE=1).
HIGH-1 — vacuous summaries on reload with stubs:
_format_entries_for_compression now emits
[Tool result id=X: previously cleared — content unavailable] instead
of feeding the literal [cleared for context budget] stub back into
the LLM, so re-summarization after reload is not dominated by
"the agent cleared tool results".
New — snapshot-based persistence (user requested):
timeline.json is frozen at the first compression. Each compression
rolls timeline-after-compress-{ISO-ts}.json; subsequent saves within
a generation update the same snapshot in place. Full audit retention
— older snapshots are never deleted. Repository read and the
serializer loader both prefer the newest snapshot and fall back to
timeline.json, then legacy path. Reload rehydrates the active
snapshot so a fresh process does not roll a new file on every save.
Tests: 4 new CRITICAL-1 tests (assert retry token count < first attempt),
14 new tool-result dump tests, 7 new snapshot persistence tests. Two
existing failover tests updated for the new messages_fn parameter. Full
suite: 1148 unit + 72 integration passing.
`StarAgent` unconditionally aliased `max_context_tokens` as the compression trigger, so any agent that set a context budget (e.g. `EnergyWasteAnalyst` at 200k) transparently overrode `DANA_COMPACT_TRIGGER_TOKENS` — ops had no reachable knob through the agent path. Split the two concerns: - `CompressedTimeline.__init__` gains a dedicated `max_context_tokens` kwarg. Trigger resolution: explicit → env → 150k default. Budget resolution: explicit → falls back to resolved trigger for callers that haven't split yet. `cutoff_when_token_reach` stays pinned to the trigger. - `StarAgent.__init__` gains `compress_trigger_tokens: int | None = None` and threads the two knobs separately. `compress_trigger_tokens=None` (default) defers to the env var, which is the intended ops contract. Regression guards cover: budget set alone leaves trigger on env, env wins when only the budget is explicit, explicit trigger still beats env, and legacy single-knob callers still alias (backward compat).
….json After compression rolls a snapshot, resuming a session via `CompressedTimeline.load_from_entries(entries)` (without native_messages) skipped the snapshot-state rehydration that `read_since` does. The save path then fell through to `session_folder / "timeline.json"` and overwrote it with post-compression state on every turn — leaving the snapshot file frozen and the canonical `timeline.json` polluted. The Honeywell Django caller (agent_service.py) takes exactly this path: reads entries via snapshot-aware `read_session_entries`, then calls `load_from_entries(entries)` without the `native_messages` arg, so the load-side rehydration in `_try_load_native_messages_from_repository` never runs. Fix in `_resolve_snapshot_write_path`: when no active snapshot is tracked but the session folder already contains one or more `timeline-after-compress-*.json`, adopt the newest as the write target and stamp `_active_snapshot_compression_at` from its filename. Symmetric with `LocalTimelineRepository.read_session_entries` which already prefers newest snapshot for reads. Regression test (`test_resume_via_load_from_entries_adopts_newest_snapshot_on_save`) mirrors the Honeywell caller: compress, fresh timeline, load_from_entries, add entry, save, assert write landed in the snapshot and timeline.json stayed frozen.
…t_sessions (GH-1) Timeline serializer now goes through the repository interface only — no direct file I/O, no Path/glob, no _events_path access. Compaction mints sibling logical sessions {base}__compact__{ISO-ts} instead of rolling snapshot files, making the compression feature usable with any TimelineRepositoryProtocol implementation (including in-memory and remote repos, not just filesystem-backed). - Add list_sessions(prefix) to TimelineRepositoryProtocol + local impl. - Add TimelineRepositoryDefaultsMixin so external repos without list_sessions get a no-op default (empty list → single-session fallback). - Rewrite TimelineSerializerMixin (448→347 LOC) to use only save/ read_session_entries/list_sessions. - Drop native_messages persistence — recomputed from entries on load. - Rename CompressedTimeline state: _active_snapshot_path → _active_compact_session_id, _active_snapshot_compression_at → _active_compact_compression_at. - Use microsecond precision in compact session timestamps to prevent same-second collisions that would break audit retention. - Add in-memory test fixture + parity test proving behavior matches across local-fs and in-memory backends. - Keep LocalTimelineRepository._resolve_timeline_file_for_read for backward-compat reads of legacy timeline-after-compress-*.json. Refs: GH-1 Plan: plans/260420-2141-GH-1-timeline-repository-compatibility/ Tests: 132/132 timeline+compression tests pass.
…path When SearchResource.grep receives a file path with the default output_mode=files_with_matches, the engine returns only the bare path — which LLM callers frequently misread as an empty result. Auto-promote to content mode with an explanatory header note so the output actually conveys what matched. Also refactors the AUTO-mode engine chain into a for/break/else loop so the capture-then-prepend flow stays clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Close three compression gaps vs OpenClaude while staying single-tier, LLM-agnostic, KISS. Always uses
len(str)/4heuristic — zero provider coupling. System+tools coverage via optional caller-supplied callbacks.Implements plan
260419-2241-compression-parity-dana-runtime(Phases 1–4).What's in
Phase 1 — Heuristic threshold (P3)
DANA_COMPACT_TRIGGER_TOKENSenv knob (default 150000, clamp[8k, 2M])CompressedTimeline(system_tokens_fn, tools_tokens_fn)callbacks folded intoneeds_compression()estimatestar_agentwireslen(system_prompt)//4+len(json.dumps(tools))//4Phase 2 — Cheap client-side shrink (P6)
cheap_shrink_tool_results()stubs old tool_result bodies to[cleared for context budget]preservingtool_call_idenable_cheap_shrink_tool_results(default off)Phase 3 — Reactive compact + circuit breaker (P2)
PromptTooLongErrortyped exception; provider mapping for Anthropic (invalid_request_error+"prompt is too long"), OpenAI-compat (context_length_exceeded), Gemini (post-hoc WARNING onMAX_TOKENS)llm_caller._invoke_llm_sync/asyncwraps with PTL catch →reactive_compact(attempt)→ retry with 1s/3s backoffreactive_compactdrops{1:5, 2:10, 3:20}oldest + forward-orphan pruning + full re-summary (no shrink-bypass)DANA_CIRCUIT_COOLDOWN_SECONDS, default 300s) + half-open probe +reset_circuit()ops hatchDANA_DISABLE_REACTIVE_COMPACT=1star_agent._maybe_compress_timelinere-raises PTL explicitly (no swallowing)Phase 4 — Telemetry
CompressionLogFieldsTypedDict authoritative allowlist +new_compaction_id()logger.*(..., extra={...})keys stay within allowlist (no prompt-content leakage)Files
Source: 2 new (
compact_trigger.py,telemetry.py), 8 modified (providers,types.py,compression_engine.py,compressed_timeline.py,llm_caller.py,star_agent.py).Tests: 6 new files, 44 new unit tests — all pass.
Fixtures: 3 provider PTL JSONs under
tests/fixtures/provider_ptl/.Docs:
system-architecture.mdupdated,project-changelog.mdcreated.Known gaps (follow-up PRs — report filed)
Review report:
plans/reports/code-review-260420-1112-compression-parity-triggering.mdmessageslist; post-compact retry re-sends the stale oversized payload → circuit opens on genuinely-recoverable turns. Tests pass because fake LLM ignores payload content.keep_recent=10blocks it; reactive_compact drops oldest only).These should be addressed before relying on reactive recovery in production. The exception types, circuit breaker, telemetry, and pre-turn heuristic trigger are usable as-is.
Test plan
uv run pytest tests/unit/test_compact_trigger.py tests/unit/test_compressed_timeline_callbacks.py tests/unit/test_cheap_shrink.py tests/unit/test_reactive_compact.py tests/unit/test_llm_caller_ptl_retry.py tests/unit/test_log_field_allowlist.py→ 44/44 passtest_star_agent_streaming.pyverified on develop, unrelated)uv run python -m compileall dana/core/timeline dana/core/agent dana/common/llm dana/core/llm→ OK