feat(timeline): compression parity upgrades (P2+P3+P6) by ngoclam9415 · Pull Request #8 · aitomatic/dana-runtime

ngoclam9415 · 2026-04-20T05:00:09Z

Summary

Close three compression gaps vs OpenClaude while staying single-tier, LLM-agnostic, KISS. Always uses len(str)/4 heuristic — zero provider coupling. System+tools coverage via optional caller-supplied callbacks.

Implements plan 260419-2241-compression-parity-dana-runtime (Phases 1–4).

What's in

Phase 1 — Heuristic threshold (P3)

DANA_COMPACT_TRIGGER_TOKENS env knob (default 150000, clamp [8k, 2M])
CompressedTimeline(system_tokens_fn, tools_tokens_fn) callbacks folded into needs_compression() estimate
star_agent wires len(system_prompt)//4 + len(json.dumps(tools))//4

Phase 2 — Cheap client-side shrink (P6)

cheap_shrink_tool_results() stubs old tool_result bodies to [cleared for context budget] preserving tool_call_id
Predictive gate blocks mutation when shrink-alone can't close gap → avoids vacuous summaries over stubs
Idempotent via content equality; opt-in via enable_cheap_shrink_tool_results (default off)

Phase 3 — Reactive compact + circuit breaker (P2)

PromptTooLongError typed exception; provider mapping for Anthropic (invalid_request_error + "prompt is too long"), OpenAI-compat (context_length_exceeded), Gemini (post-hoc WARNING on MAX_TOKENS)
llm_caller._invoke_llm_sync/async wraps with PTL catch → reactive_compact(attempt) → retry with 1s/3s backoff
reactive_compact drops {1:5, 2:10, 3:20} oldest + forward-orphan pruning + full re-summary (no shrink-bypass)
Per-session circuit breaker with cooldown recovery (DANA_CIRCUIT_COOLDOWN_SECONDS, default 300s) + half-open probe + reset_circuit() ops hatch
Kill switch DANA_DISABLE_REACTIVE_COMPACT=1
star_agent._maybe_compress_timeline re-raises PTL explicitly (no swallowing)

Phase 4 — Telemetry

CompressionLogFields TypedDict authoritative allowlist + new_compaction_id()
AST-based test asserts logger.*(..., extra={...}) keys stay within allowlist (no prompt-content leakage)

Files

Source: 2 new (compact_trigger.py, telemetry.py), 8 modified (providers, types.py, compression_engine.py, compressed_timeline.py, llm_caller.py, star_agent.py).
Tests: 6 new files, 44 new unit tests — all pass.
Fixtures: 3 provider PTL JSONs under tests/fixtures/provider_ptl/.
Docs: system-architecture.md updated, project-changelog.md created.

Known gaps (follow-up PRs — report filed)

Review report: plans/reports/code-review-260420-1112-compression-parity-triggering.md

CRITICAL-1 PTL retry closes over the captured messages list; post-compact retry re-sends the stale oversized payload → circuit opens on genuinely-recoverable turns. Tests pass because fake LLM ignores payload content.
CRITICAL-2 Recent huge tool_result is unreclaimable (shrink keep_recent=10 blocks it; reactive_compact drops oldest only).
HIGH Multi-compression does not preserve prior summary text; stubbed content persisted across reload produces vacuous summaries on re-compression.

These should be addressed before relying on reactive recovery in production. The exception types, circuit breaker, telemetry, and pre-turn heuristic trigger are usable as-is.

Test plan

Unit: uv run pytest tests/unit/test_compact_trigger.py tests/unit/test_compressed_timeline_callbacks.py tests/unit/test_cheap_shrink.py tests/unit/test_reactive_compact.py tests/unit/test_llm_caller_ptl_retry.py tests/unit/test_log_field_allowlist.py → 44/44 pass
Full unit suite regression: 1064 passed (1 pre-existing failure in test_star_agent_streaming.py verified on develop, unrelated)
Compile: uv run python -m compileall dana/core/timeline dana/core/agent dana/common/llm dana/core/llm → OK
Integration test for CRITICAL-1 (stale messages) — follow-up PR
Integration test for CRITICAL-2 (huge recent tool result) — follow-up PR

Close three compression gaps vs OpenClaude while staying single-tier, LLM-agnostic, KISS. Always uses len(str)/4 heuristic — zero provider coupling. System+tools coverage via optional caller-supplied callbacks. Phase 1 — Heuristic threshold (P3) - New env knob DANA_COMPACT_TRIGGER_TOKENS (default 150000, clamp [8k, 2M]) - CompressedTimeline accepts optional system_tokens_fn / tools_tokens_fn callbacks; folded into needs_compression() estimate - star_agent passes len(system_prompt)//4 + len(json.dumps(tools))//4 Phase 2 — Cheap client-side shrink (P6) - cheap_shrink_tool_results() stubs old tool_result bodies to "[cleared for context budget]" preserving tool_call_id - Predictive gate blocks mutation when savings insufficient (avoids vacuous summary over stubs) - Idempotent via content-equality (no metadata flag) - Opt-in via enable_cheap_shrink_tool_results; off by default Phase 3 — Reactive compact + circuit breaker (P2) - PromptTooLongError typed exception; provider mapping for Anthropic (invalid_request_error + "prompt is too long"), OpenAI-compat (context_length_exceeded), Gemini (post-hoc WARNING on MAX_TOKENS) - llm_caller._invoke_llm_sync/async wraps with PTL catch → reactive_compact(attempt) → retry with 1s/3s backoff - reactive_compact drops 5→10→20 oldest + _remove_forward_orphans + full re-summary (no shrink-bypass) - Per-session circuit breaker with cooldown recovery (DANA_CIRCUIT_COOLDOWN_SECONDS, default 300s) + half-open probe - Kill switch via DANA_DISABLE_REACTIVE_COMPACT=1 - star_agent._maybe_compress_timeline re-raises PTL explicitly Phase 4 — Telemetry & polish - CompressionLogFields TypedDict allowlist + new_compaction_id() - AST-based test asserts log extra={} keys stay within allowlist Known gaps (documented in review report, follow-up PRs): - PTL retry closes over captured messages list; post-compact retry re-sends stale oversized payload - Recent huge tool_result cannot be reclaimed (shrink keep_recent blocks it; reactive_compact drops oldest only) - Multi-compression does not preserve prior summary text - Stubbed content persisted across reload produces vacuous summaries

Addresses the code review in plans/reports/code-review-260420-1112-compression-parity-triggering.md. Scope: two CRITICAL merge-blockers, one HIGH bug, and the user-requested snapshot-based persistence. HIGH-2/HIGH-3 and MEDIUM/LOW items are intentionally deferred. CRITICAL-1 — stale messages on PTL retry (llm_caller): _invoke_llm_sync/async closed over `messages`, so after reactive_compact trimmed the timeline, retries re-sent the same oversized payload and the circuit opened on genuinely-recoverable sessions. Added a `messages_fn: Callable[[], list[LLMMessage]]` parameter threaded through call_llm → _call_with_failover → _invoke_llm_sync/async. After each reactive_compact, the factory is re-invoked so the retry observes the compacted payload. star_agent passes a factory that re-calls runtime.build_prompt. Backwards-compatible (parameter defaults to None). CRITICAL-2 — unreclaimable giant tool_result: Single huge tool_result in the keep-recent window couldn't be stubbed (cheap_shrink skips recent) nor dropped (reactive_compact drops oldest), wedging sessions after one big call. Fixed at ingest time: maybe_dump_oversized_content writes bodies >50KB (env-tunable via DANA_TOOL_RESULT_DUMP_THRESHOLD_CHARS) to {session}/tool_results/{tool_call_id}.txt and replaces the timeline content with a compact marker that preserves tool_call_id. A new ToolResultDumpResource exposes a read_tool_result tool with offset/limit slicing; auto-wired into STARAgent (opt-out via DANA_DISABLE_TOOL_RESULT_DUMP_RESOURCE=1). HIGH-1 — vacuous summaries on reload with stubs: _format_entries_for_compression now emits [Tool result id=X: previously cleared — content unavailable] instead of feeding the literal [cleared for context budget] stub back into the LLM, so re-summarization after reload is not dominated by "the agent cleared tool results". New — snapshot-based persistence (user requested): timeline.json is frozen at the first compression. Each compression rolls timeline-after-compress-{ISO-ts}.json; subsequent saves within a generation update the same snapshot in place. Full audit retention — older snapshots are never deleted. Repository read and the serializer loader both prefer the newest snapshot and fall back to timeline.json, then legacy path. Reload rehydrates the active snapshot so a fresh process does not roll a new file on every save. Tests: 4 new CRITICAL-1 tests (assert retry token count < first attempt), 14 new tool-result dump tests, 7 new snapshot persistence tests. Two existing failover tests updated for the new messages_fn parameter. Full suite: 1148 unit + 72 integration passing.

`StarAgent` unconditionally aliased `max_context_tokens` as the compression trigger, so any agent that set a context budget (e.g. `EnergyWasteAnalyst` at 200k) transparently overrode `DANA_COMPACT_TRIGGER_TOKENS` — ops had no reachable knob through the agent path. Split the two concerns: - `CompressedTimeline.__init__` gains a dedicated `max_context_tokens` kwarg. Trigger resolution: explicit → env → 150k default. Budget resolution: explicit → falls back to resolved trigger for callers that haven't split yet. `cutoff_when_token_reach` stays pinned to the trigger. - `StarAgent.__init__` gains `compress_trigger_tokens: int | None = None` and threads the two knobs separately. `compress_trigger_tokens=None` (default) defers to the env var, which is the intended ops contract. Regression guards cover: budget set alone leaves trigger on env, env wins when only the budget is explicit, explicit trigger still beats env, and legacy single-knob callers still alias (backward compat).

….json After compression rolls a snapshot, resuming a session via `CompressedTimeline.load_from_entries(entries)` (without native_messages) skipped the snapshot-state rehydration that `read_since` does. The save path then fell through to `session_folder / "timeline.json"` and overwrote it with post-compression state on every turn — leaving the snapshot file frozen and the canonical `timeline.json` polluted. The Honeywell Django caller (agent_service.py) takes exactly this path: reads entries via snapshot-aware `read_session_entries`, then calls `load_from_entries(entries)` without the `native_messages` arg, so the load-side rehydration in `_try_load_native_messages_from_repository` never runs. Fix in `_resolve_snapshot_write_path`: when no active snapshot is tracked but the session folder already contains one or more `timeline-after-compress-*.json`, adopt the newest as the write target and stamp `_active_snapshot_compression_at` from its filename. Symmetric with `LocalTimelineRepository.read_session_entries` which already prefers newest snapshot for reads. Regression test (`test_resume_via_load_from_entries_adopts_newest_snapshot_on_save`) mirrors the Honeywell caller: compress, fresh timeline, load_from_entries, add entry, save, assert write landed in the snapshot and timeline.json stayed frozen.

…t_sessions (GH-1) Timeline serializer now goes through the repository interface only — no direct file I/O, no Path/glob, no _events_path access. Compaction mints sibling logical sessions {base}__compact__{ISO-ts} instead of rolling snapshot files, making the compression feature usable with any TimelineRepositoryProtocol implementation (including in-memory and remote repos, not just filesystem-backed). - Add list_sessions(prefix) to TimelineRepositoryProtocol + local impl. - Add TimelineRepositoryDefaultsMixin so external repos without list_sessions get a no-op default (empty list → single-session fallback). - Rewrite TimelineSerializerMixin (448→347 LOC) to use only save/ read_session_entries/list_sessions. - Drop native_messages persistence — recomputed from entries on load. - Rename CompressedTimeline state: _active_snapshot_path → _active_compact_session_id, _active_snapshot_compression_at → _active_compact_compression_at. - Use microsecond precision in compact session timestamps to prevent same-second collisions that would break audit retention. - Add in-memory test fixture + parity test proving behavior matches across local-fs and in-memory backends. - Keep LocalTimelineRepository._resolve_timeline_file_for_read for backward-compat reads of legacy timeline-after-compress-*.json. Refs: GH-1 Plan: plans/260420-2141-GH-1-timeline-repository-compatibility/ Tests: 132/132 timeline+compression tests pass.

…path When SearchResource.grep receives a file path with the default output_mode=files_with_matches, the engine returns only the bare path — which LLM callers frequently misread as an empty result. Auto-promote to content mode with an explanatory header note so the output actually conveys what matched. Also refactors the AUTO-mode engine chain into a for/break/else loop so the capture-then-prepend flow stays clean.

ngoclam9415 added 6 commits April 20, 2026 11:59

ngoclam9415 merged commit 4b2d713 into develop Apr 22, 2026
1 check passed

TheVinhLuong102 deleted the feat/compression-parity-phases-1-4 branch May 9, 2026 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(timeline): compression parity upgrades (P2+P3+P6)#8

feat(timeline): compression parity upgrades (P2+P3+P6)#8
ngoclam9415 merged 6 commits into
developfrom
feat/compression-parity-phases-1-4

ngoclam9415 commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ngoclam9415 commented Apr 20, 2026

Summary

What's in

Phase 1 — Heuristic threshold (P3)

Phase 2 — Cheap client-side shrink (P6)

Phase 3 — Reactive compact + circuit breaker (P2)

Phase 4 — Telemetry

Files

Known gaps (follow-up PRs — report filed)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant