Skip to content

feat(timeline): compression parity upgrades (P2+P3+P6)#8

Merged
ngoclam9415 merged 6 commits into
developfrom
feat/compression-parity-phases-1-4
Apr 22, 2026
Merged

feat(timeline): compression parity upgrades (P2+P3+P6)#8
ngoclam9415 merged 6 commits into
developfrom
feat/compression-parity-phases-1-4

Conversation

@ngoclam9415
Copy link
Copy Markdown
Contributor

Summary

Close three compression gaps vs OpenClaude while staying single-tier, LLM-agnostic, KISS. Always uses len(str)/4 heuristic — zero provider coupling. System+tools coverage via optional caller-supplied callbacks.

Implements plan 260419-2241-compression-parity-dana-runtime (Phases 1–4).

What's in

Phase 1 — Heuristic threshold (P3)

  • DANA_COMPACT_TRIGGER_TOKENS env knob (default 150000, clamp [8k, 2M])
  • CompressedTimeline(system_tokens_fn, tools_tokens_fn) callbacks folded into needs_compression() estimate
  • star_agent wires len(system_prompt)//4 + len(json.dumps(tools))//4

Phase 2 — Cheap client-side shrink (P6)

  • cheap_shrink_tool_results() stubs old tool_result bodies to [cleared for context budget] preserving tool_call_id
  • Predictive gate blocks mutation when shrink-alone can't close gap → avoids vacuous summaries over stubs
  • Idempotent via content equality; opt-in via enable_cheap_shrink_tool_results (default off)

Phase 3 — Reactive compact + circuit breaker (P2)

  • PromptTooLongError typed exception; provider mapping for Anthropic (invalid_request_error + "prompt is too long"), OpenAI-compat (context_length_exceeded), Gemini (post-hoc WARNING on MAX_TOKENS)
  • llm_caller._invoke_llm_sync/async wraps with PTL catch → reactive_compact(attempt) → retry with 1s/3s backoff
  • reactive_compact drops {1:5, 2:10, 3:20} oldest + forward-orphan pruning + full re-summary (no shrink-bypass)
  • Per-session circuit breaker with cooldown recovery (DANA_CIRCUIT_COOLDOWN_SECONDS, default 300s) + half-open probe + reset_circuit() ops hatch
  • Kill switch DANA_DISABLE_REACTIVE_COMPACT=1
  • star_agent._maybe_compress_timeline re-raises PTL explicitly (no swallowing)

Phase 4 — Telemetry

  • CompressionLogFields TypedDict authoritative allowlist + new_compaction_id()
  • AST-based test asserts logger.*(..., extra={...}) keys stay within allowlist (no prompt-content leakage)

Files

Source: 2 new (compact_trigger.py, telemetry.py), 8 modified (providers, types.py, compression_engine.py, compressed_timeline.py, llm_caller.py, star_agent.py).
Tests: 6 new files, 44 new unit tests — all pass.
Fixtures: 3 provider PTL JSONs under tests/fixtures/provider_ptl/.
Docs: system-architecture.md updated, project-changelog.md created.

Known gaps (follow-up PRs — report filed)

Review report: plans/reports/code-review-260420-1112-compression-parity-triggering.md

  • CRITICAL-1 PTL retry closes over the captured messages list; post-compact retry re-sends the stale oversized payload → circuit opens on genuinely-recoverable turns. Tests pass because fake LLM ignores payload content.
  • CRITICAL-2 Recent huge tool_result is unreclaimable (shrink keep_recent=10 blocks it; reactive_compact drops oldest only).
  • HIGH Multi-compression does not preserve prior summary text; stubbed content persisted across reload produces vacuous summaries on re-compression.

These should be addressed before relying on reactive recovery in production. The exception types, circuit breaker, telemetry, and pre-turn heuristic trigger are usable as-is.

Test plan

  • Unit: uv run pytest tests/unit/test_compact_trigger.py tests/unit/test_compressed_timeline_callbacks.py tests/unit/test_cheap_shrink.py tests/unit/test_reactive_compact.py tests/unit/test_llm_caller_ptl_retry.py tests/unit/test_log_field_allowlist.py → 44/44 pass
  • Full unit suite regression: 1064 passed (1 pre-existing failure in test_star_agent_streaming.py verified on develop, unrelated)
  • Compile: uv run python -m compileall dana/core/timeline dana/core/agent dana/common/llm dana/core/llm → OK
  • Integration test for CRITICAL-1 (stale messages) — follow-up PR
  • Integration test for CRITICAL-2 (huge recent tool result) — follow-up PR

Close three compression gaps vs OpenClaude while staying single-tier,
LLM-agnostic, KISS. Always uses len(str)/4 heuristic — zero provider
coupling. System+tools coverage via optional caller-supplied callbacks.

Phase 1 — Heuristic threshold (P3)
- New env knob DANA_COMPACT_TRIGGER_TOKENS (default 150000, clamp [8k, 2M])
- CompressedTimeline accepts optional system_tokens_fn / tools_tokens_fn
  callbacks; folded into needs_compression() estimate
- star_agent passes len(system_prompt)//4 + len(json.dumps(tools))//4

Phase 2 — Cheap client-side shrink (P6)
- cheap_shrink_tool_results() stubs old tool_result bodies to
  "[cleared for context budget]" preserving tool_call_id
- Predictive gate blocks mutation when savings insufficient (avoids
  vacuous summary over stubs)
- Idempotent via content-equality (no metadata flag)
- Opt-in via enable_cheap_shrink_tool_results; off by default

Phase 3 — Reactive compact + circuit breaker (P2)
- PromptTooLongError typed exception; provider mapping for Anthropic
  (invalid_request_error + "prompt is too long"), OpenAI-compat
  (context_length_exceeded), Gemini (post-hoc WARNING on MAX_TOKENS)
- llm_caller._invoke_llm_sync/async wraps with PTL catch →
  reactive_compact(attempt) → retry with 1s/3s backoff
- reactive_compact drops 5→10→20 oldest + _remove_forward_orphans +
  full re-summary (no shrink-bypass)
- Per-session circuit breaker with cooldown recovery
  (DANA_CIRCUIT_COOLDOWN_SECONDS, default 300s) + half-open probe
- Kill switch via DANA_DISABLE_REACTIVE_COMPACT=1
- star_agent._maybe_compress_timeline re-raises PTL explicitly

Phase 4 — Telemetry & polish
- CompressionLogFields TypedDict allowlist + new_compaction_id()
- AST-based test asserts log extra={} keys stay within allowlist

Known gaps (documented in review report, follow-up PRs):
- PTL retry closes over captured messages list; post-compact retry
  re-sends stale oversized payload
- Recent huge tool_result cannot be reclaimed (shrink keep_recent
  blocks it; reactive_compact drops oldest only)
- Multi-compression does not preserve prior summary text
- Stubbed content persisted across reload produces vacuous summaries
Addresses the code review in
plans/reports/code-review-260420-1112-compression-parity-triggering.md.
Scope: two CRITICAL merge-blockers, one HIGH bug, and the user-requested
snapshot-based persistence. HIGH-2/HIGH-3 and MEDIUM/LOW items are
intentionally deferred.

CRITICAL-1 — stale messages on PTL retry (llm_caller):
  _invoke_llm_sync/async closed over `messages`, so after reactive_compact
  trimmed the timeline, retries re-sent the same oversized payload and the
  circuit opened on genuinely-recoverable sessions. Added a
  `messages_fn: Callable[[], list[LLMMessage]]` parameter threaded through
  call_llm → _call_with_failover → _invoke_llm_sync/async. After each
  reactive_compact, the factory is re-invoked so the retry observes the
  compacted payload. star_agent passes a factory that re-calls
  runtime.build_prompt. Backwards-compatible (parameter defaults to None).

CRITICAL-2 — unreclaimable giant tool_result:
  Single huge tool_result in the keep-recent window couldn't be stubbed
  (cheap_shrink skips recent) nor dropped (reactive_compact drops oldest),
  wedging sessions after one big call. Fixed at ingest time:
  maybe_dump_oversized_content writes bodies >50KB (env-tunable via
  DANA_TOOL_RESULT_DUMP_THRESHOLD_CHARS) to
  {session}/tool_results/{tool_call_id}.txt and replaces the timeline
  content with a compact marker that preserves tool_call_id. A new
  ToolResultDumpResource exposes a read_tool_result tool with
  offset/limit slicing; auto-wired into STARAgent (opt-out via
  DANA_DISABLE_TOOL_RESULT_DUMP_RESOURCE=1).

HIGH-1 — vacuous summaries on reload with stubs:
  _format_entries_for_compression now emits
  [Tool result id=X: previously cleared — content unavailable] instead
  of feeding the literal [cleared for context budget] stub back into
  the LLM, so re-summarization after reload is not dominated by
  "the agent cleared tool results".

New — snapshot-based persistence (user requested):
  timeline.json is frozen at the first compression. Each compression
  rolls timeline-after-compress-{ISO-ts}.json; subsequent saves within
  a generation update the same snapshot in place. Full audit retention
  — older snapshots are never deleted. Repository read and the
  serializer loader both prefer the newest snapshot and fall back to
  timeline.json, then legacy path. Reload rehydrates the active
  snapshot so a fresh process does not roll a new file on every save.

Tests: 4 new CRITICAL-1 tests (assert retry token count < first attempt),
14 new tool-result dump tests, 7 new snapshot persistence tests. Two
existing failover tests updated for the new messages_fn parameter. Full
suite: 1148 unit + 72 integration passing.
`StarAgent` unconditionally aliased `max_context_tokens` as the compression
trigger, so any agent that set a context budget (e.g. `EnergyWasteAnalyst`
at 200k) transparently overrode `DANA_COMPACT_TRIGGER_TOKENS` — ops had no
reachable knob through the agent path.

Split the two concerns:

- `CompressedTimeline.__init__` gains a dedicated `max_context_tokens` kwarg.
  Trigger resolution: explicit → env → 150k default. Budget resolution:
  explicit → falls back to resolved trigger for callers that haven't split
  yet. `cutoff_when_token_reach` stays pinned to the trigger.
- `StarAgent.__init__` gains `compress_trigger_tokens: int | None = None`
  and threads the two knobs separately. `compress_trigger_tokens=None`
  (default) defers to the env var, which is the intended ops contract.

Regression guards cover: budget set alone leaves trigger on env, env wins
when only the budget is explicit, explicit trigger still beats env, and
legacy single-knob callers still alias (backward compat).
….json

After compression rolls a snapshot, resuming a session via
`CompressedTimeline.load_from_entries(entries)` (without native_messages)
skipped the snapshot-state rehydration that `read_since` does. The save
path then fell through to `session_folder / "timeline.json"` and
overwrote it with post-compression state on every turn — leaving the
snapshot file frozen and the canonical `timeline.json` polluted.

The Honeywell Django caller (agent_service.py) takes exactly this path:
reads entries via snapshot-aware `read_session_entries`, then calls
`load_from_entries(entries)` without the `native_messages` arg, so the
load-side rehydration in `_try_load_native_messages_from_repository`
never runs.

Fix in `_resolve_snapshot_write_path`: when no active snapshot is
tracked but the session folder already contains one or more
`timeline-after-compress-*.json`, adopt the newest as the write target
and stamp `_active_snapshot_compression_at` from its filename. Symmetric
with `LocalTimelineRepository.read_session_entries` which already
prefers newest snapshot for reads.

Regression test (`test_resume_via_load_from_entries_adopts_newest_snapshot_on_save`)
mirrors the Honeywell caller: compress, fresh timeline, load_from_entries,
add entry, save, assert write landed in the snapshot and timeline.json
stayed frozen.
…t_sessions (GH-1)

Timeline serializer now goes through the repository interface only — no
direct file I/O, no Path/glob, no _events_path access. Compaction mints
sibling logical sessions {base}__compact__{ISO-ts} instead of rolling
snapshot files, making the compression feature usable with any
TimelineRepositoryProtocol implementation (including in-memory and
remote repos, not just filesystem-backed).

- Add list_sessions(prefix) to TimelineRepositoryProtocol + local impl.
- Add TimelineRepositoryDefaultsMixin so external repos without
  list_sessions get a no-op default (empty list → single-session fallback).
- Rewrite TimelineSerializerMixin (448→347 LOC) to use only save/
  read_session_entries/list_sessions.
- Drop native_messages persistence — recomputed from entries on load.
- Rename CompressedTimeline state: _active_snapshot_path →
  _active_compact_session_id, _active_snapshot_compression_at →
  _active_compact_compression_at.
- Use microsecond precision in compact session timestamps to prevent
  same-second collisions that would break audit retention.
- Add in-memory test fixture + parity test proving behavior matches
  across local-fs and in-memory backends.
- Keep LocalTimelineRepository._resolve_timeline_file_for_read for
  backward-compat reads of legacy timeline-after-compress-*.json.

Refs: GH-1
Plan: plans/260420-2141-GH-1-timeline-repository-compatibility/
Tests: 132/132 timeline+compression tests pass.
…path

When SearchResource.grep receives a file path with the default
output_mode=files_with_matches, the engine returns only the bare path
— which LLM callers frequently misread as an empty result. Auto-promote
to content mode with an explanatory header note so the output actually
conveys what matched. Also refactors the AUTO-mode engine chain into a
for/break/else loop so the capture-then-prepend flow stays clean.
@ngoclam9415 ngoclam9415 merged commit 4b2d713 into develop Apr 22, 2026
1 check passed
@TheVinhLuong102 TheVinhLuong102 deleted the feat/compression-parity-phases-1-4 branch May 9, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant