Skip to content

feat: Add "semble savings" command#76

Open
Pringled wants to merge 19 commits intomainfrom
add-saved-tokens-logging
Open

feat: Add "semble savings" command#76
Pringled wants to merge 19 commits intomainfrom
add-saved-tokens-logging

Conversation

@Pringled
Copy link
Copy Markdown
Member

@Pringled Pringled commented May 7, 2026

This PR adds semble savings, a CLI command that tracks and displays token savings across all searches. Stats are recorded automatically to ~/.semble/savings.jsonl on every search. There's also a verbose output that shows the calls per call type but it's not that interesting (yet) since we only have search and find_related atm.

Example output 💅 ✨:

  Semble Token Savings
  ════════════════════════════════════════════════════════════════
  Period        Calls   Savings
  ────────────────────────────────────────────────────────────────
  Today         42      [███████████████░]  ~58.4k tokens (95%)
  Last 7 days   287     [██████████████░░]  ~312.4k tokens (90%)
  All time      1.4k    [██████████████░░]  ~1.2M tokens (89%)

@Pringled Pringled requested a review from stephantul May 7, 2026 15:47
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/semble/cli.py 100.00% <100.00%> (ø)
src/semble/index/index.py 100.00% <100.00%> (ø)
src/semble/stats.py 100.00% <100.00%> (ø)
src/semble/types.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@stephantul stephantul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the code here is overly defensive, lots of guards and checks. But we control the data sources on both ends, so there is likely no need to ever check most of the things. Specifically, I think the gets with defaults of 0 actually hurt rather than help, because they obscure other bugs, and can lead to silent over- or underestimations of the stats.

Comment thread src/semble/index/index.py
)

index = SembleIndex(model, bm25, vicinity, chunks)
index._file_sizes = SembleIndex._compute_file_sizes(path, chunks)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be self._compute_file_sizes. I would not call this here and just make it part of the __init__, or, if it is fast enough, just make it a property. There's also afaik no need to make this a staticmethod, doing

index.recompute_file_sizes()

is completely fine. I realize you do need the root, but the root could be part of the SembleIndex.

Comment thread src/semble/index/index.py
results = search_semantic(target.content, self.model, self._semantic_index, self.chunks, top_k + 1, selector)
return [r for r in results if r.chunk != target][:top_k]
results = [r for r in results if r.chunk != target][:top_k]
if self._file_sizes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this ever needs to be checked. It's probably just better to populate file_sizes here if it doesn't exist already.

Comment thread src/semble/stats.py
"""Save stats about a search or find_related call to the stats file."""
try:
snippet_chars = sum(len(result.chunk.content) for result in results)
file_chars = sum(file_sizes.get(path, 0) for path in {result.chunk.file_path for result in results})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably better to disregard chunks for files for which we don't have size information. In fact, I think not having a specific file here points to some other failure.

Comment thread src/semble/stats.py
file_chars = sum(file_sizes.get(path, 0) for path in {result.chunk.file_path for result in results})

record = {
"ts": datetime.now(timezone.utc).isoformat(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't write iso format, just dump the timestamp.

Comment thread src/semble/stats.py
in_today = record_date == today
in_last_7 = record_date > seven_days_ago
except ValueError:
in_today = in_last_7 = False # unparseable timestamp: count in All time only
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how could this happen? AFAIK we write this ourselves. In any case, you should not parse a timestamp to a date and then reparse it again. Just use a timestamp

Comment thread src/semble/stats.py
]
for label, bucket in summary.buckets.items():
saved_chars = max(0, bucket.file_chars - bucket.snippet_chars)
saved_tokens = saved_chars // 4 # standard ~4 chars/token approximation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm...

Comment thread src/semble/stats.py
try:
record = json.loads(line)
except json.JSONDecodeError:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe warn here, we don't expect this to happen.

Comment thread src/semble/stats.py
continue
snippet_chars = record.get("snippet_chars", 0)
file_chars = record.get("file_chars", 0)
call_type = record.get("call", "search")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would we expect any of these to be missing? We write these ourselves.

Comment thread src/semble/index/index.py
raise ValueError(f"Unknown search mode: {mode!r}")
else:
raise ValueError(f"Unknown search mode: {mode!r}")
if self._file_sizes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants