Skip to content

feat(mcp): local semantic + hybrid search over saved transcripts#1326

Draft
r3dbars wants to merge 2 commits into
mainfrom
claude/zealous-cori-f04800
Draft

feat(mcp): local semantic + hybrid search over saved transcripts#1326
r3dbars wants to merge 2 commits into
mainfrom
claude/zealous-cori-f04800

Conversation

@r3dbars

@r3dbars r3dbars commented Jun 25, 2026

Copy link
Copy Markdown
Owner

What

Adds local, on-device semantic search to the read-only MCP server so paraphrase queries hit — e.g. search("pricing pushback") now finds "they balked at the cost". Today search is SQLite FTS5 lexical-only (TranscriptIndex.swift). This complements FTS, it does not replace it.

This is NEXT_WORK.md item #6 (L). The one thing cloud RAG still beat us on.

Model choice — lightest viable, zero bundle cost

The search/index lives in the standalone Tools/TranscriptedMCP SwiftPM package, which intentionally has no dependency on the app's MLX/FluidAudio model infra and builds via plain swift build. So I picked the lightest viable on-device option:

Apple NaturalLanguageNLEmbedding.sentenceEmbedding(for: .english) (512-dim).

  • Bundled model: none. Download: none. It's built into macOS → negligible incremental app size, modest RAM, fully on-device/offline.
  • Builds in the standalone package with just import NaturalLanguage — no prebuilt frameworks, no CoreML packaging, no tokenizer to ship, no CI surface.
  • Behind an EmbeddingProvider protocol, so a bundled CoreML model (e.g. quantized MiniLM) can replace it later for higher quality without touching the store or search path.

App-size / RAM proof on a real signed build is Justin's to confirm, but by construction this adds no bundled assets — the only on-disk growth is the vector index (Float32 per utterance/entry) in the existing SQLite cache file, not the app bundle.

How it works

  • EmbeddingProvider.swift — protocol, NLEmbeddingProvider, SearchMode, VectorMath (normalize / cosine / blob).
  • EmbeddingStore.swift — vector store on its own SQLite connection to the same mcp_index.sqlite. Writes only additive tables (embedding_meta, utterance_vectors, dictation_entry_vectors), reads the lexical tables. Embeds new/changed rows lazily after each reconcile, drops orphaned vectors, re-embeds everything on a model-id/dimension change, and runs a streaming cosine scan honoring the same speaker/date filters as FTS.
  • SemanticSearchFusion.swift — Reciprocal Rank Fusion (k=60) merging FTS and semantic lists. Hybrid is a strict superset of FTS recall: exact hits are preserved and only reordered; paraphrase-only matches are appended.
  • TranscriptIndex.swift — additive vector tables + lexical / semantic / hybrid routing on searchUtterances / searchDictationEntries / searchContext. The lexical path is byte-for-byte unchanged (mode defaults to .lexical at the index layer).
  • search / search_context MCP tools — new optional mode arg, default hybrid.

Graceful degradation: if the embedding backend is unavailable (e.g. missing OS language assets in a headless image), the store is never created and every mode runs lexical-only — no errors.

Local proof (this Mac)

Real NLEmbedding works here: 512-dim, and the headline example ranks correctly — cosine("pricing pushback", "they balked at the cost") = 0.387 vs unrelated = 0.307. Note NLEmbedding's similarity floor is high (~0.30 even for unrelated short sentences), so pure semantic mode is best-effort; hybrid (default) is rank-based and stays robust because exact FTS hits anchor precision. Similarity threshold set to 0.30 to trim the obvious tail. A bundled MiniLM via the provider seam is the quality upgrade path.

Tests

SemanticSearchTests.swift — a deterministic stub provider (concept-axis embeddings) drives: paraphrase-that-lexical-misses, hybrid superset, semantic date-filter, unrelated-concept rejection, dictation semantic, provider-unavailable fallback, no-provider disabled, and model-change re-embed. Plus VectorMath round-trip/normalize/dot and RRF fusion unit tests. Full MCP suite green: 90 tests, 0 failures. (Tests use the stub so they don't depend on OS NLEmbedding assets in CI.)

Conflict / coordination

Touches the same area as the in-flight Moat #1 work (#1323, summary fields into search). Checked the diff: #1323 does not touch TranscriptIndex.swift (my core change). The only overlap is ToolHandlers.swift, and the edits are in different regions (they refactor recap/recent_context; I add mode to search/search_context). Additive, resolvable on rebase — merge-room can sequence these.

Verification run

Per .agents/test-matrix.yml (touched Tools/TranscriptedMCP/**):

  • swift build
  • swift test --package-path Tools/TranscriptedMCP ✅ (90 tests)

🤖 Generated with Claude Code

r3dbars and others added 2 commits June 25, 2026 06:09
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add on-device semantic search to the read-only MCP server so paraphrase
queries hit (e.g. "pricing pushback" finds "they balked at the cost"),
complementing — not replacing — the existing SQLite FTS5 lexical path.

Model choice: Apple's NaturalLanguage NLEmbedding.sentenceEmbedding. It is
built into macOS, so there is no bundled model and no download — negligible
incremental app size and modest RAM, and it builds in this standalone SwiftPM
package with no dependency on the app's MLX/FluidAudio infra. The backend sits
behind an EmbeddingProvider protocol so a bundled CoreML model can replace it
later without touching the store or search path.

- EmbeddingProvider.swift: protocol, NLEmbeddingProvider, SearchMode, VectorMath
- EmbeddingStore.swift: vector store on its own SQLite connection; embeds rows
  lazily after each reconcile, re-embeds on model-id/dimension change, runs a
  streaming cosine scan with the same speaker/date filters as FTS
- SemanticSearchFusion.swift: reciprocal-rank fusion for hybrid search
- TranscriptIndex: additive vector tables + lexical/semantic/hybrid routing;
  the lexical path is unchanged
- search / search_context expose `mode` (default hybrid)

All modes degrade gracefully: if the embedding backend is unavailable the store
is never created and search runs lexical-only. Changes are additive (new tables
+ new search path), keeping the lexical write path untouched.

Tests: deterministic stub provider drives the paraphrase, hybrid-superset,
date-filter, dictation, fallback, and model-change cases; plus vector-math and
RRF fusion unit tests. Full MCP suite green (90 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant