feat(bundle): llama.cpp CPU Qwen3-Embedding-0.6B embedding bundle by kh0pper · Pull Request #111 · kh0pper/crow

kh0pper · 2026-06-29T01:16:19Z

What

A CPU-only embedding bundle so hosts without a compatible GPU — or that can't reach a shared embedder like grackle-embed — can run Crow's semantic search locally.

Serves Qwen3-Embedding-0.6B (Q8_0 GGUF, 1024-dim) via llama.cpp with an OpenAI-compatible /v1/embeddings endpoint on 127.0.0.1:8007, registering the llamacpp-cpu-embed provider. Same model / vector space as the GPU vllm-cuda-embed and llamacpp-vulkan-qwen3-embed bundles, so embeddings are interchangeable.

Why

The existing embedding bundles are GPU + Linux-only (vllm-cuda-embed → NVIDIA, llamacpp-vulkan-qwen3-embed → ROCm/gfx1151). There was no option for a Mac/Windows Docker Desktop host or any CPU-only box.

Highlights

Runs anywhere Docker runs — no GPU, gpu_arch: ["cpu"], port bound to 127.0.0.1.
First request auto-downloads the ~640MB GGUF via -hf and caches it in a Docker volume (no manual model fetch).
Pairs with the configurable-provider change (feat(embeddings): make the embedding provider configurable #110): after install, set dashboard_settings.embed_provider = 'llamacpp-cpu-embed' (or CROW_EMBED_PROVIDER).

Changes

bundles/llamacpp-cpu-qwen3-embed/ — manifest.json, docker-compose.yml, README.md.
registry/add-ons.json — regenerated via npm run build-registry (single entry added, no churn).

Validation

npm run build-registry → 89 bundles, 0 invalid/draft/untracked.
npm run test:bundle-contract → 25/25 pass.

Live container test (image tag / -hf flags) happens on a Docker host as part of activation; flagging in case the published ghcr.io/ggml-org/llama.cpp:server tag or flag names need a tweak after first pull.

🤖 Generated with Claude Code

Adds a CPU-only embedding bundle so hosts without a compatible GPU (or that can't reach grackle-embed) can run semantic search locally. Serves Qwen3-Embedding-0.6B (Q8_0 GGUF, 1024-dim) via llama.cpp with an OpenAI-compatible /v1/embeddings endpoint on 127.0.0.1:8007, registering the llamacpp-cpu-embed provider. Same model / vector space as the GPU embed bundles. Runs on macOS/Windows Docker Desktop; first request auto-downloads the GGUF via -hf and caches it. Regenerated registry/add-ons.json via build-registry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

manifest declared contextLen 32768 but docker-compose serves --ctx-size 8192, so inputs over 8K tokens would be silently rejected despite the advertised capacity. The CPU bundle intentionally caps ctx at 8192 for RAM; embedding inputs are capped at 8000 chars upstream, so 8192 is ample. Lower the declared contextLen (manifest + regenerated registry entry) to match reality. Vector space is unchanged (1024-dim, same model) — embeddings stay interchangeable with the GPU bundles; only max input length differs.

The CPU embedding bundle (#111) binds host port 8007, but the row was never added to docs/developers/port-allocation.md, so the Port Allocation Check CI (scripts/check-port-allocation.js) failed on the PR and on the main push. Add the 8007 row; check now passes (43 ports, all documented). Co-authored-by: kh0pper <kevin.hopper@maestro.press>

kh0pper mentioned this pull request Jun 29, 2026

Add llamacpp-cpu-qwen3-embed (CPU embedding) extension kh0pper/crow-addons#1

Merged

kh0pper merged commit 5b7615d into main Jun 29, 2026
1 check failed

kh0pper deleted the feat/llamacpp-cpu-embed-bundle branch June 29, 2026 01:39

kh0pper mentioned this pull request Jun 29, 2026

docs(ports): document port 8007 for llamacpp-cpu-qwen3-embed (fixes CI) #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bundle): llama.cpp CPU Qwen3-Embedding-0.6B embedding bundle#111

feat(bundle): llama.cpp CPU Qwen3-Embedding-0.6B embedding bundle#111
kh0pper merged 2 commits into
mainfrom
feat/llamacpp-cpu-embed-bundle

kh0pper commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kh0pper commented Jun 29, 2026

What

Why

Highlights

Changes

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant