Skip to content

Add llamacpp-cpu-qwen3-embed (CPU embedding) extension#1

Merged
kh0pper merged 2 commits into
mainfrom
add/llamacpp-cpu-qwen3-embed
Jun 29, 2026
Merged

Add llamacpp-cpu-qwen3-embed (CPU embedding) extension#1
kh0pper merged 2 commits into
mainfrom
add/llamacpp-cpu-qwen3-embed

Conversation

@kh0pper

@kh0pper kh0pper commented Jun 29, 2026

Copy link
Copy Markdown
Owner

CPU-only Qwen3-Embedding-0.6B via llama.cpp, OpenAI-compatible /v1/embeddings on port 8007. Runs on macOS/Windows Docker Desktop — no GPU required.

Mirrors the bundle added upstream in kh0pper/crow#111. Adds llamacpp-cpu-qwen3-embed/ (crow-addon.json + docker-compose.yml) and the registry.json entry.

🤖 Generated with Claude Code

DAYANE GRISEL PALACIOS TORRES and others added 2 commits June 28, 2026 19:21
CPU-only Qwen3-Embedding-0.6B via llama.cpp, OpenAI-compatible /v1/embeddings on
port 8007. Runs on macOS/Windows Docker Desktop — no GPU. Mirrors the bundle added
to kh0pper/crow upstream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 8192

The manifest declared contextLen 32768 while docker-compose serves
--ctx-size 8192, so inputs over 8K tokens would be silently rejected despite
the advertised capacity. Lower the declared contextLen (crow-addon.json +
registry.json entry) to match what the CPU server actually serves. Vector
space is unchanged (1024-dim, same model) — embeddings stay interchangeable
with the GPU bundles; only max input length differs.

Mirrors the same fix in crow PR #111.
@kh0pper kh0pper merged commit 3ed7299 into main Jun 29, 2026
@kh0pper kh0pper deleted the add/llamacpp-cpu-qwen3-embed branch June 29, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant