Z.AI media tools for Pi — generate images and OCR documents via GLM models, exposed as agent-invokable tools.
Pi's agent loop speaks one protocol: messages → chat/completions → streamed text. Non-chat models — image generators, OCR engines — use different endpoints with different request/response shapes, so they cannot be defined as models in models.json. zai-media wraps them as tools instead: the chat model reasons and decides when to call them; the tools call the right endpoint and hand the result back (including the image itself, inline).
| Tool | Z.AI endpoint | Model | Output |
|---|---|---|---|
zai_generate_image |
/images/generations |
glm-image |
.png/.jpg saved to ./zai-output/ + returned inline (the agent sees it) |
zai_ocr |
/layout_parsing |
glm-ocr |
.md saved to ./zai-output/ + content returned |
Because zai_generate_image returns an ImageContent block, a vision-capable chat model (e.g. glm-5v-turbo) can inspect its own generated output in the next turn.
A Z.AI API key. Either:
- Recommended — run
/loginin Pi and configure thezaiprovider.zai-mediareuses that key automatically (no extra setup). - Or export it:
export ZAI_API_KEY=your_key_here # get one at https://z.ai/manage-apikey/apikey-list
The general Z.AI endpoint (
https://api.z.ai/api/paas/v4) is used and billed per-token/usage — separate from the flat-rate Coding Plan. SetZAI_BASE_URLto override the endpoint.
# From npm (once published)
pi install npm:@getpipher/zai-media
# From git (always available)
pi install git:github.com/getpipher/zai-media
# Try without installing
pi -e git:github.com/getpipher/zai-mediaOnce installed, the tools are available to any chat model. Just ask:
> Generate a clean SVG-style flat illustration of a rocket ship, 1024x1024
→ calls zai_generate_image → image saved + shown inline
> Read the table out of screenshots/pricing.png
→ calls zai_ocr → markdown saved + pasted back
Outputs land in ./zai-output/ (relative to your working directory).
chat model (e.g. glm-5.2)
│ reasons about the task
│ decides image/OCR is needed
▼
zai_generate_image / zai_ocr ← in-process fetch() to the Z.AI endpoint
│ POST /images/generations { prompt } → url
│ fetch bytes → write ./zai-output/x.png
▼
returns { content: [ImageContent, TextContent], details: { path, … } }
│
▼
chat model sees the result (and can re-inspect images with a vision model)
This is the idiomatic Pi pattern for "produce a non-text artifact": tools, not fake models.
| Env var | Default | Purpose |
|---|---|---|
ZAI_API_KEY |
— | API key (fallback if no /login zai key) |
ZAI_BASE_URL |
https://api.z.ai/api/paas/v4 |
Override the endpoint (e.g. China: https://open.bigmodel.cn/api/paas/v4) |
-
zai_generate_image— GLM-Image -
zai_ocr— GLM-OCR -
zai_transcribe— GLM-ASR-2512 (audio → text) -
zai_generate_video— CogVideoX / Vidu (async polling)
MIT © getpipher