zai-media

Z.AI media tools for Pi — generate images and OCR documents via GLM models, exposed as agent-invokable tools.

Pi's agent loop speaks one protocol: messages → chat/completions → streamed text. Non-chat models — image generators, OCR engines — use different endpoints with different request/response shapes, so they cannot be defined as models in models.json. zai-media wraps them as tools instead: the chat model reasons and decides when to call them; the tools call the right endpoint and hand the result back (including the image itself, inline).

What it gives you

Tool	Z.AI endpoint	Model	Output
`zai_generate_image`	`/images/generations`	`glm-image`	`.png`/`.jpg` saved to `./zai-output/` + returned inline (the agent sees it)
`zai_ocr`	`/layout_parsing`	`glm-ocr`	`.md` saved to `./zai-output/` + content returned

Because zai_generate_image returns an ImageContent block, a vision-capable chat model (e.g. glm-5v-turbo) can inspect its own generated output in the next turn.

Prerequisites

A Z.AI API key. Either:

Recommended — run /login in Pi and configure the zai provider. zai-media reuses that key automatically (no extra setup).

Or export it:

export ZAI_API_KEY=your_key_here   # get one at https://z.ai/manage-apikey/apikey-list

The general Z.AI endpoint (https://api.z.ai/api/paas/v4) is used and billed per-token/usage — separate from the flat-rate Coding Plan. Set ZAI_BASE_URL to override the endpoint.

Install

# From npm (once published)
pi install npm:@getpipher/zai-media

# From git (always available)
pi install git:github.com/getpipher/zai-media

# Try without installing
pi -e git:github.com/getpipher/zai-media

Usage

Once installed, the tools are available to any chat model. Just ask:

> Generate a clean SVG-style flat illustration of a rocket ship, 1024x1024
  → calls zai_generate_image → image saved + shown inline

> Read the table out of screenshots/pricing.png
  → calls zai_ocr → markdown saved + pasted back

Outputs land in ./zai-output/ (relative to your working directory).

How it works

chat model (e.g. glm-5.2)
   │  reasons about the task
   │  decides image/OCR is needed
   ▼
zai_generate_image / zai_ocr   ← in-process fetch() to the Z.AI endpoint
   │  POST /images/generations { prompt }  →  url
   │  fetch bytes → write ./zai-output/x.png
   ▼
returns { content: [ImageContent, TextContent], details: { path, … } }
   │
   ▼
chat model sees the result (and can re-inspect images with a vision model)

This is the idiomatic Pi pattern for "produce a non-text artifact": tools, not fake models.

Configuration

Env var	Default	Purpose
`ZAI_API_KEY`	—	API key (fallback if no `/login zai` key)
`ZAI_BASE_URL`	`https://api.z.ai/api/paas/v4`	Override the endpoint (e.g. China: `https://open.bigmodel.cn/api/paas/v4`)

Roadmap

zai_generate_image — GLM-Image
zai_ocr — GLM-OCR
zai_transcribe — GLM-ASR-2512 (audio → text)
zai_generate_video — CogVideoX / Vidu (async polling)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client.ts		client.ts
index.ts		index.ts
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

zai-media

What it gives you

Prerequisites

Install

Usage

How it works

Configuration

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

zai-media

What it gives you

Prerequisites

Install

Usage

How it works

Configuration

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages