Skip to content

getpipher/zai-media

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

getpipher

zai-media

Z.AI media tools for Pi — generate images and OCR documents via GLM models, exposed as agent-invokable tools.

pi-package Z.AI License: MIT getpipher


Pi's agent loop speaks one protocol: messageschat/completions → streamed text. Non-chat models — image generators, OCR engines — use different endpoints with different request/response shapes, so they cannot be defined as models in models.json. zai-media wraps them as tools instead: the chat model reasons and decides when to call them; the tools call the right endpoint and hand the result back (including the image itself, inline).

What it gives you

Tool Z.AI endpoint Model Output
zai_generate_image /images/generations glm-image .png/.jpg saved to ./zai-output/ + returned inline (the agent sees it)
zai_ocr /layout_parsing glm-ocr .md saved to ./zai-output/ + content returned

Because zai_generate_image returns an ImageContent block, a vision-capable chat model (e.g. glm-5v-turbo) can inspect its own generated output in the next turn.

Prerequisites

A Z.AI API key. Either:

  1. Recommended — run /login in Pi and configure the zai provider. zai-media reuses that key automatically (no extra setup).
  2. Or export it:
    export ZAI_API_KEY=your_key_here   # get one at https://z.ai/manage-apikey/apikey-list

The general Z.AI endpoint (https://api.z.ai/api/paas/v4) is used and billed per-token/usage — separate from the flat-rate Coding Plan. Set ZAI_BASE_URL to override the endpoint.

Install

# From npm (once published)
pi install npm:@getpipher/zai-media

# From git (always available)
pi install git:github.com/getpipher/zai-media

# Try without installing
pi -e git:github.com/getpipher/zai-media

Usage

Once installed, the tools are available to any chat model. Just ask:

> Generate a clean SVG-style flat illustration of a rocket ship, 1024x1024
  → calls zai_generate_image → image saved + shown inline

> Read the table out of screenshots/pricing.png
  → calls zai_ocr → markdown saved + pasted back

Outputs land in ./zai-output/ (relative to your working directory).

How it works

chat model (e.g. glm-5.2)
   │  reasons about the task
   │  decides image/OCR is needed
   ▼
zai_generate_image / zai_ocr   ← in-process fetch() to the Z.AI endpoint
   │  POST /images/generations { prompt }  →  url
   │  fetch bytes → write ./zai-output/x.png
   ▼
returns { content: [ImageContent, TextContent], details: { path, … } }
   │
   ▼
chat model sees the result (and can re-inspect images with a vision model)

This is the idiomatic Pi pattern for "produce a non-text artifact": tools, not fake models.

Configuration

Env var Default Purpose
ZAI_API_KEY API key (fallback if no /login zai key)
ZAI_BASE_URL https://api.z.ai/api/paas/v4 Override the endpoint (e.g. China: https://open.bigmodel.cn/api/paas/v4)

Roadmap

  • zai_generate_image — GLM-Image
  • zai_ocr — GLM-OCR
  • zai_transcribe — GLM-ASR-2512 (audio → text)
  • zai_generate_video — CogVideoX / Vidu (async polling)

License

MIT © getpipher

About

Z.AI media tools for Pi — generate images (GLM-Image) and OCR documents (GLM-OCR) as agent-invokable tools. The tool-adapter pattern for non-chat models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors