Skip to content

Daily-AC/omnireach

Repository files navigation

omnireach

English · 中文

Give your AI agent the senses of a logged-in human — across the entire internet.

omnireach lets your agent search and read 15+ platforms — including the login-walled verticals (Twitter · Reddit · 小红书 · 微信 · 抖音 · B站 · TikTok) that every agent web search is blind to — through your own browser session. One uniform interface: omnireach search returns normalized metadata + URL, omnireach fetch returns clean markdown. Installed as a Claude Code skill, so your agent just knows how to use it next session.

demo

License Python


Install — just tell your agent

"Install omnireach"

Your agent runs one command. You copy nothing.

Manual fallback — paste into your terminal if you prefer:

curl -fsSL https://raw.githubusercontent.com/Daily-AC/omnireach/main/install.sh | sh

This installs the CLI and registers the Claude Code skill (auto-discovered next session). Zero-config sources — HackerNews, RSS, 微信 (Sogou path), B站 — work immediately. Other sources open up after a one-time setup step each.


What you get

1. Reach the unreachable. Twitter, Reddit, 小红书, 微信公众号, 抖音, B站, TikTok — login-walled vertical platforms that no agent web search reaches. omnireach reads them through your own logged-in browser session (via OpenCLI Chrome bridge), so the results your agent sees are the same results a human would see when signed in.

2. One uniform contract. omnireach search → normalized metadata + URL, same shape across every source. omnireach fetch → clean markdown for any URL, with host-aware routing (mp.weixin.qq.com goes through your logged-in Chrome session to bypass CAPTCHA walls; all other hosts go through Crawl4AI → Jina Reader fallback). Your agent learns one interface, not 15 APIs.

3. Works even when WebSearch doesn't. On proxy / relay-station / Bedrock / Vertex-Claude-3.x setups where the built-in WebSearch server tool isn't available, omnireach gives search back — it runs entirely on the client side, bypassing both gate layers.


Example

# Search a login-walled vertical platform
omnireach search --on xiaohongshu --json "Claude Code 使用技巧"

# Fetch a WeChat article — login-walled, via your session
omnireach fetch --json "https://mp.weixin.qq.com/s/<token>"

# Full pipeline: search → fetch all results
omnireach search --on wechat --json "claude 4.7" \
  | jq -r '.results[].url' \
  | xargs -I{} omnireach fetch --json {}

Commands

Command What it does
omnireach search "<query>" Search (SERP: metadata + URL)
omnireach search --on twitter,reddit "..." Target specific sources
omnireach search --mode quick "..." Quick mode — HN only
omnireach search --mode deep "..." Deep mode — all ready sources
omnireach search --json "..." Explicit JSON output
omnireach fetch <url> (v0.10, v0.10.1) URL → full markdownmp.weixin.qq.com uses OpenCLI logged-in Chrome (v0.10.1+), other hosts use crwl → jina fallback
omnireach fetch <url> --backend jina Force Jina Reader SaaS (zero local deps)
omnireach fetch <url> --backend opencli Force OpenCLI wechat logged-in path (v0.10.1+)
omnireach init Write default ~/.omnireach/preferences.toml
omnireach sources List all sources + tier status
omnireach setup <source> Guided setup for a 🟡 / 🔴 source
omnireach doctor Health check (sources / fetch backends / wechat backends)

Sources

Source Tier Dependency Notes
hackernews ✅ ready none Direct Algolia, zero-config
youtube ✅ ready yt-dlp (pip install) omnireach setup youtube
github ✅ ready gh CLI + gh auth login omnireach setup github
rss ✅ ready built-in feedparser query must be a URL
reddit 🟡 one_step rdt-cli + rdt login omnireach setup reddit
twitter 🔴 heavy OpenCLI + Chrome extension v0.3 path
xiaohongshu 🔴 heavy OpenCLI + Chrome extension v0.3 path
tiktok 🔴 heavy OpenCLI + Chrome extension TikTok international (v0.7)
douyin 🔴 heavy OpenCLI fork + Chrome extension 抖音 (v0.7.2, Daily-AC/OpenCLI fork)
💎 tavily booster env TAVILY_API_KEY paid (v0.4)
💎 brave booster env BRAVE_API_KEY paid (v0.4)
💎 perplexity booster env PERPLEXITY_API_KEY paid (v0.4)
💎 exa booster env EXA_API_KEY paid web search (v0.5)
wechat ✅ ready none (optional EXA_API_KEY for enhancement) WeChat Official Accounts — search via Sogou free path; EXA_API_KEY optionally enables semantic enhancement; v0.10.1+ omnireach fetch <wechat-url> auto-routes through OpenCLI logged-in Chrome
bilibili ✅ ready none (optional EXA_API_KEY for enhancement) B站 — v0.9+ defaults to B站 official search API; EXA_API_KEY optionally enables semantic enhancement

抖音 / douyin (v0.7.2): Uses omnireach setup douyin, installs Daily-AC/OpenCLI fork (upstream PR jackwener/OpenCLI#1759 pending review; will switch back once merged). Requires signing in to www.douyin.com in Chrome. engagement.likes has real data (DOM extraction); plays/comments/shares are not exposed in search cards, normalized to null.

WeChat fetch (v0.10.1): omnireach fetch <mp.weixin.qq.com URL> uses the same Daily-AC/OpenCLI fork (weixin download --stdout, upstream PR jackwener/OpenCLI#1770 pending). Requires having opened any mp.weixin.qq.com article in Chrome (no explicit login step; browser cookies suffice). See How to get full text below.


Agent calling convention

When calling omnireach from an agent, always request JSON explicitly to prevent rich-table output from wrapping fields you need:

# Option 1: explicit flag per command
omnireach search --json "..."
omnireach fetch  --json "<url>"

# Option 2: env var (recommended for agent harnesses)
export OMNIREACH_FORCE_JSON=1

The not isatty() auto-JSON added in v0.9.2 covers most cases, but some agent terminals (e.g. Antigravity) allocate a real PTY to subprocesses making isatty()=True. Explicit --json or the env var always works.

Full skill contract: .claude-plugin/skills/omnireach/SKILL.md


Who specifically needs this? (The WebSearch two-layer gate)

Claude Code's WebSearch is a server-side server tool (web_search_20250305). Whether it actually works depends on two independent gates:

Gate 1 — client-side (WebSearchTool.isEnabled() checks getAPIProvider()):

  • Default firstParty (no CLAUDE_CODE_USE_* env var set, including the case where only ANTHROPIC_BASE_URL is changed) → tool registered
  • Explicit CLAUDE_CODE_USE_BEDROCK=1 → tool off
  • Explicit CLAUDE_CODE_USE_VERTEX=1 + Claude 4+ → registered; older Claude → off
  • Explicit CLAUDE_CODE_USE_FOUNDRY=1 → registered

Gate 2 — upstream server tool implementation: After the client sends the tool schema, the upstream API must have specifically implemented the web_search_20250305 server tool (receive tool call → run search → return results to client):

  • Real Anthropic API (api.anthropic.com): ✓ native
  • Vertex / Foundry: ✓ (each backend implements it)
  • Third-party providers that specifically support Claude Code (e.g. DeepSeek's Anthropic-compat endpoint): ✓ they implemented the server tool handling, routing to their own search backend
  • OpenAI-compatible relay stations (cliproxy / anyrouter etc. that simply translate Claude API → OpenAI Chat Completions): ✗ don't recognize server tool semantics
  • Self-hosted gateways / most proxies: ✗ generally not implemented

Gate 2 looks at "did the upstream specifically implement Claude Code server tool support" — not at "is this real Anthropic." Providers who specifically support Claude Code implement it; pure API translators don't. (Data point: DeepSeek is not real Anthropic but WebSearch works because they explicitly implemented it.)

Root causes differ by scenario:

  • DeepSeek and other Claude-Code-specific third parties: both gates pass, WebSearch ✓ — omnireach value here is supplementing vertical sources (Twitter/小红书/微信), not patching search
  • OpenAI-compatible relay station users: client registers the tool, upstream doesn't handle the server tool call → failure
  • Explicit Bedrock users / Vertex Claude 3.x users: client-side isEnabled is off before the request even leaves

Even with WebSearch working, it can't reach Twitter threads, Reddit comment sections, 小红书 posts, WeChat articles, 抖音, B站 technical videos — these login-walled vertical platforms are blind spots for every hosted web search. omnireach's three value layers:

  1. Patch for users whose client-side gate is off
  2. Patch for users whose upstream doesn't implement the server tool
  3. Supplement for users whose WebSearch works fine but can't reach vertical platforms
Naming & architecture (search / fetch / parse)

omnireach = omni (all) + reach (reach). Full "reach the whole web" semantics require three capability layers, all living as sibling binaries in this repo (analogous to cargo / rustc / rustfmt in the Rust monorepo — no sister repo):

Layer Implementation Responsibility Status
search omnireach search subcommand Locate across the web — returns metadata + URL, does not fetch content ✅ v0.7+ in use
fetch omnireach fetch subcommand Given a URL, fetch full-text markdown — host-aware: mp.weixin.qq.com via OpenCLI logged-in Chrome, others via Crawl4AIJina Reader ✅ v0.10+ (wechat path v0.10.1+)
parse (not yet implemented, future addition to this repo) Video/audio content parsing (subtitles/STT/frame-by-frame) 🔜 not started

v0.10+ has omnireach covering search + fetch (subcommand form). Video parsing still uses external tools (yt-dlp / whisper); parse binary will be added to this repo when there are real user requests (YAGNI — no sister repo).

This split mirrors Anthropic's own WebSearch + WebFetch split: each layer does one thing well, search isn't slowed by parsing tasks, and agent callers can compose freely.

Note: renaming omnireach to omnisearch was considered, but v0.10 landing omnireach fetch resolved the question — omnireach search (reach = find) + omnireach fetch (reach = retrieve) are both natural sub-actions of "reach," so the umbrella name fits. No rename (decided 2026-05-27).

Upgrade

omnireach is in alpha with frequent releases. Check and upgrade:

omnireach check-update                                                            # compare against GitHub Releases
uv tool install --force git+https://github.com/Daily-AC/omnireach.git             # pull latest

⚠️ uv tool upgrade omnireach will not pull new commits (uv locks git-URL-installed tools to the commit at install time). --force reinstall fetches the latest.

Platform support
Platform Status Notes
macOS ✅ Primary dev platform All sources tested (HN/RSS/youtube/github/reddit/twitter/xhs/tiktok/douyin + 4 boosters + wechat/bilibili); omnireach fetch all three backends (crwl/jina/opencli) verified
Linux 🟡 best-effort Should work; setup flow doesn't auto-detect apt/pacman
WSL2 🟡 best-effort Same as Linux
Windows (native PowerShell) 🟡 experimental (v0.6.3+) macOS assumptions removed: secrets.env no longer calls POSIX chmod; preferences edit falls back to notepad; setup github prompts winget install GitHub.cli; OpenCLI sources (twitter/xhs) theoretically cross-platform but untested. File an issue if you hit problems.

Run omnireach doctor to print a platform / Python version line at the top — useful to include when filing issues.

💎 Paid boosters (v0.4)

omnireach is free by default. Configuring a paid API key improves result quality:

omnireach setup tavily       # guided key entry + writes to ~/.omnireach/secrets.env
omnireach setup brave
omnireach setup perplexity
omnireach setup exa          # added v0.5 (replaces old `web` source)

Keys are detected automatically when set. Results carry cost="paid" metadata; TTY output shows a 💎 prefix for auditability.

To disable: edit ~/.omnireach/preferences.toml and set [boosters] auto_enable = false.

⚙️ User preferences (v0.4)

~/.omnireach/preferences.toml can configure default sources, language, output format, and source_trust overrides.

omnireach preferences show     # view current config
omnireach preferences edit     # open in $EDITOR
omnireach preferences reset    # reset (backs up original to .bak)
omnireach preferences path     # print file location
How to get full text (v0.8)

omnireach is the search layer — the content field is uniformly capped at ≤ 500 characters + . This validator runs in the contract layer (SearchResult.content pydantic field_validator) for all sources. This is intentional — full text belongs to the omnireach fetch layer (see Naming & architecture above).

For sources whose upstream returns full text (wechat / exa / tavily) or long threads (twitter), the complete raw payload is preserved in result.raw, so agents that need full text can retrieve it:

# Python (call CLI + parse JSON envelope)
import json
import subprocess

out = subprocess.run(
    ["omnireach", "search", "--json", "--on", "wechat", "claude 4.7"],
    check=True, capture_output=True, text=True,
)
env = json.loads(out.stdout)
snippet = env["results"][0]["content"]        # 500 chars + "…"
full    = env["results"][0]["raw"]["text"]    # Exa / wechat / twitter full text
# tavily uses raw["content"]
# CLI + jq
omnireach search --json --on tavily "claude 4.7" | \
  jq '.results[] | {title, snippet: .content, full: .raw.content}'

Field mapping (verified by real E2E in v0.8.1 + v0.9):

Source result.adapter result.content result.raw[...] for full text / raw payload
wechat (default Sogou) sogou snippet (Sogou SERP summary) raw["item_html"] (full Sogou card HTML) — for actual full text you need mp.weixin.qq.com
wechat (EXA_API_KEY enabled) exa-api snippet raw["text"] (Exa full text)
bilibili (default B站 API) bilibili-api video description (≤500) raw entire video item dict, includes desc/cover/aid/bvid
bilibili (EXA_API_KEY enabled) exa-api snippet raw["text"]
exa exa-api snippet raw["text"]
tavily tavily-api snippet raw["content"]
twitter opencli snippet (long threads trigger truncation) raw["text"]
xiaohongshu opencli empty — OpenCLI search results don't include body n/a (no full text at search layer)

The specific key names in raw[...] match upstream API schema directly — if upstream changes schema, this table needs updating. When uncertain, inspect result.raw.keys() first.

Other sources (HN / GitHub / RSS / YouTube / etc.) generally have content under 500 chars, so the validator is usually a no-op — but it's not guaranteed. For full-text fallback, check result.raw for a matching key.

For actual full text → omnireach fetch <url> (v0.10+)

omnireach fetch <url> is the official search → full-text pipeline, with host-aware backend routing:

# Any webpage → crwl (local Crawl4AI) preferred, jina (r.jina.ai SaaS) fallback
omnireach fetch https://example.com/article --json

# WeChat Official Account article → automatically uses OpenCLI logged-in Chrome (v0.10.1+), bypasses CAPTCHA
omnireach fetch https://mp.weixin.qq.com/s/<token> --json

# search → fetch pipeline
omnireach search --on wechat "claude 4.7" --json \
  | jq -r '.results[].url' \
  | xargs -I{} omnireach fetch --json {}

Backend matrix:

URL host --backend auto routes to Notes
mp.weixin.qq.com opencli (logged-in cookie strategy) Install Daily-AC/OpenCLI fork with weixin download --stdout flag (v0.10.1 commit fe28823+); direct crwl / jina are blocked by WeChat's "environment anomaly" CAPTCHA
Other hosts crwl → jina Crawl4AI preferred; falls back to Jina Reader SaaS if not installed or fails

Explicit --backend overrides auto:

Flag Behavior
--backend crwl Force Crawl4AI local (66K ⭐ Apache-2.0, built-in Cloudflare/Akamai/DataDome bypass)
--backend jina Force Jina Reader SaaS — zero local deps, large free tier
--backend opencli Force OpenCLI wechat logged-in path (only meaningful for mp.weixin.qq.com)

v0.10.1 adds CAPTCHA heuristic fallback for all backends: when keywords like 环境异常 / 完成验证后即可继续访问 / Cloudflare / Just a moment are detected, the envelope errors[] gains a captcha_suspected: ... entry; the markdown field is preserved (graceful degrade — agent reads errors to decide whether to trust the content). omnireach doctor's wechat_backends section surfaces whether OpenCLI + --stdout are ready.

Design

See docs/superpowers/specs/2026-05-25-omnireach-design.md.

License

MIT — see LICENSE.

About

Give your AI agent the senses of a logged-in human across the whole internet — search & read 15+ platforms (Twitter/Reddit/小红书/微信/抖音/B站/TikTok/YouTube/GitHub) no web search reaches. Install by telling your agent.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors