AI Task Hub

Formerly skill-hub-gateway.

Public package boundary:

Only orchestrates portal.skill.execute, portal.skill.poll, portal.skill.presentation, portal.account.balance, and portal.account.ledger.
Does not exchange api_key or userToken inside this package.
Does not handle recharge or payment flows inside this package.
Prefers attachment URLs, and when host runtime explicitly exposes attachment bytes for the current request, forwards only that explicit attachment material through the public bridge before execution.
Third-party agent entry uses POST /agent/public-bridge/invoke.

User-Facing Response Policy

When users upload images, audio, documents, or video and ask for a capability, prefer executing immediately only when the host runtime has already supplied an explicit attachment object or explicit attachment bytes for that request.
Do not explain image_url, attachment.url, storage URLs, bridge layers, host uploads, input normalization, or controlled media domain details to end users unless they explicitly ask for technical debugging.
Do not ask end users to provide manual URLs, JSON field names, or upload-chain instructions; those are internal host-to-skill mechanics.
If the runtime supports attachment handling, limit processing to the explicit attachment object supplied for the current request and keep the upload/URL handoff scoped to execute/poll/presentation for that same request.
Only when execution actually fails and the user must intervene should you mention missing processable files, incomplete authorization, or retry guidance, using user-oriented language without exposing internal layering.

Chinese documentation: SKILL.zh-CN.md

When to Use This Skill

Use this skill when the user asks to:

detect faces, human presence, body keypoints, image tags, or facial emotion from images
generate person/product segmentation, mask, cutout, or matting outputs that this public package explicitly exposes
transcribe uploaded audio into text (speech to text, audio transcription)
generate speech from text input (text to speech, voice generation)
convert uploaded files into markdown (document to markdown)
start async jobs and check status later (poll, check job status)
fetch rendered visual outputs such as overlay, mask, and cutout
run embedding or reranking tasks for retrieval workflows
check current account points balance or recent points ledger rows

Common Requests

Example requests that should trigger this skill:

"Detect faces in this image and return bounding boxes."
"Tag this image and summarize the main objects."
"Remove the background from this product photo."
"Create a clean cutout from this portrait image."
"Transcribe this meeting audio into text."
"Generate speech from this paragraph."
"Convert this PDF file into markdown."
"Start this job now and let me poll the run status later."
"Fetch overlay and mask files for run_456."
"Generate embeddings for this text list and rerank the candidates."
"Check my current points balance."
"Show my recent points ledger from 2026-03-01 to 2026-03-15."

Search-Friendly Capability Aliases

vision aliases: face detection, human detection, person detection, image tagging
background aliases: remove background, background removal, cutout, matting, product-cutout
asr aliases: speech to text, audio transcription, transcribe audio
tts aliases: text to speech, voice generation, speech synthesis
markdown_convert aliases: document to markdown, file to markdown, markdown conversion
poll aliases: check job status, poll long-running task, async run status
presentation aliases: rendered output, overlay, mask, cutout files
account.balance aliases: points balance, credits balance, remaining points
account.ledger aliases: points ledger, credits history, points statement
embeddings/reranker aliases: vectorization, semantic vectors, relevance reranking

Public discovery boundary for visual capabilities:

This published skill only advertises visual capabilities whose backing services are currently enabled for public delivery.
Disabled or internally retained legacy routes are intentionally omitted from discovery references and capability manifests even if related backend code still exists.

Runtime Contract

Default API base URL: https://gateway-api.binaryworks.app Published package policy: outbound base URL is locked to the default API base URL to reduce token exfiltration risk.

Action to endpoint mapping:

portal.skill.execute -> POST /agent/skill/execute
portal.skill.poll -> GET /agent/skill/runs/:run_id
portal.skill.presentation -> GET /agent/skill/runs/:run_id/presentation
portal.account.balance -> GET /agent/skill/account/balance
portal.account.ledger -> GET /agent/skill/account/ledger

Install Mechanism & Runtime Requirements

This skill is instruction-first and does not define a remote installer flow.
Runtime execution is limited to bundled local scripts under scripts/*.mjs.
Required runtime binary is node (as declared in metadata.openclaw.requires.bins).
No remote download-to-exec install chain is used (curl|wget ... | sh|bash|python|node is not part of this package).

Auth Contract

Third-party agent entry mode (recommended):

Use POST /agent/public-bridge/invoke as the first entrypoint for OpenClaw / Codex / Claude style runtimes.
Do not require end users to provide any credential.
With TRIAL_ENABLED and available trial points, first-time calls may proceed without browser authorization.
On first use without an existing binding, gateway can proceed without browser authorization when TRIAL_ENABLED and trial points are available; AUTHORIZATION_REQUIRED is returned only for conditional upgrade paths (for example trial exhausted or trial-disabled rollback).
The returned authorization_url may include gateway_api_base_url; preserve it when completing browser authorization so /agent-auth/complete is posted back to the same API environment that created the auth session.
Host/runtime should show authorization_url to the user, persist entry_user_key, then retry the same action with that same entry_user_key.
If gateway later returns AUTHORIZATION_REQUIRED with details.likely_cause=ENTRY_USER_KEY_NOT_REUSED, details.recovery_action=REUSE_ENTRY_USER_KEY, and details.reauthorization_required=false, host should restore the previously persisted entry_user_key and retry without sending the user through browser authorization again.

Identifier format constraints used by gateway auth:

agent_uid must match ^agent_[a-z0-9][a-z0-9_-]{5,63}$.
conversation_id must match ^[A-Za-z0-9._:-]{8,128}$.
In deployed bridge mode, host may pass its own stable runtime agent identifier and the gateway bridge will canonicalize it server-side.

Host-side token bridge (outside published package):

To keep this package compliant and low-privilege, this published runtime does not issue or accept caller-managed task tokens.
Preferred deployed bridge endpoint for third-party agent entry: POST /agent/public-bridge/invoke.
Trusted host runtime that can safely hold bridge assertion secret may continue to use POST /agent/skill/bridge/invoke.
These bridge endpoints are served by gateway runtime, not bundled into this published package, and do not require caller-managed credentials.
Bridge request body should include action, agent_uid, conversation_id, and optional payload.
conversation_id should be a host-generated opaque session/install identifier, not a public chat ID, raw thread ID, or PII.
Public bridge should resolve a stable external user binding when available; if the binding is missing and trial conditions are satisfied, first-time onboarding can continue without browser authorization, while conditional upgrade paths return a host-owned authorization URL plus entry_user_key.
Cross-conversation account continuity requires reusing the same entry_user_key; public bridge intentionally does not accept owner overrides.
Gateway bridge will canonicalize agent_uid, repair binding when missing, issue short-lived internal task token, and run the action server-side.
portal.skill.execute through public bridge is write-capable and should send options.confirm_write=true after user confirmation; otherwise gateway may return ACTION_CONFIRMATION_REQUIRED.
base_url, gateway_api_key, api_key, user_token, agent_task_token, owner_uid_hint, and install_channel overrides are rejected by the deployed bridge endpoint.
Recommended host behavior: persist entry_user_key, normalize agent_uid, and re-run the same bridge action after authorization completes.

Host integration modes:

interactive (recommended): host calls POST /agent/public-bridge/invoke, surfaces the returned host-owned authorization URL to the user when needed, persists returned entry_user_key, and retries after authorization completes.
trusted host bridge (secondary): a trusted backend you control may call POST /agent/skill/bridge/invoke with its own bridge assertion secret.
Published skill package itself does not open browser, persist credentials, or perform OAuth/token exchange flows.
The authorization URL above is owned by deployed gateway/admin-web pages, not by this skill package runtime.

Agent Invocation Quickstart

Preferred invocation mode for third-party agent entry (recommended):

Deployed bridge API:

{
  "entry_host": "openclaw",
  "action": "portal.account.balance",
  "agent_uid": "support_assistant",
  "conversation_id": "host_session_20260316_opaque_001",
  "payload": {}
}

Send that body to POST /agent/public-bridge/invoke.
This is the recommended production entrypoint for third-party agent-friendly integration.
With TRIAL_ENABLED and available trial points, first-time onboarding can complete without browser authorization.
On first use, gateway may return AUTHORIZATION_REQUIRED with authorization_url and entry_user_key only when conditional authorization upgrade is required (for example trial exhausted).
Persist entry_user_key and retry with the same value after user authorization completes.
Preserve any gateway_api_base_url embedded in the authorization flow so the completion request lands on the same gateway API environment.
agent_uid should be your host-defined stable runtime agent identifier.
conversation_id should be your host-generated opaque session/install identifier; it is not tied to Telegram or any single tool and does not determine account ownership.
Use the same entry_user_key across conversations when those conversations should share one account.

Trusted host runtime secondary mode:

If you control the upstream backend and it can safely hold bridge assertion secret, use POST /agent/skill/bridge/invoke.
This path is for trusted host runtime only, not OpenClaw / Codex / Claude style third-party entry.

Action payload templates (same for public bridge and trusted host bridge mode):

portal.skill.execute

{
  "capability": "human_detect",
  "input": { "image_url": "https://files.example.com/demo.png" },
  "request_id": "optional_request_id"
}

portal.skill.poll

{ "run_id": "run_123" }

portal.skill.presentation

{ "run_id": "run_123", "channel": "web", "include_files": true }

portal.account.balance

{}

portal.account.ledger

{ "date_from": "2026-03-01", "date_to": "2026-03-15" }

Agent-side decision flow:

Always prefer POST /agent/public-bridge/invoke for third-party agent entry so first-time authorization can return authorization_url plus entry_user_key.
New task: call portal.skill.execute, then poll with portal.skill.poll until data.terminal=true, then fetch portal.skill.presentation.
Account query: call portal.account.balance or portal.account.ledger directly.
Keep conversation_id as session context only; do not use it as the account key.
For cross-conversation continuity in third-party entry mode, persist and reuse the same entry_user_key; do not pass owner_uid_hint to the public bridge endpoint.
If AUTHORIZATION_REQUIRED is returned, show authorization_url, persist entry_user_key, then retry the same action after user authorization completes.
If AUTHORIZATION_REQUIRED includes details.likely_cause=ENTRY_USER_KEY_NOT_REUSED, do not open a new auth flow yet; first restore the previously persisted entry_user_key and retry the same bridge call.
Treat details.reauthorization_required=false as a recovery hint that browser re-login is unnecessary for this failure mode.
If AUTH_UNAUTHORIZED + agent_uid claim format is invalid: use canonical agent_uid (agent_...) instead of a short host alias (assistant, planner).
If SYSTEM_NOT_FOUND + agent binding not found: restart the same bridge flow once and let gateway repair binding.

Output parsing contract:

Always parse standard gateway envelope: request_id, data, error.
Treat non-empty error as failure even when HTTP tooling hides status code.

Visualization Playbooks (Agent Guidance)

For successful visual actions (portal.skill.execute, portal.skill.poll, portal.skill.presentation), the script enriches responses with data.agent_guidance.visualization.playbook.
Playbook mapping covers the visual capabilities currently exposed by this published skill (detection/classification/keypoints/segmentation/matting families).
Global rendering guardrail for all visual capabilities:
Must use skill-native rendered assets first (overlay/mask/cutout/view_url) when available.
Manual local drawing fallback is disabled by default (allow_manual_draw=false) to avoid inconsistent agent-side rendering.
If rendered assets are missing, fallback is summary-only from structured output (raw/visual.spec), not local drawing.
Example special rule:
body-contour-63pt -> when both rendered assets and geometry are absent, playbook marks status=degraded and recommends fallback capability body-keypoints-2d.

Payload Contract

portal.skill.execute: payload requires capability and input.
payload.request_id is optional and passed through.
portal.skill.poll and portal.skill.presentation: payload requires run_id.
portal.skill.presentation supports include_files (defaults to true).
portal.account.balance: payload is optional and ignored.
portal.account.ledger: payload may include date_from + date_to (YYYY-MM-DD, must be provided together).

Attachment normalization:

Prefer explicit image_url / audio_url / file_url / video_url.
attachment.url is mapped to target media field by capability.
When host runtime exposes attachment bytes, this published package forwards only that explicit attachment material through the public bridge and injects the returned URL before execute.
There is no separate portal.upload action in this package; for third-party agent entry, callers should keep using portal.skill.execute, and the bundled runtime will only forward explicit attachment bytes already supplied by the host for the current request.
If a host bypasses the bundled auto-upload helper and implements upload itself, use POST /agent/public-bridge/upload-file for third-party/public entry, not POST /agent/skill/bridge/upload-file.
Local file_path handling is disabled in the published public skill.
The runtime does not scan the local filesystem, guess file locations, expand directories/globs, or read local paths from payload.file_path, input.file_path, attachment.path, or attachment.file_path.
Arbitrary unmanaged local filesystem access remains unsupported; hosts should provide bytes or a bridge-managed URL instead.
Example host upload endpoint: /agent/public-bridge/upload-file.
tencent-video-face-fusion requires 2 uploaded files from the user before execution:
source video -> input.video_url
merge face image -> input.merge_infos[0].merge_face_image.url
If either Tencent face fusion file is missing, agent should ask the user to upload both files first.
Prefer a short source video for testing or smoke runs because Tencent legacy facefusion jobs are asynchronous and slower than image-only tasks.
Do not rely on a single attachment.url auto-mapping for tencent-video-face-fusion; host must pass both structured URL fields explicitly.

Error Contract

Preserve gateway envelope: request_id, data, error.
Preserve POINTS_INSUFFICIENT and pass through error.details.recharge_url.

Bundled Files

scripts/skill.mjs
scripts/agent-task-auth.mjs
scripts/base-url.mjs
scripts/attachment-normalize.mjs
scripts/telemetry.mjs (compatibility shim)
references/capabilities.json
references/openapi.json
SKILL.zh-CN.md

ai-task-hub

Installation