ai-task-hub
v3.2.29AI task hub for image analysis, background removal, speech-to-text, text-to-speech, markdown conversion, points balance/ledger lookup, and async execute/poll/presentation orchestration. Use when users need hosted AI outcomes while host runtime manages identity, credits, payment, and risk control.
Installation
AI Task Hub
Formerly skill-hub-gateway.
Public package boundary:
- Only orchestrates
portal.skill.execute,portal.skill.poll,portal.skill.presentation,portal.account.balance, andportal.account.ledger. - Does not exchange
api_keyoruserTokeninside this package. - Does not handle recharge or payment flows inside this package.
- Prefers attachment URLs, and when host runtime explicitly exposes attachment bytes for the current request, forwards only that explicit attachment material through the public bridge before execution.
- Third-party agent entry uses
POST /agent/public-bridge/invoke.
User-Facing Response Policy
- When users upload images, audio, documents, or video and ask for a capability, prefer executing immediately only when the host runtime has already supplied an explicit attachment object or explicit attachment bytes for that request.
- Do not explain
image_url,attachment.url, storage URLs, bridge layers, host uploads, input normalization, or controlled media domain details to end users unless they explicitly ask for technical debugging. - Do not ask end users to provide manual URLs, JSON field names, or upload-chain instructions; those are internal host-to-skill mechanics.
- If the runtime supports attachment handling, limit processing to the explicit attachment object supplied for the current request and keep the upload/URL handoff scoped to execute/poll/presentation for that same request.
- Only when execution actually fails and the user must intervene should you mention missing processable files, incomplete authorization, or retry guidance, using user-oriented language without exposing internal layering.
Chinese documentation: SKILL.zh-CN.md
When to Use This Skill
Use this skill when the user asks to:
- detect faces, human presence, body keypoints, image tags, or facial emotion from images
- generate person/product segmentation, mask, cutout, or matting outputs that this public package explicitly exposes
- transcribe uploaded audio into text (
speech to text,audio transcription) - generate speech from text input (
text to speech,voice generation) - convert uploaded files into markdown (
document to markdown) - start async jobs and check status later (
poll,check job status) - fetch rendered visual outputs such as
overlay,mask, andcutout - run embedding or reranking tasks for retrieval workflows
- check current account points balance or recent points ledger rows
Common Requests
Example requests that should trigger this skill:
- "Detect faces in this image and return bounding boxes."
- "Tag this image and summarize the main objects."
- "Remove the background from this product photo."
- "Create a clean cutout from this portrait image."
- "Transcribe this meeting audio into text."
- "Generate speech from this paragraph."
- "Convert this PDF file into markdown."
- "Start this job now and let me poll the run status later."
- "Fetch overlay and mask files for run_456."
- "Generate embeddings for this text list and rerank the candidates."
- "Check my current points balance."
- "Show my recent points ledger from 2026-03-01 to 2026-03-15."
Search-Friendly Capability Aliases
visionaliases: face detection, human detection, person detection, image taggingbackgroundaliases: remove background, background removal, cutout, matting, product-cutoutasraliases: speech to text, audio transcription, transcribe audiottsaliases: text to speech, voice generation, speech synthesismarkdown_convertaliases: document to markdown, file to markdown, markdown conversionpollaliases: check job status, poll long-running task, async run statuspresentationaliases: rendered output, overlay, mask, cutout filesaccount.balancealiases: points balance, credits balance, remaining pointsaccount.ledgeraliases: points ledger, credits history, points statementembeddings/rerankeraliases: vectorization, semantic vectors, relevance reranking
Public discovery boundary for visual capabilities:
- This published skill only advertises visual capabilities whose backing services are currently enabled for public delivery.
- Disabled or internally retained legacy routes are intentionally omitted from discovery references and capability manifests even if related backend code still exists.
Runtime Contract
Default API base URL: https://gateway-api.binaryworks.app
Published package policy: outbound base URL is locked to the default API base URL to reduce token exfiltration risk.
Action to endpoint mapping:
portal.skill.execute->POST /agent/skill/executeportal.skill.poll->GET /agent/skill/runs/:run_idportal.skill.presentation->GET /agent/skill/runs/:run_id/presentationportal.account.balance->GET /agent/skill/account/balanceportal.account.ledger->GET /agent/skill/account/ledger
Install Mechanism & Runtime Requirements
- This skill is instruction-first and does not define a remote installer flow.
- Runtime execution is limited to bundled local scripts under
scripts/*.mjs. - Required runtime binary is
node(as declared inmetadata.openclaw.requires.bins). - No remote download-to-exec install chain is used (
curl|wget ... | sh|bash|python|nodeis not part of this package).
Auth Contract
Third-party agent entry mode (recommended):
- Use
POST /agent/public-bridge/invokeas the first entrypoint for OpenClaw / Codex / Claude style runtimes. - Do not require end users to provide any credential.
- With
TRIAL_ENABLEDand available trial points, first-time calls may proceed without browser authorization. - On first use without an existing binding, gateway can proceed without browser authorization when TRIAL_ENABLED and trial points are available;
AUTHORIZATION_REQUIREDis returned only for conditional upgrade paths (for example trial exhausted or trial-disabled rollback). - The returned
authorization_urlmay includegateway_api_base_url; preserve it when completing browser authorization so/agent-auth/completeis posted back to the same API environment that created the auth session. - Host/runtime should show
authorization_urlto the user, persistentry_user_key, then retry the same action with that sameentry_user_key. - If gateway later returns
AUTHORIZATION_REQUIREDwithdetails.likely_cause=ENTRY_USER_KEY_NOT_REUSED,details.recovery_action=REUSE_ENTRY_USER_KEY, anddetails.reauthorization_required=false, host should restore the previously persistedentry_user_keyand retry without sending the user through browser authorization again.
Identifier format constraints used by gateway auth:
agent_uidmust match^agent_[a-z0-9][a-z0-9_-]{5,63}$.conversation_idmust match^[A-Za-z0-9._:-]{8,128}$.- In deployed bridge mode, host may pass its own stable runtime agent identifier and the gateway bridge will canonicalize it server-side.
Host-side token bridge (outside published package):
- To keep this package compliant and low-privilege, this published runtime does not issue or accept caller-managed task tokens.
- Preferred deployed bridge endpoint for third-party agent entry:
POST /agent/public-bridge/invoke. - Trusted host runtime that can safely hold bridge assertion secret may continue to use
POST /agent/skill/bridge/invoke. - These bridge endpoints are served by gateway runtime, not bundled into this published package, and do not require caller-managed credentials.
- Bridge request body should include
action,agent_uid,conversation_id, and optionalpayload. conversation_idshould be a host-generated opaque session/install identifier, not a public chat ID, raw thread ID, or PII.- Public bridge should resolve a stable external user binding when available; if the binding is missing and trial conditions are satisfied, first-time onboarding can continue without browser authorization, while conditional upgrade paths return a host-owned authorization URL plus
entry_user_key. - Cross-conversation account continuity requires reusing the same
entry_user_key; public bridge intentionally does not accept owner overrides. - Gateway bridge will canonicalize
agent_uid, repair binding when missing, issue short-lived internal task token, and run the action server-side. portal.skill.executethrough public bridge is write-capable and should sendoptions.confirm_write=trueafter user confirmation; otherwise gateway may returnACTION_CONFIRMATION_REQUIRED.base_url,gateway_api_key,api_key,user_token,agent_task_token,owner_uid_hint, andinstall_channeloverrides are rejected by the deployed bridge endpoint.- Recommended host behavior: persist
entry_user_key, normalizeagent_uid, and re-run the same bridge action after authorization completes.
Host integration modes:
interactive(recommended): host callsPOST /agent/public-bridge/invoke, surfaces the returned host-owned authorization URL to the user when needed, persists returnedentry_user_key, and retries after authorization completes.trusted host bridge(secondary): a trusted backend you control may callPOST /agent/skill/bridge/invokewith its own bridge assertion secret.- Published skill package itself does not open browser, persist credentials, or perform OAuth/token exchange flows.
- The authorization URL above is owned by deployed gateway/admin-web pages, not by this skill package runtime.
Agent Invocation Quickstart
Preferred invocation mode for third-party agent entry (recommended):
- Deployed bridge API:
{
"entry_host": "openclaw",
"action": "portal.account.balance",
"agent_uid": "support_assistant",
"conversation_id": "host_session_20260316_opaque_001",
"payload": {}
}
- Send that body to
POST /agent/public-bridge/invoke. - This is the recommended production entrypoint for third-party agent-friendly integration.
- With
TRIAL_ENABLEDand available trial points, first-time onboarding can complete without browser authorization. - On first use, gateway may return
AUTHORIZATION_REQUIREDwithauthorization_urlandentry_user_keyonly when conditional authorization upgrade is required (for example trial exhausted). - Persist
entry_user_keyand retry with the same value after user authorization completes. - Preserve any
gateway_api_base_urlembedded in the authorization flow so the completion request lands on the same gateway API environment. agent_uidshould be your host-defined stable runtime agent identifier.conversation_idshould be your host-generated opaque session/install identifier; it is not tied to Telegram or any single tool and does not determine account ownership.- Use the same
entry_user_keyacross conversations when those conversations should share one account.
Trusted host runtime secondary mode:
- If you control the upstream backend and it can safely hold bridge assertion secret, use
POST /agent/skill/bridge/invoke. - This path is for trusted host runtime only, not OpenClaw / Codex / Claude style third-party entry.
Action payload templates (same for public bridge and trusted host bridge mode):
portal.skill.execute
{
"capability": "human_detect",
"input": { "image_url": "https://files.example.com/demo.png" },
"request_id": "optional_request_id"
}
portal.skill.poll
{ "run_id": "run_123" }
portal.skill.presentation
{ "run_id": "run_123", "channel": "web", "include_files": true }
portal.account.balance
{}
portal.account.ledger
{ "date_from": "2026-03-01", "date_to": "2026-03-15" }
Agent-side decision flow:
- Always prefer
POST /agent/public-bridge/invokefor third-party agent entry so first-time authorization can returnauthorization_urlplusentry_user_key. - New task: call
portal.skill.execute, then poll withportal.skill.polluntildata.terminal=true, then fetchportal.skill.presentation. - Account query: call
portal.account.balanceorportal.account.ledgerdirectly. - Keep
conversation_idas session context only; do not use it as the account key. - For cross-conversation continuity in third-party entry mode, persist and reuse the same
entry_user_key; do not passowner_uid_hintto the public bridge endpoint. - If
AUTHORIZATION_REQUIREDis returned, showauthorization_url, persistentry_user_key, then retry the same action after user authorization completes. - If
AUTHORIZATION_REQUIREDincludesdetails.likely_cause=ENTRY_USER_KEY_NOT_REUSED, do not open a new auth flow yet; first restore the previously persistedentry_user_keyand retry the same bridge call. - Treat
details.reauthorization_required=falseas a recovery hint that browser re-login is unnecessary for this failure mode. - If
AUTH_UNAUTHORIZED+agent_uid claim format is invalid: use canonicalagent_uid(agent_...) instead of a short host alias (assistant,planner). - If
SYSTEM_NOT_FOUND+agent binding not found: restart the same bridge flow once and let gateway repair binding.
Output parsing contract:
- Always parse standard gateway envelope:
request_id,data,error. - Treat non-empty
erroras failure even when HTTP tooling hides status code.
Visualization Playbooks (Agent Guidance)
- For successful visual actions (
portal.skill.execute,portal.skill.poll,portal.skill.presentation), the script enriches responses withdata.agent_guidance.visualization.playbook. - Playbook mapping covers the visual capabilities currently exposed by this published skill (detection/classification/keypoints/segmentation/matting families).
- Global rendering guardrail for all visual capabilities:
- Must use skill-native rendered assets first (
overlay/mask/cutout/view_url) when available. - Manual local drawing fallback is disabled by default (
allow_manual_draw=false) to avoid inconsistent agent-side rendering. - If rendered assets are missing, fallback is summary-only from structured output (
raw/visual.spec), not local drawing. - Example special rule:
body-contour-63pt-> when both rendered assets and geometry are absent, playbook marksstatus=degradedand recommends fallback capabilitybody-keypoints-2d.
Payload Contract
portal.skill.execute: payload requirescapabilityandinput.payload.request_idis optional and passed through.portal.skill.pollandportal.skill.presentation: payload requiresrun_id.portal.skill.presentationsupportsinclude_files(defaults totrue).portal.account.balance: payload is optional and ignored.portal.account.ledger: payload may includedate_from+date_to(YYYY-MM-DD, must be provided together).
Attachment normalization:
- Prefer explicit
image_url/audio_url/file_url/video_url. attachment.urlis mapped to target media field by capability.- When host runtime exposes attachment bytes, this published package forwards only that explicit attachment material through the public bridge and injects the returned URL before execute.
- There is no separate
portal.uploadaction in this package; for third-party agent entry, callers should keep usingportal.skill.execute, and the bundled runtime will only forward explicit attachment bytes already supplied by the host for the current request. - If a host bypasses the bundled auto-upload helper and implements upload itself, use
POST /agent/public-bridge/upload-filefor third-party/public entry, notPOST /agent/skill/bridge/upload-file. - Local
file_pathhandling is disabled in the published public skill. - The runtime does not scan the local filesystem, guess file locations, expand directories/globs, or read local paths from
payload.file_path,input.file_path,attachment.path, orattachment.file_path. - Arbitrary unmanaged local filesystem access remains unsupported; hosts should provide bytes or a bridge-managed URL instead.
- Example host upload endpoint:
/agent/public-bridge/upload-file. tencent-video-face-fusionrequires 2 uploaded files from the user before execution:- source video ->
input.video_url - merge face image ->
input.merge_infos[0].merge_face_image.url - If either Tencent face fusion file is missing, agent should ask the user to upload both files first.
- Prefer a short source video for testing or smoke runs because Tencent legacy facefusion jobs are asynchronous and slower than image-only tasks.
- Do not rely on a single
attachment.urlauto-mapping fortencent-video-face-fusion; host must pass both structured URL fields explicitly.
Error Contract
- Preserve gateway envelope:
request_id,data,error. - Preserve
POINTS_INSUFFICIENTand pass througherror.details.recharge_url.
Bundled Files
scripts/skill.mjsscripts/agent-task-auth.mjsscripts/base-url.mjsscripts/attachment-normalize.mjsscripts/telemetry.mjs(compatibility shim)references/capabilities.jsonreferences/openapi.jsonSKILL.zh-CN.md