SkillHub

desktop-automation-ultra

v2.0.0

通过安全可记录的鼠标键盘、OCR、图像识别及宏录制回放功能,实现Windows/macOS/Linux跨平台桌面任务自动化。

Sourced from ClawHub, Authored by JordaneParis

Installation

Please help me install the skill `desktop-automation-ultra` from SkillHub official store. npx skills add JordaneParis/desktop-automation-ultra

Desktop Automation Skill v2.0

License: MIT OpenClaw

Complete desktop automation for Windows/macOS/Linux. Zero-error edition.


⚠️ Privacy & Security

CRITICAL: This skill captures ALL keyboard and mouse events. - NEVER record while entering passwords, credit cards, or secrets - Recorded macros are stored as JSON in recorded_macro/ directory - Always use dry_run=true to test before actual execution - Store macros in secure locations only - Enable safe mode by default (it is)


🎯 What It Does

Automate desktop interactions without APIs: - ✅ Click, type, drag, scroll - ✅ Capture screenshots - ✅ Recognize images (OpenCV template matching) - ✅ Extract text (Tesseract OCR) - ✅ Record and replay macros - ✅ Find windows by title - ✅ Clipboard operations - ✅ Safe mode with dry_run for testing


🔐 Safety Features (Built-In)

1. Safe Mode (Default: ON)

Blocks dangerous actions when enabled: - type, press_key, click, drag are monitored - Parameters are scanned for dangerous patterns: rm, del, C:Windows, /etc/, sudo, etc. - Blocked actions are logged

2. Dry-Run Mode

All actions support dry_run=true: - Action is logged but NOT executed - Use for testing before running real automation

3. Audit Logging

Every action logged to ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

4. Thread Safety

All modules use locks to prevent race conditions.


📦 Installation

1. Extract Files

Place desktop-automation-ultra-local/ in: - Windows: C:Users<User>.openclawworkspaceskills - Linux/macOS: ~/.openclaw/workspace/skills/

2. Install Dependencies

pip install -r requirements.txt

3. Optional: Tesseract for OCR

For find_text_on_screen functionality: - Windows: Download installer from https://github.com/UB-Mannheim/tesseract/wiki - Linux: sudo apt install tesseract-ocr - macOS: brew install tesseract

4. Restart OpenClaw

openclaw gateway restart

🚀 Quick Start

Basic Click

action: click
params:
  x: 100
  y: 100
  dry_run: true  # Test first!

Type Text

action: type
params:
  text: "Hello World"
  interval: 0.05  # Delay between keys
  dry_run: false

Find Image

action: find_image
params:
  template_path: "templates/button.png"
  confidence: 0.95

Extract Text (OCR)

action: read_text_ocr
params:
  lang: "fra"  # French

📖 Core Actions

Mouse & Keyboard

Action Parameters Returns
click x, y, button="left", dry_run {status, x, y}
type text, interval=0.05, dry_run {status, text}
press_key key, dry_run {status, key}
move_mouse x, y, duration=0.5, dry_run {status, x, y}
scroll amount=5, dry_run {status, amount}
drag start_x, start_y, end_x, end_y, duration=0.5, dry_run {status}
copy_to_clipboard text, dry_run {status}
paste_from_clipboard dry_run {status, length}

Screenshots & Windows

Action Parameters Returns
screenshot path="~/Desktop/screenshot.png", dry_run {status, path}
get_active_window dry_run {status, title, x, y, width, height}
list_windows dry_run {status, windows[], count}
activate_window title_substring, dry_run {status, title}

Image Recognition (requires OpenCV)

Action Parameters Returns
find_image template_path, confidence=0.9, dry_run {status, x, y, confidence}
find_image_multiscale template_path, confidence, scale_factors, dry_run {status, x, y, confidence, scale}
wait_for_image template_path, timeout=30.0, interval=0.5, confidence=0.9, dry_run {status, x, y, confidence}

OCR / Text Recognition (requires Tesseract)

Action Parameters Returns
find_text_on_screen text, lang="fra", dry_run {status, locations[], count}
find_all_text_on_screen text, lang="fra", dry_run {status, data[], count}
read_text_ocr lang="fra", dry_run {status, text, length}
read_text_region x, y, width, height, lang="fra", dry_run {status, text, length}
extract_screen_data region={}, output_format="json", lang="fra", dry_run {status, data[], count}

Macros

Action Parameters Returns
play_macro macro_path, speed=1.0, dry_run {status, executed, total, errors[]}
stop_macro {status}
play_macro_with_subroutines macro_path, speed=1.0, sub_macros_dir, dry_run {status, executed, total, errors[]}

Safety Management

Action Parameters Returns
set_safe_mode enabled=true {status, safe_mode}
get_safety_status {status, safe_mode_enabled, dangerous_patterns, dangerous_actions[]}

📝 Macro Format

Recorded macros are JSON with this structure:

{
  "events": [
    {
      "action": "click",
      "params": {"x": 100, "y": 50},
      "wait": 500
    },
    {
      "action": "type",
      "params": {"text": "Hello"},
      "wait": 200
    },
    {
      "action": "press_key",
      "params": {"key": "return"},
      "wait": 100
    }
  ]
}
  • action — action name
  • params — action parameters
  • wait — milliseconds to wait before next action

🔧 Advanced: Mouse Move Debouncing

To avoid recording hundreds of move_mouse events during a smooth drag, the recorder uses debouncing:

  • When you move the mouse, events are suppressed during movement
  • After you stop moving for N seconds (default: 1 sec), the final position is recorded
  • This reduces macro size dramatically while preserving intended end positions
  • Configurable via GUI: set debounce time (0.1–10 seconds)

Example: - Fast horizontal line → 1 move_mouse event (end coordinates) - Slow, stop-and-go → multiple move_mouse events (one per "stop")


🧪 Testing

Run the unit test suite:

python scripts/test_automation.py

Output:

test_dry_run_click ... ok
test_get_active_window ... ok
test_safe_mode_blocks_dangerous ... ok
...
Ran 13 tests
OK

📊 Logging

All actions logged to: ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log

Example:

[2026-03-15 10:23:45] [INFO] ActionManager: ActionManager initialized with safe_mode=True
[2026-03-15 10:23:46] [INFO] ActionManager: Clicked at (100, 50) with left button
[2026-03-15 10:23:47] [INFO] ActionManager: Typed: Hello World

⚙️ Configuration

Environment Variables

# Override log directory
export AUTOMATION_LOG_DIR=~/my_logs

# Disable safe mode globally (NOT recommended)
export AUTOMATION_SAFE_MODE=false

🐛 Troubleshooting

"pyautogui failsafe triggered"

Move mouse to corner of screen to stop.

OCR returns empty text

  • Ensure Tesseract is installed correctly
  • Check image quality (high contrast helps)
  • Try read_text_ocr instead of find_text_on_screen

Image recognition not finding template

  • Ensure template image exists and is correct format (PNG, JPG)
  • Try lower confidence threshold (e.g., 0.85 instead of 0.95)
  • Use find_image_multiscale to detect at different scales

Actions blocked by safe mode

This is intentional. To run dangerous actions:

action: set_safe_mode
params:
  enabled: false

Then execute your action. Re-enable safe mode immediately after:

action: set_safe_mode
params:
  enabled: true

📄 License

MIT License. See LICENSE file.


📚 Files Structure

desktop-automation-ultra-local/
├── SKILL.md                          (This file)
├── requirements.txt                  (Python dependencies)
├── lib/
│   ├── actions.py                   (Core click/type/drag actions)
│   ├── image_recognition.py         (OpenCV template matching)
│   ├── ocr_engine.py                (Tesseract OCR)
│   ├── macro_player.py              (Record/playback macros)
│   ├── safety_manager.py            (Safe mode, blocking)
│   └── utils.py                     (Logging, helpers)
├── scripts/
│   └── test_automation.py           (Unit tests)
└── recorded_macro/                  (Output: saved macros)

Validation Checklist

  • [x] All modules have proper error handling
  • [x] Thread safety implemented (locks)
  • [x] Safe mode enabled by default
  • [x] Dry-run mode on all actions
  • [x] Comprehensive logging
  • [x] Unit tests (13 tests)
  • [x] UTF-8 encoding for all text
  • [x] No hardcoded paths (uses expanduser)
  • [x] Graceful fallbacks for missing dependencies
  • [x] Documentation complete

Status: PRODUCTION READY


Last updated: 2026-03-15 Version: 2.0.0