desktop-automation-ultra
v2.0.0通过安全可记录的鼠标键盘、OCR、图像识别及宏录制回放功能,实现Windows/macOS/Linux跨平台桌面任务自动化。
Installation
Desktop Automation Skill v2.0
Complete desktop automation for Windows/macOS/Linux. Zero-error edition.
⚠️ Privacy & Security
CRITICAL: This skill captures ALL keyboard and mouse events.
- NEVER record while entering passwords, credit cards, or secrets
- Recorded macros are stored as JSON in recorded_macro/ directory
- Always use dry_run=true to test before actual execution
- Store macros in secure locations only
- Enable safe mode by default (it is)
🎯 What It Does
Automate desktop interactions without APIs: - ✅ Click, type, drag, scroll - ✅ Capture screenshots - ✅ Recognize images (OpenCV template matching) - ✅ Extract text (Tesseract OCR) - ✅ Record and replay macros - ✅ Find windows by title - ✅ Clipboard operations - ✅ Safe mode with dry_run for testing
🔐 Safety Features (Built-In)
1. Safe Mode (Default: ON)
Blocks dangerous actions when enabled:
- type, press_key, click, drag are monitored
- Parameters are scanned for dangerous patterns: rm, del, C:Windows, /etc/, sudo, etc.
- Blocked actions are logged
2. Dry-Run Mode
All actions support dry_run=true:
- Action is logged but NOT executed
- Use for testing before running real automation
3. Audit Logging
Every action logged to ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log
4. Thread Safety
All modules use locks to prevent race conditions.
📦 Installation
1. Extract Files
Place desktop-automation-ultra-local/ in:
- Windows: C:Users<User>.openclawworkspaceskills
- Linux/macOS: ~/.openclaw/workspace/skills/
2. Install Dependencies
pip install -r requirements.txt
3. Optional: Tesseract for OCR
For find_text_on_screen functionality:
- Windows: Download installer from https://github.com/UB-Mannheim/tesseract/wiki
- Linux: sudo apt install tesseract-ocr
- macOS: brew install tesseract
4. Restart OpenClaw
openclaw gateway restart
🚀 Quick Start
Basic Click
action: click
params:
x: 100
y: 100
dry_run: true # Test first!
Type Text
action: type
params:
text: "Hello World"
interval: 0.05 # Delay between keys
dry_run: false
Find Image
action: find_image
params:
template_path: "templates/button.png"
confidence: 0.95
Extract Text (OCR)
action: read_text_ocr
params:
lang: "fra" # French
📖 Core Actions
Mouse & Keyboard
| Action | Parameters | Returns |
|---|---|---|
click |
x, y, button="left", dry_run |
{status, x, y} |
type |
text, interval=0.05, dry_run |
{status, text} |
press_key |
key, dry_run |
{status, key} |
move_mouse |
x, y, duration=0.5, dry_run |
{status, x, y} |
scroll |
amount=5, dry_run |
{status, amount} |
drag |
start_x, start_y, end_x, end_y, duration=0.5, dry_run |
{status} |
copy_to_clipboard |
text, dry_run |
{status} |
paste_from_clipboard |
dry_run |
{status, length} |
Screenshots & Windows
| Action | Parameters | Returns |
|---|---|---|
screenshot |
path="~/Desktop/screenshot.png", dry_run |
{status, path} |
get_active_window |
dry_run |
{status, title, x, y, width, height} |
list_windows |
dry_run |
{status, windows[], count} |
activate_window |
title_substring, dry_run |
{status, title} |
Image Recognition (requires OpenCV)
| Action | Parameters | Returns |
|---|---|---|
find_image |
template_path, confidence=0.9, dry_run |
{status, x, y, confidence} |
find_image_multiscale |
template_path, confidence, scale_factors, dry_run |
{status, x, y, confidence, scale} |
wait_for_image |
template_path, timeout=30.0, interval=0.5, confidence=0.9, dry_run |
{status, x, y, confidence} |
OCR / Text Recognition (requires Tesseract)
| Action | Parameters | Returns |
|---|---|---|
find_text_on_screen |
text, lang="fra", dry_run |
{status, locations[], count} |
find_all_text_on_screen |
text, lang="fra", dry_run |
{status, data[], count} |
read_text_ocr |
lang="fra", dry_run |
{status, text, length} |
read_text_region |
x, y, width, height, lang="fra", dry_run |
{status, text, length} |
extract_screen_data |
region={}, output_format="json", lang="fra", dry_run |
{status, data[], count} |
Macros
| Action | Parameters | Returns |
|---|---|---|
play_macro |
macro_path, speed=1.0, dry_run |
{status, executed, total, errors[]} |
stop_macro |
— | {status} |
play_macro_with_subroutines |
macro_path, speed=1.0, sub_macros_dir, dry_run |
{status, executed, total, errors[]} |
Safety Management
| Action | Parameters | Returns |
|---|---|---|
set_safe_mode |
enabled=true |
{status, safe_mode} |
get_safety_status |
— | {status, safe_mode_enabled, dangerous_patterns, dangerous_actions[]} |
📝 Macro Format
Recorded macros are JSON with this structure:
{
"events": [
{
"action": "click",
"params": {"x": 100, "y": 50},
"wait": 500
},
{
"action": "type",
"params": {"text": "Hello"},
"wait": 200
},
{
"action": "press_key",
"params": {"key": "return"},
"wait": 100
}
]
}
action— action nameparams— action parameterswait— milliseconds to wait before next action
🔧 Advanced: Mouse Move Debouncing
To avoid recording hundreds of move_mouse events during a smooth drag, the recorder uses debouncing:
- When you move the mouse, events are suppressed during movement
- After you stop moving for
Nseconds (default: 1 sec), the final position is recorded - This reduces macro size dramatically while preserving intended end positions
- Configurable via GUI: set debounce time (0.1–10 seconds)
Example:
- Fast horizontal line → 1 move_mouse event (end coordinates)
- Slow, stop-and-go → multiple move_mouse events (one per "stop")
🧪 Testing
Run the unit test suite:
python scripts/test_automation.py
Output:
test_dry_run_click ... ok
test_get_active_window ... ok
test_safe_mode_blocks_dangerous ... ok
...
Ran 13 tests
OK
📊 Logging
All actions logged to: ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log
Example:
[2026-03-15 10:23:45] [INFO] ActionManager: ActionManager initialized with safe_mode=True
[2026-03-15 10:23:46] [INFO] ActionManager: Clicked at (100, 50) with left button
[2026-03-15 10:23:47] [INFO] ActionManager: Typed: Hello World
⚙️ Configuration
Environment Variables
# Override log directory
export AUTOMATION_LOG_DIR=~/my_logs
# Disable safe mode globally (NOT recommended)
export AUTOMATION_SAFE_MODE=false
🐛 Troubleshooting
"pyautogui failsafe triggered"
Move mouse to corner of screen to stop.
OCR returns empty text
- Ensure Tesseract is installed correctly
- Check image quality (high contrast helps)
- Try
read_text_ocrinstead offind_text_on_screen
Image recognition not finding template
- Ensure template image exists and is correct format (PNG, JPG)
- Try lower confidence threshold (e.g., 0.85 instead of 0.95)
- Use
find_image_multiscaleto detect at different scales
Actions blocked by safe mode
This is intentional. To run dangerous actions:
action: set_safe_mode
params:
enabled: false
Then execute your action. Re-enable safe mode immediately after:
action: set_safe_mode
params:
enabled: true
📄 License
MIT License. See LICENSE file.
📚 Files Structure
desktop-automation-ultra-local/
├── SKILL.md (This file)
├── requirements.txt (Python dependencies)
├── lib/
│ ├── actions.py (Core click/type/drag actions)
│ ├── image_recognition.py (OpenCV template matching)
│ ├── ocr_engine.py (Tesseract OCR)
│ ├── macro_player.py (Record/playback macros)
│ ├── safety_manager.py (Safe mode, blocking)
│ └── utils.py (Logging, helpers)
├── scripts/
│ └── test_automation.py (Unit tests)
└── recorded_macro/ (Output: saved macros)
✅ Validation Checklist
- [x] All modules have proper error handling
- [x] Thread safety implemented (locks)
- [x] Safe mode enabled by default
- [x] Dry-run mode on all actions
- [x] Comprehensive logging
- [x] Unit tests (13 tests)
- [x] UTF-8 encoding for all text
- [x] No hardcoded paths (uses expanduser)
- [x] Graceful fallbacks for missing dependencies
- [x] Documentation complete
Status: PRODUCTION READY ✅
Last updated: 2026-03-15 Version: 2.0.0