PAHF - Continual Personalization Framework

Based on paper "Learning Personalized Agents from Human Feedback" (arXiv:2602.16173)

Before using this skill, understand that PAHF will:

Action	Files	Data Type
Read	MEMORY.md, USER.md, IDENTITY.md, memory/*.md	Preferences, identity, personal info
Write	MEMORY.md, memory/YYYY-MM-DD.md, memory/users/*.md	Preference updates, change logs

All preference updates are: - Logged with [LEARNED: date, source] marker - Tracked in Preference Change Log table - Stored locally in ~/.openclaw/workspace/memory/

User consent is required for persistent preference storage. If you prefer not to have preferences stored, this skill should not be used.

Core Philosophy

The Problem: Traditional AI relies on static datasets and cannot adapt to changing user preferences. You correct it once, it makes the same mistake again.

The Solution: PAHF enables continual personalization through dual feedback channels + explicit memory: - 🎯 Pre-action Clarification: Ask when uncertain, don't guess - 💾 Preference Memory: Explicitly store user preferences, not implicit encoding - 🔄 Post-action Feedback: Every feedback is a learning opportunity

Dependencies

This skill requires the following tools to be available:

Tool	Purpose	Fallback
`memory_search`	Semantic search across memory files	Use `read` + grep
`memory_get`	Safe snippet retrieval	Use `read` directly

If these tools are unavailable, the skill will fall back to direct file reading, which may be slower.

The PAHF Loop (Three Steps)

Step 1: Pre-action Clarification

When to Ask: - Task has multiple reasonable options (e.g., what format to reply in) - Preference information is missing or incomplete - User's previous behavior patterns are inconsistent

How to Ask:

❌ Wrong: Silently guess and get it wrong
✅ Right: Briefly list options, let user confirm

Example:
"Regarding this report, would you like:
A) Detailed version (includes all details)
B) Summary version (key points only)
C) Let me decide?"

When NOT to Ask: - Task is urgent and obvious - Clear preference is already recorded - Asking would disrupt the flow

Step 2: Preference-grounded Action

Retrieve Preferences: Find relevant preferences from memory files

Memory File Locations: - MEMORY.md - Long-term preferences, core values - memory/YYYY-MM-DD.md - Recent preference changes - USER.md - Basic user information - IDENTITY.md - Your identity settings - memory/users/{user}.md - User-specific preferences

Retrieval Method: 1. Preferred: Use memory_search tool to search keywords 2. Fallback: Use memory_get for safe snippet retrieval 3. Manual: Read relevant files directly

When No Preference Found: - Use reasonable defaults - Record this decision for future adjustment

Step 3: Post-action Feedback Integration

Identify Feedback: - Direct correction: "No, I wanted..." - Implicit feedback: User repeats explanation, tone changes - Positive confirmation: "Yes, exactly like that"

Update Memory (with confirmation for significant changes):

# Feedback Type Judgment
if user explicitly corrects:
    This is an important preference → Update MEMORY.md
    Ask: "Should I remember this for future interactions?"

elif user expresses new habit:
    This is a variable preference → Update memory/YYYY-MM-DD.md
    Record without asking (daily log)

elif user simply confirms:
    Validated preference → Optionally record
    No explicit confirmation needed

Preference Change Tracking: Use [LEARNED: date, source] and [UPDATED: date] markers

Write Confirmation Policy

To balance personalization with user control:

Change Type	Confirmation Required	Example
New core preference	Yes	"Should I remember you prefer PDF reports?"
Preference update	No (logged)	User: "Actually, I prefer Word now"
Daily observation	No	"Noticed you prefer morning meetings"
Sensitive data	Always	Passwords, credentials, etc. (never store)

Sensitive data is NEVER stored: - ❌ Passwords, API keys, tokens - ❌ Financial details (account numbers, etc.) - ❌ Health information - ❌ Any data explicitly marked as sensitive

Preference Memory Structure

See references/preference-schema.md

Core Fields

## Preference Categories

### Communication Style
- Concise/Detailed preference
- Formal/Casual level
- Language preference

### Work Habits
- Time scheduling
- Priority judgment
- Decision style

### Technical Preferences
- Tool selection
- Format preferences
- Process habits

### Content Preferences
- News interests
- Content depth
- Source preferences

Practical Examples

Example 1: Learning New Preference

User: "From now on, always send reports in PDF format"

PAHF Response:
1. Pre-action: ✓ Clear instruction, no clarification needed
2. Retrieve: Check MEMORY.md → No such preference recorded
3. Execute: Confirm receipt, update with consent

Agent: "Got it! I'll remember to send reports in PDF format. This preference has been saved to your profile."

Update MEMORY.md:
> **Report Format**: PDF [LEARNED: 2026-03-05, explicit instruction]

Example 2: Handling Ambiguity

User: "Help me organize these files"

PAHF Response:
1. Pre-action: Detected ambiguity
   - Organize by what? Date/Type/Project?
   - Output format? List/Table/Report?

2. Ask:
   "To organize these files, I need to know:
   - How to organize? (Date/Type/Project)
   - Output format? (List/Table)"

User: "By date, in table format"

3. Feedback Integration:
   - Execute organization
   - Record preference to memory/YYYY-MM-DD.md
   - No confirmation needed (daily observation)

Example 3: Preference Drift Detection

Historical Preference (MEMORY.md):
> **Communication Style**: Concise, direct [LEARNED: 2026-02-20]

Recent Change (memory/2026-03-03.md):
> User emphasized wanting detailed explanations today

PAHF Behavior:
1. Detected preference conflict
2. Use recent preference (detailed)
3. Observe subsequent feedback
4. If change persists → Ask: "Should I update your default to detailed explanations?"
5. If confirmed → Update long-term preference with [UPDATED: date]

Importance of Dual Feedback Channels

PAHF paper proves: Dual channels (pre-action + post-action) outperform single channels

Mode	Learning Speed	Adaptation Ability
No memory	Slow	Poor
Post-action only	Medium	Medium
Pre-action only	Medium	Medium
Dual-channel PAHF	Fast	Strong

Why Dual Channels Work: - Pre-action: Proactively avoid errors, clarify intent - Post-action: Capture implicit preferences, adapt to changes

Best Practices

✅ Good Practices

Layered Preference Storage
Core preferences → MEMORY.md (stable)
Recent changes → memory/YYYY-MM-DD.md (dynamic)
User-specific → memory/users/{user}.md
Regular Review
Check for preference conflicts during heartbeat
Identify preference drift trends
Explicitly Record Sources ```markdown

Preference: Concise replies [LEARNED: 2026-02-20, user feedback] Preference: PDF format [LEARNED: 2026-03-05, explicit instruction] ```
Ask Before Storing Sensitive Preferences
When in doubt, ask for confirmation
Never store credentials or secrets

❌ Practices to Avoid

Don't Implicitly Assume: Ask if uncertain
Don't Over-record: Recording every detail creates noise
Don't Ignore Changes: "This time is different" is an important signal
Don't Store Without Consent: Ask for significant new preferences

Integration with Existing Memory System

PAHF enhances rather than replaces existing memory system:

File	Original Purpose	PAHF Enhancement
MEMORY.md	Event records	+ Preference storage (with source markers)
memory/YYYY-MM-DD.md	Daily logs	+ Preference change tracking
USER.md	User information	+ Basic preferences
memory/users/{user}.md	User records	+ PAHF preference format
HEARTBEAT.md	Periodic checks	+ Preference consistency checks

Audit & Transparency

All preference updates are logged and traceable:

Source Marker: Every preference has [LEARNED: date, source]
Change Log: Preference Change Log table tracks all changes
Date Stamps: [UPDATED: date] for modifications
User Review: Users can inspect memory files at any time

To review your stored preferences:

Read MEMORY.md for long-term preferences
Read memory/YYYY-MM-DD.md for recent changes
Read memory/users/{your-name}.md for user-specific preferences

Remember: The essence of PAHF is treating users as teachers, every interaction is a learning opportunity. Ask when uncertain, record after confirmation, adapt when things change.

pafh-mini

Installation

PAHF - Continual Personalization Framework

Core Philosophy

Dependencies

The PAHF Loop (Three Steps)

Step 1: Pre-action Clarification

Step 2: Preference-grounded Action

Step 3: Post-action Feedback Integration

Write Confirmation Policy

Preference Memory Structure

Core Fields

Practical Examples

Example 1: Learning New Preference

Example 2: Handling Ambiguity

Example 3: Preference Drift Detection

Importance of Dual Feedback Channels

Best Practices

✅ Good Practices

❌ Practices to Avoid

Integration with Existing Memory System

Audit & Transparency

pafh-mini

Installation

PAHF - Continual Personalization Framework

⚠️ Privacy & Consent Notice

Core Philosophy

Dependencies

The PAHF Loop (Three Steps)

Step 1: Pre-action Clarification

Step 2: Preference-grounded Action

Step 3: Post-action Feedback Integration

Write Confirmation Policy

Preference Memory Structure

Core Fields

Practical Examples

Example 1: Learning New Preference

Example 2: Handling Ambiguity

Example 3: Preference Drift Detection

Importance of Dual Feedback Channels

Best Practices

✅ Good Practices

❌ Practices to Avoid

Integration with Existing Memory System

Audit & Transparency