XCLAW

Extract posts from X (Twitter) pages with full metadata including text, author, timestamp, engagement metrics, media URLs, and download local copies.

When to Use

List recent posts from a user's profile
Extract posts from home timeline
Capture post URLs without navigating into individual posts
Gather engagement metrics (likes, reposts, replies, views)
Download images and video thumbnails
Analyze content patterns across multiple posts

File Structure

intel/x/
├── {date}-{time}-X.md      # Intel file
└── media/
    └── {YYYYMMDD}-{HHMMSS}/        # Media folder (same timestamp)
        ├── image_{postId}_{index}.jpg
        └── video_{postId}_poster.jpg

File Naming

Item	Format	Example
Intel file	`{date}-{time}-X.md`	`2026-03-16-143052-X.md`
Media folder	`{YYYYMMDD}-{HHMMSS}`	`20260316-143052`
Image	`image_{postId}_{index}.jpg`	`image_2032741269689213289_0.jpg`
Video thumbnail	`video_{postId}_poster.jpg`	`video_2032741269689213289_poster.jpg`

OUTPUT FORMAT — EXACT FORMAT REQUIRED

WRITE THE INTEL FILE IN THIS EXACT FORMAT. FOLLOW EVERY COMPONENT PRECISELY.

# X Intelligence Report

**Scraped:** {date} {time} (Asia/Shanghai)
**Source:** X Home Timeline
**Posts Collected:** {count}

---

## Post {n}: {Topic Title}

**Author:** {Name} (@{handle})
**Posted:** {relative time}
**URL:** {post_url}

**Content:**
{post_text_content}

**Metrics:** ❤️ {likes} 🔁 {reposts} 💬 {replies} 👁 {views}

**Media:**
🖼️ media/{folder}/image_{postId}_{index}.jpg

**Signal:** {🔴 HIGH / 🟡 MEDIUM / 🟡 LOW / ⚪ SKIP} — {Brief signal assessment}

---

## Post {n+1}: ...

(MORE POSTS...)

---

## Summary

**High Signal (Tier 1):**
1. {Topic} — {One-line summary}
2. {Topic} — {One-line summary}

**Medium Signal (Tier 2):**
- {Topic} — {One-line summary}
- {Topic} — {One-line summary}

**Skip:**
- {Topic} — {Reason}

Media in Output Format (Conditional)

IF THE POST HAS MEDIA, INCLUDE THE MEDIA LINE AFTER METRICS:

**Media:**
🖼️ media/{folder}/image_{postId}_{index}.jpg

OR FOR VIDEO THUMBNAILS:

**Media:**
🎬 media/{folder}/video_{postId}_poster.jpg

IF THE POST HAS NO MEDIA, OMIT THE MEDIA SECTION ENTIRELY.

WORKFLOW

Step 1: Create Media Folder FIRST

BEFORE EXTRACTING ANY POSTS, create the media folder:

# Create folder with current timestamp
# Format: YYYYMMDD-HHMMSS
mkdir -p ../../intel/x/media/{YYYYMMDD}-{HHMMSS}

# Example
mkdir -p ../../intel/x/media/20260316-143052

Step 2: Navigate to Target Page

// For home timeline
browser.navigate({ url: "https://x.com/home" })

// For a specific user profile
browser.navigate({ url: "https://x.com/username" })

Step 3: Scroll to Load More Posts

browser.act({ 
  fn: "() => { for(let i=0; i<3; i++) { window.scrollBy(0, window.innerHeight); } return 'scrolled'; }",
  kind: "evaluate"
})

Step 4: Extract Posts with Media URLs

browser.act({
  fn: "() => {
    const articles = document.querySelectorAll('article[data-testid="tweet"]');
    const posts = [];
    articles.forEach(article => {
      const url = article.querySelector('a[href*="/status/"]')?.href;
      if (!url || url.includes('/analytics') || url.includes('/photo')) return;

      const text = article.querySelector('[data-testid="tweetText"]')?.innerText || '';
      const timeEl = article.querySelector('time');
      const timestamp = timeEl?.innerText || '';
      const datetime = timeEl?.dateTime || '';

      // Get author info
      const nameEl = article.querySelector('[data-testid="User-Name"]');
      const name = nameEl?.querySelector('span')?.innerText || '';
      const handle = nameEl?.querySelector('a[href*="/"]')?.innerText || '';

      // Get engagement metrics
      const metricButtons = article.querySelectorAll('button[role="button"]');
      let replies = '0', reposts = '0', likes = '0', views = '0';

      metricButtons.forEach(btn => {
        const txt = btn.innerText || '';
        const label = btn.getAttribute('aria-label') || '';
        if (label && label.includes('Reply')) {
          const match = txt.match(/(d+)/);
          if (match) replies = match[1];
        }
        if (label && label.includes('Repost')) {
          const match = txt.match(/(d+)/);
          if (match) reposts = match[1];
        }
        if (label && label.includes('Like')) {
          const match = txt.match(/(d+)/);
          if (match) likes = match[1];
        }
      });

      // Views are in an <a> link (not button), e.g. 71K
      const viewLink = article.querySelector('a[href*="/analytics"]');
      if (viewLink) {
        const v = viewLink.innerText.trim();
        views = v.replace(/[^0-9KMB.]/g, '') || '0';
      }

      // Get media URLs - images and video thumbnails
      const mediaUrls = [];

      // Image posts: pbs.twimg.com/media/
      const images = article.querySelectorAll('img[src*="pbs.twimg.com/media/"]');
      images.forEach(img => {
        if (img.src && !mediaUrls.includes(img.src)) {
          mediaUrls.push(img.src);
        }
      });

      // Also check for image links in <a> tags
      const imageLinks = article.querySelectorAll('a[href*="pbs.twimg.com/media/"]');
      imageLinks.forEach(link => {
        if (link.href && !mediaUrls.includes(link.href)) {
          mediaUrls.push(link.href);
        }
      });

      // Video thumbnails: pbs.twimg.com/amplify_video_thumb/
      const videoThumbs = article.querySelectorAll('img[src*="pbs.twimg.com/amplify_video_thumb/"]');
      videoThumbs.forEach(img => {
        if (img.src && !mediaUrls.includes(img.src)) {
          mediaUrls.push(img.src);
        }
      });

      const postId = url.match(//status/(d+)/)?.[1] || '';

      posts.push({ 
        name,
        handle,
        text: text.substring(0, 500),
        timestamp, 
        datetime,
        url,
        postId,
        mediaUrls,
        metrics: { replies, reposts, likes, views }
      });
    });
    return posts.slice(0, 10);
  }",
  kind: "evaluate"
})

Step 5: Download Media Files

FOR EACH POST WITH MEDIA, DOWNLOAD THE FILES:

# Download image - replace URL with actual media URL from extraction
curl -L -o "../../intel/x/media/{folder}/image_{postId}_{index}.jpg" "https://pbs.twimg.com/media/xxx?format=jpg&name=large"

# Download video thumbnail
curl -L -o "../../intel/x/media/{folder}/video_{postId}_poster.jpg" "https://pbs.twimg.com/amplify_video_thumb/xxx"

IMPORTANT: - USE -L FLAG TO FOLLOW REDIRECTS - ADD &name=large FOR FULL-RESOLUTION IMAGES - USE THE ACTUAL URLS EXTRACTED FROM STEP 4

Step 6: Write Intel File

WRITE THE MARKDOWN FILE TO ../../intel/x/{date}-{time}-X.md

FOLLOW THE EXACT OUTPUT FORMAT SHOWN ABOVE. INCLUDE: - Header with Scraped, Source, Posts Collected - Each post with Author, Posted, URL, Content, Signal - Media references if present - Summary section at the end

Step 7: Close Chrome Tab

browser.close()

Key Selectors

Element	Selector
Article container	`article[data-testid="tweet"]`
Post text	`[data-testid="tweetText"]`
Timestamp	`time`
User name	`[data-testid="User-Name"]`
Post URL	`a[href*="/status/"]`
Analytics link (views)	`a[href*="/analytics"]`
Metric buttons	`button[role="button"]`
Images	`img[src*="pbs.twimg.com/media/"]`
Video thumbnails	`img[src*="pbs.twimg.com/amplify_video_thumb/"]`

Tips

Don't click into posts - Extract URLs directly from the timeline
Filter URLs - Exclude /analytics and /photo URLs
Scroll before extracting - X loads posts dynamically
Create folder FIRST - Media folder must exist before downloads
Use -L flag - curl must follow redirects for pbs.twimg.com URLs
Add &name=large - Get full-resolution images, not thumbnails
Always close the tab - Task ends when tab closes

x-openclaw

Installation