SkillHub

text

v1.0.0

Transform, format, and process text with patterns for writing, data cleaning, localization, citations, and copywriting.

Sourced from ClawHub, Authored by Iván

Installation

Please help me install the skill `text` from SkillHub official store. npx skills add ivangdavila/text

Quick Reference

Task Load
Creative writing (voice, dialogue, POV) writing.md
Data processing (CSV, regex, encoding) data.md
Academic/citations (APA, MLA, Chicago) academic.md
Marketing copy (headlines, CTA, email) copy.md
Translation/localization localization.md

Universal Text Rules

Encoding

  • Always verify encoding first: file -bi document.txt
  • Normalize line endings: tr -d 'r'
  • Remove BOM if present: sed -i '1s/^xEFxBBxBF//'

Whitespace

  • Collapse multiple spaces: sed 's/[[:space:]]+/ /g'
  • Trim leading/trailing: sed 's/^[[:space:]]*//;s/[[:space:]]*$//'

Common Traps

  • Smart quotes (" ") break parsers → normalize to "
  • Em/en dashes ( ) break ASCII → normalize to -
  • Zero-width chars invisible but break comparisons → strip them
  • String length ≠ byte length in UTF-8 ("café" = 4 chars, 5 bytes)

Format Detection

# Detect encoding
file -I document.txt

# Detect line endings
cat -A document.txt | head -1
# ^M at end = Windows (CRLF)
# No ^M = Unix (LF)

# Detect delimiter (CSV/TSV)
head -1 file | tr -cd ',;t|' | wc -c

Quick Transformations

Task Command
Lowercase tr '[:upper:]' '[:lower:]'
Remove punctuation tr -d '[:punct:]'
Count words wc -w
Count unique lines sort -u | wc -l
Find duplicates sort | uniq -d
Extract emails grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}'
Extract URLs grep -oE 'https?://[^[:space:]<>"{}|\^[]]+'`

Before Processing Checklist

  • [ ] Encoding verified (UTF-8?)
  • [ ] Line endings normalized
  • [ ] Delimiter identified (for structured text)
  • [ ] Target format/style defined
  • [ ] Edge cases considered (empty, Unicode, special chars)