jpocr — Japanese OCR Skill

Local Japanese OCR powered by NDLOCR-Lite from Japan's National Diet Library. Runs on CPU (Apple Silicon / x86), no GPU or API key required.

Capabilities

Target	Quality
Printed Japanese (活字)	Excellent
Vertical text (縦書き)	Excellent
English text	Good
Handwritten Japanese (手書き)	Experimental

How to call

Run scripts/ocr-cli.sh from the skill root directory:

<SKILL_ROOT>/scripts/ocr-cli.sh <image_path>              # → plain text to stdout
<SKILL_ROOT>/scripts/ocr-cli.sh <image_path> --json        # → JSON with bounding boxes
<SKILL_ROOT>/scripts/ocr-cli.sh <image_path> --viz         # → also saves visualization
<SKILL_ROOT>/scripts/ocr-cli.sh <dir_path>                 # → batch all images in dir

Output formats

text (default): one line per detected text region.

json:

{
  "contents": [[
    {
      "boundingBox": [[x1,y1],[x1,y2],[x2,y1],[x2,y2]],
      "text": "recognized text",
      "confidence": 0.95,
      "isVertical": "true"
    }
  ]],
  "imginfo": { "img_width": 1920, "img_height": 1080 }
}

viz: saves viz_<filename> bounding-box overlay image to the output directory.

Performance

~2-3 seconds per image on Apple Silicon (CPU)
Formats: JPG, PNG, TIFF, JP2, BMP
Charset: ~7000 characters (JIS kanji + kana + ASCII + Greek)

Tech stack

Layout detection: DEIMv2 (ONNX)
Text recognition: PARSeq cascade (30/50/100 char models, ONNX)
Reading order: xy-cut algorithm