pdf-brand-extraction

/home/avalon/.hermes/skills/software-development/pdf-brand-extraction/SKILL.md · raw

PDF Brand Standards Extraction

Extract colors, typography, logos, and imagery from a brand standards PDF into an organized style guide with CSS design tokens and extracted assets.

When to Use

User sends a brand standards/guidelines PDF
Need exact color values, font specs, and logo assets from a design document
Building a style guide for a web app from a PDF reference

Prerequisites

pymupdf must be installed in the Hermes venv:

uv pip install pymupdf --python /home/avalon/.hermes/hermes-agent/venv/bin/python3

Then import pymupdf works in python3.

Step 1: Extract Text from Every Page

import pymupdf
doc = pymupdf.open('/path/to/brand.pdf')
print(f'Pages: {len(doc)}')
for i, page in enumerate(doc):
    text = page.get_text()
    if text.strip():
        print(f'\n=== PAGE {i+1} ===')
        print(text)

This gets all text content — color names, hex values, font names, usage rules. But WARNING: PDF text extraction is unreliable for spec values. Color hex codes, RGB values, and CMYK values extracted from text are often wrong (duplicated, garbled, or from overlapping text layers). Always verify with pixel sampling (Step 4).

Step 2: Extract Embedded Images

import os
out_dir = '/path/to/app/public/kb/brand-assets'
os.makedirs(out_dir, exist_ok=True)

doc = pymupdf.open('/path/to/brand.pdf')
for page_num in range(len(doc)):
    page = doc[page_num]
    images = page.get_images(full=True)
    for img_idx, img in enumerate(images):
        xref = img[0]
        base_image = doc.extract_image(xref)
        if base_image and len(base_image["image"]) > 500:  # skip tiny images
            ext = base_image["ext"]
            fname = f"page{page_num+1:02d}_img{img_idx:02d}.{ext}"
            with open(os.path.join(out_dir, fname), "wb") as f:
                f.write(base_image["image"])

Note: Vector graphics (logos, icons) won't appear as extractable images. They're embedded as PDF drawing commands. Use page rendering (Step 3) to capture them.

Step 3: Render Pages as PNGs

for i in range(len(doc)):
    page = doc[i]
    pix = page.get_pixmap(dpi=150)  # 150 for reference, 300 for pixel sampling
    pix.save(os.path.join(out_dir, f"page_{i+1:02d}.png"))

These rendered PNGs can be analyzed with vision_analyze and used for pixel color sampling.

Step 4: Pixel-Sample Exact Colors (CRITICAL)

DO NOT trust text-extracted hex/RGB values. In real brand PDFs, the printed color specs are often wrong — duplicated across swatches, garbled by text layers, or simply production errors.

Instead, render the color swatch pages at 300dpi and sample the actual pixel values:

doc = pymupdf.open('/path/to/brand.pdf')
page = doc[COLOR_PAGE_INDEX]  # 0-indexed
pix = page.get_pixmap(dpi=300)
w, h = pix.width, pix.height

# Sample multiple points within each color swatch area
# Adjust x/y percentages based on where the swatches are on the page
samples = []
for x in range(int(w*0.1), int(w*0.35), int(w*0.05)):
    for y in range(int(h*0.3), int(h*0.7), int(h*0.1)):
        r, g, b = pix.pixel(x, y)[:3]
        if r < 100:  # filter for dark pixels if sampling a dark swatch
            samples.append((r, g, b))

if samples:
    avg = tuple(sum(c)//len(samples) for c in zip(*samples))
    print(f"Color: RGB({avg[0]}, {avg[1]}, {avg[2]}) = #{avg[0]:02X}{avg[1]:02X}{avg[2]:02X}")

Sample 10-20+ points per swatch and average them. Filter by brightness range to avoid accidentally sampling background, text, or borders within the swatch area.

Step 5: Analyze Visual Pages with Vision

Use vision_analyze on the rendered page PNGs for: - Logo descriptions and variations - Typography specimens (font names, weights, usage rules) - Layout patterns and design rules - Photography style guidelines

Use delegate_task with parallel subagents for batch analysis (same pattern as screenshot-knowledge-base skill).

Step 6: Write the Style Guide

Compile everything into a structured markdown document with:

Color Palette — Every color with name, verified hex, RGB. Note primary vs secondary vs seasonal.
Typography — Font names, weights, tracking values, usage hierarchy (headers/body/accent).
Logo System — Variations, usage rules, badge structures.
CSS Design Tokens — Ready-to-use CSS custom properties.
Tailwind Config — Theme extension block if using Tailwind.
Asset Index — Table mapping extracted files to their content.
Brand Extension — Photography style, social media rules, seasonal strategy.

Step 7: Apply to App

When applying brand tokens to an existing app: - Update CSS custom properties / Tailwind @theme tokens - Replace generic grays with brand charcoal/neutrals - Replace generic accents with brand copper/earth tones - Update font-family stacks - Add letter-spacing (tracking) per brand specs - Update component-level classes (sidebar, cards, headers, buttons) - Build logo as SVG component if it's geometric (don't use raster)

Pitfalls

pymupdf install location — On this VPS, pip install goes to python3.12 but Hermes runs python3.11. Use uv pip install pymupdf --python /path/to/venv/python3 to target the right environment.
Text-extracted color values are UNRELIABLE — In the Jungle brand PDF, both Charcoal and White Sand had identical printed specs (#F4EDE0). Pixel sampling revealed Charcoal is actually #414242 (dark gray, not beige). Always pixel-verify.
Vector logos don't extract as images — page.get_images() only returns raster images. Logos drawn with PDF vector commands need to be either: (a) rendered as high-res PNGs and traced, (b) recreated as SVG from the visual description, or (c) extracted with specialized tools. For simple geometric logos (like concentric arches), recreating as SVG is fastest.
vision_analyze can't read small text on rendered pages — Hex codes and RGB values printed small on color swatches are often misread by vision. Don't rely on vision for exact numeric specs — use text extraction for the names and pixel sampling for the actual values.
Brand PDFs have repeated sidebar/nav text — Most brand guide PDFs have a persistent table of contents sidebar on every page. The extracted text will contain this repeated navigation text on every page. Filter it out or ignore it when parsing.