youtube-content

/home/avalon/.hermes/skills/media/youtube-content/SKILL.md · raw

YouTube Video Intelligence Extractor

Extract everything from a YouTube video: metadata, transcript, chapters, description links, social profiles, tags, engagement stats — saved to ~/.hermes/youtube/ for reuse across tasks.

Use This Together With video-watch

This skill now keeps the best parts of the original Hermes YouTube extractor and borrows the strongest workflow ideas from Brad Bonanno's claude-video /watch skill.

Use youtube-content when you need: - structured metadata archival - transcript/chapter extraction - description-link mining - reusable JSON artifacts in ~/.hermes/youtube/ - durable source records that pair cleanly with the separate video-watch visual-analysis workflow

Use video-watch when you need: - URL-or-local-file video analysis beyond YouTube - bug debugging from a screen recording - hook analysis from the first seconds of a video - focused timestamp-window inspection with denser frame extraction - answers grounded in extracted frames rather than only transcript/metadata

Best combined workflow: 1. Run youtube-content first for YouTube videos to get durable metadata, transcript, chapters, and link extraction. 2. If the user's question depends on what is visibly on screen, follow with a video-watch-style frame workflow focused on the specific chapter or timestamp range. 3. For long videos, do not trust a sparse whole-video visual pass if the user only cares about one moment. Re-run on a bounded window.

Provider Priority

  1. Supadata API (primary provider) — reliable transcript + metadata, no IP blocks, 1-2 credits per call
  2. web_extract (metadata fallback) — free, bypasses IP blocks
  3. youtube-transcript-api + Tor (transcript fallback) — free but unreliable from VPS

Visual-analysis handoff

This skill is now intentionally metadata/transcript-first.

When the user needs actual visual understanding of the video, switch to video-watch instead of trying to do URL-level Gemini analysis inside this extractor. That keeps the workflow closer to Brad Bonanno's /watch approach:

Dependencies

# Required
export SUPADATA_API_KEY=sd_...  # In ~/.hermes/.env

# Optional (legacy fallback)
pip install youtube-transcript-api pysocks

Quick Start

Standalone CLI

SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "https://youtube.com/watch?v=VIDEO_ID"
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --json
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" -l es
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --no-save

Agent Workflow (execute_code)

from hermes_tools import terminal
import os

url = "https://www.youtube.com/watch?v=VIDEO_ID"
key = os.environ.get("SUPADATA_API_KEY", "")
result = terminal(f'SUPADATA_API_KEY={key} python3 ~/.hermes/skills/media/youtube-content/scripts/youtube_extract.py "{url}" --json')
print(result["output"])

Direct API Call (simple transcript only)

from hermes_tools import web_extract
import json, os

# Quick transcript via Supadata — no script needed
video_id = "dQw4w9WgXcQ"
key = os.environ.get("SUPADATA_API_KEY", "")
from urllib.request import Request, urlopen
req = Request(
    f"https://api.supadata.ai/v1/transcript?url=https://www.youtube.com/watch?v={video_id}&mode=auto",
    headers={"x-api-key": key}
)
data = json.loads(urlopen(req, timeout=60).read())
print(data["content"][:500])

Storage Structure

All extractions saved to ~/.hermes/youtube/:

~/.hermes/youtube/
├── index.json                           # Master index of all extracted videos
├── VIDEO_ID_title-slug.json             # Full structured extraction (JSON)
├── VIDEO_ID_transcript.txt              # Human-readable transcript with chapters
└── ...

Full Extraction JSON Structure

{
  "extracted_at": "ISO timestamp",
  "video_id": "11-char ID",
  "url": "full YouTube URL",
  "title": "Video title",
  "channel": {"name", "id", "url", "handle"},
  "description": "Full description text",
  "duration_seconds": 213,
  "duration_string": "3:33",
  "upload_date": "2024-01-15T...",
  "category": "Education",
  "tags": ["tag1", "tag2"],
  "stats": {"views": 1000000, "likes": 50000},
  "chapters": [{"title": "Intro", "time": "0:00", "start_seconds": 0}],
  "thumbnail": "https://i.ytimg.com/...",
  "description_links": ["https://..."],
  "description_timestamps": [{"time": "0:00", "seconds": 0, "label": "Intro"}],
  "social_links": {"twitter": ["@handle"], "github": ["repo"]},
  "endscreen_videos": [{"title": "...", "url": "..."}],
  "transcript": {
    "available": true,
    "source": "supadata",
    "segment_count": 250,
    "full_text": "complete transcript as one string",
    "timestamped_text": "0:00 first line\n0:05 second line...",
    "segments": [{"text": "...", "start": 0.0, "duration": 2.5}]
  }
}

Why visual analysis moved out

We removed Gemini/OpenRouter video-url enrichment from the default YouTube extraction flow.

Reasons: - it created provider/config friction in Hermes - it blurred the line between archival extraction and actual video watching - Brad-style analysis is better served by the dedicated video-watch skill, which works from frames, contact sheets, timestamps, and captions

Use youtube-content to archive and structure the source. Use video-watch to actually watch the video.

Supadata API Reference

Base URL: https://api.supadata.ai/v1 Auth: x-api-key header

Transcript (GET /transcript)

YouTube Video Metadata (GET /youtube/video)

Unified Metadata (GET /metadata)

Async Job Polling (GET /job/{jobId})

URL Formats Supported

Manual fallback: YouTube page initial data

When Supadata is not configured/available and normal transcript tools fail, do not stop at a sparse noembed result. Fetch the watch page HTML directly and parse ytInitialData / ytcfg to recover useful structured data:

python3 - <<'PY'
import json, re, urllib.request
video_id = 'VIDEO_ID'
url = f'https://www.youtube.com/watch?v={video_id}&hl=en&gl=US'
req = urllib.request.Request(url, headers={
  'User-Agent': 'Mozilla/5.0',
  'Accept-Language': 'en-US,en;q=0.9',
})
html = urllib.request.urlopen(req, timeout=20).read().decode('utf-8', 'ignore')
open('/tmp/yt.html', 'w').write(html)
for name, pat in [('player', r'ytInitialPlayerResponse\s*=\s*({.+?});'), ('data', r'ytInitialData\s*=\s*({.+?});')]:
  m = re.search(pat, html)
  if m:
    open(f'/tmp/{name}.json', 'w').write(json.dumps(json.loads(m.group(1)), indent=2))
PY

Then inspect /tmp/data.json recursively for: - videoPrimaryInfoRenderer: title, views, likes - videoSecondaryInfoRenderer: channel name/id/handle/subscriber count - structuredDescriptionContentRenderervideoDescriptionHeaderRenderer: publish date, views, likes - expandableVideoDescriptionBodyRenderer.attributedDescriptionBodyText.content: full visible description and chapter timestamps - engagementPanels[] with panelIdentifier == engagement-panel-searchable-transcript: transcript availability and getTranscriptEndpoint.params

Use this fallback to update the saved JSON with metadata, description, timestamps, and extraction notes even when transcript text cannot be retrieved.

Pitfalls