--- name: youtube-content description: Extract ALL structured data from YouTube videos — metadata, description, chapters, links, social profiles, transcript — and store for later use. Comprehensive video intelligence extraction. tags: [youtube, transcript, metadata, video, supadata] --- # YouTube Video Intelligence Extractor Extract everything from a YouTube video: metadata, transcript, chapters, description links, social profiles, tags, engagement stats — saved to ~/.hermes/youtube/ for reuse across tasks. ## Use This Together With `video-watch` This skill now keeps the best parts of the original Hermes YouTube extractor **and** borrows the strongest workflow ideas from Brad Bonanno's `claude-video` `/watch` skill. Use **`youtube-content`** when you need: - structured metadata archival - transcript/chapter extraction - description-link mining - reusable JSON artifacts in `~/.hermes/youtube/` - durable source records that pair cleanly with the separate `video-watch` visual-analysis workflow Use **`video-watch`** when you need: - URL-or-local-file video analysis beyond YouTube - bug debugging from a screen recording - hook analysis from the first seconds of a video - focused timestamp-window inspection with denser frame extraction - answers grounded in extracted frames rather than only transcript/metadata **Best combined workflow:** 1. Run `youtube-content` first for YouTube videos to get durable metadata, transcript, chapters, and link extraction. 2. If the user's question depends on what is visibly on screen, follow with a `video-watch`-style frame workflow focused on the specific chapter or timestamp range. 3. For long videos, do **not** trust a sparse whole-video visual pass if the user only cares about one moment. Re-run on a bounded window. ## Provider Priority 1. **Supadata API** (primary provider) — reliable transcript + metadata, no IP blocks, 1-2 credits per call 2. **web_extract** (metadata fallback) — free, bypasses IP blocks 3. **youtube-transcript-api + Tor** (transcript fallback) — free but unreliable from VPS ## Visual-analysis handoff This skill is now intentionally **metadata/transcript-first**. When the user needs actual visual understanding of the video, switch to **`video-watch`** instead of trying to do URL-level Gemini analysis inside this extractor. That keeps the workflow closer to Brad Bonanno's `/watch` approach: - extract frames - inspect bounded timestamp windows - use captions/transcript as support, not as a substitute for visual evidence - ground the answer in what is visibly on screen ## Dependencies ```bash # Required export SUPADATA_API_KEY=sd_... # In ~/.hermes/.env # Optional (legacy fallback) pip install youtube-transcript-api pysocks ``` ## Quick Start ### Standalone CLI ```bash SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "https://youtube.com/watch?v=VIDEO_ID" SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --json SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" -l es SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --no-save ``` ### Agent Workflow (execute_code) ```python from hermes_tools import terminal import os url = "https://www.youtube.com/watch?v=VIDEO_ID" key = os.environ.get("SUPADATA_API_KEY", "") result = terminal(f'SUPADATA_API_KEY={key} python3 ~/.hermes/skills/media/youtube-content/scripts/youtube_extract.py "{url}" --json') print(result["output"]) ``` ### Direct API Call (simple transcript only) ```python from hermes_tools import web_extract import json, os # Quick transcript via Supadata — no script needed video_id = "dQw4w9WgXcQ" key = os.environ.get("SUPADATA_API_KEY", "") from urllib.request import Request, urlopen req = Request( f"https://api.supadata.ai/v1/transcript?url=https://www.youtube.com/watch?v={video_id}&mode=auto", headers={"x-api-key": key} ) data = json.loads(urlopen(req, timeout=60).read()) print(data["content"][:500]) ``` ## Storage Structure All extractions saved to `~/.hermes/youtube/`: ``` ~/.hermes/youtube/ ├── index.json # Master index of all extracted videos ├── VIDEO_ID_title-slug.json # Full structured extraction (JSON) ├── VIDEO_ID_transcript.txt # Human-readable transcript with chapters └── ... ``` ### Full Extraction JSON Structure ```json { "extracted_at": "ISO timestamp", "video_id": "11-char ID", "url": "full YouTube URL", "title": "Video title", "channel": {"name", "id", "url", "handle"}, "description": "Full description text", "duration_seconds": 213, "duration_string": "3:33", "upload_date": "2024-01-15T...", "category": "Education", "tags": ["tag1", "tag2"], "stats": {"views": 1000000, "likes": 50000}, "chapters": [{"title": "Intro", "time": "0:00", "start_seconds": 0}], "thumbnail": "https://i.ytimg.com/...", "description_links": ["https://..."], "description_timestamps": [{"time": "0:00", "seconds": 0, "label": "Intro"}], "social_links": {"twitter": ["@handle"], "github": ["repo"]}, "endscreen_videos": [{"title": "...", "url": "..."}], "transcript": { "available": true, "source": "supadata", "segment_count": 250, "full_text": "complete transcript as one string", "timestamped_text": "0:00 first line\n0:05 second line...", "segments": [{"text": "...", "start": 0.0, "duration": 2.5}] } } ``` ## Why visual analysis moved out We removed Gemini/OpenRouter video-url enrichment from the default YouTube extraction flow. Reasons: - it created provider/config friction in Hermes - it blurred the line between archival extraction and actual video watching - Brad-style analysis is better served by the dedicated `video-watch` skill, which works from frames, contact sheets, timestamps, and captions Use `youtube-content` to archive and structure the source. Use `video-watch` to actually *watch* the video. ## Supadata API Reference Base URL: `https://api.supadata.ai/v1` Auth: `x-api-key` header ### Transcript (`GET /transcript`) - `url` — YouTube URL (required) - `lang` — language code (optional) - `mode` — `native` (1 credit), `generate` (2 credits/min), `auto` (tries native first) - Videos >20 min return a `jobId` for async polling ### YouTube Video Metadata (`GET /youtube/video`) - `videoId` — 11-char video ID (required) - Returns: title, description, duration, views, likes, tags, chapters, channel info - 1 credit per request ### Unified Metadata (`GET /metadata`) - `url` — any social media URL (YouTube, TikTok, Instagram, X, Facebook) - Returns: standardized metadata across platforms - 1 credit per request ### Async Job Polling (`GET /job/{jobId}`) - Poll every 1-2 seconds - Results expire after 1 hour ## URL Formats Supported - `https://www.youtube.com/watch?v=VIDEO_ID` - `https://youtu.be/VIDEO_ID` - `https://youtube.com/shorts/VIDEO_ID` - `https://youtube.com/embed/VIDEO_ID` - `https://youtube.com/live/VIDEO_ID` - Raw 11-character video ID ## Manual fallback: YouTube page initial data When Supadata is not configured/available and normal transcript tools fail, do not stop at a sparse `noembed` result. Fetch the watch page HTML directly and parse `ytInitialData` / `ytcfg` to recover useful structured data: ```bash python3 - <<'PY' import json, re, urllib.request video_id = 'VIDEO_ID' url = f'https://www.youtube.com/watch?v={video_id}&hl=en&gl=US' req = urllib.request.Request(url, headers={ 'User-Agent': 'Mozilla/5.0', 'Accept-Language': 'en-US,en;q=0.9', }) html = urllib.request.urlopen(req, timeout=20).read().decode('utf-8', 'ignore') open('/tmp/yt.html', 'w').write(html) for name, pat in [('player', r'ytInitialPlayerResponse\s*=\s*({.+?});'), ('data', r'ytInitialData\s*=\s*({.+?});')]: m = re.search(pat, html) if m: open(f'/tmp/{name}.json', 'w').write(json.dumps(json.loads(m.group(1)), indent=2)) PY ``` Then inspect `/tmp/data.json` recursively for: - `videoPrimaryInfoRenderer`: title, views, likes - `videoSecondaryInfoRenderer`: channel name/id/handle/subscriber count - `structuredDescriptionContentRenderer` → `videoDescriptionHeaderRenderer`: publish date, views, likes - `expandableVideoDescriptionBodyRenderer.attributedDescriptionBodyText.content`: full visible description and chapter timestamps - `engagementPanels[]` with `panelIdentifier == engagement-panel-searchable-transcript`: transcript availability and `getTranscriptEndpoint.params` Use this fallback to update the saved JSON with metadata, description, timestamps, and extraction notes even when transcript text cannot be retrieved. ## Pitfalls - **Supadata API key**: Must be set as `SUPADATA_API_KEY` env var. Key format: `sd_...` - **Sparse noembed fallback**: If the script falls back to noembed, it may save only title/channel/thumbnail. Immediately try the watch-page `ytInitialData` fallback above before reporting partial extraction. - **Transcript panel ≠ transcript access**: The watch page may show a transcript panel and `getTranscriptEndpoint.params`, but `/youtubei/v1/get_transcript` can still fail with `400 FAILED_PRECONDITION` from VPS/cloud environments. Record that the panel exists and save chapters/description; do not claim transcript extraction succeeded. - **yt-dlp bot wall**: `yt-dlp --dump-single-json` may fail with “Sign in to confirm you’re not a bot” on cloud hosts. Treat it as another blocked path, not a final failure. - **Async for long videos**: Videos >20 min trigger async processing. The script handles polling automatically but has a 120s timeout. - **Credit costs**: Transcripts are 1-2 credits, metadata is 1 credit. AI-generated transcripts cost 2 credits per minute of video. - **youtube-transcript-api v1.2+ breaking change**: Instance methods, not class methods. `.snippets` attribute, not dict keys. - **Tor is legacy fallback**: Still works but unreliable. Supadata should handle 99% of cases. - **Private/age-restricted videos**: Neither Supadata nor legacy methods can access these.