---
name: youtube-content
description: Extract ALL structured data from YouTube videos — metadata, description, chapters, links, social profiles, transcript — and store for later use. Comprehensive video intelligence extraction.
tags: [youtube, transcript, metadata, video, supadata]
---

# YouTube Video Intelligence Extractor

Extract everything from a YouTube video: metadata, transcript, chapters, description links, social profiles, tags, engagement stats — saved to ~/.hermes/youtube/ for reuse across tasks.

## Use This Together With `video-watch`

This skill now keeps the best parts of the original Hermes YouTube extractor **and** borrows the strongest workflow ideas from Brad Bonanno's `claude-video` `/watch` skill.

Use **`youtube-content`** when you need:
- structured metadata archival
- transcript/chapter extraction
- description-link mining
- reusable JSON artifacts in `~/.hermes/youtube/`
- durable source records that pair cleanly with the separate `video-watch` visual-analysis workflow

Use **`video-watch`** when you need:
- URL-or-local-file video analysis beyond YouTube
- bug debugging from a screen recording
- hook analysis from the first seconds of a video
- focused timestamp-window inspection with denser frame extraction
- answers grounded in extracted frames rather than only transcript/metadata

**Best combined workflow:**
1. Run `youtube-content` first for YouTube videos to get durable metadata, transcript, chapters, and link extraction.
2. If the user's question depends on what is visibly on screen, follow with a `video-watch`-style frame workflow focused on the specific chapter or timestamp range.
3. For long videos, do **not** trust a sparse whole-video visual pass if the user only cares about one moment. Re-run on a bounded window.

## Provider Priority

1. **Supadata API** (primary provider) — reliable transcript + metadata, no IP blocks, 1-2 credits per call
2. **web_extract** (metadata fallback) — free, bypasses IP blocks
3. **youtube-transcript-api + Tor** (transcript fallback) — free but unreliable from VPS

## Visual-analysis handoff

This skill is now intentionally **metadata/transcript-first**.

When the user needs actual visual understanding of the video, switch to **`video-watch`** instead of trying to do URL-level Gemini analysis inside this extractor. That keeps the workflow closer to Brad Bonanno's `/watch` approach:

- extract frames
- inspect bounded timestamp windows
- use captions/transcript as support, not as a substitute for visual evidence
- ground the answer in what is visibly on screen

## Dependencies

```bash
# Required
export SUPADATA_API_KEY=sd_...  # In ~/.hermes/.env

# Optional (legacy fallback)
pip install youtube-transcript-api pysocks
```

## Quick Start

### Standalone CLI
```bash
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "https://youtube.com/watch?v=VIDEO_ID"
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --json
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" -l es
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --no-save
```

### Agent Workflow (execute_code)
```python
from hermes_tools import terminal
import os

url = "https://www.youtube.com/watch?v=VIDEO_ID"
key = os.environ.get("SUPADATA_API_KEY", "")
result = terminal(f'SUPADATA_API_KEY={key} python3 ~/.hermes/skills/media/youtube-content/scripts/youtube_extract.py "{url}" --json')
print(result["output"])
```

### Direct API Call (simple transcript only)
```python
from hermes_tools import web_extract
import json, os

# Quick transcript via Supadata — no script needed
video_id = "dQw4w9WgXcQ"
key = os.environ.get("SUPADATA_API_KEY", "")
from urllib.request import Request, urlopen
req = Request(
    f"https://api.supadata.ai/v1/transcript?url=https://www.youtube.com/watch?v={video_id}&mode=auto",
    headers={"x-api-key": key}
)
data = json.loads(urlopen(req, timeout=60).read())
print(data["content"][:500])
```

## Storage Structure

All extractions saved to `~/.hermes/youtube/`:

```
~/.hermes/youtube/
├── index.json                           # Master index of all extracted videos
├── VIDEO_ID_title-slug.json             # Full structured extraction (JSON)
├── VIDEO_ID_transcript.txt              # Human-readable transcript with chapters
└── ...
```

### Full Extraction JSON Structure
```json
{
  "extracted_at": "ISO timestamp",
  "video_id": "11-char ID",
  "url": "full YouTube URL",
  "title": "Video title",
  "channel": {"name", "id", "url", "handle"},
  "description": "Full description text",
  "duration_seconds": 213,
  "duration_string": "3:33",
  "upload_date": "2024-01-15T...",
  "category": "Education",
  "tags": ["tag1", "tag2"],
  "stats": {"views": 1000000, "likes": 50000},
  "chapters": [{"title": "Intro", "time": "0:00", "start_seconds": 0}],
  "thumbnail": "https://i.ytimg.com/...",
  "description_links": ["https://..."],
  "description_timestamps": [{"time": "0:00", "seconds": 0, "label": "Intro"}],
  "social_links": {"twitter": ["@handle"], "github": ["repo"]},
  "endscreen_videos": [{"title": "...", "url": "..."}],
  "transcript": {
    "available": true,
    "source": "supadata",
    "segment_count": 250,
    "full_text": "complete transcript as one string",
    "timestamped_text": "0:00 first line\n0:05 second line...",
    "segments": [{"text": "...", "start": 0.0, "duration": 2.5}]
  }
}
```

## Why visual analysis moved out

We removed Gemini/OpenRouter video-url enrichment from the default YouTube extraction flow.

Reasons:
- it created provider/config friction in Hermes
- it blurred the line between archival extraction and actual video watching
- Brad-style analysis is better served by the dedicated `video-watch` skill, which works from frames, contact sheets, timestamps, and captions

Use `youtube-content` to archive and structure the source.
Use `video-watch` to actually *watch* the video.

## Audio-download vs audio-recipe workflow

This skill does **not** download YouTube media assets. Supadata covers metadata and transcripts/generated transcripts; it does not provide raw YouTube audio files in this workflow. If the user asks “did we get the audio?” or expects an MP3, be explicit: the saved `~/.hermes/youtube/*.json` is metadata/transcript intelligence, not an `.mp3`/`.m4a` asset.

When the user does **not** need the exact copyrighted YouTube audio and only wants a functional binaural/hemi-sync-style MP3:
1. Run normal `youtube-content` extraction first.
2. Inspect the description for frequency recipes, e.g. `185 Hz / 193 Hz = 8 Hz`, carrier frequencies, brainwave bands, chapter notes, or app/Patreon/lossless links.
3. Generate an original approximation from the stated recipe with Python + `wave`/`numpy` and encode via `ffmpeg` to MP3. Keep wording clear: “generated approximation from the public frequency recipe, not a rip.”
4. Save generated audio under a media-vault path such as `~/hermes-media-vault/generated/hemisync/<descriptive-name>.mp3`.
5. If the user wants a downloadable link rather than Telegram media, hand off to `hetzner-s3-storage` and upload/verify the object.

Practical binaural generation notes:
- Use stereo only; headphones are required for the effect.
- Left/right frequency difference is the binaural beat: `185/193 Hz` gives `8 Hz`; `185/199 Hz` gives `14 Hz`.
- Use fades (10–15s) to avoid clicks, modest amplitude (`~0.18–0.25`) to avoid clipping, and optional low-volume harmonic bed/slow isochronic-style amplitude modulation for listenability.
- Do not describe generated tracks as official Monroe Institute “Hemi-Sync”; use “hemi-sync style”, “binaural entrainment”, or “dual hemisphere sync style”.

## Supadata API Reference

Base URL: `https://api.supadata.ai/v1`
Auth: `x-api-key` header

### Transcript (`GET /transcript`)
- `url` — YouTube URL (required)
- `lang` — language code (optional)
- `mode` — `native` (1 credit), `generate` (2 credits/min), `auto` (tries native first)
- Videos >20 min return a `jobId` for async polling

### YouTube Video Metadata (`GET /youtube/video`)
- `videoId` — 11-char video ID (required)
- Returns: title, description, duration, views, likes, tags, chapters, channel info
- 1 credit per request

### Unified Metadata (`GET /metadata`)
- `url` — any social media URL (YouTube, TikTok, Instagram, X, Facebook)
- Returns: standardized metadata across platforms
- 1 credit per request

### Async Job Polling (`GET /job/{jobId}`)
- Poll every 1-2 seconds
- Results expire after 1 hour

## URL Formats Supported

- `https://www.youtube.com/watch?v=VIDEO_ID`
- `https://youtu.be/VIDEO_ID`
- `https://youtube.com/shorts/VIDEO_ID`
- `https://youtube.com/embed/VIDEO_ID`
- `https://youtube.com/live/VIDEO_ID`
- Raw 11-character video ID

## Manual fallback: YouTube page initial data

When Supadata is not configured/available and normal transcript tools fail, do not stop at a sparse `noembed` result. Fetch the watch page HTML directly and parse `ytInitialData` / `ytcfg` to recover useful structured data:

```bash
python3 - <<'PY'
import json, re, urllib.request
video_id = 'VIDEO_ID'
url = f'https://www.youtube.com/watch?v={video_id}&hl=en&gl=US'
req = urllib.request.Request(url, headers={
  'User-Agent': 'Mozilla/5.0',
  'Accept-Language': 'en-US,en;q=0.9',
})
html = urllib.request.urlopen(req, timeout=20).read().decode('utf-8', 'ignore')
open('/tmp/yt.html', 'w').write(html)
for name, pat in [('player', r'ytInitialPlayerResponse\s*=\s*({.+?});'), ('data', r'ytInitialData\s*=\s*({.+?});')]:
  m = re.search(pat, html)
  if m:
    open(f'/tmp/{name}.json', 'w').write(json.dumps(json.loads(m.group(1)), indent=2))
PY
```

Then inspect `/tmp/data.json` recursively for:
- `videoPrimaryInfoRenderer`: title, views, likes
- `videoSecondaryInfoRenderer`: channel name/id/handle/subscriber count
- `structuredDescriptionContentRenderer` → `videoDescriptionHeaderRenderer`: publish date, views, likes
- `expandableVideoDescriptionBodyRenderer.attributedDescriptionBodyText.content`: full visible description and chapter timestamps
- `engagementPanels[]` with `panelIdentifier == engagement-panel-searchable-transcript`: transcript availability and `getTranscriptEndpoint.params`

Use this fallback to update the saved JSON with metadata, description, timestamps, and extraction notes even when transcript text cannot be retrieved.

## Pitfalls

- **Supadata API key**: Must be set as `SUPADATA_API_KEY` env var. Key format: `sd_...`
- **Sparse noembed fallback**: If the script falls back to noembed, it may save only title/channel/thumbnail. Immediately try the watch-page `ytInitialData` fallback above before reporting partial extraction.
- **Transcript panel ≠ transcript access**: The watch page may show a transcript panel and `getTranscriptEndpoint.params`, but `/youtubei/v1/get_transcript` can still fail with `400 FAILED_PRECONDITION` from VPS/cloud environments. Record that the panel exists and save chapters/description; do not claim transcript extraction succeeded.
- **yt-dlp bot wall**: `yt-dlp --dump-single-json` may fail with "Sign in to confirm you're not a bot" on cloud hosts. Treat it as another blocked path, not a final failure. When this happens, do NOT waste time on yt-dlp variations — the IP is blocked at the YouTube level. Supadata API handles this correctly because it routes through its own infrastructure. If the user needs actual audio/video download, they must provide cookies from a logged-in YouTube session (`--cookies /path/to/cookies.txt` in Netscape format).
- **Async for long videos**: Videos >20 min trigger async processing. The script handles polling automatically but has a 120s timeout.
- **Credit costs**: Transcripts are 1-2 credits, metadata is 1 credit. AI-generated transcripts cost 2 credits per minute of video.
- **youtube-transcript-api v1.2+ breaking change**: Instance methods, not class methods. `.snippets` attribute, not dict keys.
- **Tor is legacy fallback**: Still works but unreliable. Supadata should handle 99% of cases.
- **Private/age-restricted videos**: Neither Supadata nor legacy methods can access these.