youtube-content

/home/avalon/.hermes/skills/media/youtube-content/SKILL.md · raw

YouTube Video Intelligence Extractor

Extract everything from a YouTube video: metadata, transcript, chapters, description links, social profiles, tags, engagement stats — saved to ~/.hermes/youtube/ for reuse across tasks.

Use This Together With `video-watch`

This skill now keeps the best parts of the original Hermes YouTube extractor and borrows the strongest workflow ideas from Brad Bonanno's claude-video /watch skill.

Use youtube-content when you need: - structured metadata archival - transcript/chapter extraction - description-link mining - reusable JSON artifacts in ~/.hermes/youtube/ - durable source records that pair cleanly with the separate video-watch visual-analysis workflow

Use video-watch when you need: - URL-or-local-file video analysis beyond YouTube - bug debugging from a screen recording - hook analysis from the first seconds of a video - focused timestamp-window inspection with denser frame extraction - answers grounded in extracted frames rather than only transcript/metadata

Best combined workflow: 1. Run youtube-content first for YouTube videos to get durable metadata, transcript, chapters, and link extraction. 2. If the user's question depends on what is visibly on screen, follow with a video-watch-style frame workflow focused on the specific chapter or timestamp range. 3. For long videos, do not trust a sparse whole-video visual pass if the user only cares about one moment. Re-run on a bounded window.

Provider Priority

Supadata API (primary provider) — reliable transcript + metadata, no IP blocks, 1-2 credits per call
web_extract (metadata fallback) — free, bypasses IP blocks
youtube-transcript-api + Tor (transcript fallback) — free but unreliable from VPS

Visual-analysis handoff

This skill is now intentionally metadata/transcript-first.

When the user needs actual visual understanding of the video, switch to video-watch instead of trying to do URL-level Gemini analysis inside this extractor. That keeps the workflow closer to Brad Bonanno's /watch approach:

extract frames
inspect bounded timestamp windows
use captions/transcript as support, not as a substitute for visual evidence
ground the answer in what is visibly on screen

Dependencies

# Required
export SUPADATA_API_KEY=sd_...  # In ~/.hermes/.env

# Optional (legacy fallback)
pip install youtube-transcript-api pysocks

Quick Start

Standalone CLI

SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "https://youtube.com/watch?v=VIDEO_ID"
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --json
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" -l es
SUPADATA_API_KEY=sd_... python3 SKILL_DIR/scripts/youtube_extract.py "VIDEO_URL" --no-save

Agent Workflow (execute_code)

from hermes_tools import terminal
import os

url = "https://www.youtube.com/watch?v=VIDEO_ID"
key = os.environ.get("SUPADATA_API_KEY", "")
result = terminal(f'SUPADATA_API_KEY={key} python3 ~/.hermes/skills/media/youtube-content/scripts/youtube_extract.py "{url}" --json')
print(result["output"])

Direct API Call (simple transcript only)

from hermes_tools import web_extract
import json, os

# Quick transcript via Supadata — no script needed
video_id = "dQw4w9WgXcQ"
key = os.environ.get("SUPADATA_API_KEY", "")
from urllib.request import Request, urlopen
req = Request(
    f"https://api.supadata.ai/v1/transcript?url=https://www.youtube.com/watch?v={video_id}&mode=auto",
    headers={"x-api-key": key}
)
data = json.loads(urlopen(req, timeout=60).read())
print(data["content"][:500])

Storage Structure

All extractions saved to ~/.hermes/youtube/:

~/.hermes/youtube/
├── index.json                           # Master index of all extracted videos
├── VIDEO_ID_title-slug.json             # Full structured extraction (JSON)
├── VIDEO_ID_transcript.txt              # Human-readable transcript with chapters
└── ...

Full Extraction JSON Structure

{
  "extracted_at": "ISO timestamp",
  "video_id": "11-char ID",
  "url": "full YouTube URL",
  "title": "Video title",
  "channel": {"name", "id", "url", "handle"},
  "description": "Full description text",
  "duration_seconds": 213,
  "duration_string": "3:33",
  "upload_date": "2024-01-15T...",
  "category": "Education",
  "tags": ["tag1", "tag2"],
  "stats": {"views": 1000000, "likes": 50000},
  "chapters": [{"title": "Intro", "time": "0:00", "start_seconds": 0}],
  "thumbnail": "https://i.ytimg.com/...",
  "description_links": ["https://..."],
  "description_timestamps": [{"time": "0:00", "seconds": 0, "label": "Intro"}],
  "social_links": {"twitter": ["@handle"], "github": ["repo"]},
  "endscreen_videos": [{"title": "...", "url": "..."}],
  "transcript": {
    "available": true,
    "source": "supadata",
    "segment_count": 250,
    "full_text": "complete transcript as one string",
    "timestamped_text": "0:00 first line\n0:05 second line...",
    "segments": [{"text": "...", "start": 0.0, "duration": 2.5}]
  }
}

Why visual analysis moved out

We removed Gemini/OpenRouter video-url enrichment from the default YouTube extraction flow.

Reasons: - it created provider/config friction in Hermes - it blurred the line between archival extraction and actual video watching - Brad-style analysis is better served by the dedicated video-watch skill, which works from frames, contact sheets, timestamps, and captions

Use youtube-content to archive and structure the source. Use video-watch to actually watch the video.

Audio-download vs audio-recipe workflow

This skill does not download YouTube media assets. Supadata covers metadata and transcripts/generated transcripts; it does not provide raw YouTube audio files in this workflow. If the user asks “did we get the audio?” or expects an MP3, be explicit: the saved ~/.hermes/youtube/*.json is metadata/transcript intelligence, not an .mp3/.m4a asset.

When the user does not need the exact copyrighted YouTube audio and only wants a functional binaural/hemi-sync-style MP3: 1. Run normal youtube-content extraction first. 2. Inspect the description for frequency recipes, e.g. 185 Hz / 193 Hz = 8 Hz, carrier frequencies, brainwave bands, chapter notes, or app/Patreon/lossless links. 3. Generate an original approximation from the stated recipe with Python + wave/numpy and encode via ffmpeg to MP3. Keep wording clear: “generated approximation from the public frequency recipe, not a rip.” 4. Save generated audio under a media-vault path such as ~/hermes-media-vault/generated/hemisync/<descriptive-name>.mp3. 5. If the user wants a downloadable link rather than Telegram media, hand off to hetzner-s3-storage and upload/verify the object.

Practical binaural generation notes: - Use stereo only; headphones are required for the effect. - Left/right frequency difference is the binaural beat: 185/193 Hz gives 8 Hz; 185/199 Hz gives 14 Hz. - Use fades (10–15s) to avoid clicks, modest amplitude (~0.18–0.25) to avoid clipping, and optional low-volume harmonic bed/slow isochronic-style amplitude modulation for listenability. - Do not describe generated tracks as official Monroe Institute “Hemi-Sync”; use “hemi-sync style”, “binaural entrainment”, or “dual hemisphere sync style”.

Supadata API Reference

Base URL: https://api.supadata.ai/v1 Auth: x-api-key header

Transcript (`GET /transcript`)

url — YouTube URL (required)
lang — language code (optional)
mode — native (1 credit), generate (2 credits/min), auto (tries native first)
Videos >20 min return a jobId for async polling

YouTube Video Metadata (`GET /youtube/video`)

videoId — 11-char video ID (required)
Returns: title, description, duration, views, likes, tags, chapters, channel info
1 credit per request

Unified Metadata (`GET /metadata`)

url — any social media URL (YouTube, TikTok, Instagram, X, Facebook)
Returns: standardized metadata across platforms
1 credit per request

Async Job Polling (`GET /job/{jobId}`)

Poll every 1-2 seconds
Results expire after 1 hour

URL Formats Supported

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://youtube.com/shorts/VIDEO_ID
https://youtube.com/embed/VIDEO_ID
https://youtube.com/live/VIDEO_ID
Raw 11-character video ID

Manual fallback: YouTube page initial data

When Supadata is not configured/available and normal transcript tools fail, do not stop at a sparse noembed result. Fetch the watch page HTML directly and parse ytInitialData / ytcfg to recover useful structured data:

python3 - <<'PY'
import json, re, urllib.request
video_id = 'VIDEO_ID'
url = f'https://www.youtube.com/watch?v={video_id}&hl=en&gl=US'
req = urllib.request.Request(url, headers={
  'User-Agent': 'Mozilla/5.0',
  'Accept-Language': 'en-US,en;q=0.9',
})
html = urllib.request.urlopen(req, timeout=20).read().decode('utf-8', 'ignore')
open('/tmp/yt.html', 'w').write(html)
for name, pat in [('player', r'ytInitialPlayerResponse\s*=\s*({.+?});'), ('data', r'ytInitialData\s*=\s*({.+?});')]:
  m = re.search(pat, html)
  if m:
    open(f'/tmp/{name}.json', 'w').write(json.dumps(json.loads(m.group(1)), indent=2))
PY

Then inspect /tmp/data.json recursively for: - videoPrimaryInfoRenderer: title, views, likes - videoSecondaryInfoRenderer: channel name/id/handle/subscriber count - structuredDescriptionContentRenderer → videoDescriptionHeaderRenderer: publish date, views, likes - expandableVideoDescriptionBodyRenderer.attributedDescriptionBodyText.content: full visible description and chapter timestamps - engagementPanels[] with panelIdentifier == engagement-panel-searchable-transcript: transcript availability and getTranscriptEndpoint.params

Use this fallback to update the saved JSON with metadata, description, timestamps, and extraction notes even when transcript text cannot be retrieved.

Pitfalls

Supadata API key: Must be set as SUPADATA_API_KEY env var. Key format: sd_...
Sparse noembed fallback: If the script falls back to noembed, it may save only title/channel/thumbnail. Immediately try the watch-page ytInitialData fallback above before reporting partial extraction.
Transcript panel ≠ transcript access: The watch page may show a transcript panel and getTranscriptEndpoint.params, but /youtubei/v1/get_transcript can still fail with 400 FAILED_PRECONDITION from VPS/cloud environments. Record that the panel exists and save chapters/description; do not claim transcript extraction succeeded.
yt-dlp bot wall: yt-dlp --dump-single-json may fail with "Sign in to confirm you're not a bot" on cloud hosts. Treat it as another blocked path, not a final failure. When this happens, do NOT waste time on yt-dlp variations — the IP is blocked at the YouTube level. Supadata API handles this correctly because it routes through its own infrastructure. If the user needs actual audio/video download, they must provide cookies from a logged-in YouTube session (--cookies /path/to/cookies.txt in Netscape format).
Async for long videos: Videos >20 min trigger async processing. The script handles polling automatically but has a 120s timeout.
Credit costs: Transcripts are 1-2 credits, metadata is 1 credit. AI-generated transcripts cost 2 credits per minute of video.
youtube-transcript-api v1.2+ breaking change: Instance methods, not class methods. .snippets attribute, not dict keys.
Tor is legacy fallback: Still works but unreliable. Supadata should handle 99% of cases.
Private/age-restricted videos: Neither Supadata nor legacy methods can access these.