The user asks you to download videos from a website where:
- The <video> element's src is a blob: URL (not a real CDN URL)
- The site uses Squarespace, Wistia, JWPlayer, Vimeo, or a custom HLS player
- Right-click → save doesn't work
- The user has legitimate access (logged-in, paid, owns the content)
Do not use for: YouTube (use yt-dlp directly with the page URL), DRM-protected streams (Widevine/PlayReady — yt-dlp cannot decrypt these), or content the user doesn't have rights to.
A blob: URL on <video> means the player builds the stream client-side from a master playlist. The real source is exposed somewhere on the page — usually in a data-* attribute, a <script> JSON config, or a network request the page already made. You extract that, then hand it to yt-dlp.
Load the page in the browser tool and inspect what's actually in the DOM:
// browser_console expression
(() => {
const out = {videos: [], iframes: [], sources: [], dataAttrs: [], urls: []};
document.querySelectorAll('video').forEach(v => out.videos.push({src: v.src, currentSrc: v.currentSrc, poster: v.poster}));
document.querySelectorAll('iframe').forEach(f => out.iframes.push({src: f.src}));
document.querySelectorAll('source').forEach(s => out.sources.push({src: s.src, type: s.type}));
const html = document.body.innerHTML;
const dataMatches = html.match(/data-config[^=]*="[^"]+"|data-video[^=]*="[^"]+"|videoUrl[^,}]+/gi);
out.dataAttrs = dataMatches ? dataMatches.slice(0,10) : [];
const urlMatches = html.match(/https?:[^"'\s]+\.(?:mp4|m3u8|mpd|webm)[^"'\s]*/gi);
out.urls = urlMatches ? [...new Set(urlMatches)].slice(0,20) : [];
return out;
})()
The player already fetched the playlist while the page loaded; performance entries reveal the real URL:
// browser_console expression
performance.getEntriesByType('resource')
.filter(e => /\.m3u8|\.mpd|\.mp4|cdn|video/.test(e.name))
.map(e => e.name)
This is usually the fastest path — the m3u8 master playlist URL is sitting right there.
See references/players.md for the per-player extraction recipes (Squarespace, Wistia, JWPlayer, Vimeo). Add a new entry there whenever you encounter a new player.
For HLS:
yt-dlp \
--referer 'https://site.com/' \
-o 'output-name.%(ext)s' \
--merge-output-format mp4 \
'https://cdn.example.com/path/playlist.m3u8'
yt-dlp handles AES-128 encrypted segments natively — no extra flags needed. For login-gated streams where the playlist URL contains a Signature query param, the URL itself is the auth — copy it fresh from the browser, do not store it long-term.
--referer 'https://origin-site.com/' to yt-dlp.playlist.m3u8 without mpegts- prefix). yt-dlp will pick the best variant. If you grab a variant directly you lock yourself to that resolution.blob: URL is never downloadable. Don't waste time on it. It's a MediaSource Extensions handle that only exists inside the page's JS context.--cookies-from-browser or --add-headers.browser_console JS evaluation — DOM parsing in shell is fragile.After downloading, always:
ffprobe -v error -show_entries format=duration,bit_rate -show_entries stream=codec_name,width,height file.mp4
to confirm you got a real video, not a 4KB error page renamed to .mp4.
For uploading downloaded videos to Hetzner Object Storage, see the hetzner-s3-storage skill for bucket creation, access controls, and presigned URLs.
rclone is not always installed on the VPS. The reliable path is python3 + boto3 with creds sourced from a known-good app's .env (/home/avalon/apps/video-story/.env). See templates/catalog-to-s3-pipeline.sh for the full bash+python pipeline that:
- creates the bucket with a public-read policy if missing,
- iterates a TSV manifest of (filename, asset_id) rows,
- dispatches per row to the right yt-dlp invocation (Squarespace native vs Vimeo iframe — extend for other players),
- ffprobes each output for sanity,
- uploads each MP4, then builds and uploads a .zip archive.
A multi-video catalog download will take minutes to hours. Foreground tool calls in this environment get interrupted whenever the user sends a new message, and an interrupted download leaves partial files. Always launch catalog downloads as a terminal(background=true, notify_on_complete=true) job so:
Write the manifest TSV to a stable path (e.g. ~/.hermes/jobs/<name>-manifest.tsv) so the bg script can read it, and use a .done marker file per video so a re-run is idempotent if the job is restarted.
When the user provides login credentials to access gated content, after the task is complete strip them from any logs you wrote. Common locations to scrub:
~/.hermes/jobs/<name>.log) — usually clean since yt-dlp doesn't log cookies, but check.set -x traces that may have captured env vars.history -d for individual entries; not a permanent rewrite but covers the common case)./tmp or staging dirs.Do NOT save the credentials to memory or skills — they're per-task secrets, not durable user facts.