Alex runs two multi-tenant Hermes control planes that share ~80% of their architecture. Treat them as siblings: the same patterns, pitfalls, and conventions apply to both.
| Astral Hermes | Hermes Spawn | |
|---|---|---|
| Scope | Vertical (astrology) | Horizontal (general-purpose Hermes) |
| App path | /home/avalon/apps/astral-hermes-platform |
/home/avalon/apps/hermes-spawn |
| PM2 process | astral-hermes-web |
hermes-spawn |
| Public URL | astral.apps.poofc.com |
spawn.apps.poofc.com |
| Port (control plane) | varies (Express) | 4031 |
| Tenant root (worker) | /srv/astral/tenants/<id> |
/home/avalon/hermes-spawn-tenants/<id> |
| Tenant container name | astral-tenant-<id> |
spawn-tenant-<id> |
| Worker host | avalon@5.78.199.26 |
avalon@5.78.199.26 |
| GitHub | firemountain/astral-hermes-platform |
firemountain/hermes-spawn |
| Domain skill bundles | astral-core + astral-hd (versioned) |
user picks skills à la carte |
Astral preceded Spawn architecturally; many Spawn patterns originated in Astral. Pre-existing implementations of "new" features are common — always grep first before delegating or writing fresh code.
Each tenant has a hermes-home dir on the worker (<TENANT_ROOT>/<id>/hermes-home/) containing:
config.yaml — Hermes config (model provider, skills, toolsets, storage, etc.).env — tenant secrets (API keys, Telegram tokens)auth.json — provider OAuth/credential pool (see "Auth shapes" below)data/hermes/knowledge/ — tenant KB (Spawn; Astral has its own variant)data/hermes/scripts/ — guard/util scripts (S3 quota guard etc.)The container runs astral-hermes-runner (Astral) or spawn-hermes-runner (Spawn) image and is started/stopped via the control plane's SSH-exec layer.
Both apps use a providerConfig({ provider, model }) function returning { envName, config, note? }:
openai (or openai-api in Spawn): envName: OPENAI_API_KEY, provider: custom w/ base_url: https://api.openai.com/v1, api_mode: chat_completionsopenai-codex: ChatGPT subscription OAuth. envName: '' (no env var). provider: openai-codex. Requires post-create device-auth.anthropic: envName: ANTHROPIC_API_KEY. provider: anthropic, api_mode: anthropic_messages, default claude-sonnet-4-5. Working in both apps as of 2026-05-22.anthropic-oauth: Claude Pro/Max subscription via PKCE OAuth. Astral has helpers built (installAnthropicOauthCredential, exchangeAnthropicAuthorization, buildAnthropicAuthorizeUrl, generateAnthropicPkce) but no completion Express route or UI flow yet. Spawn marks it oauth-unavailable with a clear "use API-key path for now" message pointing users to hermes auth add anthropic --type oauth inside the container.openrouter: envName: OPENROUTER_API_KEY. provider: openrouter.When adding a new provider, update providerConfig, the betaProviderMatrix / publicProviderOptions (Astral: src/provider-matrix.mjs), the wizard UI, the allowed-providers gate in /api/provision (Spawn: server.mjs ~line 818, "allowed = new Set([...])"), and security regression tests.
Hermes core resolves OAuth credentials from two locations in auth.json. Both must be written or some code paths (gateway, cron, CLI probes) silently fail.
For openai-codex (Codex device-auth):
# credential_pool shape — used by `hermes auth list` and status probes
store['credential_pool']['openai-codex'] = [entry, ...]
# providers shape — used by gateway/cron/model calls at runtime
store['providers']['openai-codex'] = {
'tokens': {'access_token': ..., 'refresh_token': ...},
'last_refresh': ...,
'auth_mode': 'chatgpt',
'base_url': ...,
}
store['active_provider'] = 'openai-codex'
Spawn writes both shapes in server.mjs ~line 274 (search for the Python heredoc that writes auth.json). Astral writes both shapes via installCodexCredential. If a tenant reports "No Codex credentials stored" while hermes auth list shows the credential exists, the runtime shape is missing — re-run device-auth completion or hand-patch. See references/codex-auth-shape-fix.md for the codyguy/cody1 incident.
For Anthropic OAuth (Astral only, helpers built):
# credential_pool shape
store['credential_pool']['anthropic'] = [entry, ...]
# Plus a separate .anthropic_oauth.json file at hermes-home root with
# {accessToken, refreshToken, expiresAt} — Hermes core reads this directly.
INVITE_CODE; Spawn: gated user registration).<TENANT_ROOT>/<id>/hermes-home/, write config.yaml + .env.astral / spawn CLI helper.src/entitlement-ledger.mjs in Astral; equivalent DB in Spawn).Astral ships versioned bundles (astral-core@0.2.0, astral-hd@0.2.0) installed via node bin/astral.mjs bundle install --tenant <id> --bundle <name> --version <v> [--replace] [--restart]. Use this for any persona-shaped tenant set: write the bundle once, install across many tenants, version-pin.
Spawn lets users pick individual skills. There is a known cross-pollination idea: bring versioned bundles to Spawn so users can install "the trading bundle" or "the writing bundle" as a unit. Not built yet.
When changing what bundles are installed by default, update installProvisionBundles() in Astral (web/server.mjs ~line 804) AND backfill existing tenants by running astral bundle install per tenant. See the bundle-v0.2.0 backfill session for the loop pattern.
Astral previously used a shared ASTRAL_ADMIN_TOKEN bearer. Replaced with role-based auth driven by ASTRAL_ADMIN_EMAILS env var.
Key pieces:
src/entitlement-ledger.mjs): accounts have a role field ('admin' | 'user'), default 'user'. Methods: setAccountRole(id, role), safeAccount() includes role. upsertAccount preserves existing role; new accounts default to 'user'.web/server.mjs): ADMIN_EMAILS = new Set(...) parsed at module load from ASTRAL_ADMIN_EMAILS (comma-separated, lowercased). requireAdmin middleware checks session cookie + account.role === 'admin' → 401 if logged out, 403 if logged in but not admin. The middleware NAME is preserved so all /api/admin/* routes are unchanged.applyAdminRoleFromEnv(account) runs in /api/auth/register AND /api/auth/login so accounts that pre-date being added to the env list get promoted at next login.AdminApp fetches /api/auth/me on mount. Logged out → redirect /account?next=/admin. Logged in but role !== 'admin' → "Not authorized" screen. Admin → normal UI with a "View as user" button. Clicking it sets localStorage.astralAdminViewAsUser=true and routes to /account. The ReturnToAdminPill component renders at the app root and shows a fixed top-right "← Return to admin" pill on /account and /chat whenever the flag is set AND /api/auth/me confirms admin role. Clicking clears the flag.ASTRAL_ADMIN_TOKEN env is still read so old deploys don't crash, but it grants nothing. The "Unlock admin" form, astralAdminToken localStorage, and Bearer header injection were fully removed.To bootstrap on the live host: ASTRAL_ADMIN_EMAILS=firemountain@gmail.com in .env, then pm2 restart astral-hermes-web --update-env.
If extending the same pattern to Hermes Spawn, mirror the env-var-seeded role + applyAdminRoleFromEnv() approach.
Things one app does well that the other can adopt:
svghanddraw); post-provision capability smoke loop ("can this tenant call its KB? S3? Telegram bot?"); shared backend services pattern (Astral's transit-list-demo as a tenant-callable shared service).The admin-role pattern just landed in Astral; if Spawn is asked for the same, port it directly.
# Astral
cd /home/avalon/apps/astral-hermes-platform/web && npm run build && cd .. && npm run test:security
git add -A && git commit -m "..." && git push origin main
pm2 restart astral-hermes-web --update-env
curl -s -o /dev/null -w "%{http_code}\n" https://astral.apps.poofc.com/api/health # expect 200
# Spawn
cd /home/avalon/apps/hermes-spawn && npm test && npm run build
git add -A && git commit -m "..." && git push origin main
pm2 restart hermes-spawn --update-env
curl -s -o /dev/null -w "%{http_code}\n" https://spawn.apps.poofc.com/api/health # expect 200
Always pm2 restart --update-env after any .env change, not just pm2 restart.
Provisioning seems fine but chat fails with "quota exceeded" on Codex tenant: Voice transcription fell back to control-plane OpenAI key that's quota-exhausted. Voice transcription is not automatically routed through Codex subscription — it uses ASTRAL_TRANSCRIPTION_OPENAI_KEY / VOICE_TOOLS_OPENAI_KEY. If that quota is dry, transcription silently fails for ALL subscription-auth tenants. Astral now shows a friendly "Voice transcription is temporarily unavailable" instead of the raw OpenAI billing URL.
Tenant says "No Codex credentials stored" but hermes auth list shows them: The dual-write was incomplete. Re-run device-auth completion or hand-patch providers['openai-codex'].tokens + active_provider in auth.json. See references/codex-auth-shape-fix.md.
Skills bundles "missing" after backfill: The skills were installed but terminal was disabled in platform_toolsets, OR web.backend: firecrawl had no credits. The skills can load but cannot execute the API calls they wrap. Verify terminal is enabled AND the relevant API keys are populated. The 2026-05-17 backfill (commit faa92e8 in Astral) re-enabled terminal for tenants and injected ASTRAL_TENANT_FIRECRAWL_API_KEY.
Subagent timeout when adding a feature to one of these apps: see subagent-driven-development skill's pre-flight discovery section. Both repos accumulate "almost finished" features that need enabling rather than rebuilding. Always grep first.
Telegram bot prefills credentials: Public onboarding/login UIs MUST NOT prefill remembered IDs, emails, or default passwords. Alex is security-conscious about this — explicit user-profile note.
mayaastral tenant has a config shape difference: discovered during 2026-05-17 backfill. If a backfill loop succeeds for most tenants but mayaastral shows missing platform_toolsets.terminal, that tenant needs a manual pass.
Agent claims to write to KB but file never appears on host: The astral-tenant-kb skill reads ASTRAL_KB_ROOT from the process environment, not the .env file. Hermes does NOT auto-load /data/hermes/.env into the child Python processes that skills spawn. Result: if -e ASTRAL_KB_ROOT=/data/hermes/knowledge is missing from docker run, the skill silently falls back to ~/.hermes/astral-kb-prototype/knowledge/ inside the writable container layer and the agent reports success while nothing persists to the mounted volume.
- Fix: bake the env var into the docker run line in BOTH web/server.mjs (provisioning script ~line 1608) AND bin/astral.mjs (bundle install --restart path, line 178). Adding to .env is necessary but not sufficient; existing containers need docker rm -f && docker run (not just docker restart) because env can only be set at container creation.
- Verify: docker exec <container> env | grep ASTRAL_KB_ROOT → must show the path. Then ssh write a probe file into the tenant's knowledge/raw/.probe and confirm the container sees it via docker exec <container> ls /data/hermes/knowledge/raw/.
- Detection signal: agent says \"saved Kathleen Brown's chart to your knowledge base\" but find /srv/astral/tenants/<id>/hermes-home/knowledge/entities/people/ is empty. Same root cause every time. Commit 457b9aa in Astral.
Tenant has no knowledge/ directory at all: Older tenants (notably mayaastral, the CHAT_DEFAULT_TENANT) were provisioned before the KB scaffold became a default. The scaffold is created by createKnowledgeScaffold({ tenantId, tenantName }) in web/server.mjs. Backfill via the admin endpoint:
bash
curl -X POST -b \"$ADMIN_COOKIE\" \\\n https://astral.apps.poofc.com/api/admin/tenants/<id>/ensure-knowledge\n\n Or via the one-off helper at scripts/scaffold-kb.mjs. The endpoint uses if not p.exists() guards so it's idempotent and safe to re-run.
Cross-tenant isolation is real but verify after changes: Each tenant has its own <TENANT_ROOT>/<id>/hermes-home/ mounted at /data/hermes inside its dedicated container (astral-tenant-<id>). No shared volumes. Containers run with --cap-drop=ALL, --security-opt no-new-privileges:true, and --memory=1g. If you ever rewrite container creation, verify isolation by: (a) writing a probe file in tenant A's KB on the host, (b) listing the same path from tenant B's container — must be empty. requireChatTenantAccess and requireTenantOwner middleware also enforce app-level gates.
Pre-existing helpers exist for \"new\" features — search before delegating: Recurring pattern in this codebase: Anthropic OAuth PKCE helpers (installAnthropicOauthCredential, exchangeAnthropicAuthorization, buildAnthropicAuthorizeUrl, generateAnthropicPkce) were already built but disabled via enabled: false in src/provider-matrix.mjs. Similarly, KB ingestion sessions existed in Spawn before they were surfaced. Always grep -rn '<feature>' web/server.mjs src/ BEFORE delegating implementation work; the subagent timeouts in this session were directly caused by re-exploring existing code.
Alex's explicit preference for any multi-step wizard in these apps:
provisioning (spinner) → providerAuth (conditional) → billing (conditional Stripe) → ready (brief confirmation + auto-redirect to /chat). The wizard never ends with a menu; it always proceeds to the next screen or to the actual product.<details> for the actual stderr. Anyone seeing astral-tenant-hhfggg Up 3 seconds astral-hermes-runner:dev in the UI is the bug.success_url: /onboarding?tenant=<id>&checkout=ok, cancel_url: /onboarding?tenant=<id>&checkout=cancel. The wizard reads these query params on mount and either advances to ready step or stays on billing./chat and /account (\"Want Telegram access? Set it up in settings →\") deep-links to /account/tenant/<id>#telegram for tenants without it configured. Telegram is fiddly (BotFather + numeric user IDs) and shouldn't gate first-chat.Route: /account/tenant/<tenantId>. Tabs: provider, telegram, danger zone.
Endpoints (all gated by requireTenantOwner which allows either the owning account OR any admin):
GET /api/tenant/:tenantId/settings — masked snapshot (provider id + model + has-api-key + masked-fingerprint, telegram-configured boolean, entitlement status). DO NOT return raw token or user-ID list.POST /api/tenant/:tenantId/provider — change provider (writes new config.yaml + .env, restarts container). Returns providerAuthRequired: true if OAuth-based so UI prompts re-auth next.POST /api/tenant/:tenantId/provider/key — update API key only. Validates current provider is api-key-based.POST /api/tenant/:tenantId/telegram — write or replace bot config. Calls lookupTelegramBotUsername(token) to verify the token works before saving.DELETE /api/tenant/:tenantId/telegram — wipe the three TELEGRAM_* keys from .env, restart container.Key helpers:
mergeEnvFile(existing, updates, removeKeys) preserves comments and key ordering, deduplicates keys.readTenantEnv(tenantId) / writeTenantEnvAndRestart(tenantId, newContent) use base64 + python heredoc + atomic temp+rename for safe writes with 0600 perms.docker restart astral-tenant-<id> (faster than rm+run) UNLESS you're changing env vars — then full recreate.OAuth re-auth from settings panel just calls the existing /api/provider/device-auth + /api/provider/complete (Codex) and /api/provider/anthropic/start + /api/provider/anthropic/complete (Anthropic) endpoints — they already work for existing tenants, not just new ones. installCodexCredential and installAnthropicOauthCredential overwrite the auth.json credential, which is the correct behavior for re-auth.
Alex's explicit preferences for the chat surface:
white-space: pre-wrap. CSS class targets: .chat-bubble.assistant:not(.audio):not(.error):not(.typing) gets width:100%, transparent background, no border/shadow, minimal padding.uploading… (pre-fetch, client) → transcribing… (server transcribing event) → sent ✓ (server transcribed event) → received ✓ (server done event). Drive this via SSE events, not artificial timers./api/chat/message and /api/chat/voice support SSE when Accept: text/event-stream. Server drops the -Q (quiet) flag on hermes chat, spawns via sshStream() (line-buffered stdout), parses verbose output via parseHermesProgressLine(), and emits events transcribing | transcribed | start | tool_call | progress | text | done | error. JSON contract preserved when Accept header is absent — backward compatible. Client shows progress events as small italic gray lines below the assistant response (Telegram-style activity log).ReturnToAdminPill self-hides on /chat to avoid double-rendering. Account + Settings buttons live next to each other in .sidebar-action-links for easy access.references/codex-auth-shape-fix.md — the codyguy/cody1 "No Codex credentials stored" root cause and dual-write fix.references/provider-config-cheatsheet.md — quick reference for the providerConfig() shape per provider (envName, config block, post-create requirements).references/admin-role-overhaul-astral.md — file-by-file checklist of the role-based admin migration in Astral, useful when porting to Spawn.references/kb-env-var-in-docker-run.md — root cause and fix for "agent claims to save to KB but file never appears on host" (the ASTRAL_KB_ROOT .env-vs-docker run lesson).references/typeform-wizard-principles.md — the principles Alex explicitly approved for the Astral onboarding rewrite. Apply to any future wizard in either app.