Use this skill for production chat interfaces where the details matter: composer behavior, mobile keyboard ergonomics, voice-note controls, thread history, sidebar layout, and visual parity with a high-fidelity mockup. The goal is not merely that messages send; the chat should feel like a polished consumer product.
For Alex's apps, assume mobile-first verification and brand-safe copy are part of the feature, not optional polish. Do not expose implementation or infrastructure language in user-facing chat UI unless the user explicitly asks for an admin/debug surface.
See references/astral-live-chat-parity-2026-05.md for a condensed session example covering Astral Hermes mockup-to-production parity work.
See references/voice-note-end-to-end-2026-05.md for the specific debugging pattern where a local voice bubble rendered correctly but never reached transcription/chat backend processing.
Don't use this skill for generic API-only chat backend work unless it affects user-visible chat behavior.
tenant, container, instance, session secret, worker, or live sandbox with user-facing language such as Guide, Connected, Active, Thread, or Memory.MediaRecorder handler is incomplete until recorded audio appears in the transcript and reaches the same backend/chat pipeline that text messages use. A pretty local bubble that never transcribes/sends is still broken.uploading… → transcribing… → sent ✓ → received ✓. For streamed responses: show tool calls and progress lines as they happen, not as a single final blob.textarea, not single-line input, for normal chat composition.white-space: pre-wrap).MediaRecorder start.audio/voice kind.sending…, pulse/waveform motion, disabled state if needed).FormData preferred; base64 JSON acceptable for small notes with a strict body limit).not sent) and append a human-readable error bubble; do not leave the UI silently idle.New thread button.<details> as a large settings container inside an overflow-constrained sidebar. Use an explicit sidebar view/state machine instead (for example sidebarView: 'main' | 'settings'): tapping Settings replaces the sidebar content with a dedicated settings pane, shows a sticky header/back button, and gives the settings pane its own scroll area.min-width: 0, text-overflow: ellipsis, and breakpoint-specific label hiding to avoid layout blowouts.live tenant → Guide, Connected, or the guide name.isolated container → omit or say private space only if privacy is the point.Hermes instance → Astral Hermes or your guide.session when user-facing → thread or conversation.The default modern pattern Alex expects: user messages bubbled, assistant text responses unbubbled.
align-self: stretch; width: 100%), transparent background, no border, no shadow, minimal padding. Markdown rendering with white-space: pre-wrap for paragraph breaks. Links auto-detected.CSS shape (Tailwind-agnostic, CSS-vars-friendly):
.chat-bubble.user { /* keep existing bubble styles */ }
.chat-bubble.assistant:not(.audio):not(.error):not(.typing) {
align-self: stretch;
width: 100%;
background: transparent;
border: none;
box-shadow: none;
padding: 0.5rem 0;
white-space: pre-wrap;
}
.chat-bubble.assistant.audio { /* keep bubble */ }
.chat-bubble.assistant.error { /* keep bubble */ }
Apply to ALL chat surfaces in the app (landing/demo chat, real tenant chat, embedded chat). They share message-rendering code or they don't — if they don't, factor a common renderer.
When the backend takes more than ~1s to respond and especially when it makes intermediate tool calls (web search, chart calc, RAG, etc.), single-shot request/response makes the chat feel dead. Stream events via SSE.
Server pattern (keep JSON contract for backward compat):
POST /api/chat/message, POST /api/chat/voice) detects Accept: text/event-stream and branches.transcribing / transcribed (voice only)start (tenant/agent acknowledged the prompt)tool_call (with tool name + brief args)progress (free-form "Searching the web…", "Calling chart-api…")text (the final assistant text — buffered emit is fine if the agent doesn't token-stream)done (with sessionId, any final metadata)errordata: <json>\n\n lines.Client pattern (vanilla fetch streaming, no new deps):
async function streamChatSSE(url, body, { onEvent, onError } = {}) {
const res = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'Accept': 'text/event-stream' },
body: JSON.stringify(body),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = '';
while (true) {
const { value, done } = await reader.read();
if (done) break;
buf += decoder.decode(value, { stream: true });
const events = buf.split('\n\n');
buf = events.pop();
for (const evt of events) {
const dataLine = evt.split('\n').find(l => l.startsWith('data: '));
if (!dataLine) continue;
try { onEvent(JSON.parse(dataLine.slice(6))); } catch (e) { onError?.(e); }
}
}
}
Append a single assistant placeholder message immediately on send, then live-update its text, progress[], and pending flags from incoming events. Render progress[] as small italic gray lines beneath the response.
Voice lifecycle states drive off the same SSE stream:
- uploading… — set client-side BEFORE the fetch starts.
- transcribing… — on transcribing event.
- sent ✓ — on transcribed event (prompt is now flowing to the agent).
- received ✓ — on done event.
Parser specifics (confirmed for current Hermes verbose CLI): the live hermes chat (without -Q) emits progress lines in the form ┊ <emoji> <text>… — for example ┊ 💻 preparing terminal…, ┊ ⚙ calling web_search…, ┊ 🔍 searching for transits. The leading whitespace can vary. Match the ┊ (U+250A) prefix, optionally consume one emoji code point, and capture the trailing text (with optional terminating …):
const re = /^[┊│|]\s*[\p{Emoji_Presentation}\p{Extended_Pictographic}\u{1F300}-\u{1FAFF}\u{2600}-\u{27BF}]?\s*(.+?)…?\s*$/u;
When the captured text starts with preparing|calling|using|running|invoking|executing <name>, surface it as a structured tool_call event with tool: <name>; otherwise emit it as progress. The older patterns (→ tool:, [tool:name], leading Searching|Calling|Fetching|...) still exist as legacy fallbacks but ┊-prefixed lines are the dominant real-world format. Verify against actual output before assuming any other pattern — guessing without grounding wasted an iteration in past sessions.
Strip the Hermes TUI panel from the final reply: when you drop -Q to enable streaming, you also start receiving the full verbose framing: a ╭─ ⚕ Hermes ──╮ panel box around the reply, an "Initializing agent..." banner above it, an echo of the prompt (visible as Query: <prompt> before the panel), and a footer with Resume this session with: hermes --resume ..., Session: ..., Duration: ..., Messages: ..., plus a trailing Listening for chart and memory context line. The final assistant text the user sees must be ONLY the body inside the panel. Algorithm:
╭─ (or ┌─) and matching ╰─ (or └─).╭┌┐╮), strip leading/trailing │| and whitespace from every other line.^session_id:, ^↻ Resumed session, ^Initializing agent, ^Resume this session with:, ^hermes --resume, ^Session:\s+\d, ^Duration:, ^Messages:\s+\d, ^Listening for chart and memory context, ^─{4,}$.\n{3,} to \n\n, trim.This is non-optional once -Q is dropped; otherwise users see panel borders rendered as ╭───╮ inline in their chat replies. Capture the full original output in a debug field for admin-only inspection.
When a spinner has nothing else to say — initial connection, provisioning, queued work, a stream that just went quiet — fill the gap with domain-flavored captions. Never let the user stare at a bare spinner or Loading… for more than ~2 seconds.
Strict hierarchy (Alex calls cheesy generic filler "dumb and cheesy" — get this right):
Source the captions from real domain data, not invented flavor text. For Astral Hermes that means real current transits (Moon degree+sign, planetary day/hour, decan/face ruler, applying aspects). For other apps, derive from something true about the user's current context. Bad: "Asking Mercury for safe passage…", "Checking the decans for a good omen…", "Aligning fixed stars over your container…" — these are made-up flavor text and the user will (correctly) call them cheesy. Good: "Moon 4° Virgo — Mercury's domicile, Sun-faced decan.", "Sun and Mercury both in Gemini today.", "Day of Saturn. Moon waxing through Virgo." — factual, derived from real ephemeris data, written in the user's astrology register (traditional rulerships, decans, no woo).
Implementation shape (React, ~30 lines):
function FillerCaption({ lastProgressAt, endpoint = '/api/daily-captions', stallMs = 4000 }) {
const [pool, setPool] = useState([]);
const [caption, setCaption] = useState('');
const [stalled, setStalled] = useState(lastProgressAt == null);
useEffect(() => {
let cancelled = false;
fetch(endpoint).then(r => r.ok ? r.json() : null).then(data => {
if (cancelled || !data?.captions?.length) return;
setPool(data.captions);
setCaption(data.captions[Math.floor(Math.random() * data.captions.length)]);
}).catch(() => {});
return () => { cancelled = true; };
}, [endpoint]);
// Track stalled state when wrapping live-progress UIs
useEffect(() => {
if (lastProgressAt == null) { setStalled(true); return undefined; }
setStalled(false);
const t = window.setTimeout(() => setStalled(true), stallMs);
return () => window.clearTimeout(t);
}, [lastProgressAt, stallMs]);
// Rotate while visible
useEffect(() => {
if (!stalled || pool.length < 2) return undefined;
const t = window.setInterval(() => {
setCaption(pool[Math.floor(Math.random() * pool.length)]);
}, 3200);
return () => window.clearInterval(t);
}, [stalled, pool]);
if (!stalled || !caption) return null;
return <p className="filler-caption" aria-live="polite">{caption}</p>;
}
Pass lastProgressAt={undefined} for pure-spinner cases (always show). Pass lastProgressAt={Date.now()} on every real progress event in streaming UIs (hides until next stall). Render nothing while loading the caption pool — better than a flicker of fallback text.
Caption pool source: a backend endpoint fed by a daily/hourly cron that computes the real domain state and emits a small array. For Astral Hermes: /api/daily-captions served from JSON written by scripts/daily-transits.mjs running on a cron. See astral-transit-calendar skill for the transit pipeline and caption-generation rules.
When the chat product includes an onboarding/setup flow (create a guide, connect a provider, billing, etc.), make it Typeform-style: one screen, one decision, one primary CTA. The wizard NEVER ends in a dashboard with parallel "Open chat / Open account / Start billing / Connect provider" buttons — that's the anti-pattern.
Required shape:
<StepShell> / progress bar pattern works.<details> for the raw stderr. Never render raw stdout to users./onboarding?...&checkout=ok so the wizard reads the query string on mount and jumps straight to the ready step. Don't exempt admins from sandbox billing — they need to validate the flow end-to-end before live keys flip on (request from Alex).Wizard navigation: - Hide Back/Next on terminal steps (provisioning, providerAuth, billing, ready) — they advance themselves. - Progress bar recomputes total based on conditionals (admin skips invite step, API-key providers skip credential step, OAuth providers add provider-auth step, etc.).
Don't put fiddly steps in onboarding. Telegram bot setup, Slack integration, custom domain — anything that requires the user to open another app, copy something, paste something — does NOT belong in the first-run flow. Push it to a settings panel and surface a dismissable nudge banner on /chat and /account for tenants without it configured:
{tenant.metadata?.telegramConfigured === false && !dismissed && (
<TelegramNudgeBanner tenant={tenant} onDismiss={...} />
)}
Banner copy is a single line + arrow to deep-link into the relevant settings tab. Dismissal persists per-tenant in localStorage (<feature>NudgeDismissed:<tenantId>).
Run both functional and visual checks:
npm run build or project equivalent./, /chat, and /api/health or project equivalents.tenant, container, instance, worker, debug, secret.Fixing the handler but not the renderer. The record button can successfully create audio data while the chat appears broken because the message list ignores audio messages.
Fixing the renderer but not the backend. A polished voice bubble with waveform/playback can still be a no-op if it never transcribes the blob or calls the real chat API. When the user says “I sent an audio note but nothing happened,” inspect both the UI message insertion and the network/backend handoff before polishing more UI.
Persisting threads without listing them. LocalStorage/database saves are not enough. Users need visible thread navigation and active-state feedback.
Letting settings dominate the sidebar. In chat products, thread history and knowledge/context usually deserve top priority; settings should sit at the bottom unless the app is primarily an admin console.
Assuming native mobile keyboards can show both Send and Return. Browsers/platforms differ. The reliable pattern is Return for newline and an on-screen send button for send.
Shipping technical copy after backend work. Implementation labels often leak into placeholders, empty states, badges, and tests. Update tests to assert polished copy so the regression does not return.
Desktop-only validation. Chat layout bugs often appear only at <380px widths: squeezed pills, hidden send buttons, broken drawers, or composer overlap.
Using <details> for mobile settings inside a drawer. Native details can work for tiny groups, but it is easy for custom CSS/overflow to reserve hidden space, trap lower content below the viewport, or make settings feel like a drawer within a drawer. For substantial settings, use a dedicated nested sidebar view with a back button and independent scroll area. Verify on an iPhone-sized viewport that the main thread list is replaced while settings is open, the back button is visible, and settings content scrolls without moving the chat/composer.
Mockup drift. If the user says the live UI should match the mockup, treat animation, icon shape, spacing, and placement as acceptance criteria, not decoration.
Bubbling the assistant. Defaulting every message to the same bubble component makes the chat feel like SMS, not a modern assistant. User messages bubbled, assistant text full-width unbubbled, audio/error messages keep bubbles. See "Assistant message layout" above.
Silent async UI. Voice send button that goes "sending…" and stays "sending…" until the assistant replies 8 seconds later is broken UX even if the backend works. Every async action needs visible state changes within ~100ms and a sensible state machine per phase (uploading → transcribing → sent → received). Implement via SSE if the backend has multiple phases; don't fake it with setTimeout.
Post-action button salads. A success screen with 5+ parallel CTAs ("Open chat / Continue to billing / Connect provider / Open account / Provision another") is a wizard that failed to finish. Convert each action into its own sequential step with one CTA each. Auto-redirect to the actual product at the end.
Putting fiddly setup in onboarding. Telegram bot, Slack webhook, custom domain — anything requiring "open BotFather, copy this, paste here" — does NOT belong in first-run flow. Push to settings + nudge banner. Onboarding should only ask for things the user can answer in their head or paste from one tab.
Raw stderr in user-facing UI. Provisioning failures, deploy logs, astral-tenant-foo Up 3 seconds — never render these to end users. Wrap in admin-gated <details> blocks. Show users a friendly retry message.
Cheesy invented filler text under spinners. When a spinner has nothing else to say, the temptation is to invent flavor text ("Asking Mercury for safe passage…", "Checking the decans for a good omen…", "Aligning fixed stars over your container…"). Don't. Alex calls this "dumb and cheesy" and he's right — it reads as fake. Derive captions from real domain data via a cron-fed endpoint (current transits, planetary hour, decan ruler, etc.) and write them in the user's actual domain register, not as cute filler. See "Filler caption layer" above.
Filler captions competing with real progress. If real tool-call events / streaming tokens / phase labels are flowing, captions must be hidden. They are a fallback for blank or stalled states only, and they hide the moment a concrete event resumes. A caption rotating underneath a live "Searching the web…" line is double-talk.
Loading… for more than ~2s; a domain-flavored caption layer fills the gap.