chat-interface-development

/home/avalon/.hermes/skills/software-development/chat-interface-development/SKILL.md · raw

Chat Interface Development

Overview

Use this skill for production chat interfaces where the details matter: composer behavior, mobile keyboard ergonomics, voice-note controls, thread history, sidebar layout, and visual parity with a high-fidelity mockup. The goal is not merely that messages send; the chat should feel like a polished consumer product.

For Alex's apps, assume mobile-first verification and brand-safe copy are part of the feature, not optional polish. Do not expose implementation or infrastructure language in user-facing chat UI unless the user explicitly asks for an admin/debug surface.

See references/astral-live-chat-parity-2026-05.md for a condensed session example covering Astral Hermes mockup-to-production parity work. See references/voice-note-end-to-end-2026-05.md for the specific debugging pattern where a local voice bubble rendered correctly but never reached transcription/chat backend processing.

When to Use

Don't use this skill for generic API-only chat backend work unless it affects user-visible chat behavior.

Working Principles

  1. Parity before novelty. If the user references a mockup, inspect the mockup behavior and port it deliberately. Do not invent a different interaction because it is easier.
  2. Mobile is the truth surface. Verify on narrow viewports and, when possible, a real phone/incognito session. Desktop success does not prove mobile chat UX.
  3. Consumer copy only. Replace terms like tenant, container, instance, session secret, worker, or live sandbox with user-facing language such as Guide, Connected, Active, Thread, or Memory.
  4. Thread visibility is a UX requirement. If conversations are persisted but not listed in navigation, the feature is effectively broken.
  5. Voice notes need end-to-end action, not just rendering. A working MediaRecorder handler is incomplete until recorded audio appears in the transcript and reaches the same backend/chat pipeline that text messages use. A pretty local bubble that never transcribes/sends is still broken.
  6. Don't over-bind Enter on mobile. For multiline composers, Return/Enter should insert a newline unless the product explicitly requires keyboard-send. Use the visible send button as the reliable send action.
  7. Assistant messages are not bubbles. ChatGPT-style asymmetry is the default for modern assistant UIs: user messages get bubbles (right-aligned, max-width ~75%, background tint), assistant text responses get NO bubble — full viewport width, transparent background, minimal padding. Keep bubbles for user audio (player needs containment) and for errors/system notices (they need a styled box to read as alerts). See "Assistant message layout" below.
  8. No feedback is a bug. Every async action — send, transcribe, stream, save — needs a visible state change within ~100ms. A button that goes from "Send" to nothing-until-2-seconds-later is broken UX even if the backend is fine. For voice notes: uploading… → transcribing… → sent ✓ → received ✓. For streamed responses: show tool calls and progress lines as they happen, not as a single final blob.
  9. Loading-state hierarchy: real progress > filler captions; filler captions > blank spinner. When something concrete is happening, show what's happening (tool call name, phase label, streamed token). When nothing is happening yet (initial spinner, queue wait) OR a real progress event hasn't updated in >~4s (stalled), drop in a domain-flavored caption beneath/alongside the spinner. Captions must NEVER displace or compete with real progress signals — they are a fallback layer that hides itself as soon as concrete events resume. See "Filler caption layer" below.
  10. One screen, one decision, one CTA — for wizards. When the chat product wraps an onboarding/setup flow (provision a tenant, connect a provider, billing), use a Typeform-style sequential wizard, not a post-action dashboard with a wall of buttons. See "Typeform-style sequential onboarding" below.

Implementation Checklist

Composer

Voice recording

Threads and navigation

Mobile layout

Copy and branding

Assistant message layout (ChatGPT-style asymmetric bubbles)

The default modern pattern Alex expects: user messages bubbled, assistant text responses unbubbled.

CSS shape (Tailwind-agnostic, CSS-vars-friendly):

.chat-bubble.user { /* keep existing bubble styles */ }
.chat-bubble.assistant:not(.audio):not(.error):not(.typing) {
  align-self: stretch;
  width: 100%;
  background: transparent;
  border: none;
  box-shadow: none;
  padding: 0.5rem 0;
  white-space: pre-wrap;
}
.chat-bubble.assistant.audio { /* keep bubble */ }
.chat-bubble.assistant.error { /* keep bubble */ }

Apply to ALL chat surfaces in the app (landing/demo chat, real tenant chat, embedded chat). They share message-rendering code or they don't — if they don't, factor a common renderer.

Streaming feedback (SSE) for real-time tool calls and multi-phase actions

When the backend takes more than ~1s to respond and especially when it makes intermediate tool calls (web search, chart calc, RAG, etc.), single-shot request/response makes the chat feel dead. Stream events via SSE.

Server pattern (keep JSON contract for backward compat):

Client pattern (vanilla fetch streaming, no new deps):

async function streamChatSSE(url, body, { onEvent, onError } = {}) {
  const res = await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'Accept': 'text/event-stream' },
    body: JSON.stringify(body),
  });
  const reader = res.body.getReader();
  const decoder = new TextDecoder();
  let buf = '';
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buf += decoder.decode(value, { stream: true });
    const events = buf.split('\n\n');
    buf = events.pop();
    for (const evt of events) {
      const dataLine = evt.split('\n').find(l => l.startsWith('data: '));
      if (!dataLine) continue;
      try { onEvent(JSON.parse(dataLine.slice(6))); } catch (e) { onError?.(e); }
    }
  }
}

Append a single assistant placeholder message immediately on send, then live-update its text, progress[], and pending flags from incoming events. Render progress[] as small italic gray lines beneath the response.

Voice lifecycle states drive off the same SSE stream: - uploading… — set client-side BEFORE the fetch starts. - transcribing… — on transcribing event. - sent ✓ — on transcribed event (prompt is now flowing to the agent). - received ✓ — on done event.

Parser specifics (confirmed for current Hermes verbose CLI): the live hermes chat (without -Q) emits progress lines in the form ┊ <emoji> <text>… — for example ┊ 💻 preparing terminal…, ┊ ⚙ calling web_search…, ┊ 🔍 searching for transits. The leading whitespace can vary. Match the (U+250A) prefix, optionally consume one emoji code point, and capture the trailing text (with optional terminating ):

const re = /^[┊│|]\s*[\p{Emoji_Presentation}\p{Extended_Pictographic}\u{1F300}-\u{1FAFF}\u{2600}-\u{27BF}]?\s*(.+?)…?\s*$/u;

When the captured text starts with preparing|calling|using|running|invoking|executing <name>, surface it as a structured tool_call event with tool: <name>; otherwise emit it as progress. The older patterns (→ tool:, [tool:name], leading Searching|Calling|Fetching|...) still exist as legacy fallbacks but -prefixed lines are the dominant real-world format. Verify against actual output before assuming any other pattern — guessing without grounding wasted an iteration in past sessions.

Strip the Hermes TUI panel from the final reply: when you drop -Q to enable streaming, you also start receiving the full verbose framing: a ╭─ ⚕ Hermes ──╮ panel box around the reply, an "Initializing agent..." banner above it, an echo of the prompt (visible as Query: <prompt> before the panel), and a footer with Resume this session with: hermes --resume ..., Session: ..., Duration: ..., Messages: ..., plus a trailing Listening for chart and memory context line. The final assistant text the user sees must be ONLY the body inside the panel. Algorithm:

  1. Find first ╭─ (or ┌─) and matching ╰─ (or └─).
  2. Take only the substring between them.
  3. Split by lines, drop the top border line (contains ╭┌┐╮), strip leading/trailing │| and whitespace from every other line.
  4. From the remaining body, also drop lines matching: ^session_id:, ^↻ Resumed session, ^Initializing agent, ^Resume this session with:, ^hermes --resume, ^Session:\s+\d, ^Duration:, ^Messages:\s+\d, ^Listening for chart and memory context, ^─{4,}$.
  5. Collapse \n{3,} to \n\n, trim.

This is non-optional once -Q is dropped; otherwise users see panel borders rendered as ╭───╮ inline in their chat replies. Capture the full original output in a debug field for admin-only inspection.

Filler caption layer (for blank or stalled loading states)

When a spinner has nothing else to say — initial connection, provisioning, queued work, a stream that just went quiet — fill the gap with domain-flavored captions. Never let the user stare at a bare spinner or Loading… for more than ~2 seconds.

Strict hierarchy (Alex calls cheesy generic filler "dumb and cheesy" — get this right):

  1. Real progress event → render it verbatim. Caption layer is hidden.
  2. No event yet, or last event was >~4s ago → show a caption. Rotate every ~3.2s.
  3. New real event arrives → caption hides immediately.

Source the captions from real domain data, not invented flavor text. For Astral Hermes that means real current transits (Moon degree+sign, planetary day/hour, decan/face ruler, applying aspects). For other apps, derive from something true about the user's current context. Bad: "Asking Mercury for safe passage…", "Checking the decans for a good omen…", "Aligning fixed stars over your container…" — these are made-up flavor text and the user will (correctly) call them cheesy. Good: "Moon 4° Virgo — Mercury's domicile, Sun-faced decan.", "Sun and Mercury both in Gemini today.", "Day of Saturn. Moon waxing through Virgo." — factual, derived from real ephemeris data, written in the user's astrology register (traditional rulerships, decans, no woo).

Implementation shape (React, ~30 lines):

function FillerCaption({ lastProgressAt, endpoint = '/api/daily-captions', stallMs = 4000 }) {
  const [pool, setPool] = useState([]);
  const [caption, setCaption] = useState('');
  const [stalled, setStalled] = useState(lastProgressAt == null);

  useEffect(() => {
    let cancelled = false;
    fetch(endpoint).then(r => r.ok ? r.json() : null).then(data => {
      if (cancelled || !data?.captions?.length) return;
      setPool(data.captions);
      setCaption(data.captions[Math.floor(Math.random() * data.captions.length)]);
    }).catch(() => {});
    return () => { cancelled = true; };
  }, [endpoint]);

  // Track stalled state when wrapping live-progress UIs
  useEffect(() => {
    if (lastProgressAt == null) { setStalled(true); return undefined; }
    setStalled(false);
    const t = window.setTimeout(() => setStalled(true), stallMs);
    return () => window.clearTimeout(t);
  }, [lastProgressAt, stallMs]);

  // Rotate while visible
  useEffect(() => {
    if (!stalled || pool.length < 2) return undefined;
    const t = window.setInterval(() => {
      setCaption(pool[Math.floor(Math.random() * pool.length)]);
    }, 3200);
    return () => window.clearInterval(t);
  }, [stalled, pool]);

  if (!stalled || !caption) return null;
  return <p className="filler-caption" aria-live="polite">{caption}</p>;
}

Pass lastProgressAt={undefined} for pure-spinner cases (always show). Pass lastProgressAt={Date.now()} on every real progress event in streaming UIs (hides until next stall). Render nothing while loading the caption pool — better than a flicker of fallback text.

Caption pool source: a backend endpoint fed by a daily/hourly cron that computes the real domain state and emits a small array. For Astral Hermes: /api/daily-captions served from JSON written by scripts/daily-transits.mjs running on a cron. See astral-transit-calendar skill for the transit pipeline and caption-generation rules.

Typeform-style sequential onboarding (kill the button-salad success screen)

When the chat product includes an onboarding/setup flow (create a guide, connect a provider, billing, etc.), make it Typeform-style: one screen, one decision, one primary CTA. The wizard NEVER ends in a dashboard with parallel "Open chat / Open account / Start billing / Connect provider" buttons — that's the anti-pattern.

Required shape:

  1. Input steps — name, provider pick, credentials, etc. Existing <StepShell> / progress bar pattern works.
  2. Provisioning step — spinner only, no buttons, calls the create API once, auto-advances on success. On failure: friendly retry message + admin-gated <details> for the raw stderr. Never render raw stdout to users.
  3. Provider auth step (conditional) — only if the chosen provider needs post-create OAuth. Single CTA flow per provider (device code button, then "I approved it"; or "Open authorize →" + code paste + "Complete"). Auto-advance on success.
  4. Billing step (conditional) — single "Start trial →" button → redirect to payment provider. Success URL returns to /onboarding?...&checkout=ok so the wizard reads the query string on mount and jumps straight to the ready step. Don't exempt admins from sandbox billing — they need to validate the flow end-to-end before live keys flip on (request from Alex).
  5. Ready step — exactly ONE big CTA ("Open chat →"), 2-second auto-redirect to the actual product, ONE small text link to the dashboard as escape hatch.

Wizard navigation: - Hide Back/Next on terminal steps (provisioning, providerAuth, billing, ready) — they advance themselves. - Progress bar recomputes total based on conditionals (admin skips invite step, API-key providers skip credential step, OAuth providers add provider-auth step, etc.).

Don't put fiddly steps in onboarding. Telegram bot setup, Slack integration, custom domain — anything that requires the user to open another app, copy something, paste something — does NOT belong in the first-run flow. Push it to a settings panel and surface a dismissable nudge banner on /chat and /account for tenants without it configured:

{tenant.metadata?.telegramConfigured === false && !dismissed && (
  <TelegramNudgeBanner tenant={tenant} onDismiss={...} />
)}

Banner copy is a single line + arrow to deep-link into the relevant settings tab. Dismissal persists per-tenant in localStorage (<feature>NudgeDismissed:<tenantId>).

Testing and Verification

Run both functional and visual checks:

  1. Build: npm run build or project equivalent.
  2. Unit/readiness/security tests: run the app's established scripts, not only the touched test.
  3. Route smoke: verify /, /chat, and /api/health or project equivalents.
  4. Mobile browser smoke: narrow viewport, open drawer/sidebar, create multiple threads, switch between them.
  5. Voice smoke: start recording, stop, confirm message appears, play/pause works, processing state appears, transcript/reply arrives, and no console errors.
  6. API smoke for voice routes: verify auth failures return controlled JSON/401, oversized/empty audio is rejected, and tests mock provider transcription without leaking API keys.
  7. Copy smoke: search and inspect for leaked technical words such as tenant, container, instance, worker, debug, secret.
  8. Deployment verification: restart the correct process and test the public URL after deploy.

Common Pitfalls

  1. Fixing the handler but not the renderer. The record button can successfully create audio data while the chat appears broken because the message list ignores audio messages.

  2. Fixing the renderer but not the backend. A polished voice bubble with waveform/playback can still be a no-op if it never transcribes the blob or calls the real chat API. When the user says “I sent an audio note but nothing happened,” inspect both the UI message insertion and the network/backend handoff before polishing more UI.

  3. Persisting threads without listing them. LocalStorage/database saves are not enough. Users need visible thread navigation and active-state feedback.

  4. Letting settings dominate the sidebar. In chat products, thread history and knowledge/context usually deserve top priority; settings should sit at the bottom unless the app is primarily an admin console.

  5. Assuming native mobile keyboards can show both Send and Return. Browsers/platforms differ. The reliable pattern is Return for newline and an on-screen send button for send.

  6. Shipping technical copy after backend work. Implementation labels often leak into placeholders, empty states, badges, and tests. Update tests to assert polished copy so the regression does not return.

  7. Desktop-only validation. Chat layout bugs often appear only at <380px widths: squeezed pills, hidden send buttons, broken drawers, or composer overlap.

  8. Using <details> for mobile settings inside a drawer. Native details can work for tiny groups, but it is easy for custom CSS/overflow to reserve hidden space, trap lower content below the viewport, or make settings feel like a drawer within a drawer. For substantial settings, use a dedicated nested sidebar view with a back button and independent scroll area. Verify on an iPhone-sized viewport that the main thread list is replaced while settings is open, the back button is visible, and settings content scrolls without moving the chat/composer.

  9. Mockup drift. If the user says the live UI should match the mockup, treat animation, icon shape, spacing, and placement as acceptance criteria, not decoration.

  10. Bubbling the assistant. Defaulting every message to the same bubble component makes the chat feel like SMS, not a modern assistant. User messages bubbled, assistant text full-width unbubbled, audio/error messages keep bubbles. See "Assistant message layout" above.

  11. Silent async UI. Voice send button that goes "sending…" and stays "sending…" until the assistant replies 8 seconds later is broken UX even if the backend works. Every async action needs visible state changes within ~100ms and a sensible state machine per phase (uploading → transcribing → sent → received). Implement via SSE if the backend has multiple phases; don't fake it with setTimeout.

  12. Post-action button salads. A success screen with 5+ parallel CTAs ("Open chat / Continue to billing / Connect provider / Open account / Provision another") is a wizard that failed to finish. Convert each action into its own sequential step with one CTA each. Auto-redirect to the actual product at the end.

  13. Putting fiddly setup in onboarding. Telegram bot, Slack webhook, custom domain — anything requiring "open BotFather, copy this, paste here" — does NOT belong in first-run flow. Push to settings + nudge banner. Onboarding should only ask for things the user can answer in their head or paste from one tab.

  14. Raw stderr in user-facing UI. Provisioning failures, deploy logs, astral-tenant-foo Up 3 seconds — never render these to end users. Wrap in admin-gated <details> blocks. Show users a friendly retry message.

  15. Cheesy invented filler text under spinners. When a spinner has nothing else to say, the temptation is to invent flavor text ("Asking Mercury for safe passage…", "Checking the decans for a good omen…", "Aligning fixed stars over your container…"). Don't. Alex calls this "dumb and cheesy" and he's right — it reads as fake. Derive captions from real domain data via a cron-fed endpoint (current transits, planetary hour, decan ruler, etc.) and write them in the user's actual domain register, not as cute filler. See "Filler caption layer" above.

  16. Filler captions competing with real progress. If real tool-call events / streaming tokens / phase labels are flowing, captions must be hidden. They are a fallback for blank or stalled states only, and they hide the moment a concrete event resumes. A caption rotating underneath a live "Searching the web…" line is double-talk.

Verification Checklist