chat-interface-development

/home/avalon/.hermes/skills/software-development/chat-interface-development/SKILL.md · raw

Chat Interface Development

Overview

Use this skill for production chat interfaces where the details matter: composer behavior, mobile keyboard ergonomics, voice-note controls, thread history, sidebar layout, and visual parity with a high-fidelity mockup. The goal is not merely that messages send; the chat should feel like a polished consumer product.

For Alex's apps, assume mobile-first verification and brand-safe copy are part of the feature, not optional polish. Do not expose implementation or infrastructure language in user-facing chat UI unless the user explicitly asks for an admin/debug surface.

See references/astral-live-chat-parity-2026-05.md for a condensed session example covering Astral Hermes mockup-to-production parity work. See references/voice-note-end-to-end-2026-05.md for the specific debugging pattern where a local voice bubble rendered correctly but never reached transcription/chat backend processing.

When to Use

The user reports chat UI regressions: broken record button, cramped icons, missing threads, awkward settings placement, composer behavior, scrolling issues.
A mockup/prototype chat exists and the live app must match its UI, motion, and interaction details.
Building or changing voice input/recording UX in a web chat.
Implementing local or server-backed thread/session history.
Removing technical/developer copy from a consumer-facing assistant interface.
Polishing mobile chat layout: fixed composer, newest message visibility, keyboard-safe spacing, compact topbars.

Don't use this skill for generic API-only chat backend work unless it affects user-visible chat behavior.

Working Principles

Parity before novelty. If the user references a mockup, inspect the mockup behavior and port it deliberately. Do not invent a different interaction because it is easier.
Mobile is the truth surface. Verify on narrow viewports and, when possible, a real phone/incognito session. Desktop success does not prove mobile chat UX.
Consumer copy only. Replace terms like tenant, container, instance, session secret, worker, or live sandbox with user-facing language such as Guide, Connected, Active, Thread, or Memory.
Thread visibility is a UX requirement. If conversations are persisted but not listed in navigation, the feature is effectively broken.
Voice notes need end-to-end action, not just rendering. A working MediaRecorder handler is incomplete until recorded audio appears in the transcript and reaches the same backend/chat pipeline that text messages use. A pretty local bubble that never transcribes/sends is still broken.
Don't over-bind Enter on mobile. For multiline composers, Return/Enter should insert a newline unless the product explicitly requires keyboard-send. Use the visible send button as the reliable send action.
Assistant messages are not bubbles. ChatGPT-style asymmetry is the default for modern assistant UIs: user messages get bubbles (right-aligned, max-width ~75%, background tint), assistant text responses get NO bubble — full viewport width, transparent background, minimal padding. Keep bubbles for user audio (player needs containment) and for errors/system notices (they need a styled box to read as alerts). See "Assistant message layout" below.
No feedback is a bug. Every async action — send, transcribe, stream, save — needs a visible state change within ~100ms. A button that goes from "Send" to nothing-until-2-seconds-later is broken UX even if the backend is fine. For voice notes: uploading… → transcribing… → sent ✓ → received ✓. For streamed responses: show tool calls and progress lines as they happen, not as a single final blob.
Loading-state hierarchy: real progress > filler captions; filler captions > blank spinner. When something concrete is happening, show what's happening (tool call name, phase label, streamed token). When nothing is happening yet (initial spinner, queue wait) OR a real progress event hasn't updated in >~4s (stalled), drop in a domain-flavored caption beneath/alongside the spinner. Captions must NEVER displace or compete with real progress signals — they are a fallback layer that hides itself as soon as concrete events resume. See "Filler caption layer" below.
One screen, one decision, one CTA — for wizards. When the chat product wraps an onboarding/setup flow (provision a tenant, connect a provider, billing), use a Typeform-style sequential wizard, not a post-action dashboard with a wall of buttons. See "Typeform-style sequential onboarding" below.

Implementation Checklist

Composer

Use textarea, not single-line input, for normal chat composition.
Decide Enter behavior explicitly:
Multiline-first: Enter inserts newline; send button sends.
Desktop power-user mode, if desired: Cmd/Ctrl+Enter sends.
Auto-grow to a small max height, then scroll internally.
Preserve multiline rendering in chat bubbles (white-space: pre-wrap).
Keep the latest message visible above the composer after send/receive/record.

Voice recording

Check both state and UI path:
Permission request and MediaRecorder start.
Stop/cancel transition.
Blob URL creation and cleanup where appropriate.
Message object insertion with an audio/voice kind.
Transcript renderer recognizes that kind.
Check the backend path too; after the local bubble appears, the voice note must visibly advance:
Show a processing state (sending…, pulse/waveform motion, disabled state if needed).
Convert the blob to the server's expected transport format (FormData preferred; base64 JSON acceptable for small notes with a strict body limit).
Authenticate the same way text chat does.
Transcribe audio server-side or via a trusted endpoint; never expose provider keys in the browser.
Feed the transcript into the same session/thread/chat function as typed messages.
Return/display the transcript and assistant reply, then persist the completed thread state.
On failure, update the audio bubble (not sent) and append a human-readable error bubble; do not leave the UI silently idle.
Provide a custom voice-note bubble when native controls clash with the design:
Play/pause button.
Duration or status label.
Waveform/progress affordance.
Clear focus/disabled states.
Verify icon geometry at actual rendered size. Many microphone SVGs look balanced at 32px but become scrunched at 18-22px. Prefer stroke-based, symmetric, rounded icons.
Include recording animation parity where specified: pulse, glowing ring, waveform motion, or timer.

Thread creation should not erase the previous visible conversation unless that is the explicit product model.
Persist active thread identity separately from the thread list.
For local-only prototypes, namespace storage keys by user/guide/tenant to avoid cross-account leakage.
Sidebar should show thread titles/previews and active state, not merely a New thread button.
Settings belongs below primary navigation/history in most chat apps; it should not displace active conversations.
On mobile, avoid using <details> as a large settings container inside an overflow-constrained sidebar. Use an explicit sidebar view/state machine instead (for example sidebarView: 'main' | 'settings'): tapping Settings replaces the sidebar content with a dedicated settings pane, shows a sticky header/back button, and gives the settings pane its own scroll area.
If moving from local to durable server persistence, preserve the same UI contract first, then swap the storage backend.

Mobile layout

Keep topbar controls short and truncatable.
Hide non-essential pills/labels under narrow breakpoints rather than squeezing primary actions.
Make sidebar/drawer targets thumb-friendly.
Ensure composer and send/record controls remain reachable above safe-area and keyboard offsets.
Use min-width: 0, text-overflow: ellipsis, and breakpoint-specific label hiding to avoid layout blowouts.

Copy and branding

Audit visible strings after code changes with grep/search and browser smoke tests.
Replace infrastructure words:
live tenant → Guide, Connected, or the guide name.
isolated container → omit or say private space only if privacy is the point.
Hermes instance → Astral Hermes or your guide.
session when user-facing → thread or conversation.
Keep empty states action-oriented and brand-aligned.

Assistant message layout (ChatGPT-style asymmetric bubbles)

The default modern pattern Alex expects: user messages bubbled, assistant text responses unbubbled.

User text and audio messages: bubbled, right-aligned, max-width ~75%, tinted background. Audio MUST stay bubbled because the player UI needs visual containment.
Assistant text messages: full viewport width (align-self: stretch; width: 100%), transparent background, no border, no shadow, minimal padding. Markdown rendering with white-space: pre-wrap for paragraph breaks. Links auto-detected.
Assistant error or system messages: keep a subtle styled box. They need to read as alerts, not as normal replies.
Typing/placeholder state: render as full-width too (a small pulse or dots), not a bubble. The transition from placeholder to final reply should be seamless.

CSS shape (Tailwind-agnostic, CSS-vars-friendly):

.chat-bubble.user { /* keep existing bubble styles */ }
.chat-bubble.assistant:not(.audio):not(.error):not(.typing) {
  align-self: stretch;
  width: 100%;
  background: transparent;
  border: none;
  box-shadow: none;
  padding: 0.5rem 0;
  white-space: pre-wrap;
}
.chat-bubble.assistant.audio { /* keep bubble */ }
.chat-bubble.assistant.error { /* keep bubble */ }

Apply to ALL chat surfaces in the app (landing/demo chat, real tenant chat, embedded chat). They share message-rendering code or they don't — if they don't, factor a common renderer.

Streaming feedback (SSE) for real-time tool calls and multi-phase actions

When the backend takes more than ~1s to respond and especially when it makes intermediate tool calls (web search, chart calc, RAG, etc.), single-shot request/response makes the chat feel dead. Stream events via SSE.

Server pattern (keep JSON contract for backward compat):

The same endpoint (POST /api/chat/message, POST /api/chat/voice) detects Accept: text/event-stream and branches.
SSE branch spawns the agent process with line-buffered stdout (don't await full output; pipe and parse).
Drop "quiet mode" flags — you need the verbose tool-preview lines.
Parse each line into structured events. Useful event types:
transcribing / transcribed (voice only)
start (tenant/agent acknowledged the prompt)
tool_call (with tool name + brief args)
progress (free-form "Searching the web…", "Calling chart-api…")
text (the final assistant text — buffered emit is fine if the agent doesn't token-stream)
done (with sessionId, any final metadata)
error
Emit as data: <json>\n\n lines.
Run all event bodies through the project's redact() helper before emitting.

Client pattern (vanilla fetch streaming, no new deps):

async function streamChatSSE(url, body, { onEvent, onError } = {}) {
  const res = await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'Accept': 'text/event-stream' },
    body: JSON.stringify(body),
  });
  const reader = res.body.getReader();
  const decoder = new TextDecoder();
  let buf = '';
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buf += decoder.decode(value, { stream: true });
    const events = buf.split('\n\n');
    buf = events.pop();
    for (const evt of events) {
      const dataLine = evt.split('\n').find(l => l.startsWith('data: '));
      if (!dataLine) continue;
      try { onEvent(JSON.parse(dataLine.slice(6))); } catch (e) { onError?.(e); }
    }
  }
}

Append a single assistant placeholder message immediately on send, then live-update its text, progress[], and pending flags from incoming events. Render progress[] as small italic gray lines beneath the response.

Voice lifecycle states drive off the same SSE stream: - uploading… — set client-side BEFORE the fetch starts. - transcribing… — on transcribing event. - sent ✓ — on transcribed event (prompt is now flowing to the agent). - received ✓ — on done event.

Parser specifics (confirmed for current Hermes verbose CLI): the live hermes chat (without -Q) emits progress lines in the form ┊ <emoji> <text>… — for example ┊ 💻 preparing terminal…, ┊ ⚙ calling web_search…, ┊ 🔍 searching for transits. The leading whitespace can vary. Match the ┊ (U+250A) prefix, optionally consume one emoji code point, and capture the trailing text (with optional terminating …):

const re = /^[┊│|]\s*[\p{Emoji_Presentation}\p{Extended_Pictographic}\u{1F300}-\u{1FAFF}\u{2600}-\u{27BF}]?\s*(.+?)…?\s*$/u;

When the captured text starts with preparing|calling|using|running|invoking|executing <name>, surface it as a structured tool_call event with tool: <name>; otherwise emit it as progress. The older patterns (→ tool:, [tool:name], leading Searching|Calling|Fetching|...) still exist as legacy fallbacks but ┊-prefixed lines are the dominant real-world format. Verify against actual output before assuming any other pattern — guessing without grounding wasted an iteration in past sessions.

Strip the Hermes TUI panel from the final reply: when you drop -Q to enable streaming, you also start receiving the full verbose framing: a ╭─ ⚕ Hermes ──╮ panel box around the reply, an "Initializing agent..." banner above it, an echo of the prompt (visible as Query: <prompt> before the panel), and a footer with Resume this session with: hermes --resume ..., Session: ..., Duration: ..., Messages: ..., plus a trailing Listening for chart and memory context line. The final assistant text the user sees must be ONLY the body inside the panel. Algorithm:

Find first ╭─ (or ┌─) and matching ╰─ (or └─).
Take only the substring between them.
Split by lines, drop the top border line (contains ╭┌┐╮), strip leading/trailing │| and whitespace from every other line.
From the remaining body, also drop lines matching: ^session_id:, ^↻ Resumed session, ^Initializing agent, ^Resume this session with:, ^hermes --resume, ^Session:\s+\d, ^Duration:, ^Messages:\s+\d, ^Listening for chart and memory context, ^─{4,}$.
Collapse \n{3,} to \n\n, trim.

This is non-optional once -Q is dropped; otherwise users see panel borders rendered as ╭───╮ inline in their chat replies. Capture the full original output in a debug field for admin-only inspection.

Filler caption layer (for blank or stalled loading states)

When a spinner has nothing else to say — initial connection, provisioning, queued work, a stream that just went quiet — fill the gap with domain-flavored captions. Never let the user stare at a bare spinner or Loading… for more than ~2 seconds.

Strict hierarchy (Alex calls cheesy generic filler "dumb and cheesy" — get this right):

Real progress event → render it verbatim. Caption layer is hidden.
No event yet, or last event was >~4s ago → show a caption. Rotate every ~3.2s.
New real event arrives → caption hides immediately.

Source the captions from real domain data, not invented flavor text. For Astral Hermes that means real current transits (Moon degree+sign, planetary day/hour, decan/face ruler, applying aspects). For other apps, derive from something true about the user's current context. Bad: "Asking Mercury for safe passage…", "Checking the decans for a good omen…", "Aligning fixed stars over your container…" — these are made-up flavor text and the user will (correctly) call them cheesy. Good: "Moon 4° Virgo — Mercury's domicile, Sun-faced decan.", "Sun and Mercury both in Gemini today.", "Day of Saturn. Moon waxing through Virgo." — factual, derived from real ephemeris data, written in the user's astrology register (traditional rulerships, decans, no woo).

Implementation shape (React, ~30 lines):

function FillerCaption({ lastProgressAt, endpoint = '/api/daily-captions', stallMs = 4000 }) {
  const [pool, setPool] = useState([]);
  const [caption, setCaption] = useState('');
  const [stalled, setStalled] = useState(lastProgressAt == null);

  useEffect(() => {
    let cancelled = false;
    fetch(endpoint).then(r => r.ok ? r.json() : null).then(data => {
      if (cancelled || !data?.captions?.length) return;
      setPool(data.captions);
      setCaption(data.captions[Math.floor(Math.random() * data.captions.length)]);
    }).catch(() => {});
    return () => { cancelled = true; };
  }, [endpoint]);

  // Track stalled state when wrapping live-progress UIs
  useEffect(() => {
    if (lastProgressAt == null) { setStalled(true); return undefined; }
    setStalled(false);
    const t = window.setTimeout(() => setStalled(true), stallMs);
    return () => window.clearTimeout(t);
  }, [lastProgressAt, stallMs]);

  // Rotate while visible
  useEffect(() => {
    if (!stalled || pool.length < 2) return undefined;
    const t = window.setInterval(() => {
      setCaption(pool[Math.floor(Math.random() * pool.length)]);
    }, 3200);
    return () => window.clearInterval(t);
  }, [stalled, pool]);

  if (!stalled || !caption) return null;
  return <p className="filler-caption" aria-live="polite">{caption}</p>;
}

Pass lastProgressAt={undefined} for pure-spinner cases (always show). Pass lastProgressAt={Date.now()} on every real progress event in streaming UIs (hides until next stall). Render nothing while loading the caption pool — better than a flicker of fallback text.

Caption pool source: a backend endpoint fed by a daily/hourly cron that computes the real domain state and emits a small array. For Astral Hermes: /api/daily-captions served from JSON written by scripts/daily-transits.mjs running on a cron. See astral-transit-calendar skill for the transit pipeline and caption-generation rules.

Typeform-style sequential onboarding (kill the button-salad success screen)

When the chat product includes an onboarding/setup flow (create a guide, connect a provider, billing, etc.), make it Typeform-style: one screen, one decision, one primary CTA. The wizard NEVER ends in a dashboard with parallel "Open chat / Open account / Start billing / Connect provider" buttons — that's the anti-pattern.

Required shape:

Input steps — name, provider pick, credentials, etc. Existing <StepShell> / progress bar pattern works.
Provisioning step — spinner only, no buttons, calls the create API once, auto-advances on success. On failure: friendly retry message + admin-gated <details> for the raw stderr. Never render raw stdout to users.
Provider auth step (conditional) — only if the chosen provider needs post-create OAuth. Single CTA flow per provider (device code button, then "I approved it"; or "Open authorize →" + code paste + "Complete"). Auto-advance on success.
Billing step (conditional) — single "Start trial →" button → redirect to payment provider. Success URL returns to /onboarding?...&checkout=ok so the wizard reads the query string on mount and jumps straight to the ready step. Don't exempt admins from sandbox billing — they need to validate the flow end-to-end before live keys flip on (request from Alex).
Ready step — exactly ONE big CTA ("Open chat →"), 2-second auto-redirect to the actual product, ONE small text link to the dashboard as escape hatch.

Wizard navigation: - Hide Back/Next on terminal steps (provisioning, providerAuth, billing, ready) — they advance themselves. - Progress bar recomputes total based on conditionals (admin skips invite step, API-key providers skip credential step, OAuth providers add provider-auth step, etc.).

Don't put fiddly steps in onboarding. Telegram bot setup, Slack integration, custom domain — anything that requires the user to open another app, copy something, paste something — does NOT belong in the first-run flow. Push it to a settings panel and surface a dismissable nudge banner on /chat and /account for tenants without it configured:

{tenant.metadata?.telegramConfigured === false && !dismissed && (
  <TelegramNudgeBanner tenant={tenant} onDismiss={...} />
)}

Banner copy is a single line + arrow to deep-link into the relevant settings tab. Dismissal persists per-tenant in localStorage (<feature>NudgeDismissed:<tenantId>).

Testing and Verification

Run both functional and visual checks:

Build: npm run build or project equivalent.
Unit/readiness/security tests: run the app's established scripts, not only the touched test.
Route smoke: verify /, /chat, and /api/health or project equivalents.
Mobile browser smoke: narrow viewport, open drawer/sidebar, create multiple threads, switch between them.
Voice smoke: start recording, stop, confirm message appears, play/pause works, processing state appears, transcript/reply arrives, and no console errors.
API smoke for voice routes: verify auth failures return controlled JSON/401, oversized/empty audio is rejected, and tests mock provider transcription without leaking API keys.
Copy smoke: search and inspect for leaked technical words such as tenant, container, instance, worker, debug, secret.
Deployment verification: restart the correct process and test the public URL after deploy.

Common Pitfalls

Fixing the handler but not the renderer. The record button can successfully create audio data while the chat appears broken because the message list ignores audio messages.
Fixing the renderer but not the backend. A polished voice bubble with waveform/playback can still be a no-op if it never transcribes the blob or calls the real chat API. When the user says “I sent an audio note but nothing happened,” inspect both the UI message insertion and the network/backend handoff before polishing more UI.
Persisting threads without listing them. LocalStorage/database saves are not enough. Users need visible thread navigation and active-state feedback.
Letting settings dominate the sidebar. In chat products, thread history and knowledge/context usually deserve top priority; settings should sit at the bottom unless the app is primarily an admin console.
Assuming native mobile keyboards can show both Send and Return. Browsers/platforms differ. The reliable pattern is Return for newline and an on-screen send button for send.
Shipping technical copy after backend work. Implementation labels often leak into placeholders, empty states, badges, and tests. Update tests to assert polished copy so the regression does not return.
Desktop-only validation. Chat layout bugs often appear only at <380px widths: squeezed pills, hidden send buttons, broken drawers, or composer overlap.
Using <details> for mobile settings inside a drawer. Native details can work for tiny groups, but it is easy for custom CSS/overflow to reserve hidden space, trap lower content below the viewport, or make settings feel like a drawer within a drawer. For substantial settings, use a dedicated nested sidebar view with a back button and independent scroll area. Verify on an iPhone-sized viewport that the main thread list is replaced while settings is open, the back button is visible, and settings content scrolls without moving the chat/composer.
Mockup drift. If the user says the live UI should match the mockup, treat animation, icon shape, spacing, and placement as acceptance criteria, not decoration.
Bubbling the assistant. Defaulting every message to the same bubble component makes the chat feel like SMS, not a modern assistant. User messages bubbled, assistant text full-width unbubbled, audio/error messages keep bubbles. See "Assistant message layout" above.
Silent async UI. Voice send button that goes "sending…" and stays "sending…" until the assistant replies 8 seconds later is broken UX even if the backend works. Every async action needs visible state changes within ~100ms and a sensible state machine per phase (uploading → transcribing → sent → received). Implement via SSE if the backend has multiple phases; don't fake it with setTimeout.
Post-action button salads. A success screen with 5+ parallel CTAs ("Open chat / Continue to billing / Connect provider / Open account / Provision another") is a wizard that failed to finish. Convert each action into its own sequential step with one CTA each. Auto-redirect to the actual product at the end.
Putting fiddly setup in onboarding. Telegram bot, Slack webhook, custom domain — anything requiring "open BotFather, copy this, paste here" — does NOT belong in first-run flow. Push to settings + nudge banner. Onboarding should only ask for things the user can answer in their head or paste from one tab.
Raw stderr in user-facing UI. Provisioning failures, deploy logs, astral-tenant-foo Up 3 seconds — never render these to end users. Wrap in admin-gated <details> blocks. Show users a friendly retry message.
Cheesy invented filler text under spinners. When a spinner has nothing else to say, the temptation is to invent flavor text ("Asking Mercury for safe passage…", "Checking the decans for a good omen…", "Aligning fixed stars over your container…"). Don't. Alex calls this "dumb and cheesy" and he's right — it reads as fake. Derive captions from real domain data via a cron-fed endpoint (current transits, planetary hour, decan ruler, etc.) and write them in the user's actual domain register, not as cute filler. See "Filler caption layer" above.
Filler captions competing with real progress. If real tool-call events / streaming tokens / phase labels are flowing, captions must be hidden. They are a fallback for blank or stalled states only, and they hide the moment a concrete event resumes. A caption rotating underneath a live "Searching the web…" line is double-talk.

Verification Checklist

[ ] Composer newline/send behavior matches the requested product behavior.
[ ] Multiline messages render correctly in bubbles.
[ ] Recording can start and stop; recorded voice-note appears in the transcript.
[ ] Voice-note playback UI works and matches the product style.
[ ] Mic icon is visually balanced at final rendered size.
[ ] Multiple threads can be created, listed, selected, and restored.
[ ] Settings/navigation hierarchy matches the mockup or product expectation.
[ ] Mobile topbar does not crowd primary actions at narrow widths.
[ ] User-facing copy has no infrastructure language.
[ ] Build/tests/smoke pass locally and against the deployed public URL when deployment is part of the task.
[ ] Assistant text messages render full-width without bubble; user + audio + error messages keep bubbles.
[ ] Voice send shows distinct lifecycle states (uploading → transcribing → sent → received), not a single static label.
[ ] Long-running assistant replies stream progress (tool calls / "searching…") rather than showing a frozen "thinking…" spinner for the whole duration.
[ ] Onboarding wizard has exactly one primary CTA per screen; no post-provision dashboard with parallel buttons.
[ ] First-run flow does not require the user to open another app (Telegram, Slack, BotFather, etc.); those configurations live in settings + nudge banners.
[ ] Raw stderr/stdout from provisioning is never rendered to non-admin users.
[ ] Spinner/loading states never sit blank or show bare Loading… for more than ~2s; a domain-flavored caption layer fills the gap.
[ ] Filler captions are sourced from real domain data (cron-fed endpoint), not invented flavor text.
[ ] Filler captions hide automatically when real progress events resume.