Skip to content

fix(conductor): show native-swarm worker progress and fix dispatch bugs#451

Open
waylonkenning wants to merge 19 commits into
outsourc-e:mainfrom
waylonkenning:fix-conductor-session-key-prefix
Open

fix(conductor): show native-swarm worker progress and fix dispatch bugs#451
waylonkenning wants to merge 19 commits into
outsourc-e:mainfrom
waylonkenning:fix-conductor-session-key-prefix

Conversation

@waylonkenning
Copy link
Copy Markdown

Summary

Fixes the Conductor native-swarm mode so the UI shows worker progress during mission execution instead of a permanent "Spawning workers..." state. Also fixes several dispatch bugs that prevented workers from starting.

Changes

Conductor dispatch fixes (swarm-dispatch.ts)

  • Oneshot fallback: The no-wrapper path passed -q without a query argument, causing hermes chat to fail immediately with error: argument -q/--query: expected one argument. Fixed by always including the prompt after -q.
  • Tmux startup sentinel: The includes('[Hermes worker exited with status') check matched the echoed shell command itself (the printf format string contains that text), falsely rejecting healthy hermes startups. Fixed to use a regex anchored to line-start.

Checkpoint parsing fix (swarm-checkpoints.ts)

  • The tmux worker writes checkpoints as **STATE:** DONE (bold markdown) but parseSwarmCheckpoint expected plain STATE: format. Fixed by stripping ** before parsing.

Conductor UI progress feedback (conductor-spawn.ts, use-conductor-gateway.ts)

  • GET handler: Now polls the worker's SQLite database for checkpoints when the mission is active, then updates the mission state via recordMissionCheckpoint. Re-reads the mission from the store after sync so the response reflects the updated state.
  • UI hook: Builds virtual ConductorWorker cards from native-swarm mission assignments when no Gateway sessions match. Shows worker status (Running / Complete) instead of permanent "Spawning workers...".

Tests

  • Adds Playwright e2e test that launches a native-swarm mission via the API and polls until it transitions to completed with DONE checkpoints.

Closes #448

Waylon Kenning added 19 commits May 13, 2026 07:37
sessionKeyPrefix was hardcoded to null in conductor-spawn.ts, breaking
async session resolution when the dashboard backend returns a prefix.
Now mirrors the sessionKey pattern and passes through the value from
the spawn result.

Co-authored-by: Hermes Agent
…y-center, responsive OfficeView

Three rendering bugs fixed in the Conductor component:

1. Missing overflow-y-auto on active, preview, and complete phase
   containers — content was silently clipped instead of scrolling
   on mobile viewports.

2. justify-center on the active phase flex column main container
   fought with natural top-to-bottom flow when content overflowed.

3. OfficeView fixed at h-[360px] on mobile — changed to
   max-h-[clamp(200px,40vh,360px)] so it adapts to viewport height.

Fixes outsourc-e#445
…ile layout

- Replaced absolute-positioned action buttons with flex layout using
  flex-1 spacers — prevents the Conductor badge from overlapping with
  the action buttons on narrow screens.
- Added truncate to badge text and shrink-0 to the green dot for
  graceful overflow.
- Hid token count column on mobile in recent missions rows to give
  the mission title more room (reduces truncation).
- Reduced gap and fixed-width column sizes slightly for tighter mobile
  layout.
…h to dashboard logic with null session key

The ConductorSpawnResponse type declared native-swarm mode but the
sendMission handler had no case for it. When the server returned
mode: 'native-swarm', the hook fell through to the dashboard fallback
with null sessionKey/sessionKeyPrefix/missionId/jobId, throwing a
generic error.

Now native-swarm is handled with its own branch that:
- Sets missionId and jobId for mission status polling
- Uses missionId as the orchestrator session key proxy
- Sets descriptive plan text about swarm workers
- Immediately transitions to running phase

Also added the assignments field to ConductorSpawnResponse type.
… missions could get stuck

The missionStatusQuery used default react-query retry (3 attempts),
which could exhaust before the SwarmMission store had created the
mission record. For native-swarm missions, the dispatch is async
(void-promise), so the GET /api/conductor-spawn?missionId=... call
could arrive before the swarm mission was stored, returning 404.

With retry exhausted, the query stopped polling and the mission
remained stuck in the 'running' phase forever.

Changed to retry: Infinity with exponential backoff
(2s, 4s, 8s capped at 10s) so the query keeps polling until the
swarm mission is available.
…— was never invoked, leaving new profiles stuck at setup required
…ent clip

The home page Conductor view used md:justify-center on the main container,
which vertically centers flex content in the available space. When content
height exceeds viewport height, flex centering pushes the top off-screen
(y=-51px), clipping the Conductor badge and header.

Fix: remove md:justify-center so content starts at justify-start (y=24px),
keeping the badge and header visible. Only the home phase had this class;
preview, active, and complete phases were already justify-start.
…sent even without initial sessionKey

When the dashboard returns a sessionKeyPrefix without a sessionKey
(session hasn't resolved yet), the async session key resolver at
line 1581 was gated on both . Since
orchestratorKey was null in this case, the resolver never started
and the session key was never resolved.

Fix: remove the orchestratorKey check — the resolver only needs the
prefix to start polling for a matching session.
…d header

1. Notification dropdown: absolute right-0 clipped left side off-screen
   on mobile (causing 'Swarm updates' → 'ARM UPDATES', 'Mission' → 'ission').
   Changed to left-0 on mobile, right-0 on sm+.

2. Worker card header: px-28 (112px each side) left only ~126px for the
   worker name between two absolutely-positioned elements on mobile.
   Reduced to px-16 on mobile, keeping px-28 on sm+.

3. Swarm description: truncate class cut the worker count description
   mid-sentence on mobile. Changed to wrap on mobile, truncate on sm+.
…bot avatar

The absolute left-0 navigation tabs (Control/Board/Inbox/Runtime)
stretched across the card width on mobile, overlapping the centered
PixelAvatar below. Added flex-wrap so tabs wrap to a second line
on narrow viewports, keeping sm:flex-nowrap on desktop.
…t mobile widths

- OpsStrip gateway block: add flex-wrap so '0 active runs' and
  'pulse 4h ago' wrap properly on narrow viewports instead of
  colliding horizontally.
- AttentionMarquee mask: extend fade zone from 92% to 96% so
  marquee items aren't prematurely masked on 390px viewport.
Change action bar from justify-end to justify-start on mobile
so that NEW CHAT / TERMINAL / SKILLS buttons don't clip the
right edge of the 390px viewport. Desktop remains justify-end.
The  element in the code viewer had default
which caused line overflow on narrow screens. Added  so long code comments wrap at 390px viewport instead of
clipping off-screen.
…ontent clipping across all pages

styles.css hardcoded --tabbar-h: 0px, overriding the 80px fallback
in var(--tabbar-h, 80px) used by most pages. This caused bottom
content to be hidden behind the fixed mobile tab bar (~80px tall)
on every non-chat page. The MobileTabBar JavaScript does dynamically
measure and set this variable, but the CSS default was wrong.
iPhone SE (375px) couldn't fit NEW CHAT + TERMINAL + SKILLS
on one row because the primary button used desktop-sized padding
and text. Reduced New Chat to px-3 py-1.5 text-xs on mobile
(default), restored to px-3.5 py-2 text-sm at sm: breakpoint.

Also removed redundant justify-start (flex-start is the default)
to keep flex-wrap clean.
…failure

dispatchSwarmAssignments ran hermes chat -q <prompt> with the
full prompt as a command-line argument. For long prompts this
exceeds ARG_MAX or contains unescaped chars, causing execFile
to fail silently ("Command failed"). The worker was marked
dispatched but never actually started, leaving it stuck in
'idle' with 0/1 tasks complete forever.

Fix: pass the prompt through execFile's stdin ('input' option)
instead of as a positional arg. hermes chat -q reads from stdin
when no query argument follows the flag.
## Changes

### Conductor dispatch fixes (swarm-dispatch.ts)
- Fix oneshot fallback: pass prompt as  argument (not just stdin) so
   gets a valid query string
- Fix tmux startup sentinel: use regex anchored to line-start so the echoed
  shell  command doesn't cause a false positive

### Checkpoint parsing fix (swarm-checkpoints.ts)
- Strip markdown bold markers () before parsing checkpoint labels so
   is recognized as STATE: DONE

### Conductor UI progress feedback (conductor-spawn.ts, use-conductor-gateway.ts)
- GET handler now polls the worker's SQLite database for checkpoints when
  the mission is active, then updates the mission state via
  recordMissionCheckpoint
- Re-read mission from store after checkpoint sync so the response reflects
  the updated state
- UI hook builds virtual worker cards from native-swarm mission assignments
  when no Gateway sessions match, showing worker status instead of
  permanent 'Spawning workers...'

### Tests
- Add Playwright e2e test for native-swarm conductor lifecycle

Closes outsourc-e#448
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhancement(conductor): show native-swarm worker progress during mission execution

1 participant