PROJECT STATUS 2026-05-28 phase 2c shipped terminal polish next

What's built, what's next, how to use it.

Working project status as of today. Separate from the architectural overview; this is the practical "where are we, what works, what's broken, what's next" snapshot updated as each phase ships.

148
tests passing
32
test files
4 / 10
phases shipped
0
known prod blockers

Two diagrams side by side. Mentally diff them to see what's left.

Current workflow (phase 2c shipped)

Single channel input, single provider, three-layer Push Gate, audit log. Everything in this diagram is shipped and tested.

INPUT MODERATE + GATE EXECUTE PERSIST + NOTIFY User chat dashboard composer ws: chat_send Moderator (Hono) two-phase prompt prepended Push Gate matcher tier 1 floor + tier 2 blacklist.yml + best-effort normalize Worker Manager spawn / cap 8 concurrent Worker (Claude) claude code subprocess stream-json events opus 4-7 default Permission MCP stdio sidecar + research tool SQLite (WAL) sessions / events / queue / audit WebSocket Hub topics: agents / gate / dev_servers /audit endpoint every decision queryable Dashboard notify sonner toast + push gate bar send spawn tool ask decide event gate write ws: events back to operator SAFETY INVARIANT (provable by test) Tier 1 hardcoded floor cannot be overridden by any operator config  |  floor-cannot-be-lifted.test.ts
green = shipped  ·  arrows = real data flow today

Full vision (all phases complete)

What the same request looks like when every phase has shipped. Same trigger column flows through more layers; new always-on processes can inject their own triggers; memory persists across all sessions; multiple worker providers run in parallel; reviewers gate quality before persist.

TRIGGER ROUTE MODERATE EXECUTE REVIEW PERSIST User chat Discord DM Email CI / runtime hook Cron tick Channel Router classify / dedupe Moderator clarify, decompose Memory load core + project Skill lookup replay if match Push Gate blacklist.yml + floor Worker (Claude) opus 4-7 Worker (Claude) sonnet 4-6 Worker (Codex) image / fast code Worker (Gemini) long context Critic Vision Critic Debugger Self-Editor Memory Keeper Skills Librarian Sync Server Notify operator ALWAYS-ON OVERSIGHT (re-injects triggers) Watchdog  |  Scheduler  |  Error Ingest  |  Self-Editor independent processes that detect drift, fire crons, ingest webhooks, inject new triggers without operator action MEMORY (PERSISTENT ACROSS ALL SESSIONS) Core (frozen) Project (per repo) Session (FTS) Skill (procedural) Multi-Machine Sync WORKER TOOL SURFACE (via MCP) GitHub MCP Playwright (browser) Postgres / SQLite MCP Filesystem MCP Mem0 / Honcho
blue = planned  ·  final state when every phase ships

The delta is the roadmap. Every box in the blue diagram that isn't in the green one is something the self-iteration loop will build, in the order listed below.

4 shipped, 6 to go.

1moderator
2Abackend gate
2Bdashboard
2Ctrust
2.5errors + crons
3critic
4multi-page
5remote
6memory
7self-edit

What actually works right now.

1
Minimum Viable Moderator Hono HTTP server, spawn claude as subprocess, SQLite WAL, SSE event stream, basic CLI
Phase 1
2A
Push Gate backend + multi-agent + WebSocket Permission MCP shim, per-worker config, WS hub with topic subscriptions, dev-server detection, multi-agent concurrency cap
896ade2
2B
Cursor-style dashboard Vite + React 19 + Tailwind v4 + Sonner, mac chrome, tabbed agents, collapsible chat, layout modes, settings modal, browser notifications, WS auto-reconnect
10 commits
2C
Tier-1 hardcoded floor Pure function blocking ~/.ssh, ~/.aws, .env, sudo, rm -rf, curl | sh regardless of config. Cannot be overridden.
33d0c6b
2C
blacklist.yml loader + hot-reload js-yaml parser, fs.watch hot-reload, defensive parsing of malformed configs
3002955
2C
Command normalization (best-effort) Strips ANSI escapes, null bytes, fullwidth chars; unwraps bash -c; one-level base64 decode. KNOWN_GAPS.md documents limits.
9704e88
2C
Three-layer matcher wired floor + normalize + blacklist with layer/rule reported on every decision. Invariant test proves blank config still blocks floor.
653ae52
2C
Universal audit log audit_log SQLite table, GET /audit endpoint, writes on every gate decision (subject, action, layer, rule, decided_by, ts)
4ace857
2C
Two-phase Moderator prompt moderator-system.md prepended to every spawn. Phase 1 clarifies (up to 3 questions), Phase 2 runs autonomously. sessions.phase column tracks state.
c2df7d7
2C
Dashboard approval card upgraded Shows matched rule + layer chip (red floor / amber blacklist). Operator-forbidden fields explicitly absent (no "recommended", no "confidence").
cc2a9f8
post-2C
Real-claude end-to-end pipeline 5 bug bundle (claude args reorder, --verbose, stream-json parser, tsx absolute path, MCP server name match). Worker now actually talks to Claude through Moderator. Verified live: worker read status.html and wrote a self-introduction.
50dada9
post-2C
Researcher sub-tool + permission decision schema fix New mcp__orchestrator__research tool that off-loads web lookups to a short-lived Claude sub-process. Plus fix for Claude's permission-decision schema needing updatedInput on allow. Plus pnpm dev:safe (no tsx watch) for self-iteration runs. Plus prompts/self-iteration.md guardrail template.
d17eb29
post-2C
Rebrand to Nyx + plain-English explainer + visuals Whole product renamed Nyx (Greek goddess of night, works while you sleep, pairs with Atlas). New /simple page with dashboard screenshot, terminal mockup, AI-agent definition, comparison table vs Cursor / Devin / OpenHands / Aider / Hermes.
b84d505
item 2
Audit log dashboard tab 480px right drawer with kind filter + 5s auto-refresh + expandable details JSON. Replaces curl + jq for inspecting gate decisions during the FBM trial. ScrollText icon in the titlebar.
37b8350 + 4b8048d
item 3
Numbered DB migration runner Replaces "rename moderator.db backup" with a tracked, transactional migration system. Baseline migration captures current schema verbatim. Per-migration transactions roll back cleanly on failure. Future schema changes: drop a numbered file in db/migrations/, ship.
b79a0de
item 4
Error webhook (POST /events/error) External systems (CI, runtime crash reporters, other crons) POST here and Nyx auto-spawns an investigation agent. Writes a trigger_received entry to the audit log. Investigation-only by default; no commits, no deploys without operator approval.
aff751b
item 4
Repo priming (auto-load CLAUDE.md) When a worker enters a repo, root CLAUDE.md and .claude/CLAUDE.md are auto-loaded and prepended to the prompt under a PROJECT MEMORY header. 8000-char cap with truncation note. Interop with existing Claude Code memory convention; no new config needed.
f519b85
item 4
Cron scheduler Schedules table + in-process tick loop (60s). Supports @every 5m/1h/1d, @hourly, @daily HH:MM. REST endpoints to CRUD. Cron fires write trigger_received audit entries with schedule context. Unlocks "every morning check FBM CI status" type automations.
f42602d

14 items the loop will work through, in order.

Ordered smallest-and-safest first so the self-iteration loop validates itself on easy wins before touching anything load-bearing. The total is roughly 30-35 hours of subagent execution to ship every box in the "full vision" diagram above.

Operator note: The self-iteration loop also needs three guardrails before it can run unattended: (a) pnpm dev:safe script that runs the moderator without tsx watch (so worker edits don't trigger mid-task restarts), (b) prompt guardrail "if tests fail, STOP and report — do not try to fix forward", (c) per-task git commit + audit log entry so any single iteration is rollbackable. Run separately from this list once they're in place.

How to use it today.

1

Start the full stack

One command, moderator on :3000 + dashboard on :5173, concurrently labeled.

cd ~/projects/agent-orchestrator
pnpm dev
2

Open the dashboard

Visit http://localhost:5173. You should see the mac-window chrome and a green "live" indicator in the titlebar.

3

Create an agent

Click the + button in the tab bar. A new agent tab appears. It is empty (no worker spawned yet).

4

Send your first message

Type in the right-side chat panel. cmd+enter sends. This spawns a worker with your message as the prompt. The Moderator prompt template prepended automatically asks for clarification if needed, otherwise goes straight to work.

5

Watch what happens

Worker output streams into the chat panel. If it spins up a localhost dev server, the middle pane iframes it (live, interactive). If it tries something gated (git push, edit ~/.env, etc.), the bottom bar lights up amber with the matched rule and layer chip.

6

Approve or deny gated actions

Click approve or deny on the bottom bar. Decision is logged to audit_log and broadcast back to the worker via WS so it can continue (or handle the denial).

7

Customize what gets gated

Click the gear icon. Edit the blacklist tab. Add patterns. Save. The loader hot-reloads, the matcher uses new patterns on the next decision.

# ~/.orchestrator/blacklist.yml
bash:
  - "git push *"
  - "fly deploy *"
paths:
  - "**/.env*"
  - "~/personal-notes/**"
8

Inspect the audit log

API-only for now. UI tab coming.

curl http://localhost:3000/audit | jq
curl http://localhost:3000/audit?kind=gate_decision&limit=20 | jq

Honest list. None are blockers, all are scoped.

med
No real DB migration story.
Schema additions (e.g., the audit_log table, sessions.phase column) work for fresh DBs and use try/catch ALTER for upgrades. Older DBs from before a column existed can fail. Workaround: rename apps/moderator/data/moderator.db to backup and let it recreate.
med
tsx watch restarts moderator on every save.
When the moderator restarts, in-flight WS connections drop. Dashboard auto-reconnects within ~500ms. Active claude subprocesses are orphaned but continue writing to the DB.
minor
Audit log is API-only; no dashboard UI yet.
Inspect with curl until the UI tab ships (planned soon, low effort).
minor
Memory layers not built.
Workers spawn fresh each time and only see what Claude Code reads from CLAUDE.md natively. The Hermes-inspired 5-layer stack (core + project + session + skill + sync) is Phase 6.
minor
Approval card is functional but minimal.
Shows matched rule + layer + command. Does NOT yet show diff summary, files changed, test status, or recent worker actions (operator spec for the full evidence card). Add as needed during the FBM trial.
minor
No critic, no vision review.
Worker output is not reviewed before being marked done. UI work cannot be auto-iterated against a screenshot yet. Both are Phase 3.
minor
No remote access.
Dashboard is localhost:5173 only. Cloud Tunnel for phone access is Phase 5.
minor
Self-iteration is janky-possible only.
You can point the orchestrator at its own repo and have a worker improve it. Restart is manual; rollback on test failure is not automatic. Phase 7 (Self-Editor) is deferred per the Self-Editor risk note in the trust spec.

Where to find things.

apps/moderator/src/ Backend: Hono server, DB, worker spawn, push gate
apps/moderator/src/push-gate/ floor.ts (Tier 1), blacklist-loader.ts (Tier 2), normalize.ts (best-effort), matcher.ts (orchestrator)
apps/moderator/src/routes/ agents, sessions, gate, dev-servers, internal, config, audit
apps/moderator/src/prompts/ moderator-system.md (clarify-then-run prompt)
apps/moderator/test/ 28 test files, 130 passing
apps/dashboard/src/ Frontend: React 19 + Tailwind v4 + Zustand store + ws client
apps/dashboard/src/components/ LayoutShell, Titlebar, TabBar, Workspace, AgentsSidebar, PreviewPane, ChatPanel, PushGateBar, SettingsModal, Toaster
docs/overview.html Architectural overview / pitch document (this file's sibling)
docs/status.html This file. Working project status.
docs/superpowers/specs/ Design docs: orchestrator, push gate / dashboard, trust corrections, KNOWN_GAPS
docs/superpowers/plans/ Implementation plans: Phase 1, 2A, 2B, 2C
~/.orchestrator/blacklist.yml Operator-owned push gate Tier 2 patterns (paths + bash + mcp)
~/.orchestrator/roles.json Per-role model configuration (moderator, worker, critic, watchdog)
apps/moderator/data/moderator.db SQLite WAL database (sessions, events, agents, messages, gate queue, audit log)

Why this order, not "build everything."

The trust trial gates the rest.

Phases 1-2C give you a tool you can actually point at FBM Sniper for 30 nights. That data tells us which Phase 6 memory features matter and which Part B safety features are needed first. Building Phase 6 before the trial is guessing.

Depth on a few things beats breadth on twenty.

The win isn't the longest feature list. It's being the only tool where you genuinely close the laptop and trust it. Earned by Push Gate + audit log + Pending Learnings reliability, not by shipping every roadmap item.

Self-Editor stays scoped.

An orchestrator modifying its own gate logic is the single feature most likely to break the trust pitch catastrophically. Phase 7 stays planned but only for non-safety code, and is not part of the public release.

Part B is intentionally deferred.

Pending Learnings, skill provenance, token budgets, and the circuit breaker are all great features. They are also all more value to a tool that has lived through real incidents than to a tool that hasn't. Ship the trial, learn, then build them with that data.