Nyx | Project Status

Workflow today vs full vision

Two diagrams side by side. Mentally diff them to see what's left.

Current workflow (as of 2026-05-30)

All shipped components as of today. Five-column view: triggers on the left, oversight and persistence on the right. New today (9 commits): 5 deadlock root-cause fixes (planner 0-subtask early return + 45min fanout cap, archive() SIGTERMs subprocess, 60min worker runtime cap, autopilot 1h cooldown, watchdog tool_use idle tracking at 15min), GET /status/pulse (pure-SQL health broadcast every 20min on SSE, haiku escalation on anomaly only), GET /reflection/digest (rule-based audit-log analysis, 1 haiku call per digest, fires on daily scheduler). Verifier runtime gate queued. Go TUI live at apps/tui-go/nyx-tui. Multi-moderator and concierge-uses-planner still in flight.

green = shipped and running · flow lines = real data flow today · 2026-05-30

Full vision (all phases complete)

Target architecture when every phase ships. Green boxes are already shipped (as of 2026-05-30); blue boxes remain. What's still ahead: the multi-moderator process model with the 2-slot / 4-worker caps (in flight), concierge auto-routing big tasks through the planner (in flight), and the Verifier runtime gate + Visual Iteration items (queued). The Go TUI is live. Once the remaining items land, the operator gives intents and the system runs them across 1-2 project moderators with workers, planner, critic, auto-revise, verifier, janitor, and auto-restart all self-driving. Memory persists across all sessions; the operator becomes optional for everything except strategic conversation.

green = shipped as of 2026-05-30 · blue = planned · final state when every phase ships

The delta is the roadmap. Blue boxes in the full vision that are not green are what the self-iteration loop will build next, in the order listed below.

Shipped

What actually works right now.

Minimum Viable Moderator Hono HTTP server, spawn claude as subprocess, SQLite WAL, SSE event stream, basic CLI

Phase 1

Push Gate backend + multi-agent + WebSocket Permission MCP shim, per-worker config, WS hub with topic subscriptions, dev-server detection, multi-agent concurrency cap

896ade2

Cursor-style dashboard Vite + React 19 + Tailwind v4 + Sonner, mac chrome, tabbed agents, collapsible chat, layout modes, settings modal, browser notifications, WS auto-reconnect

10 commits

Tier-1 hardcoded floor Pure function blocking ~/.ssh, ~/.aws, .env, sudo, rm -rf, curl | sh regardless of config. Cannot be overridden.

33d0c6b

blacklist.yml loader + hot-reload js-yaml parser, fs.watch hot-reload, defensive parsing of malformed configs

3002955

Command normalization (best-effort) Strips ANSI escapes, null bytes, fullwidth chars; unwraps bash -c; one-level base64 decode. KNOWN_GAPS.md documents limits.

9704e88

Three-layer matcher wired floor + normalize + blacklist with layer/rule reported on every decision. Invariant test proves blank config still blocks floor.

653ae52

Universal audit log audit_log SQLite table, GET /audit endpoint, writes on every gate decision (subject, action, layer, rule, decided_by, ts)

4ace857

Two-phase Moderator prompt moderator-system.md prepended to every spawn. Phase 1 clarifies (up to 3 questions), Phase 2 runs autonomously. sessions.phase column tracks state.

c2df7d7

Dashboard approval card upgraded Shows matched rule + layer chip (red floor / amber blacklist). Operator-forbidden fields explicitly absent (no "recommended", no "confidence").

cc2a9f8

post-2C

Real-claude end-to-end pipeline 5 bug bundle (claude args reorder, --verbose, stream-json parser, tsx absolute path, MCP server name match). Worker now actually talks to Claude through Moderator. Verified live: worker read status.html and wrote a self-introduction.

50dada9

post-2C

Researcher sub-tool + permission decision schema fix New mcp__orchestrator__research tool that off-loads web lookups to a short-lived Claude sub-process. Plus fix for Claude's permission-decision schema needing updatedInput on allow. Plus pnpm dev:safe (no tsx watch) for self-iteration runs. Plus prompts/self-iteration.md guardrail template.

d17eb29

post-2C

Rebrand to Nyx + plain-English explainer + visuals Whole product renamed Nyx (Greek goddess of night, works while you sleep, pairs with Atlas). New /simple page with dashboard screenshot, terminal mockup, AI-agent definition, comparison table vs Cursor / Devin / OpenHands / Aider / Hermes.

b84d505

item 2

Audit log dashboard tab 480px right drawer with kind filter + 5s auto-refresh + expandable details JSON. Replaces curl + jq for inspecting gate decisions during the FBM trial. ScrollText icon in the titlebar.

37b8350 + 4b8048d

item 3

Numbered DB migration runner Replaces "rename moderator.db backup" with a tracked, transactional migration system. Baseline migration captures current schema verbatim. Per-migration transactions roll back cleanly on failure. Future schema changes: drop a numbered file in db/migrations/, ship.

b79a0de

item 4

Error webhook (POST /events/error) External systems (CI, runtime crash reporters, other crons) POST here and Nyx auto-spawns an investigation agent. Writes a trigger_received entry to the audit log. Investigation-only by default; no commits, no deploys without operator approval.

aff751b

item 4

Repo priming (auto-load CLAUDE.md) When a worker enters a repo, root CLAUDE.md and .claude/CLAUDE.md are auto-loaded and prepended to the prompt under a PROJECT MEMORY header. 8000-char cap with truncation note. Interop with existing Claude Code memory convention; no new config needed.

f519b85

item 4

Cron scheduler Schedules table + in-process tick loop (60s). Supports @every 5m/1h/1d, @hourly, @daily HH:MM. REST endpoints to CRUD. Cron fires write trigger_received audit entries with schedule context. Unlocks "every morning check FBM CI status" type automations.

f42602d

item 5

Memory Keeper core (Hermes frozen-snapshot) ~/.orchestrator/memory/MEMORY.md (2200 char cap) + USER.md (1375 char cap). Loaded once per worker spawn, immutable mid-session. Prefix-cache friendly. Capacity warnings at 80%. Injected into prompt as GLOBAL MEMORY section before any per-task content.

phase 6 slice 1

item 6

Per-repo PROJECT.md memory <repo>/.orchestrator/PROJECT.md (4400 char cap) loaded when worker enters that repo. Distinct namespace from operator-managed CLAUDE.md. Workers can propose updates; operator approves (write flow in a later slice). Solves Hermes' multi-project budget overflow.

phase 6 slice 2

item 7

Skills Librarian + skill MCP tool ~/.orchestrator/skills/<name>.md. Workers see all available skill names + descriptions in their prompt prefix, and can call mcp__orchestrator__skill (action=get) to read the full body when one is relevant. /internal/skill backend route. Safe name validation prevents path traversal.

phase 6 slice 3

item 8

Discord bot (apps/discord-bot) DM the bot anything, message forwards to Nyx, worker reply DMs back. One persistent agent per Discord user (clear with /reset). discord.js + bot-logic.ts (pure, unit-tested via mock ModeratorApi). Long replies chunked at 1900 chars for Discord's 2000-char limit.

phase 5

item 8

Cloudflare Tunnel script scripts/tunnel.sh runs cloudflared against localhost:5173 (the dashboard). Free *.trycloudflare.com URL per session. Setup notes in the script comments for upgrading to a named tunnel + custom domain.

phase 5

item 9

Critic (Phase 3a) POST /sessions/:id/review spawns a short-lived claude with the session's text events + a strict reviewer prompt. Verdict: approve / revise / abstain + critique up to 300 words. 90s timeout. Manual-trigger first; auto-trigger on session.succeeded gated behind a config flag still being designed.

phase 3a

item 11

Watchdog (Phase 6 slice 4) Always-on in-process monitor. Default 5-minute scan interval. Flags any session "running" longer than 15 minutes as stuck, logs a warning, writes a kind="watchdog" entry to audit_log. First slice; drift detection and external health checks come later.

phase 6 slice 4

DEMO

Yaga.co.za scraper (built autonomously by Nyx) Operator gave Nyx a one-sentence brief; Nyx self-dispatched a discovery worker (Puppeteer one-shot), found the public JSON endpoint, wrote lib/yaga-client.js + lib/yaga-sniper.js + scripts/smoke-yaga.mjs. Returns real listings from yaga.co.za through the existing FBM Sniper Pro pipeline. Zero human code edits.

ec21188 (FBM repo)

DEMO

Yaga UI integration (also autonomously by Nyx) Nyx wired the new yaga scraper into the FBM Sniper Pro desktop app (server.cjs, shared seller-trust, ui/index.html, ui/app.js across 31 state maps, 6 UI modules, onboarding step, regression tests). 13 files, 111 insertions, 18 deletions in 12 minutes. Self-corrected when its own post-commit scan caught a forbidden arrow character in the commit message and amended. Desktop build launched first-try.

0006259 (FBM repo)

item 14

Self-Iterator v1 (Phase 7, narrow scope) Auto-dispatches a fix worker scoped to the orchestrator repo when (a) a worker terminates with status=failed or (b) the watchdog finds a stuck session. Safety rails: loop prevention (label check), mutex (one self-iter at a time), cooldown (3 attempts per signal-key per hour), branch isolation (always nyx-self/<ts>-<reason>, never main, never auto-push), protected files (push-gate/floor.ts and self-iterator.ts itself), kill switch (NYX_SELF_ITER_DISABLED env). 8 dedicated tests + full suite 196/196.

aa1d457 + c082378

item 14

First autonomous self-modification (history) On its first boot, the Self-Iterator fired against a real watchdog signal (a stuck session from before the PC crash earlier). The fix worker correctly created the nyx-self/2026-05-28T15-54-37-stuck branch, diagnosed the root cause as "(c) worker making slow but real progress" rather than a code bug, and per its prompt wrote a 104-line tuning note with two ranked recommendations (raise threshold; switch to idle-based detection) instead of changing code. Stayed on the branch, no push, no merge. Diagnosis was 80% right; safety pattern was 100% right.

ef37468 on nyx-self/...

item 14

Orphan-on-boot sweep (fix surfaced by self-iter) The self-iter's first dispatch was triggered by a false-positive: a session marked status=running whose worker process had died with the previous moderator. markOrphanRunningAsFailed() now sweeps every running session to failed at moderator boot, before Watchdog or SelfIterator start. The class of false positives is now eliminated. Regression test added.

c082378

tooling

watch-agent.mjs live event tail Pretty-printer that tails the events table for any session via sqlite polling. Colored tool_use / tool_result / assistant text / system events, exits on terminal status. Built so the operator could monitor the yaga integration run from a separate terminal without the React dashboard. Reused for every Nyx run since.

aa1d457

Self-iteration build order

14 items the loop will work through, in order.

Ordered smallest-and-safest first so the self-iteration loop validates itself on easy wins before touching anything load-bearing. The total is roughly 30-35 hours of subagent execution to ship every box in the "full vision" diagram above.

DONE

Fix research-tool response shape Shipped in d17eb29. Permission decision schema now includes updatedInput on allow.

done

DONE

Audit log dashboard UI Shipped in 37b8350 + 4b8048d. Right-drawer with kind filter, 5s auto-refresh, expandable details.

done

DONE

Real DB migration story Shipped in b79a0de. Numbered transactional runner with _migrations tracking table. Future schema changes drop a numbered file.

done

DONE

Plan 2.5: Error webhook + cron + repo priming Shipped in aff751b + f519b85 + f42602d. FBM Sniper SRE use case now possible: external systems POST errors, Nyx auto-investigates; repos auto-prime when entered; scheduled checks run on @every / @hourly / @daily.

done

DONE

Phase 6 slice 1: Memory Keeper core Shipped. Hermes-style MEMORY.md + USER.md, frozen snapshot per session, injected into prompt prefix.

done

DONE

Phase 6 slice 2: Project memory (per-repo) Shipped. PROJECT.md scoped under <repo>/.orchestrator/. Loaded alongside CLAUDE.md.

done

DONE

Phase 6 slice 3: Skills Librarian Shipped. ~/.orchestrator/skills/ + mcp__orchestrator__skill tool (list/get). Available skills auto-indexed into prompt prefix.

done

DONE

Phase 5: Cloud Tunnel + Discord bot Shipped. apps/discord-bot with DM-handling logic (unit-tested via mock ModeratorApi). scripts/tunnel.sh for cloudflared. The "control from anywhere" layer.

done

DONE

Phase 3a: Critic (code review) Shipped. POST /sessions/:id/review returns approve/revise/abstain + critique.

done

Phase 3b: Vision Critic (screenshot loop) For UI work. Screenshots the dev server, compares against intent, asks worker to iterate until visually correct. Needs Critic shipped + Playwright MCP installed.

~2-3 hr

DONE

Phase 6 slice 4: Watchdog Shipped. 5-minute scan loop flags stuck workers (>15min running). Writes kind=watchdog audit entries.

done

Phase 4: Multi-page fan-out + Git Steward Git worktrees per worker so parallel agents can edit independent branches without stepping on each other. Lower priority for backend-bot work; high impact for frontend / multi-feature work.

~2-3 hr

Part B: safety refinements Pending Learnings tab, skill provenance + kill switch, skill decay, token budget gating, circuit breaker. Intentionally last — these are the kind of features that gain their value from running through real incidents first.

~10 hr

v1 DONE

Phase 7: Self-Editor v1 (narrow scope, signal-driven) Shipped as Self-Iterator. Auto-dispatches on worker_failed + watchdog_stuck. Branch-isolated, no auto-merge, no auto-push. Protected files enforced. First self-mod completed (tuning note on a branch). What's NOT shipped yet: auto-merge with Critic approval (operator still gates merges), more signal sources (CI failures, error webhook), cost guardrails.

done

Watchdog: idle-based detection (Nyx's own recommendation) From Nyx's tuning note on the nyx-self/2026-05-28T15-54-37-stuck branch: track last_event_at per session, flag when (now - last_event_at) > 10 min instead of (now - started_at) > 15 min. Distinguishes a wedged worker from a long-but-progressing one. Nyx sketched the SQL and the test approach in its note. This is the canonical "Nyx implements Nyx's recommendation" demo task.

~30-45 min

Critic-gated auto-merge for nyx-self/* branches Currently every self-iter branch needs human review to land. To run unattended: when a self-iter session succeeds AND its branch passes pnpm test AND the existing Critic returns approve, auto-merge into main. If Critic returns revise, dispatch a follow-up self-iter to address the critique. If Critic returns abstain, leave for operator. Closes the loop.

~2-3 hr

Cost guardrails (Anthropic budget API + per-task limits) Critical for "release and walk away" mode. Per-task max_tokens cap, per-day spend ceiling, kill-switch when exceeded (pause new dispatches, alert operator). Read live spend from the Anthropic API; track per-session in audit_log. Without this, an infinite-loop bug in self-iter could burn a month's credits in an hour.

~3-4 hr

Cost-aware model routing (haiku/sonnet/opus tier) Nyx currently locks every worker to opus 4.7. Most tasks don't need it. Tier the routing: Critic uses haiku 4.5 (cheap, fast), Watchdog uses haiku, Worker default sonnet 4.6, big refactors / self-iter use opus. 3-5x cost savings on routine work. Per-task override via the dispatch shape.

~1-2 hr

Reflection step + skill candidates (AutoGPT pattern) After every successful task, worker spends 30s writing "what was non-obvious about this, what would I do differently next time" to docs/reflections/. A weekly cron promotes the best reflections into skills under ~/.orchestrator/skills/, with operator approval. The skill library grows organically from real work, not from upfront design.

~2 hr

Speculative parallel execution (beam search for code) For risky tasks, dispatch N workers in parallel on the same brief with different cwd worktrees. Critic compares the N completions and picks the best (or merges insights). Higher cost but much better quality on ambiguous tasks. Worth it for marketplace integrations like yaga, where there are 3-4 valid architectures.

~4 hr (needs item 12)

Role specialization (MetaGPT-inspired) Today Nyx has Moderator / Worker / Critic / Watchdog / Self-Iterator. Add: Product Manager (writes specs from one-line briefs before dispatch), Architect (proposes file structures for complex features), QA (writes tests after code lands but before commit). Each is a thin prompt + role file, like the moderator-system.md pattern.

~3 hr (per role)

Anomaly detection (drift beyond the floor) The Push Gate floor catches known-bad patterns. Anomaly detection catches unknown-bad: a worker that suddenly issues 10 git operations in 30s, a worker that reads 50 files outside its declared cwd, a worker whose token rate triples. Pause and ask the operator. Belt-and-suspenders for the cases blacklist.yml can't enumerate.

~4 hr

Living KB across all events (FTS5 search) SQLite FTS5 index over the events + audit_log + messages tables. Workers can call mcp__orchestrator__recall to ask "have we hit this error before? what did past-Nyx do?" Compounds. Past sessions become part of the worker's tool surface. Reduces re-investigation cost.

~2-3 hr

PostToolUse hook (Claude Code parity) Push Gate is a PreToolUse hook. Add the PostToolUse equivalent: after every tool call, fire a callback with the result. Use for: structured logging beyond raw events, drift detection input, "rate-limit this tool family" counters, side-effect reconciliation (did the git push actually land?).

~2 hr

Atlas hosting integration (multi-machine sync) Nyx today is laptop-only. To run 24/7, needs to live on Atlas (the cloud product). MEMORY / PROJECT / SKILL files sync across machines (CRDT or simple last-write-wins). Push Gate decisions and audit log writeback. Operator can dispatch from phone, work runs on the cloud box, results sync home.

~8-12 hr

GitHub webhook receiver (external trigger expansion) Today: /events/error accepts CI/runtime errors. Add: GitHub webhook for PR comments ("Nyx, please look at this"), issue assignment ("nyx-bot"), failing checks, branch protection bypass attempts. Same investigate-and-PR flow, more entry points.

~3 hr

Self-iteration is now live (v1). The three guardrails are shipped: (a) pnpm dev:safe runs the moderator without tsx watch so worker edits don't trigger mid-task restarts, (b) the self-iter prompt explicitly tells the worker "STOP and report if the diagnosis is unclear, do not guess", (c) every dispatch writes a kind=self_iter audit row plus a self_iter_attempts row, and every code change lands on a fresh nyx-self/<ts> branch with its own commit so it's rollbackable. Items 15-26 above are the next-iteration backlog that Nyx itself can work through.

Current workflow

How to use it today.

Start the full stack

One command, moderator on :3000 + dashboard on :5173, concurrently labeled.

cd ~/projects/agent-orchestrator
pnpm dev

Open the dashboard

Visit http://localhost:5173. You should see the mac-window chrome and a green "live" indicator in the titlebar.

Create an agent

Click the + button in the tab bar. A new agent tab appears. It is empty (no worker spawned yet).

Send your first message

Type in the right-side chat panel. cmd+enter sends. This spawns a worker with your message as the prompt. The Moderator prompt template prepended automatically asks for clarification if needed, otherwise goes straight to work.

Watch what happens

Worker output streams into the chat panel. If it spins up a localhost dev server, the middle pane iframes it (live, interactive). If it tries something gated (git push, edit ~/.env, etc.), the bottom bar lights up amber with the matched rule and layer chip.

Approve or deny gated actions

Click approve or deny on the bottom bar. Decision is logged to audit_log and broadcast back to the worker via WS so it can continue (or handle the denial).

Customize what gets gated

Click the gear icon. Edit the blacklist tab. Add patterns. Save. The loader hot-reloads, the matcher uses new patterns on the next decision.

# ~/.orchestrator/blacklist.yml
bash:
  - "git push *"
  - "fly deploy *"
paths:
  - "**/.env*"
  - "~/personal-notes/**"

Inspect the audit log

API-only for now. UI tab coming.

curl http://localhost:3000/audit | jq
curl http://localhost:3000/audit?kind=gate_decision&limit=20 | jq

Nyx: 5 deadlock fixes + status pulse + reflector landed today.

Two diagrams side by side. Mentally diff them to see what's left.

Current workflow (as of 2026-05-30)

Full vision (all phases complete)

7 shipped (one partial), 3 to go.

What actually works right now.

14 items the loop will work through, in order.

How to use it today.

Start the full stack

Open the dashboard

Create an agent

Send your first message

Watch what happens

Approve or deny gated actions

Customize what gets gated

Inspect the audit log

Honest list. None are blockers, all are scoped.

Where to find things.

Why this order, not "build everything."

The trust trial gates the rest.

Depth on a few things beats breadth on twenty.

Self-Editor stays scoped.

Part B is intentionally deferred.