Nyx v0.2.0 Phase 2A shipped Phase 2B in progress

An autonomous multi-agent operator for your codebase.

Local-first. Bring your own Claude or Codex. Push Gate-protected. Built for solo developers and small teams who need an always-on engineer that doesn't go rogue, doesn't burn credits unsupervised, and doesn't ask permission for things it should just handle.

Runtime Node 22 + TypeScript
Model providers Claude Code, Codex
Tests 81 passing (single-fork)
License (planned) AGPL-3.0 open core

Built to keep one developer's bot alive across timezones.

The first user is a solo operator running a Facebook Marketplace bot (FBM Sniper) across multiple repos, build pipelines, and customer machines. Deploy errors, CI failures, and runtime crashes happen on no particular schedule and at no particular hour. This orchestrator is the on-call engineer that handles the boring 80%, escalates the interesting 20%, and never accidentally force-pushes to main.

Current agents have three failure modes.

They ask too much

Default Claude Code stops every 30 seconds to confirm trivial decisions. Productivity dies to interruption tax.

They go rogue

Bypass-permissions mode is faster but it will eventually rm-rf your repo, force-push to main, or burn a hundred dollars in tokens before you notice.

They work alone

Single agent, single thread, single line of attention. No critic, no reviewer, no second pair of eyes catching dumb mistakes.

Three layers that compose into a real operator.

Layer 1

Moderator clarifies, then runs.

Reads operator intent, asks any clarifying questions upfront (especially for UI scope or ambiguous goals), then goes silent and runs autonomously to completion. Front-loads ambiguity instead of interrupting mid-task.

Layer 2

Push Gate is permissive by default.

The agent can edit files, run commands, fetch the web. Your blacklist.yml flags specific paths and commands that need approval. A hardcoded floor blocks true secrets regardless of config.

Layer 3

Multi-agent fans out

Independent tasks run in parallel as isolated worker subprocesses. Each tab in the dashboard is a live agent. Output streams in real time.

What we borrowed from Hermes, what we rebuilt for developers.

Nous Research's Hermes Agent shipped the best agent-memory design we have seen: bounded, frozen mid-session, prefix-cache friendly, and curated by the agent itself. It works beautifully for single-developer, single-machine, single-project use. We kept the parts that work, fixed the layers that break the moment you have more than one repo or more than one machine, and added two layers specific to coding agents.

Hermes strength we kept

Bounded core memory, frozen snapshot.

MEMORY.md (2,200 char cap) and USER.md (1,375 char cap) load into the system prompt at session start and stay immutable until the next session. Total persistent overhead under 1,300 tokens. This is what makes prefix caching work and what forces curation instead of context-stuffing.

Hermes weaknesses we fixed

Per-repo memory + multi-machine sync.

Hermes stores at ~/.hermes/ globally, so working on two projects fights for the 2,200 char budget and entries from one repo leak into another. We add PROJECT.md scoped per-repo (interops with existing CLAUDE.md). Multi-machine sync ships in Pro Cloud. Team namespacing in Team tier.

Five-layer memory stack

1. Core Memory (frozen snapshot) MEMORY.md + USER.md, hard character caps, loaded once per session and never re-fetched. Prefix-cache friendly. Identical to Hermes design, kept verbatim because it is correct.
Phase 6
2. Project Memory (per-repo) PROJECT.md plus existing CLAUDE.md auto-loaded when a worker enters a repo. Scoped to the project root so 20 projects can each have their own budget without competing for one global cap.
Phase 6
3. Session Memory (durable, searchable) Every conversation in SQLite with FTS5. Not in the system prompt. Searched on demand via session_search(query). Long sessions never bloat context, but nothing is forgotten.
Live
4. Skill Memory (self-authored procedures) The agent writes its own skills. After 3 successful runs of the same shape (or operator approval), the Skills Librarian generalizes the pattern into a named, parameterized skill file under ~/.orchestrator/skills/. Future sessions invoke it by name with fresh arguments. Hermes hand-authors skills; ours are self-written from real successful sessions and refined as the underlying task evolves.
Phase 6
5. Multi-Machine Sync Memory files sync across operator devices via encrypted server. Solves the Hermes single-machine constraint. Team tier adds per-user / per-team namespacing on top.
Pro Cloud
+ Soul file (identity, not memory) SOUL.md defines the Moderator's personality, risk tolerance, escalation rules, and communication style. Loaded at orchestrator boot, never modified per-session. Prevents behavioral drift across hundreds of runs.
Phase 6

The tool surface (Hermes-shaped, dev-extended)

memory(action, scope, content) with actions add / replace / remove and scopes core / user / project. Substring matching for entries, same as Hermes. No read action because core memory is auto-injected.

session_search(query) for FTS over past sessions. Returns ranked snippets, not full transcripts. Cheap to call.

skill(action, name) for procedural memory. save captures a successful sequence; invoke replays it; list shows what is available.

Capacity warnings fire at 80% fullness on any bounded layer, prompting consolidation. Decay archives entries unused after 30 sessions. Per-repo memory archives with the repo when the project is removed, so old context never haunts new work.

How the pieces fit together.

A Hono HTTP+WebSocket server (the Moderator) owns a SQLite database and a pool of spawned Claude Code subprocesses (the Workers). Workers receive a permission MCP config that round-trips every sensitive tool call back to the Moderator for the Push Gate to evaluate. The operator watches everything live through a WebSocket-backed dashboard.

CLIENT MODERATOR PROCESS (NODE) WORKER PROCESSES Operator dashboard (browser) iPad / phone discord / cli (planned) Hono Server HTTP + WebSocket + SSE Worker Manager spawn / cap 8 SQLite (WAL) sessions / events / gate WebSocket Hub topics: agents, gate, dev_servers Push Gate matcher + decision waiter blacklist.yml + hardcoded floor Worker 1 claude code subprocess + permission MCP sidecar opus 4-7 Worker 2 claude code subprocess + permission MCP sidecar sonnet 4-6 Worker N codex / gemini cli planned WS + REST spawn events / gate spawn events / gate REVIEWERS (planned) Critic phase 3 / code review Vision Critic phase 3 / screenshot loop Debugger phase 6 / triage Self-Editor phase 7 ALWAYS-ON (planned) Watchdog phase 6 Scheduler phase 2.5 / crons Error Ingest phase 2.5 / webhooks Memory Keeper phase 6 / 5-layer Channel Router phase 5
solid = shipped  ·  dashed = planned  ·  blue arrows = control flow between processes

Final state, end to end

What a single request looks like when every phase is shipped. Left to right: a trigger fires (from any channel), the router classifies it, the Moderator clarifies and decomposes against live memory, workers execute in parallel across providers, reviewers verify, and the Memory Keeper persists everything that should outlive the session. Always-on processes (Watchdog, Scheduler, Error Ingest) inject new triggers back into the flow without operator action.

TRIGGER ROUTE MODERATE EXECUTE REVIEW PERSIST User chat Discord DM Email CI / runtime webhook Cron tick Channel Router classify / dedupe map to session Moderator clarify, decompose Memory load core + project Skill lookup replay if match Push Gate blacklist.yml + floor Worker (Claude) opus 4-7 Worker (Claude) sonnet 4-6 Worker (Codex) image / fast code Worker (Gemini) long context Critic diff review Vision Critic screenshot loop Debugger on error result Self-Editor on improvement asks Memory Keeper write 5 layers Skills Librarian save if 3x success Sync Server multi-machine Notify operator push / toast / DM route dispatch result approved ALWAYS-ON OVERSIGHT (re-injects triggers) Watchdog  |  Scheduler  |  Error Ingest  |  Self-Editor independent processes that detect drift, fire crons, receive webhooks, and inject new triggers back into the leftmost column without operator action re-trigger MEMORY (PERSISTENT ACROSS ALL SESSIONS) Core (frozen) MEMORY.md + USER.md Project (per repo) PROJECT.md + CLAUDE.md Session (FTS) SQLite full text Skill (procedural) self-authored files Multi-Machine Sync Pro Cloud server WORKER TOOL SURFACE (via MCP) GitHub MCP Playwright (browser) Postgres / SQLite MCP Filesystem MCP Mem0 / Honcho
all components active  ·  flow proceeds left to right per request  ·  always-on layer re-enters at the leftmost column

What happens when you type "ship this fix" from your phone.

1

Operator sends intent

Vague natural-language request from dashboard chat, Discord DM, or HTTP API. No need to specify files, commands, or steps.

POST /sessions | ws: chat_send | discord webhook (planned)
2

Moderator clarifies (front-loaded), then decomposes

Reads the intent against repo context (CLAUDE.md, recent commits, memory layers). Asks any clarifying questions upfront, especially around UI scope, ambiguous goals, or unfamiliar repos. Once aligned, goes silent: produces a task list and decides one worker or several. No further interruptions until done or genuinely blocked.

opus-4-7 xhigh effort | intake -> aligned -> running
3

Workers spawn in parallel

Worker Manager spawns one Claude Code subprocess per task with a per-worker MCP config baking in the session and agent IDs. Output streams as JSON over stdout, parsed line-by-line.

child_process.spawn | stream-json | cap 8 concurrent
4

Worker hits a sensitive operation

Worker wants to Bash, Write, Edit, WebFetch, or Read a sensitive path. Permission MCP intercepts and POSTs to /internal/can-use-tool on the Moderator.

mcp__orchestrator__can_use_tool | permission-prompt-tool flag
5

Push Gate matches against your blacklist

Matcher checks the operation against your blacklist.yml and the hardcoded floor. Most things pass through instantly. Only paths or commands that match a blacklist pattern queue to push_gate_queue and block the worker. WebSocket event fires to your dashboard or phone.

push-gate/matcher.ts | blacklist.yml | ws topic: gate
6

Operator approves from any device

Dashboard shows a toast notification with the exact command and rule that matched. Approve or deny with one tap. Decision is written to DB, signal sent back to the waiting worker.

ws: gate_decide | decideGate(db, queueId, "approved" | "denied", "user")
7

Worker continues, Critic reviews (Phase 3+)

Approved tool call returns "allow" and the worker proceeds. When the worker finishes, a Critic agent reviews the diff (and for UI work, a Vision Critic screenshots the result and iterates).

phase 3 roadmap | opus-4-7 with vision input
8

Moderator reports back, Watchdog keeps an eye

Session status flips to succeeded. Dashboard updates live. Watchdog continues monitoring for runtime errors, CI failures, or drift signals that warrant a new task.

phase 6 roadmap | haiku-4-5 (cheap always-on)

Permissive by default. Blacklist what you don't want touched.

The Push Gate is not a deny-by-default trap door. The default is permissive: the agent can edit files, run shell commands, fetch the web, and use the system the way you would. You maintain a small blacklist of paths and command patterns that always require your approval, and there is a hardcoded floor of true-secret paths that nothing can override regardless of your config. Everything else runs at full speed without round-tripping for approval.

Tier 1 // Always free

Read and discovery tools

No round trip, no config, no questions. These cannot modify state or exfiltrate secrets.

Glob Grep Read NotebookRead WebSearch WebFetch Task
Tier 2 // Free unless blacklisted

Modify the system

Run by default. Round trips only when the target matches a pattern in your blacklist.yml.

Bash Write Edit NotebookEdit MCP tool calls
Tier 3 // Hardcoded block

True secrets and destructive ops

Always gated. Cannot be overridden by config. Protects you from your own typos.

~/.ssh/** ~/.aws/credentials ~/.gnupg/** sudo * rm -rf / curl | sh

Your blacklist.yml (you own this)

Lives at ~/.orchestrator/blacklist.yml globally, with per-repo overrides at <repo>/.orchestrator/blacklist.yml. Pattern syntax is glob for paths, shell-style for bash. Anything not listed runs without prompting.

# ~/.orchestrator/blacklist.yml
# Tier 2 operations are free by default. List patterns here that
# should ALWAYS require your approval, even though the file or
# command would otherwise be permitted.

paths:
  # File writes / edits that need approval
  - "**/.env*"
  - "**/.git/config"
  - "**/package.json"            # prevent silent dep bumps
  - "~/personal-notes/**"      # private journal
  - "~/projects/fbm-sniper/pro/src/license/**"

bash:
  # Commands that need approval
  - "git push *"                 # memory:no_push_default
  - "gh release create *"        # memory:no_public_release
  - "gh pr create *"
  - "fly deploy *"
  - "npm publish *"
  - "pnpm publish *"
  - "docker push *"

mcp:
  # Specific MCP tool calls that need approval
  - "mcp__*__delete_*"
  - "mcp__github__merge_pull_request"

Tier 3 is enforced regardless. Even if you accidentally blank your blacklist.yml, the agent still cannot read ~/.ssh/id_rsa, run sudo, or pipe untrusted shell scripts. The hardcoded floor exists so a misconfiguration cannot become a security incident.

One brain, many hands.

The system is a small society of specialized agents. Each has a different model, effort level, and tool budget. Cheap models do repetitive work. Opus 4.7 does the thinking. Vision-capable models do the seeing.

Moderator Two-phase. First clarifies: asks any questions it needs upfront (especially around UI scope, ambiguous goals, or unfamiliar repos). Then runs: dispatches workers and goes autonomous until done or genuinely blocked. Owns the session lifecycle.
Live
Worker Spawned Claude Code or Codex subprocess that performs the actual file edits, commands, and tests.
Live
Push Gate Matcher plus human-approval queue. Sits between every worker and every destructive operation.
Live
Critic Reviews completed worker output, requests revisions, blocks the session from being marked done if quality fails.
Phase 3
Vision Critic For UI work. Screenshots the dev server, compares against intent, asks the worker to iterate until visually correct.
Phase 3
Git Steward Manages git worktrees so multiple workers can edit different branches simultaneously without stepping on each other.
Phase 4
Debugger Triages errors from webhooks, CI runs, and runtime crashes. Decides whether to auto-fix, escalate, or ignore.
Phase 6
Memory Keeper Owns the five-layer memory stack. Decides what gets written to MEMORY.md vs PROJECT.md, enforces character caps, triggers consolidation at 80% fullness, archives unused entries after 30 sessions.
Phase 6
Skills Librarian Captures successful tool-call sequences as named, replayable skills. Inspired by Hermes Skills but auto-extracted from successful sessions instead of hand-authored.
Phase 6
Scheduler (Crons) Cron-style proactive execution. "Every morning at 8am, check FBM Sniper CI status." "Every hour, scan error webhooks." Shifts the system from reactive to proactive. OpenClaw / Hermes pattern.
Phase 2.5
Channel Router Routes operator input from any source (dashboard chat, Discord DM, email, webhook, voice) to the right session. One agent, many inboxes. OpenClaw pattern, adapted for developers.
Phase 5
Watchdog Always-on monitor. Detects stuck workers, drift in agent behavior, and external events that warrant a new session.
Phase 6
Self-Editor Reads the orchestrator's own codebase, proposes improvements, runs them through the same safety gates as any other worker. Includes Hermes-style self-improvement: learns from past corrections.
Phase 7

Workers can talk to anything that speaks MCP.

The Push Gate itself is implemented as a stdio MCP server (one per worker, with session and agent IDs baked into each config). Additional MCPs plug in the same way. Anything stdio-compatible works without code changes.

orchestrator (permission MCP) Forwards every gated tool call to the Moderator's /internal/can-use-tool endpoint. Live
filesystem Scoped read/write to specific project roots. Bypasses the Push Gate for paths inside the scope. Planned
git Branch, worktree, and commit operations as first-class tool calls (instead of shelling out to Bash). Phase 4
playwright (browser) Screenshot capture and DOM inspection for the Vision Critic loop. Phase 3
postgres / sqlite Direct query access for the Debugger when triaging data-layer errors. Phase 6
github PR creation, issue comments, CI status checks. Same Push Gate rules apply to writes. Phase 5
discord Inbound message routing (operator-to-Moderator from phone) and outbound notifications. Phase 5
context7 Live library documentation lookups for any framework or SDK the worker is touching. Phase 3

Seven phases, currently two and a half deep.

01

Minimum Viable Moderator

Single worker, stream-JSON parsing, SQLite persistence, basic HTTP API and SSE. Foundation everything else builds on.

Shipped
02A

Backend: Push Gate + Multi-Agent + WebSocket

Permission MCP, matcher with sensitive-path detection, parallel workers with concurrency cap, WS hub with topic subscriptions, dev-server detection, per-worker MCP config files.

Shipped
02B

Dashboard

Vite + React + Tailwind + shadcn UI with Cursor-style chrome, multi-tab agents, collapsible chat, live dev-server iframe, toast notifications, model-per-role settings.

In Progress
2.5

Error Ingestion + Scheduled Jobs

Webhook receiver for CI failures and runtime errors, cron-style scheduled sessions, repo-specific priming (auto-load CLAUDE.md from any project root). FBM Sniper SRE use case.

Proposed
03

Critic + Vision Loop

Reviewer agent for code diffs. Vision-capable critic screenshots dev-server output, evaluates against intent, asks worker to iterate.

Planned
04

Multi-page Fan-out + Git Steward

Git worktrees per worker so parallel agents can edit independent branches. Steward role coordinates merges and resolves conflicts.

Planned
05

Remote Access + Discord

Cloudflare Tunnel for dashboard access from anywhere. Cloudflare Access for auth. Discord bot for phone-driven operator input.

Planned
06

Resilience: Watchdog, Debugger, Memory Keeper

Always-on monitoring layer. Detects stuck workers, triages incoming errors, maintains durable context across sessions. The "SRE for your bot" layer.

Planned
07

Polish + Self-Editor

Marketplace for community-built agent recipes. Self-improvement loop where the orchestrator can propose and ship changes to its own code (through the same gates).

Planned

How this differs from existing agentic tools.

Capability This Cursor Devin OpenHands Aider Hermes OpenClaw
Multi-agent parallel Yes No No Limited No No Sub-agents
Granular Push Gate Yes Per-tool No No Per-edit No Plugin-level
Bounded frozen-snapshot memory Yes No No No No Yes (origin) State store
Per-repo memory scoping Yes Project rules No Workspace CONVENTIONS.md Global only Plugin state
Procedural skill capture Yes No No No No Hand-authored Plugins
Scheduled / cron execution Planned No No No No Yes Yes
Multi-channel input (Discord, email, webhook) Planned No Slack No No Limited Yes (origin)
Always-on monitoring (Watchdog) Planned No No No No Crons only Crons only
Phone / remote operator input Planned Cloud Web No No No Telegram
BYO Claude Max / Pro (post-Apr 2026) Yes (subprocess) Yes API only Workarounds Workarounds API only Blocked
Local-first (your machine) Yes Hybrid Cloud Yes Yes Yes Yes
Source available Yes (planned) No No Yes Yes Yes Yes
Per-role model selection Yes Per-mode No Per-session Yes Per-skill Yes

"Origin" marks where an idea was pioneered. We borrow Hermes' frozen-snapshot memory and Skills design, and OpenClaw's multi-channel input and cron pattern, then extend each for the developer use case (per-repo scoping, multi-machine sync, official-CLI subprocess auth, push-gate safety).

Boring, proven, no novel infrastructure.

Runtime
Node 22 + TypeScript
HTTP server
Hono + @hono/node-server
WebSocket
@hono/node-ws
Database
better-sqlite3 (WAL, FK on)
Validation
Zod discriminated unions
Tests
Vitest (single-fork)
Agent runtime
Claude Code CLI subprocess
MCP transport
stdio (Anthropic SDK)
Frontend (2B)
Vite + React + Tailwind
UI components
shadcn + Sonner toasts
Remote (Phase 5)
Cloudflare Tunnel + Access
Distribution
Tauri desktop app

Open core. Community is fully featured. Pro pays for hosted infrastructure.

The codebase is planned for AGPL-3.0 release. Community Edition runs single-machine with every role, the full Push Gate, and the dashboard. Pro Cloud is a paid tier focused on infrastructure the user cannot easily run themselves.

Community Edition (free)

Everything that runs locally.

Moderator, all roles, multi-agent, Push Gate, dashboard, model-per-role configuration, full five-layer memory stack, single-machine operation. AGPL-3.0 license, source-available, modify freely.

Pro Cloud (paid)

Infrastructure you don't want to run.

Hosted Cloud Tunnel for phone access, hosted Discord bot, multi-machine memory sync, encrypted cloud backup of agent history, auto-updater, priority support. The infrastructure layer the Community Edition cannot replicate trivially.

Why we still work with Claude Pro / Max (post-April 2026)

In April 2026 Anthropic blocked Pro and Max OAuth tokens from working in third-party tools, breaking BYO-subscription auth for OpenClaw, custom harnesses, and most "Devin-but-mine" attempts. Those tools called the Anthropic API directly with the user's token, which Anthropic now refuses.

We do not call the API. We spawn the official claude CLI as a subprocess. The CLI is Anthropic's own client and retains full Pro / Max access. Our orchestrator never sees a token, never sends a request, never violates ToS. We just listen on the CLI's stdout and route its tool calls through the Push Gate.

Practical result: Claude Max users save real money. A Pro subscription that would cost hundreds in API equivalent stays at $20-200/mo flat. This is the single largest cost advantage we have over any post-April-2026 competitor that takes the API-key route.