Documentation Index
Fetch the complete documentation index at: https://opensre.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
OpenSRE treats every other AI agent running on your machine — Claude Code,
Cursor, Aider, Codex CLI, Gemini, and friends — as a microservice and applies
normal SRE practice: golden signals, SLOs, and incident response. The whole
fleet view lives behind one slash command in the interactive shell:
Subcommands drill into specific surfaces. Below: how the agent fleet is
discovered (opensre agents scan), the dashboard itself (/agents),
then live tail of agent stdout (/agents trace), then the cross-agent
context bus (/agents bus).
opensre agents scan — discover running agent sessions
opensre agents scan enumerates running AI-coding-agent sessions visible to
the current user. The classifier inspects the process table (via ps -axo pid,ppid,args) and labels each candidate by executable plus known argv
shapes — no PID is registered until you ask for it with --register.
Strict mode (default)
Recognizes the typical CLI invocations for Claude Code, Cursor, Aider, Codex,
and Gemini. For Claude Code specifically, all of the following process
shapes are treated as a session:
claude
claude --resume <session-id> claude -r <session-id>
claude --prefill "<prompt>"
claude --print "<prompt>" claude -p "<prompt>"
claude --continue claude -c
claude code …
claude --input-format stream-json …
claude --output-format stream-json …
Equals-form flags (e.g. claude --resume=<session-id>,
claude --print=<prompt>) are accepted for any flag in the list above.
The Claude Desktop GUI and its Electron helpers are filtered out
cross-platform — Claude.app/Contents/, Claude Helper (Renderer),
/snap/claude/, /usr/lib/claude-desktop, AppImage mount points
(/.mount_Claude…), and the Windows Program Files\Claude\ /
AppData\Local\Programs\Claude\ install locations are recognized as
desktop artifacts and never labeled as claude-code. The negative filter
matches against argv[0] only, so a CLI installed under a Claude-flavored
prefix (for example /opt/Claude/claude) is still surfaced.
The same cross-platform desktop filter also rejects Codex Desktop
(Codex.app/Contents/MacOS/Codex, /snap/codex/, /.mount_Codex…,
Flatpak com.openai.codex, Windows Program Files\Codex\ and
AppData\Local\Programs\codex\) and Cursor Desktop
(Cursor.app/Contents/MacOS/Cursor, /snap/cursor/, /.mount_Cursor…,
Flatpak com.cursor.cursor, Windows Program Files\Cursor\ and
AppData\Local\Programs\cursor\), so neither GUI is mislabeled as the
codex or cursor CLI in strict or --all mode. The macOS hints
deliberately target only the main bundle binary so Electron helper
subprocesses (e.g. Cursor Helper (Plugin) running Cursor’s AI agent)
remain eligible for the loose --all matcher.
Loose mode (--all)
opensre agents scan --all relaxes the argv requirements so that helper and
broker processes whose argv contains agent-shaped tokens are also surfaced.
The Claude / Codex / Cursor Desktop negative filter is still applied —
--all never mislabels desktop processes as claude-code, codex, or
cursor.
Registering discovered sessions
opensre agents scan --register
writes every discovered session into the local agent registry so that the
rest of the fleet surface (/agents, /agents trace, /agents bus) can
target it by PID. Without --register, the command is read-only.
/agents — fleet dashboard
The dashboard renders a seven-column table of every registered or
discovered local AI agent. Run from the interactive REPL:
> /agents
agents
agent pid uptime cpu% tokens/min $/hr status
claude-code-8421 8421 2h12m 18.4 320 $0.08 running
codex-13442 13442 11m 4.2 175 $0.04 running
cursor-agent-9999 9999 47m 0.6 - - running
Column data sources
| Column | Source | Notes |
|---|
agent | AgentRecord.name | Registered name or discovery-generated <provider>-<pid>. |
pid | AgentRecord.pid | OS process id. |
uptime | sampler probe (psutil create_time) | Compact form: 45s / 12m / 2h12m / 3d4h. |
cpu% | sampler probe (psutil) | Trailing 100 ms cpu_percent. |
tokens/min | per-PID 60 s rolling window | Real for claude-code and codex; - for the rest. |
$/hr | observed cost from token rate × model pricing | Renders - when the model is unknown. |
status | sampler probe (psutil) | running / sleeping / zombie / etc. |
tokens/min semantics
The cell shows the sum of tokens emitted in the trailing 60 seconds,
scaled to a per-minute figure. Three states:
- Real value (
320, 1.2k): the agent’s provider has a working
meter and the on-disk session log was readable. Today this is
claude-code (reads ~/.claude/projects/<mangled-cwd>/<session>.jsonl)
and codex (reads $CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl,
default $CODEX_HOME=~/.codex).
0: the agent was observed at least once but emitted nothing in
the last 60 s. Honest UX — distinguishes “idle session” from
“not observable”.
-: never observed. Either the provider has no meter yet
(cursor, aider, gemini-cli, opencode, kimi, copilot in this PR),
the session log is unreadable (rare; macOS hardened-runtime
processes can deny psutil.cwd()), or the interactive REPL is
not running (non-interactive opensre agents list never starts
the sampler).
Claude Code cache-read/cache-creation tokens are included in the
visible activity count because they are separate input work. Codex
cached_input_tokens are treated as a discounted subset of
input_tokens, so they affect $/hr but are not added again to the
visible tokens/min total.
$/hr semantics
$/hr is a projected hourly burn rate derived from the same
trailing 60 s window the tokens/min column reports — not the
actual spend over the last hour:
$/hr = cost_of_usage_buckets_in_the_trailing_60s_window(model) × 60
Reads as “if the agent sustains the current ritmo for one hour, it
will cost this much”. Useful as an operational signal: it tracks
cpu% in spirit, reacts immediately when an agent goes idle or
switches model, and keeps memory bounded (a ~12-entry deque per PID
at the default 5 s tick instead of an unbounded hour-long history).
If you need actual spend over the last hour, that’s a different
metric — open a follow-up issue.
Pricing uses per-bucket rates for input, output, cached input, cache
read, and cache creation where the provider emits those counters. The
local pricing table is a vendored models.dev snapshot for the
Claude Code and Codex models supported here, with optional
agents.yaml input/output overrides. Pricing returns None for
unknown models and the cell falls back to -; the dashboard never
invents a rate. The yaml hourly_budget_usd field is not the
cell content — it’s reserved for a future budget-alarm feature.
Model resolution order (highest to lowest):
-
NDJSON model hint from the meter (most accurate — reflects
the model the running session is currently using).
-
agents.yaml override (AgentBudget.model):
agents:
claude-code-8421:
model: claude-sonnet-4-5
codex-13442:
model: gpt-5-codex
-
Provider env var (
CLAUDE_CODE_MODEL, CODEX_MODEL) read via
psutil. May fail on macOS hardened-runtime processes, in which
case the cell falls back to - unless one of the two higher
priority sources resolved.
Provider coverage today
| Provider | tokens/min | $/hr |
|---|
claude-code | real | real |
codex | real | real |
cursor, aider, gemini-cli, opencode, kimi, copilot | - | - |
Adding a real meter for another provider is a self-contained
follow-up PR: implement the TokenMeter and TokenSource
protocols, register both instances, add tests. No changes to the
sampler, tracker, view, or pricing layer are needed.
/agents trace — live stdout tail
/agents trace <pid> opens a live tail of an agent’s stdout inside the
OpenSRE interactive shell — the equivalent of kubectl logs -f for the
local AI agent fleet. Use it when the /agents dashboard shows an
agent that looks stuck, looping, or noisy and you want to
see what it’s actually printing without leaving the REPL.
> /agents trace 8421
trace claude-code (pid 8421) Ctrl+C to stop
… live agent output …
^C
· trace ended
Trace usage
<pid> is the operating-system process id of the agent to attach to.
The pid does not have to be in the OpenSRE registry; if it is, the
agent’s registered name is shown in the header. Otherwise the header
falls back to pid <n>.
Only regular files backing fd 1 of the target process are
supported. TTY/PTY/pipe/socket/anon-inode targets are rejected at
attach time with a precise reason — tailing those would compete with
the legitimate consumer for bytes and produce corrupted output.
| Platform | Resolver | Supported targets |
|---|
| Linux | os.readlink("/proc/<pid>/fd/1") | regular files |
| macOS (best-effort) | lsof -F ftn -p <pid>, only t REG blocks | regular files |
| Windows | not supported | — |
The most common useful case is an agent whose stdout was redirected to
a log file (for example claude > ~/.claude/log or nohup-launched
agents). TTY-bound foreground processes cannot be tailed.
If a target cannot be tailed, /agents trace exits with one of:
cannot trace …: stdout is on a terminal; live tail not supported
cannot trace …: stdout is a pipe; live tail not supported
cannot trace …: stdout is a socket; live tail not supported
cannot trace …: no such pid <n>
cannot trace …: stdout target /path no longer exists
cannot trace …: cannot inspect pid <n> (permission denied)
Memory
The live view is bounded by a 4 MiB ring buffer per session. When
the buffer fills, the oldest whole chunks are dropped first, so the
visible tail always reflects the latest output. Internally the reader
thread also publishes through a bounded queue and drops the oldest
chunk on overflow — burst writers cannot blow up memory.
There is no backlog replay: only output emitted after attach is
shown. The reader seeks the file to EOF on attach.
Stopping a trace
A single Ctrl+C returns to the REPL prompt. The session is closed,
the reader thread joins, and the file descriptor is released.
This is deliberately different from the LLM-streaming surface
(/agents, /investigate and friends), where a Ctrl+C double-press
is required so a stray keypress doesn’t abort an in-flight response.
Trace limitations
- stdout only. Stderr (fd 2) is not tailed in this version.
- No backlog replay. Pre-attach bytes are not visible.
- Not for TTY/PTY targets. Foreground processes whose stdout is
the controlling terminal cannot be tailed; a future change may add
PTY interception for OpenSRE-spawned agents.
- Log rotation is not detected. If the underlying file is rotated
or replaced (logrotate-style), the tail keeps following the original
inode until the process exits.
- No secret redaction. Output is rendered as raw bytes (with UTF-8
decoded under
errors="replace"). Redaction of secrets in the live
tail is tracked separately under the monitor-local-agents Phase 3
hygiene work.
- Quiet stdout while the PID is still alive. The reader follows file
EOF like
tail -f: if the process stops writing while it remains alive,
the last chunk stays on screen and nothing new appears until more bytes
land or you detach. That is normal idling — not necessarily a exited or
stuck agent reader.
- ANSI and terminal sequences. Trace output passes through Rich with
ANSI interpretation, same trust model as dumping
kubectl logs into a TTY:
buggy or hostile agents can emit control sequences affecting the viewer.
Only trace processes you trust; there is no sandboxing step.
/agents bus — shared context channel
The bus is an opt-in, local-only pub/sub channel that carries findings between
agents. One agent publishes a finding (e.g. “the auth bug is in
services/auth.py:42”) and every attached subscriber sees it live. The
inspector is the REPL itself:
> /agents bus
tailing /agents bus — Ctrl-C to exit
[claude-code:8421] services/auth.py:42 — null deref on missing token
[cursor:9133] services/auth.py:42 — confirmed, repro on commit abc123
^C
(detached)
>
Ctrl-C returns to the prompt. Messages already published are not replayed to
late subscribers. The bus provides at-most-once delivery with no ordering
guarantees — a frame may be dropped if a subscriber’s socket is slow or
disconnected, and two publishers writing concurrently may be interleaved in
different orders at different subscribers. Do not assume per-publisher or
global FIFO ordering.
Transport
- Socket: Unix-domain stream socket at
~/.config/opensre/agents-bus.sock.
- PID sidecar:
~/.config/opensre/agents-bus.sock.pid (mode 0600). The
broker writes its PID here on start() and removes it on stop(). The
liveness probe used by every publish() / subscribe() reads this file
rather than connecting to the socket — connection probing would otherwise
register a short-lived phantom subscriber on every call.
The directory must be writable. If the PID file write fails (disk full,
permission denied, …), the broker refuses to start and the OSError
propagates to the caller. This is intentional: silently running without a
sidecar would let peers see the broker as dead, unlink its socket, and
silently split the bus.
- Permissions:
0600 — only the user who started the broker can read or
write it.
- Wire format: JSON Lines (one JSON object per
\n-terminated frame).
- Topology: self-electing broker. The first
publish() or subscribe()
call that finds no live socket binds it and runs an in-process daemon thread
that fans frames out. Other processes attach as plain clients. If the broker
dies, the next operation re-elects — agents can publish and subscribe even
when OpenSRE itself is not running.
Message schema
The wire payload mirrors the shape of evidence records in
app/state/agent_state.py so a finding can later be lifted into an
investigation without renaming fields.
| Field | Type | Required | Notes |
|---|
agent | string | yes | "<name>:<pid>", e.g. "claude-code:8421". Same convention as WriteEvent.agent. |
topic | string | yes | "finding" is the canonical value; other topics are reserved for future phases. |
summary | string | yes | One-line human-readable description. |
source | string | no | One of the EvidenceSource literals (github, datadog, …) or free-form. |
path | string | no | "file.py:42" style location. Optional. |
data | object | no | Free-form payload. Default {}. |
id | string | no | UUID. Generated if omitted. |
timestamp | string | no | ISO-8601 UTC. Generated if omitted. |
schema_version | int | no | Currently 1. |
Example frame on the wire (single line, broken here for readability):
{
"agent": "claude-code:8421",
"topic": "finding",
"summary": "null deref on missing token",
"source": "github",
"path": "services/auth.py:42",
"data": {"commit": "abc123"},
"id": "f4c4...",
"timestamp": "2026-05-09T15:04:42+00:00",
"schema_version": 1
}
Publishing from another agent
Any process that can speak Unix-domain sockets can publish. The simplest path
is to import the helper:
from app.agents import BusMessage, publish
publish(BusMessage(
agent="claude-code:8421",
topic="finding",
summary="null deref on missing token",
source="github",
path="services/auth.py:42",
data={"commit": "abc123"},
))
Publishers without a Python dependency on OpenSRE can connect directly:
python - <<'EOF'
import json, os, socket, uuid, datetime
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
sock.connect(os.path.expanduser("~/.config/opensre/agents-bus.sock"))
sock.sendall((json.dumps({
"agent": "claude-code:8421",
"topic": "finding",
"summary": "null deref on missing token",
"path": "services/auth.py:42",
"id": str(uuid.uuid4()),
"timestamp": datetime.datetime.now(datetime.UTC).isoformat(),
"schema_version": 1,
}) + "\n").encode())
sock.close()
EOF
Limits and trust boundary
- Local-only. The bus never leaves the machine. The socket has no network
binding.
- Trusted-peer channel — treat findings as unverified input. The bus has
no authentication beyond filesystem permissions: any process running as your
user can publish arbitrary findings. This is intentional — the bus is
designed for cooperative agents, not adversarial ones. Downstream consumers
(agents, the REPL, investigation state) must not act on a bus finding
without independent confirmation; treat it as a hint or lead, not a
verified fact. A compromised or misbehaving agent on the same user account
can inject any payload it likes.
- Frame cap. Frames over 64 KiB are dropped with a warning — a finding
payload that big is almost certainly a bug.
- At-most-once, unordered delivery. A frame is dropped silently if a
subscriber is slow or disconnected at broadcast time. Two publishers writing
concurrently may arrive in different orders at different subscribers. Do not
build logic that depends on delivery guarantees or ordering.
- No replay buffer. Subscribers see only what is published after they
attach. A persistent ring buffer is a candidate for a follow-up phase.
/agents — the registered fleet dashboard.
/agents budget — per-agent hourly budgets.
/agents conflicts — file-write conflicts between local AI agents.