<!-- Source of truth: DOCS.md · canonical page: https://delfhos.com/docs/explanation · v0.8.7 -->
<!-- This is the raw-markdown rendition for LLMs and tools. -->

# Explanation

*Explanations are understanding-oriented. They discuss the "why" and "how" behind Delfhos's design, helping you build a mental model.*

---

## The two-package architecture

Delfhos is split into two Python packages in the same repository:

- **`delfhos/`** — The public API. Everything a user imports comes from here. It is intentionally thin: it validates inputs, sets up types, and delegates to `cortex`. This separation keeps the public API stable while internal implementation details can evolve.

- **`cortex/`** — The internal engine. It contains the orchestrator, LLM integration, tool execution sandbox, approval manager, connection implementations, and the OpenAPI compiler for REST API tools. Users never import from `cortex` directly.

This design means the internal engine can be improved, refactored, or even replaced without changing how users write their code.

---

## How the orchestration loop works

When you call `agent.run("task")`, the following pipeline executes:

```
1. Memory retrieval
   └── If a Memory is attached, run semantic search against stored facts.
       Top-k facts are injected into the system prompt.

2. Tool prefiltering (optional)
   └── If enable_prefilter=True, the light_llm reads the task and the list
       of available tools, then selects the relevant subset. This reduces
       the number of tool API docs included in the next step.

3. Schema loading (SQL only)
   └── If a SQL connection is in the selected tools, the actual table schemas
       are fetched from the database and included in the code generation prompt.

4. Code generation
   └── The heavy_llm receives:
         - The system prompt
         - Known facts from memory
         - The chat history (if Chat is attached)
         - The task description
         - API documentation for the selected tools
         - The actions allowed by each connection's `allow` list
         - Workspace file paths (if files= was provided)
       It responds with a Python code block.

5. Approval gate (optional)
   └── If any tool has confirm=True (or confirm=[action list]) and the
       generated code calls that action, execution pauses and an approval
       request is created. The agent waits until a human approves or rejects.

6. Sandboxed execution
   └── The generated code runs inside an isolated execution environment.
       If Docker is available (sandbox="auto" or sandbox="docker"), the code
       runs in a disposable container with no network, read-only filesystem,
       memory/CPU caps, and zero Linux capabilities.
       All tool calls (gmail.send, sql.query, etc.) are proxied back to the
       host process via a Unix socket, so API credentials never enter the container.
       Workspace files (from files=) are bind-mounted at /workspace/<filename>:ro.
       Output files written via add_to_output_files() are collected from /output/.
       If Docker is unavailable, the local sandbox is used: restricted builtins,
       import whitelist, and a timeout are enforced in-process.
       A timeout is always enforced regardless of sandbox mode.

7. Rerun loop (adaptive replanning)
   └── If the generated code calls rerun(context=..., remaining=...) during execution,
       the current script exits cleanly — no error, no failure.
       The orchestrator runs a focused code-generation pass using the runtime context
       the script reported and the remaining-work description it provided.
       The same sandbox namespace is reused, so any variables already assigned
       (fetched data, partial results) are available to the new script without re-fetching.
       This repeats up to rerun_count times (default 2).
       See the rerun() guide for when to use this vs. normal Python control flow.

8. Retry loop
   └── If execution raises an exception, the error is fed back to the LLM
       for a corrected code generation. This repeats up to retry_count times.
       The retry loop runs independently after the rerun loop — if a rerun pass
       produces code that then raises an error, auto-retry applies to it as well.

9. Result composition and return
   └── The final output (stdout, return value, or error) is collected.
       Any output files registered via add_to_output_files() are resolved to
       host paths and stored in Response.files.
       Token counts and cost are calculated.
       The result is added to Chat history (if enabled).
       A Response object is returned.
```

---

## How memory retrieval works

Delfhos uses **semantic search** (not keyword search) for memory retrieval:

1. When you call `memory.save("some fact")`, the text is embedded using a Sentence Transformers model and stored in SQLite alongside the original text.
2. When the agent starts a task, the task string is embedded using the same model.
3. Cosine similarity is computed between the task embedding and every stored fact embedding.
4. The top-K facts above a similarity threshold (default 0.3) are returned.
5. These facts are injected into the LLM's system prompt.

This means facts are retrieved based on *meaning*, not exact wording. A task that says "email the VP of Sales" will retrieve facts about "Alice Chen, VP Sales, alice@acme.com" even though neither string appears in the task.

The embedding model runs locally (no API call required). The default model (`all-MiniLM-L6-v2`, ~90 MB) is downloaded from Hugging Face on first use.

---

## How the `allow` and `confirm` permission model works

Delfhos has a two-layer permission system on every tool. The layers run at different points in the execution pipeline and serve fundamentally different purposes.

**Layer 1 — `allow` (compile time, before code generation)**

When `allow` is set, the restricted actions are stripped from the tool's API documentation before it is sent to the LLM. The model never learns those actions exist. It cannot plan them, cannot generate code that calls them, and will not attempt them even if instructed to. This is not a guardrail — it is an information boundary.

This happens at step 4 of the orchestration loop ("Schema loading" → "Code generation"). The LLM only ever sees the actions you have permitted.

**Layer 2 — `confirm` (runtime, before execution)**

When `confirm` is set, the generated code is inspected after the LLM produces it but before it runs. If any call matches a confirmed action, execution is suspended and an approval request is created. The sandboxed execution only begins once a human approves.

This happens at step 5 of the orchestration loop ("Approval gate"). The agent has already decided *what* to do; you are deciding whether it is allowed to *do it now*.

**Why both?**

- Use `allow` to shrink the attack surface permanently: the agent is structurally incapable of calling actions outside the list, regardless of how the task is phrased.
- Use `confirm` for actions that are legitimate but high-stakes: the agent needs the capability (e.g., `send`, `delete`) but a human should review before it fires.
- They compose: `allow=["read", "send"], confirm=["send"]` means the agent can only read and send, and all sends require approval.

## How the approval system works

The approval system is designed to be both developer-friendly and production-ready.

**Granularity:** Approval is configured per-connection (`Gmail(confirm=["send"])`), meaning you can require approval only for destructive or sensitive actions while allowing safe actions to run automatically.

**Three modes:**

1. **Interactive (default):** When the agent pauses, a terminal prompt appears showing the tool name, the action, and a preview of the parameters. The developer selects Approve or Reject.

2. **Custom callback:** `on_confirm=fn` lets you integrate with external systems — Slack, email, a web dashboard — by writing a function that returns `True` (approve), `False` (reject), or `None` (fall back to default UI).

3. **Programmatic:** When using `submit()`, you poll `agent.get_pending_approvals()` and call `agent.approve()` or `agent.reject()` from your own code (e.g., a web API handler).

**What the LLM sees:** When a request is rejected, the rejection reason is fed back to the LLM as context so it can revise its approach.

---

## How tool code generation works

Delfhos does not call tools through function-calling APIs. Instead, it uses **code generation**: the LLM writes a short Python script that calls the tool library, and Delfhos executes that script in a sandbox.

This approach has several advantages:

- **Composability:** The LLM can write loops, conditionals, list comprehensions, and multi-step logic combining multiple tools in a single generated script.
- **Transparency:** The generated code is human-readable and can be inspected or logged.
- **Retry with context:** When code fails, the error traceback is fed back to the LLM, which often generates a correct fix on the next attempt.
- **Flexibility:** The LLM is not constrained to predefined call patterns; it can use any Python construct to accomplish the task.

The sandbox restricts the execution environment: only the tool library objects are in scope, dangerous builtins are removed, and a timeout is enforced. When Docker is available, the sandbox runs as a disposable container with full OS-level isolation (see [sandbox configuration](#how-to-configure-the-execution-sandbox)).

---

## How the dual-LLM architecture reduces costs

Many tasks involve two distinct cognitive loads:

1. **Routing** — which tools are relevant to this task?
2. **Generation** — what code should those tools execute?

Routing is simple (fast, cheap model is fine). Generation requires deep reasoning and knowledge of the tool APIs (expensive model needed).

By splitting these across a `light_llm` and `heavy_llm`:
- Simple routing is handled cheaply
- The heavy model only sees a small, focused context (thanks to prefiltering)
- Total cost is significantly lower than using a powerful model for everything

The `vision_llm` override lets you use a specialized model for multimodal tasks without changing the main model.

---

## How Chat auto-summarization works

Without summarization, conversation history grows linearly. With it:

1. After each message, Delfhos checks if the message count exceeds `keep`.
2. If it does, the oldest `(count - keep)` messages are extracted.
3. A separate LLM call (using `summarizer_llm`) produces a concise summary.
4. The original messages are discarded; the summary is stored as a special system message.
5. Future LLM calls receive: the summary + the most recent `keep` messages.

This keeps context size bounded at `~keep` messages regardless of conversation length, dramatically reducing token costs in long sessions.

---

## How APITool works

`APITool` connects any REST API to a Delfhos agent through a pipeline with two optional quality layers on top:

1. **Compilation** *(always, no LLM):* The `OpenAPICompiler` reads the OpenAPI 3.x spec (JSON or YAML, local file or URL), resolves all `$ref` pointers, and transforms every operation into a Delfhos-native tool entry. Each entry contains a Python function signature, parameter descriptions, and a compressed API doc for code generation. Large specs are compiled in parallel using a thread pool.

2. **LLM Enrichment** *(optional, `enrich=True`):* After compilation, the `OpenAPICompiler.enrich()` method sends all endpoint descriptions to an LLM in a single call. The LLM rewrites descriptions to be more actionable for an AI agent and infers response schemas for endpoints where the spec left them undocumented. The enriched manifest is written back to the cache — on all subsequent runs the manifest loads from disk and the LLM is never called again, so enrichment cost is incurred exactly once per spec version.

3. **Registration:** Compiled (and optionally enriched) entries are registered into three internal stores — `TOOL_REGISTRY` (for the prefilter LLM), `TOOL_ACTION_SUMMARIES` (for prefilter ranking), and `COMPRESSED_API_DOCS` (for code generation prompts).

4. **Execution:** The `APIExecutor` receives calls from the agent's generated code and maps every argument to the correct HTTP location based on the spec (`path`, `query`, `header`, or `body`). Before sending, it injects the three categories of auth/config values supplied at construction time:
   - `headers` → merged into the request headers
   - `params` → merged into the query string
   - `path_params` → substituted into `{placeholder}` segments in the URL template (URL-encoded automatically)

   The `desc` keyword argument used internally by the approval system is stripped before the request is built. The HTTP call is made via `httpx` with a 30-second timeout and redirect following enabled.

5. **Background Schema Sampling** *(optional, `sample=True`, default):* After each successful API call that returns JSON, a daemon thread infers the exact response schema from the real data using `_infer_schema()` and saves it to `sampled_schemas.json` in the cache directory. On the next compile or cache load, sampled schemas are merged back into the manifest's `response_hint` fields — the agent's knowledge of each endpoint's output improves automatically with use, at zero token cost and zero latency.

### Token tracking and cost attribution

When `enrich=True` is set on any `APITool`, the execution trace separates setup cost from task cost:

- **Setup cost** — tokens spent on LLM enrichment during `Agent` startup. Shown as `$0.000000 (cached)` on subsequent runs.
- **Task cost** — tokens spent on code generation and execution for the actual user task.

Both are visible in the `Trace.summary()` output and accessible on `trace.api_enrichment` (`EnrichmentTrace`) and `trace.total_cost_usd`.

The result: any API with an OpenAPI spec is fully usable as an agent tool with zero hand-written adapter code, progressively better response-schema knowledge, and transparent cost attribution.

---

## How the execution sandbox works

Delfhos uses a layered sandbox with a pluggable backend system. The `SandboxExecutor` selects the strongest isolation available at runtime.

### Two backends

**Local backend (process-level)**

Used when Docker is not available (or `sandbox="local"`). The generated code is compiled and `exec()`'d inside the host Python process with:

- Restricted `__builtins__` — only a safe subset of built-in functions is exposed; `eval`, `exec`, `compile`, `open`, `__import__`, and other dangerous functions are removed or replaced
- Import whitelist — only `asyncio`, `datetime`, `json`, `math`, `pathlib`, `re`, `statistics`, and `time` may be imported; any other `import` raises an error
- Tool library injection — `gmail`, `sql`, `sheets`, etc. are pre-injected as async objects; the LLM code calls them directly
- Timeout enforcement via `asyncio.wait_for()`

**Docker backend (container-level)**

Used when Docker is available (or `sandbox="docker"`). The generated code runs in a disposable container:

```
┌─────────────────── Host Process ────────────────────────┐
│                                                          │
│  Orchestrator ──▶ SandboxExecutor ──▶ DockerSandbox     │
│       ↑                                      │          │
│       │                          Unix Socket │          │
│       │                          (RPC bridge)│          │
│       │                                      ▼          │
│       │                  ┌── Docker Container ──────┐   │
│       │                  │  agent_runner.py          │   │
│       │                  │    exec(agent_code)       │   │
│       │◄─── tool result  │    proxy.gmail.send() ───┼───┘
│       │                  │                           │
│       │                  │  Resource limits:         │
│       │                  │  • No network             │
│       │                  │  • Read-only filesystem   │
│       │                  │  • 512 MB RAM cap         │
│       │                  │  • 1 CPU core             │
│       │                  │  • PID limit 64           │
│       │                  └───────────────────────────┘
└──────────────────────────────────────────────────────────┘
```

The key architectural point: **API credentials never enter the container**. When agent code calls `await gmail.send(...)`, the call travels over a Unix socket to the host process, where the real Gmail library executes it (including any human approval step). The container receives only the serialized result.

This means:
- A container escape cannot exfiltrate OAuth tokens or database credentials
- All existing approval flows work unchanged — the host pauses mid-execution as usual
- The container can be fully network-isolated even though the agent makes API calls

### Backend selection

```python
# At runtime: picks best available
SandboxExecutor(mode="auto")   # Docker → local fallback
SandboxExecutor(mode="docker") # Docker required
SandboxExecutor(mode="local")  # Always in-process
```

The selection is logged: when Docker is unavailable in `"auto"` mode, a warning is emitted (`Docker unavailable — falling back to local backend`) so you can see which isolation level is active.

---

## How `rerun()` works

`rerun()` is a cooperative escape hatch built into the sandbox. It bridges a fundamental limitation of code generation: the LLM writes a complete script before any tool is called, but sometimes the correct code for step N cannot be written without first seeing the output of step N-1.

### The mechanism

When generated code calls `rerun(context, remaining)`, the sandbox raises an internal sentinel exception (`_RerunSignal`). This is caught before the normal error handler, so it is never treated as a failure. The execution result carries three new fields:

```
{ "rerun_requested": True, "rerun_context": "...", "rerun_remaining": "..." }
```

Back in the orchestrator's execution loop, this result triggers a second code-generation call. The prompt for this call is a focused continuation prompt — not a re-statement of the original task — with three key injections:

1. **The original task** — so the LLM understands what was being accomplished.
2. **The runtime context** — what the script learned (actual column names, a sample row, a schema snippet).
3. **The preserved variables** — a list of variable names that are still in the sandbox namespace from the first pass. The LLM can reference them directly without re-fetching.

The new code is executed on the **same `PythonExecutor` instance**, so the namespace (and all variables in it) is shared. Only the code changes; the data does not.

### `rerun()` vs. auto-retry

These two mechanisms solve different problems:

| | `rerun()` | Auto-retry |
|---|---|---|
| Triggered by | Agent code calling `rerun()` proactively | An unhandled exception during execution |
| Signal | Intentional, clean exit | Error |
| Prompt style | Continuation: "here is what I learned, do the remaining work" | Fix: "here is the error, generate corrected code" |
| Namespace | Preserved — script chose which data to keep | Preserved — error-recovery variables injected |
| Token cost | One full code-gen call per iteration | One full code-gen call per retry |

They compose: a rerun pass can be followed by an auto-retry if the rerun's code itself raises an error.

### Why not just write more general code?

In many cases the LLM can write generic handlers (`if isinstance(value, list): ...`, `.get(key, default)`) that work across all possible response shapes. When that is possible, it is preferable — no rerun needed.

`rerun()` is the right choice when the response structure is genuinely unknowable at codegen time AND correct handling requires specific, non-generic code. A common example: building a markdown table where every column header must match an actual data key. Generic code would either fail or produce incorrect output; a targeted second pass gets it right.

---

## Permission model philosophy

Every connection has two independent controls:

- **`allow`** — what the agent *can* do. Actions not in this list are invisible to the LLM; it cannot generate code that calls them. This is an absolute restriction.

- **`confirm`** — what the agent must *ask before* doing. Actions in this list are available to the LLM but require human approval before execution. This is a guardrail for sensitive operations.

The typical pattern for a safe deployment:

```python
# Can read anything without asking; must ask before sending
Gmail(oauth_credentials="...", allow=["read", "send"], confirm=["send"])

# Can read and query without asking; cannot write or modify anything
SQL(url="...", allow=["schema", "query"])

# Can do everything except delete; must ask before sharing
Drive(oauth_credentials="...", allow=["search","get","create","update","share"], confirm=["share"])
```

This two-layer model gives you both capability control (what can it do at all?) and safety control (what does it need to ask about?).
