<!-- Source of truth: DOCS.md · canonical page: https://delfhos.com/docs/how-to · v0.8.7 -->
<!-- This is the raw-markdown rendition for LLMs and tools. -->

# How-to Guides

*How-to guides are problem-oriented. They show you how to accomplish a specific goal, assuming you already have basic familiarity with Delfhos.*

---

## How to connect to a SQL database

```python
from delfhos import Agent, SQL

# Option A: connection URL
db = SQL(url="postgresql://user:password@localhost:5432/mydb")

# Option B: individual parameters
db = SQL(
    host="db.example.supabase.co",
    port=5432,
    database="postgres",
    user="postgres",
    password="secret",
    db_type="postgresql",   # or "mysql" / "mariadb"
)

agent = Agent(tools=[db], llm="gemini-3.5-flash")
result = agent.run("How many users signed up last week?")
print(result.text)
agent.stop()
```

**Supported databases:** PostgreSQL, MySQL, MariaDB.

**Available actions:** `schema` (inspect tables), `query` (SELECT), `write` (INSERT/UPDATE/DELETE).

```python
# Restrict to read-only
db = SQL(url="...", allow=["schema", "query"])
```

---

## How to connect to Google Sheets

```python
from delfhos import Agent, SQL, Sheets

sheets = Sheets(oauth_credentials="client_secrets.json")
db = SQL(url="postgresql://...")

agent = Agent(tools=[db, sheets], llm="gemini-3.5-flash")

result = agent.run(
    "Pull last month's revenue by region from the database "
    "and write it to the 'Revenue Q3' sheet, creating it if it doesn't exist."
)
agent.stop()
```

**Available actions:** `read`, `write`, `create`, `format`, `chart`, `batch`.

---

## How to connect to Google Drive

```python
from delfhos import Agent, Drive, Gmail

drive = Drive(
    oauth_credentials="client_secrets.json",
    allow=["search", "get", "create", "update"],   # No delete or share
    confirm=["create", "update"],                   # Approve writes
)

agent = Agent(tools=[drive, Gmail(oauth_credentials="client_secrets.json")], llm="gemini-3.5-flash")

result = agent.run(
    "Find all PDF files in the 'Reports/Q3' folder and email them to finance@company.com"
)
agent.stop()
```

**Available actions:** `search`, `get`, `create`, `update`, `delete`, `list_permissions`, `share`, `unshare`.

---

## How to connect to Google Docs, Calendar, and more

```python
from delfhos import Agent, Docs, Calendar, WebSearch

docs     = Docs(oauth_credentials="client_secrets.json")
calendar = Calendar(oauth_credentials="client_secrets.json")
search   = WebSearch(llm="gemini-3.5-flash")

agent = Agent(
    tools=[docs, calendar, search],
    llm="gemini-3.5-flash",
)

agent.run(
    "Research the latest Python packaging best practices online "
    "and write a summary document called 'Python Packaging Guide'."
)

agent.run(
    "Find a free 30-minute slot this Friday afternoon "
    "and create a calendar event called 'Team Sync'."
)

agent.stop()
```

**Docs actions:** `read`, `create`, `update`, `format`, `delete`.
**Calendar actions:** `list`, `get`, `create`, `update`, `delete`, `respond`.

---

## How to connect to any REST API

`APITool` turns any OpenAPI 3.x specification into a set of callable agent actions. No code generation needed — the compiler reads the spec and registers every endpoint automatically.

```python
from delfhos import Agent, APITool

# From a public OpenAPI spec URL
petstore = APITool(
    spec="https://petstore3.swagger.io/api/v3/openapi.json",
    allow=["list_pets", "get_pet_by_id"],   # Restrict to specific endpoints
    confirm=["add_pet", "delete_pet"],       # Require approval for writes
)

agent = Agent(tools=[petstore], llm="gemini-3.5-flash")
agent.run("List all available pets and show their names")
agent.stop()
```

```python
# From a local spec file with auth headers
internal = APITool(
    spec="./openapi.yaml",
    base_url="https://api.internal.corp/v1",
    headers={"Authorization": "Bearer sk_..."},
)

# With query-param auth (appended to every request URL)
external = APITool(
    spec="https://api.example.com/openapi.json",
    params={"api_key": "my-key"},
)

# With fixed path parameters (substituted into URL templates)
# e.g. spec has paths like /api/{globalCompanyId}/orders
multitenant = APITool(
    spec="https://api.example.com/openapi.json",
    headers={"Authorization": "Bearer sk_..."},
    path_params={"globalCompanyId": "acme-corp"},
    # The LLM never sees globalCompanyId — it's auto-injected into every URL
)
```

### Discover available endpoints before connecting

```python
# Class-level inspect — no Agent needed
print(APITool.inspect(spec="https://petstore3.swagger.io/api/v3/openapi.json"))
# → {"tool": "petstore3", "methods": ["list_pets", "add_pet", ...], "total": 19}

# Verbose mode shows method + path + description
print(APITool.inspect(spec="./openapi.yaml", verbose=True))
```

### Caching compiled specs

For large specs (Stripe, GitHub, etc.) that rarely change, enable the disk cache so the spec is only parsed once:

```python
api = APITool(
    spec="https://api.stripe.com/openapi.json",
    headers={"Authorization": "Bearer sk_live_..."},
    cache=True,   # Saved to ~/delfhos/api_cache/
)
```

### LLM enrichment — improve descriptions automatically

Pass `enrich=True` to have an LLM rewrite every endpoint description and infer missing response schemas before the agent runs. The enriched manifest is cached so the LLM is only called once per spec version.

If you do not pass `llm=`, the agent automatically uses the `light_llm` (or `llm` in single-model mode) for enrichment — no extra configuration needed:

```python
finnhub = APITool(
    spec="https://finnhub.io/static/swagger.json",
    headers={"X-Finnhub-Token": os.environ["FINNHUB_API_KEY"]},
    cache=True,   # Required to persist enriched manifest
    enrich=True,  # Model is inferred from Agent's light_llm automatically
)

# Or specify a model explicitly:
finnhub = APITool(
    spec="https://finnhub.io/static/swagger.json",
    headers={"X-Finnhub-Token": os.environ["FINNHUB_API_KEY"]},
    cache=True,
    enrich=True,
    llm="gemini-3.5-flash",  # Override the model used for enrichment
)
```

Token usage and cost for enrichment are tracked separately from task cost and appear in the trace summary:

```
║ API ENRICHMENT            1,823ms                         ║
║   Model                   gemini-3.5-flash                ║
║   Endpoints enriched      12                              ║
║   Tokens in/out           1,024 / 487                     ║
║   Cost USD                $0.000312                       ║
╠═══════════════════════════════════════════════════════════╣
║   Setup cost (API enrich) $0.000312                       ║
║   Task cost               $0.004521                       ║
║   Cost USD                $0.004833                       ║
```

On subsequent runs the manifest loads from cache — the setup line shows `$0.000000 (cached)` and no LLM call is made.

### Background response schema sampling

`sample=True` (the default) silently captures the real response structure after each successful API call and saves it to the cache. No LLM, no tokens, zero latency impact. On the next run the agent's view of each endpoint's return type automatically improves.

```python
finnhub = APITool(
    spec="...",
    cache=True,
    sample=True,  # Default — capture real response schemas in background
)
```

Sampled schemas are stored in `~/delfhos/api_cache/{tool}_{hash}/sampled_schemas.json` and are merged into the manifest on every subsequent `compile()` or `load_cache()` call.

---

## How to use local or custom OpenAI-compatible models

Use `LLMConfig` to point Delfhos at any OpenAI-compatible endpoint — local models, open-source servers, or third-party providers.

```python
from delfhos import Agent, LLMConfig

# Local Ollama model
agent = Agent(
    tools=[...],
    llm=LLMConfig(model="llama3.2", base_url="http://localhost:11434/v1"),
)

# LM Studio
agent = Agent(
    tools=[...],
    llm=LLMConfig(
        model="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
        base_url="http://localhost:1234/v1",
    ),
)

# Groq (cloud, OpenAI-compatible)
agent = Agent(
    tools=[...],
    llm=LLMConfig(
        model="llama-3.3-70b-versatile",
        base_url="https://api.groq.com/openai/v1",
        api_key="gsk_...",
    ),
)

# Enterprise server with multiple required auth headers
agent = Agent(
    tools=[...],
    llm=LLMConfig(
        model="llama-3-70b",
        base_url="https://llm.corp.internal/v1",
        headers={
            "X-Tenant-ID": "acme-prod",
            "X-User-Token": "tok_abc123",
            "X-Request-Source": "delfhos",
        },
    ),
)

# Mix: cheap local model for prefilter, strong cloud model for generation
agent = Agent(
    tools=[...],
    light_llm=LLMConfig(model="qwen2.5:7b", base_url="http://localhost:11434/v1"),
    heavy_llm="gpt-5.5",
)

# Per-model generation settings (temperature, top_k, max_tokens, etc.)
agent = Agent(
    tools=[...],
    llm=LLMConfig(
        model="gemini-3.5-flash",
        settings={
            "temperature": 0.8,
            "top_k": 40,
            "max_tokens": 1200,
        },
    ),
)

# Pythonic helper form
cfg = LLMConfig(model="gemini-3.5-flash")
cfg.with_settings(temperature=0.8, top_k=40, max_tokens=1200)

agent = Agent(tools=[...], llm=cfg)
```

`LLMConfig` works wherever a model string is accepted: `llm`, `light_llm`, `heavy_llm`, `vision_llm`.

> **Note on `headers` vs `api_key`:** Use `api_key` for a single bearer token (`Authorization: Bearer ...`). Use `headers` when your server requires additional fields — tenant IDs, session tokens, routing keys, etc. You can use both together: `api_key` sets the `Authorization` header and `headers` adds anything else on top.

> **Note on `settings` keys:** Use Python keys like `top_k` and `max_tokens`. Aliases like `"top-k"` and `"max-tokens"` are also accepted.

> **Provider inference:** If `base_url` is not set, `LLMConfig` infers native provider from the model name (`gemini-*` → Google, `gpt-*`/`o*` → OpenAI, `claude-*` → Anthropic). If `base_url` is set, it uses OpenAI-compatible protocol for custom endpoints.

---

## How to use multiple LLMs for different tasks

```python
from delfhos import Agent, SQL, Gmail

agent = Agent(
    tools=[SQL(url="..."), Gmail(oauth_credentials="...")],
    light_llm="gemini-3.1-flash-lite",   # Fast, cheap model for tool routing
    heavy_llm="claude-opus-4-8",         # Powerful model for code generation
    vision_llm="gpt-5.5",                # Override for image/multimodal tasks
)
```

Rules:
- If you specify only `llm`, it is used for everything.
- `light_llm` and `heavy_llm` must be specified together.
- `vision_llm` is an optional override on top of `heavy_llm`.

Task routing map:

| Model field | Used for these tasks | Fallback behavior |
|-------------|----------------------|-------------------|
| `llm` | Simple mode shortcut. Handles all tasks when you do not split models. | Internally sets both `light_llm` and `heavy_llm` to the same model. |
| `light_llm` | Lightweight tasks: tool prefilter/routing (`enable_prefilter=True`), small parsing/classification helpers, and chat summarization when `Chat.summarizer_llm` is not set. | No fallback at config time: it must be provided together with `heavy_llm` (unless you use `llm`). |
| `heavy_llm` | Main reasoning model: Python code generation, retry/fix loops after execution errors, and text-only calls through `llm.call(...)`. | Base default for specialized models. |
| `vision_llm` | Image/multimodal analysis (for example `llm.call(file_data=[...], prompt=...)`). | Falls back to `heavy_llm` if not set. |

In short: keep `light_llm` cheap/fast for routing and `heavy_llm` strong for reasoning. Override `vision_llm` only when a specialized model improves multimodal workloads.

---

## How to control what a tool can do with `allow` and `confirm`

`allow` and `confirm` are two independent permission layers present on every built-in connection (`Gmail`, `SQL`, `Drive`, etc.) and on the `@tool` decorator. They are separate controls that serve different purposes:

- **`allow`** — defines which actions the agent is *permitted to use at all*. Actions not in the list are hidden from the LLM entirely: it cannot plan them, generate code for them, or call them. This is an enforcement boundary, not a prompt.
- **`confirm`** — defines which actions must be *approved by a human before they execute*. The agent can plan and generate code for these actions, but execution pauses until you approve or reject.

### Accepted values

Both parameters accept the same three forms:

| Value | Meaning |
|-------|---------|
| `None` (default) | All actions available / no approval required |
| `True` | All actions affected (all allowed / all need approval) |
| `False` | None affected (for `confirm`: skip approval entirely) |
| `["action1", "action2"]` | Only the listed actions affected |
| `"action1"` | Shorthand for a single-item list |

### `allow` — restrict what the agent can do

```python
from delfhos import Agent, SQL

# Read-only: the agent can inspect the schema and run SELECTs, but cannot write
db = SQL(url="postgresql://...", allow=["schema", "query"])

agent = Agent(tools=[db], llm="gemini-3.5-flash")
agent.run("How many users signed up last week?")   # OK
agent.run("Delete all rows from the logs table")   # LLM never sees the write action
```

When `allow` is set, the LLM's tool documentation only includes the listed actions. The restricted actions don't exist from the model's perspective.

### `confirm` — require approval before sensitive actions

```python
from delfhos import Agent, Gmail

gmail = Gmail(
    oauth_credentials="client_secrets.json",
    confirm=["send"],   # Reading is automatic; sending pauses for approval
)

agent = Agent(tools=[gmail], llm="gemini-3.5-flash")
agent.run("Summarize my inbox")           # Runs automatically
agent.run("Send a reply to Alice")        # Pauses — terminal prompt appears
```

### Using both together

`allow` and `confirm` compose naturally. A common pattern is to allow only safe actions and require approval on the ones that mutate state:

```python
from delfhos import Agent, Drive, Sheets

drive = Drive(
    oauth_credentials="client_secrets.json",
    allow=["search", "get", "create", "update"],  # delete and share are off-limits
    confirm=["create", "update"],                  # writes need approval
)

sheets = Sheets(
    oauth_credentials="client_secrets.json",
    allow=["read", "write", "create"],
    confirm=["create"],                            # only creating new sheets needs approval
)

agent = Agent(tools=[drive, sheets], llm="gemini-3.5-flash")
```

### Common patterns

```python
# Read-only — no writes, no approval prompts
Gmail(oauth_credentials="...", allow=["read"], confirm=False)

# Full access, approve everything
Drive(oauth_credentials="...", confirm=True)

# Full access, fully autonomous (no approval at all)
SQL(url="...", confirm=False)

# Allow all actions, but approve only destructive ones
Calendar(oauth_credentials="...", confirm=["delete"])
```

### On `@tool` functions

The `confirm` parameter on `@tool` works the same way — `True` (default) means the function will pause for approval before running; `False` skips the prompt:

```python
from delfhos import tool

@tool(confirm=False)          # always runs automatically
def get_account_balance(account_id: str) -> float:
    """Return the current balance for an account."""
    ...

@tool(confirm=True)           # always pauses for approval
def transfer_funds(from_id: str, to_id: str, amount: float) -> bool:
    """Transfer funds between accounts."""
    ...
```

> **Note:** `@tool` functions do not have an `allow` parameter — they are either registered with the agent or not.

---

## How to require human approval before actions

### Default interactive approval (terminal prompt)

```python
from delfhos import Agent, Gmail

gmail = Gmail(
    oauth_credentials="client_secrets.json",
    confirm=["send"],    # Only "send" needs approval; "read" does not
)

agent = Agent(tools=[gmail], llm="gemini-3.5-flash")
agent.run("Send a weekly digest to team@company.com")
# → Terminal prompt appears: Approve / Reject
```

### Custom approval handler

```python
def slack_approval(request):
    """Return True to approve, False to reject, None for default UI."""
    # request.message contains a human-readable description
    if "delete" in request.message.lower():
        return False      # Always auto-reject deletes
    return True           # Auto-approve everything else

agent = Agent(
    tools=[Gmail(oauth_credentials="...", confirm=True)],
    llm="claude-sonnet-4-6",
    on_confirm=slack_approval,
)
```

### Programmatic approval (background agents)

```python
agent = Agent(tools=[...], llm="gemini-3.5-flash")
agent.start()

agent.submit("Draft and send weekly reports")   # Returns immediately (task_id)

# Later, in a web handler or another thread:
pending = agent.get_pending_approvals()
for req in pending:
    agent.approve(req["request_id"], response="Looks good!")
    # or
    agent.reject(req["request_id"], reason="Wrong recipient")
```

---

## How to run a task: blocking, async, or background

Delfhos has three task entry points. Pick one based on whether you want to wait
for the result:

| Method | Blocks? | Returns | Use when |
|--------|---------|---------|----------|
| `run(task, timeout=60)`  | yes          | `Response`     | simple scripts — you want the answer now |
| `arun(task, timeout=60)` | yes (`await`)| `Response`     | inside `async`/`await` code (FastAPI, asyncio apps) |
| `submit(task)`           | no           | `task_id` (str)| fire-and-forget — track with `poll()`, or run many at once |

`run_chat()` is separate: an interactive terminal chat loop (see
[Tutorial 4](#tutorial-4--chat-mode-and-memory)). None of these names changed
recently except that the old `run_async()` is now `submit()`.

### Blocking — `run()`

```python
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="client_secrets.json")], llm="gemini-3.5-flash")

resp = agent.run("Summarize unread emails")     # waits up to timeout (default 60s)
print(resp.text)
print(resp.cost_usd, resp.duration_ms)
agent.stop()
```

`run()` returns a **`Response`** (`text`, `status`, `error`, `cost_usd`,
`duration_ms`, `files`, `trace`). If the timeout elapses first you get a
`Response` with `status=False` and `error="Timeout..."`.

### Async — `arun()`

Same contract as `run()`, but awaitable — use it from async code:

```python
import asyncio
from delfhos import Agent, Gmail

async def main():
    agent = Agent(tools=[Gmail(oauth_credentials="client_secrets.json")], llm="gemini-3.5-flash")
    result = await agent.arun("Summarize unread emails", timeout=60.0)
    print(result.text)
    agent.stop()

asyncio.run(main())
```

Or with a context manager for automatic cleanup:

```python
async def main():
    with Agent(tools=[Gmail(oauth_credentials="...")], llm="gemini-3.5-flash") as agent:
        result = await agent.arun("Summarize unread emails")
        print(result.text)
```

### Background — `submit()`

Returns a `task_id` immediately and runs the task in the background. You then
inspect progress with `poll()` (see the next section), or fire several at once
(see [running tasks concurrently](#how-to-run-many-tasks-concurrently)).

```python
task_id = agent.submit("Draft and send the weekly report")   # returns at once
# ... do other work, then check on it later via agent.poll(task_id)
```

---

## How to poll a running request

`run()` blocks until a task finishes. To watch a task *while it runs* — for a
progress UI, a dashboard, or a TTS pipeline — submit it with `submit()` and
poll for live snapshots with `poll()`.

```python
import time
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="...")], llm="claude-sonnet-4-6")

task_id = agent.submit("Summarize my unread emails and draft replies")

while True:
    snap = agent.poll(task_id)            # -> StreamSnapshot
    print(snap.state, "|", snap.output_so_far[-80:])
    for ev in snap.events:
        print(f"  [{ev.kind}] {ev.label} ({ev.status})")
    if snap.is_terminal:                  # "done" or "error"
        break
    time.sleep(0.2)

print("final:", snap.result or snap.error)
agent.stop()
```

`poll()` returns a **`StreamSnapshot`** that unifies everything known about the
request at that instant:

| Field            | Meaning                                                            |
|------------------|--------------------------------------------------------------------|
| `state`          | `"queued"` → `"running"` → `"done"` / `"error"` (queued is brief)  |
| `task`           | The task text                                                      |
| `elapsed_ms`     | Time since the request started                                     |
| `events`         | Unified timeline — list of `StreamEvent`                          |
| `output_so_far`  | `print()` output captured so far (grows during the run)            |
| `result`         | Final answer once `state == "done"`                               |
| `error`          | Error message once `state == "error"`                             |
| `cost_usd`, `tokens_used`, `files`, `trace` | Populated once terminal               |
| `is_terminal`    | `True` when `state` is `"done"` or `"error"`                       |

Each **`StreamEvent`** has `kind` (`"phase"` for internal pipeline steps like
planning/prefilter, `"tool"` for a tool call labelled by its `desc=`, or
`"say"` for a line the agent printed), plus `label`, `status`, `started_at`,
and `duration_ms`. This is the same information the trace records — surfaced
live instead of only at the end.

---

## How to run many tasks concurrently

A single agent can run **multiple tasks at the same time**. Submit them with
`submit()` (which returns immediately) and each one runs concurrently; you poll
each by its own `task_id` without them interfering with one another.

```python
import time
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="...")], llm="gemini-3.5-flash")

# Kick off three tasks at once — they run in parallel, not one after another.
task_ids = [
    agent.submit("Summarize today's unread emails"),
    agent.submit("Find any pending invoices"),
    agent.submit("Draft a reply to Ana's last email"),
]

# Poll each independently until all are done.
pending = set(task_ids)
while pending:
    for tid in list(pending):
        snap = agent.poll(tid)
        if snap.is_terminal:
            print(tid, "->", snap.result or snap.error)
            pending.discard(tid)
    time.sleep(0.3)

agent.stop()
```

The same applies over HTTP: fire several `POST /run` calls and poll each
returned `task_id` separately.

**How it works.** Each submitted task runs as its own concurrent unit on the
agent's background scheduler. All per-task state — the live trace, tool
timeline, captured output, result, cost, and tokens — is keyed by `task_id`, so
two tasks running at the same time never overwrite each other's progress or
final result. `poll(task_id)` always reflects exactly that task.

**What "concurrent" means here.** Tasks interleave while they wait on I/O —
LLM calls, tool/API requests, network. This is the dominant cost in agent work,
so in practice several tasks make progress together. It is *not* CPU
parallelism: if a task's generated code does heavy synchronous computation, it
holds the line until it yields. For CPU-bound parallelism, run separate `Agent`
instances (e.g. one per process).

**Things to keep in mind:**

- **Shared connections must tolerate concurrent use.** If several tasks use the
  *same* tool instance (e.g. one `Gmail()`), they'll call it at the same time.
  The built-in tools are I/O clients and handle this fine; if you write a custom
  tool with mutable shared state, make it reentrant.
- **Console output interleaves.** Logs from parallel tasks mix in the terminal
  (each line is tagged with its `task_id`). This is cosmetic — the structured
  data in each `poll()` snapshot stays clean and separated.
- **`run()` / `arun()` are unaffected.** They block until their single task
  finishes, so they don't introduce concurrency on their own. Concurrency only
  comes from issuing multiple `submit()` calls (or multiple `POST /run`).

---

## How to expose the agent over HTTP

Call `serve()` to run a small embedded HTTP API (FastAPI + uvicorn, both bundled
with Delfhos — no extra install). Everything is reachable from the `Agent`
object; you never import internal modules.

```python
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="...")], llm="gpt-5.5")

# Local only, open (fine for development)
agent.serve(port=8080)

# Public, authenticated (production)
agent.serve(host="0.0.0.0", port=8080, api_key="sk-my-secret")
```

| Endpoint                     | Purpose                                                       | Auth |
|------------------------------|---------------------------------------------------------------|------|
| `POST /run`                  | Body `{"task": "..."}` → `{"task_id": "..."}`                | ✔︎   |
| `GET  /tasks/{id}`           | JSON `StreamSnapshot` (state, events, output, cost, tokens)  | ✔︎   |
| `GET  /health`               | `{"ok": true}` — always public (for load balancers)          | —    |

Submit with `POST /run`, then poll `GET /tasks/{id}` until `is_terminal` is true.
Multiple `POST /run` calls are accepted and run
[concurrently](#how-to-run-many-tasks-concurrently) — poll each `task_id` on its own.

### Authentication

Pass `api_key` (a string or a list of keys), or set the `DELFHOS_API_KEY` env
var (comma-separated for multiple keys). Clients authenticate with either header:

```bash
curl -H "Authorization: Bearer sk-my-secret" \
     -X POST localhost:8080/run -H 'content-type: application/json' \
     -d '{"task": "Summarize my unread emails"}'
# {"task_id": "..."}

curl -H "X-API-Key: sk-my-secret" localhost:8080/tasks/<task_id>   # poll until is_terminal
```

**Fail-closed:** binding to a non-loopback interface (e.g. `0.0.0.0`) without any
key raises an error rather than serving openly. Always put HTTPS (a reverse proxy
such as Caddy/Nginx/Cloudflare) in front when exposing publicly — the API key
travels in clear over plain HTTP.

### Mounting inside your own ASGI app

Use `asgi_app()` to get the FastAPI app and compose it with an existing server —
still entirely through `Agent`:

```python
from fastapi import FastAPI

app = FastAPI()
app.mount("/agent", agent.asgi_app(api_key="sk-my-secret"))
# -> POST /agent/run, GET /agent/tasks/{id}, GET /agent/health
```

---

## How to use two Gmail accounts in one agent

```python
from delfhos import Agent, Gmail

work = Gmail(
    oauth_credentials="work_oauth.json",
    name="work_email",            # Unique name required
)
personal = Gmail(
    oauth_credentials="personal_oauth.json",
    name="personal_email",
)

agent = Agent(tools=[work, personal], llm="gemini-3.5-flash")
agent.run(
    "Forward the invoice from my work inbox to my personal email address."
)
agent.stop()
```

Any built-in connection type can be instantiated multiple times as long as each has a unique `name`.

---

## How to enable tool prefiltering to reduce costs

When you have many tools, prefiltering uses a fast (cheap) LLM to select only the relevant subset before the expensive code-generation call. This keeps the code-gen prompt short and focused, typically cutting 40–70% of context tokens.

Control prefiltering with the `prefilter_mode` parameter:

| Mode | When to use | What it does |
|------|-------------|--------------|
| `"auto"` *(default)* | Always — let Delfhos decide | `"off"` for <10 actions, `"filter"` for 10–49, `"search"` for ≥50 |
| `"filter"` | 10–49 tool actions | One fast LLM call reads the full tool list and selects relevant tools |
| `"search"` | ≥50 tool actions | Iterative search loop: LLM browses tool summaries up to 5 rounds before finalising |
| `"off"` | <10 actions or debugging | No routing — the heavy LLM sees every tool on every call |

```python
from delfhos import Agent, Gmail, Sheets, Drive, SQL, WebSearch

# "auto" (default) — Delfhos picks the right mode automatically
agent = Agent(
    tools=[Gmail(...), Sheets(...), Drive(...), SQL(...), WebSearch(...)],
    light_llm="gemini-3.1-flash-lite",   # Used for prefiltering
    heavy_llm="gpt-5.5",                 # Used for code generation
)

agent.run("What is the weather in London?")
# auto selects "filter" (5 tools → <10 actions each → 25–40 total)
# Prefilter selects: [WebSearch]  — Gmail/Sheets/Drive/SQL excluded
```

You can also pin a specific mode:

```python
# Force search mode (best for APITool with 50+ endpoints)
agent = Agent(
    tools=[my_api_tool],             # e.g. 110 Finnhub endpoints
    llm="gemini-3.5-flash",
    prefilter_mode="search",
)

# Disable prefiltering entirely (fastest for tiny tool sets)
agent = Agent(
    tools=[my_single_tool],
    llm="gemini-3.5-flash",
    prefilter_mode="off",
)
```

### How each mode works internally

**`filter` mode** — single-pass routing:
1. The `light_llm` receives the full list of available `tool:ACTION` pairs and the task description.
2. It returns the subset of actions to include; the rest are excluded from the code-gen context.

**`search` mode** — iterative discovery (up to 5 rounds):
1. Round 1: The `light_llm` sees a compact inventory (tool names + action names only).
2. If it needs detail, it emits `SEARCH: <keywords>` — Delfhos runs a ranked keyword search over tool summaries and returns the top matches.
3. When confident, it emits `DONE: tool:ACTION, …` to finalise the selection.
4. If all 5 rounds are exhausted, the union of every tool surfaced by SEARCH rounds is used as a fallback.

**`off` mode** — no prefilter:
The heavy LLM receives every tool's full documentation on every call. Fine for small agents; expensive at scale.

---

## How to add long-term memory to an agent

```python
from delfhos import Agent, Memory

memory = Memory(
    namespace="crm_agent",
    embedding_model="all-MiniLM-L6-v2",   # Default; ~90 MB download on first use
)

# Populate once (survives restarts)
memory.save("""
Alice Chen — VP Sales, alice@acme.com, Enterprise tier
Bob Torres — Dev Lead, bob@acme.com, Pro tier
Our SLA: Enterprise 2hr response, Pro 8hr response
""")

agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    memory=memory,
)

agent.run("Draft a response to Alice's support ticket")
# Memory retrieves: Alice's role, email, tier, and SLA expectations
```

You can also load memory from a text or markdown file:

```python
memory.add("knowledge_base.md")   # Reads and stores the file contents
```

---

## How to inspect a connection's available actions

```python
from delfhos import Gmail, Drive, APITool

# Class-level (no auth required)
print(Gmail.inspect())

# Instance-level (includes connection details)
gmail = Gmail(oauth_credentials="client_secrets.json")
print(gmail.inspect())
print(gmail.inspect(verbose=True))   # Full action descriptions

# REST API endpoints
print(APITool.inspect(spec="https://petstore3.swagger.io/api/v3/openapi.json"))
print(APITool.inspect(spec="./openapi.yaml", verbose=True))
```

---

## How to configure cost tracking

Delfhos tracks token usage and estimates costs automatically. Pricing is stored in `~/delfhos/pricing.json`, which is created on first run.

Edit it to add new models or update rates:

```json
{
    "_comment": "USD per 1M tokens",
    "models": {
        "gemini-3.1-flash-lite": {
            "input_per_million": 0.10,
            "output_per_million": 0.40
        },
        "gemini-3.5-flash": {
            "input_per_million": 1.50,
            "output_per_million": 9.00
        },
        "gpt-5.5": {
            "input_per_million": 2.50,
            "output_per_million": 10.00
        },
        "gpt-*": {
            "input_per_million": 1.00,
            "output_per_million": 4.00
        }
    }
}
```

Wildcards (`gpt-*`) match any model whose name starts with the prefix.

After a run:

```python
result = agent.run("...")
print(f"${result.cost_usd:.5f}")    # Cost for this task
print(agent.usage)                   # Cumulative token counts
```

### Cost Guardrails and Budgets

You can set a hard limit on the amount of money an `Agent` instance can spend.

1. **At Initialization:** Pass `budget_usd` when instantiating the `Agent`. Cost estimations are computed locally against `~/delfhos/pricing.json`.

```python
agent = Agent(
    tools=[...],
    llm="gpt-5.5",
    budget_usd=0.50, # Block new tasks if accumulated cost reaches $0.50 USD
)

# Performs multiple calls...
result = agent.run("Perform an expensive multi-step task")

# If the total agent spend reaches the limit, new .run() calls will
# be immediately rejected, raising an AGT-006 error.
print(f"Total spent so far: ${agent.total_cost_usd:.5f}")
```

2. **Dynamic Resets:** If a task runs out of budget or you want to give the agent more allowance mid-execution (or between requests), you can reset the counter or increase the limit:

```python
# Reset the accumulated cost to $0, keeping the current $0.50 limit
agent.reset_budget()

# Reset the accumulated cost to $0 AND assign a new $1.00 limit
agent.reset_budget(1.00)
```

3. **Status Checks:** You can query the current budget status programmatically prior to or after tasks:

```python
status = agent.status()
print(status["budget"]["limit_usd"])      # Configured budget limit
print(status["budget"]["remaining_usd"])  # Limit minus total_cost_usd
print(status["budget"]["is_exhausted"])   # True if remaining <= 0
```

---

## How to pass API keys programmatically

Instead of environment variables, pass keys directly:

```python
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    providers={
        "google": "GOOGLE_API_KEY_HERE",
        "openai": "OPENAI_API_KEY_HERE",
    },
)
```

---

## How to add a system prompt

```python
agent = Agent(
    tools=[SQL(url="..."), Gmail(oauth_credentials="...")],
    llm="gemini-3.5-flash",
    system_prompt="""
You are a data analyst for Acme Corp.
- Always cite the SQL query you used.
- Prefer charts over raw numbers when sharing results.
- Never email results to external addresses without explicit confirmation.
""",
)
```

The system prompt is injected into every LLM call.

---

## How to configure the execution sandbox

Delfhos executes LLM-generated code in an isolated sandbox. By default it automatically picks the strongest isolation available on your machine.

### Sandbox modes

| Mode | Behaviour |
|------|-----------|
| `"auto"` (default) | Use Docker if available, fall back to local process sandbox |
| `"docker"` | Require Docker — raise an error if Docker is not running |
| `"local"` | Always use the in-process sandbox (current behaviour before v0.6.8) |

```python
from delfhos import Agent, SQL

# Default — auto-detects Docker, falls back gracefully
agent = Agent(tools=[SQL(url="...")], llm="gemini-3.5-flash")

# Force Docker (fails if Docker is not running)
agent = Agent(
    tools=[SQL(url="...")],
    llm="gemini-3.5-flash",
    sandbox="docker",
)

# Pin to local sandbox (no Docker required)
agent = Agent(
    tools=[SQL(url="...")],
    llm="gemini-3.5-flash",
    sandbox="local",
)
```

### Resource limits (Docker mode only)

Override the defaults with `sandbox_config`:

```python
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    sandbox="docker",
    sandbox_config={
        "memory_limit": "1g",   # Default: "512m"
        "cpu_limit":    2.0,    # Default: 1.0 CPU core
        "timeout":      600,    # Default: 300 seconds
        "pids_limit":   128,    # Default: 64 (fork-bomb protection)
    },
)
```

### What Docker mode enforces

When Docker is available and `sandbox="docker"` or `sandbox="auto"`, each task execution runs in a disposable container with:

- **Code-level network isolation** — the container has a route to the host (required for the RPC bridge) but generated code cannot make arbitrary network calls — `urllib`, `requests`, `httpx`, `socket`, and similar modules are blocked by the import allowlist
- **Read-only filesystem** — the container rootfs is immutable; only `/tmp` (50 MB, `noexec`) and the task upload directory are writable
- **Memory cap** — the kernel kills the process if it exceeds the limit
- **CPU quota** — prevents runaway computation from starving the host
- **PID limit** — prevents fork bombs
- **No Linux capabilities** — `ALL` capabilities dropped; `no-new-privileges` set
- **Unprivileged user** — agent code runs as a non-root `sandbox` user

All API calls (Gmail, SQL, etc.) are forwarded to the host process over a TCP bridge (`host.docker.internal`) — generated code never touches the network directly.

Even if generated code escapes the Python-level sandbox (e.g. via object introspection), the container boundary contains the damage.

### Building the Docker image

The sandbox image is built automatically the first time a task runs with Docker enabled. It is also **rebuilt automatically** whenever the bundled container sources change (e.g. after a Delfhos upgrade), so you never run a stale image without realising it. To pre-build manually (recommended for CI or production deployments):

```python
from cortex._engine.core.sandbox.docker_sandbox import build_image

build_image()            # Skips if image is up to date
build_image(force=True)  # Rebuild unconditionally
```

The image is version-tagged to match the installed Delfhos version (e.g. `delfhos-sandbox:0.8.7`) so upgrades automatically use a fresh image.

### Checking sandbox status

```python
from cortex._engine.core.sandbox.executor import _docker_available

print("Docker available:", _docker_available())
```

---

## How to allow extra Python libraries in the sandbox

By default the sandbox only permits a safe subset of the Python standard library. Any `import` statement that references a module outside this list raises an error.

### Default allowed modules

| Module | What it provides |
|---|---|
| `json` | JSON encode / decode |
| `re` | Regular expressions |
| `datetime` | Dates, times, timedeltas |
| `math` | Arithmetic, trigonometry, logarithms |
| `statistics` | Mean, median, stdev, variance |
| `csv` | CSV reading and writing |
| `io` | In-memory byte / text streams |
| `pathlib` | Object-oriented filesystem paths (`Path`) |
| `asyncio` | Async / await primitives (proxied, no raw event-loop access) |
| `time` | `time()`, `sleep()`, `monotonic()` |

In addition to these importable modules, the sandbox exposes a curated set of built-in functions (`int`, `str`, `list`, `dict`, `sorted`, `zip`, `map`, `filter`, `enumerate`, …) and common exception types (`ValueError`, `KeyError`, `TypeError`, …) directly in the execution namespace — no import needed.

Use `allowed_libs` to extend the allowlist with additional packages. Pass PyPI package names exactly as you would to `pip install`.

```python
from delfhos import Agent, SQL

agent = Agent(
    tools=[SQL(url="postgresql://...")],
    llm="gemini-3.5-flash",
    allowed_libs=["pandas", "numpy"],
)
agent.run("Load the sales table and compute monthly totals")
```

The agent's generated code can now call `import pandas` and `import numpy` without hitting the import block.

### Local sandbox behaviour

In local (in-process) mode the packages must already be installed in your Python environment. `allowed_libs` only lifts the import restriction — it does not install anything automatically:

```bash
pip install pandas numpy   # install first
```

```python
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    sandbox="local",
    allowed_libs=["pandas", "numpy"],
)
```

### Docker sandbox behaviour

In Docker mode (`sandbox="docker"` or `sandbox="auto"` with Docker available) Delfhos automatically pip-installs the requested packages inside the container before executing the task. You do not need them installed on the host:

```python
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    sandbox="docker",
    allowed_libs=["pandas", "scikit-learn", "openpyxl"],
)
agent.run("Read the uploaded Excel file and train a simple classifier")
```

Packages are installed into a dedicated `/packages` volume inside the container and are discarded when the container exits. Each task starts from a clean install.

### Security note

Only add libraries you trust. `allowed_libs` expands what the LLM-generated code can import. Network-capable packages (`requests`, `httpx`, `urllib3`) remain blocked at the OS level in Docker mode — adding them to `allowed_libs` only unlocks the Python import; actual outbound connections still cannot reach the internet.

---

## How to pass input files to the agent workspace

Pass a list of absolute host paths via `files=` when constructing the agent. The files are injected into the sandbox as read-only workspace files so the agent's generated code can read them directly.

```python
from delfhos import Agent, SQL

agent = Agent(
    tools=[SQL(url="postgresql://...")],
    llm="gemini-3.5-flash",
    files=[
        "/data/sales_q3.csv",
        "/data/product_catalog.xlsx",
        "/config/mapping.json",
    ],
)

result = agent.run(
    "Read the sales CSV, join it with the product catalog, "
    "and write a summary to the database."
)
agent.stop()
```

**How the paths are exposed to generated code:**

| Sandbox mode | Path seen by the agent |
|---|---|
| `"docker"` (or `"auto"` with Docker) | `/workspace/<filename>` (e.g. `/workspace/sales_q3.csv`) |
| `"local"` | Original host path (e.g. `/data/sales_q3.csv`) |

The exact paths are injected into the code-generation prompt automatically — the LLM knows which files are available and uses the correct path for the active sandbox mode.

Files can be opened with standard Python I/O or through the built-in `files` tool:

```python
# Both work inside generated agent code:
import csv
with open("/workspace/sales_q3.csv") as f:
    reader = csv.DictReader(f)
    rows = list(reader)

# Or via the files tool:
data = await files.read("sales_q3.csv")   # → List[Dict] for CSV/Excel
```

> **Important:** Files passed via `files=` are **read-only**. The agent cannot modify them. To produce new files, use `add_to_output_files()` (see the next guide).

---

## How to extract output files from a task result

When the agent needs to return a file (a CSV export, a generated report, a transformed dataset), it calls `add_to_output_files(name, content)` inside the generated code. After the task completes, the files are available on `result.files` as a `{label: host_path}` mapping.

```python
from delfhos import Agent, SQL

agent = Agent(
    tools=[SQL(url="postgresql://...")],
    llm="gemini-3.5-flash",
)

result = agent.run(
    "Query the top 100 customers by revenue last month "
    "and export the data as a CSV file."
)

# Access the generated files
if result.files:
    for label, path in result.files.items():
        print(f"{label}: {path}")
        # → top_customers: /tmp/delfhos_out_abc123/top_customers.csv

agent.stop()
```

**Instructing the agent to produce a file**

Mention the desired output format in your task description. The LLM is pre-instructed to use `add_to_output_files` when it needs to return data too large to print or when a file format is requested:

```python
agent.run("Generate a monthly sales report as an Excel file.")
agent.run("Export all user records to a CSV.")
agent.run("Produce a JSON summary of the API response.")
```

You can also ask for multiple files in one task:

```python
result = agent.run(
    "Export the raw orders to orders.csv and the summary stats to summary.json."
)
for label, path in result.files.items():
    print(f"{label} → {path}")
# → orders → /tmp/.../orders.csv
# → summary → /tmp/.../summary.json
```

**Auto-conversion rules**

The `add_to_output_files` function applies these conversions automatically when the agent code calls it:

| Content type | Output format |
|---|---|
| `str` | Written as-is; extension inferred from the label |
| `bytes` | Written as binary |
| `dict` | Serialized as JSON |
| `list` | JSON, unless the label ends in `.csv`/`.xlsx` and items are dicts — then CSV or Excel |
| pandas `DataFrame` | CSV (`.csv`) or Excel (`.xlsx`) based on the label extension |

**`response.files` structure**

```python
result.files   # Dict[str, str] — {label: absolute_host_path}
```

Keys are the logical labels the agent chose (e.g. `"report"`, `"orders.csv"`). Values are absolute paths to the files on the host machine. Files persist for the lifetime of the process; copy them to a permanent location if you need to keep them.

---

## How to retry on failure

```python
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    retry_count=3,   # Retry up to 3 times on non-fatal errors
)
```

On each failure, the error message is fed back to the LLM so it can generate corrected code.

---

## How to use `rerun()` for adaptive replanning

`rerun()` is a built-in function available inside every generated agent script. It lets the script stop mid-way, hand back what it learned at runtime, and ask for a fresh code-generation pass that handles the remaining work — all without raising an error.

### When to use it

Use `rerun()` when the agent cannot write correct code for the next step without first inspecting a tool's response. Common cases:

- **Dynamic column names** — a reporting API returns column headers that vary by region, period, or data availability.
- **Unknown nesting depth** — an API can return flat or nested objects depending on the record type.
- **Response-driven routing** — the next tool to call depends on a field in the first response (e.g. `type`, `category`, `format`).

Do **not** use `rerun()` for retries, error recovery, loops, or any situation that standard Python control flow handles correctly. Normal `if/elif`, `.get()`, and `isinstance()` checks are almost always sufficient.

### Signature

```python
rerun(
    context:   str,           # What you learned — paste actual keys, a sample row, or a schema snippet
    remaining: str,           # What work is still left — be specific, not the full original task
    carry:     list = None,   # Optional: variable names you want to highlight as preserved
)
```

`rerun()` exits the current script immediately. No code after the call runs. The sandbox namespace — including any variables already assigned — is preserved and made available to the next generated script.

### Basic pattern

```python
# Pass 1: fetch to discover structure, then hand off
async def main():
    data = await my_api(params, desc="fetching report to inspect format")

    sample = data["rows"][0] if data.get("rows") else {}
    rerun(
        context=f"columns={data['columns']}, sample_row={repr(sample)[:400]}",
        remaining="Format data['rows'] as a markdown table using the exact columns listed above, "
                  "then print it with a totals row.",
    )

await main()
```

```python
# Pass 2 (auto-generated with your context injected):
# The LLM now knows the exact columns and row format.
cols = ["month", "revenue_eur", "units_sold", "support_tickets_opened", "csat_score"]
rows = report["rows"]          # preserved from Pass 1
totals = report["totals"]

header = "| " + " | ".join(cols) + " |"
sep    = "| " + " | ".join(["---"] * len(cols)) + " |"
body   = "\n".join("| " + " | ".join(str(r[i]) for i in range(len(cols))) + " |" for r in rows)
total  = "| **Total** | " + " | ".join(str(totals.get(c, "")) for c in cols[1:]) + " |"
print(f"### Sales Report\n\n{header}\n{sep}\n{body}\n{total}")
```

### Full working example

See `examples/rerun_example.py` in the repository for a self-contained runnable demo using a `@tool` that returns dynamic columns.

### Inspecting rerun iterations in the trace

```python
result = agent.run("Fetch the EMEA sales report and format it as a table.")

if result.trace and result.trace.reruns:
    print(f"Rerun iterations used: {len(result.trace.reruns)}")
    for r in result.trace.reruns:
        print(f"  #{r.attempt}  {r.duration_ms}ms")
        print(f"  context learned: {r.learned_context[:80]}")
        print(f"  remaining:       {r.remaining_task[:80]}")
else:
    print("No rerun — model handled it in a single pass.")
```

### Limiting the number of rerun iterations

By default the orchestrator allows up to 2 rerun iterations per task. Each rerun triggers a full LLM code-generation call, so keep the cap small.

```python
# The cap is set on the orchestrator, not on the public Agent constructor.
# To change it, set it after construction:
agent = Agent(tools=[...], llm="gemini-3.5-flash")
agent.orchestrator.rerun_count = 1   # Allow at most 1 rerun per task
```

### How `rerun()` interacts with auto-retry

The rerun loop runs first. If the code produced by a rerun pass itself raises a recoverable error (e.g. a `KeyError` or `AttributeError`), the normal `retry_count` auto-retry kicks in on top of it. The two mechanisms are independent and compose naturally.

---

---