Delfhos
How-to Guides

Goal-oriented guides for accomplishing specific tasks. Assumes basic familiarity with Delfhos. Jump directly to the section that solves your problem.

Connect to a SQL Database

Connect an agent to PostgreSQL, MySQL, or MariaDB. The agent can inspect schemas, run SELECT queries, and execute writes.

python
from delfhos import Agent, SQL

# Option A: connection URL
db = SQL(url="postgresql://user:password@localhost:5432/mydb")

# Option B: individual parameters
db = SQL(
    host="db.example.supabase.co",
    port=5432,
    database="postgres",
    user="postgres",
    password="secret",
    db_type="postgresql",   # or "mysql" / "mariadb"
)

agent = Agent(tools=[db], llm="claude-opus-4-8")
result = agent.run("How many users signed up last week?")
print(result.text)
agent.stop()
Available actions
schema — inspect table schemas and column definitions
query — execute SELECT queries
write — execute INSERT, UPDATE, DELETE statements

Restrict to read-only

python
db = SQL(url="...", allow=["schema", "query"])

Connect to Google Sheets

Read and write spreadsheet data, create new sheets, apply formatting, and build charts.

python
from delfhos import Agent, SQL, Sheets

sheets = Sheets(oauth_credentials="client_secrets.json")
db = SQL(url="postgresql://...")

agent = Agent(tools=[db, sheets], llm="gpt-5.5")

result = agent.run(
    "Pull last month's revenue by region from the database "
    "and write it to the 'Revenue Q3' sheet, creating it if it doesn't exist."
)
agent.stop()
readwritecreateformatchartbatch

Connect to Google Drive

Search, upload, share, and manage files. Use allow and confirm to scope permissions precisely.

python
from delfhos import Agent, Drive, Gmail

drive = Drive(
    oauth_credentials="client_secrets.json",
    allow=["search", "get", "create", "update"],
    confirm=["create", "update"],
)

agent = Agent(
    tools=[drive, Gmail(oauth_credentials="client_secrets.json")],
    llm="gemini-3.1-flash-lite",
)

result = agent.run(
    "Find all PDF files in the 'Reports/Q3' folder and email them to finance@company.com"
)
agent.stop()
searchgetcreateupdatedeletelist_permissionsshareunshare

Connect to Google Docs & Calendar

Create and edit documents, manage calendar events, and combine these with web search in a single agent.

python
from delfhos import Agent, Docs, Calendar, WebSearch

docs     = Docs(oauth_credentials="client_secrets.json")
calendar = Calendar(oauth_credentials="client_secrets.json")
search   = WebSearch(llm="gpt-5.5")

agent = Agent(
    tools=[docs, calendar, search],
    llm="claude-sonnet-4-6",
)

agent.run(
    "Research the latest Python packaging best practices online "
    "and write a summary document called 'Python Packaging Guide'."
)

agent.run(
    "Find a free 30-minute slot this Friday afternoon "
    "and create a calendar event called 'Team Sync'."
)

agent.stop()
Docs actions
readcreateupdateformatdelete
Calendar actions
listgetcreateupdatedeleterespond

Connect to any REST API

APITool turns any OpenAPI 3.x specification into a set of callable agent actions. The compiler reads the spec and registers every endpoint automatically.

From a public spec URL

python
from delfhos import Agent, APITool

petstore = APITool(
    spec="https://petstore3.swagger.io/api/v3/openapi.json",
    allow=["list_pets", "get_pet_by_id"],
    confirm=["add_pet", "delete_pet"],
)

agent = Agent(tools=[petstore], llm="claude-sonnet-4-6")
agent.run("List all available pets and show their names")
agent.stop()

From a local spec with auth headers

python
# Local YAML spec with Bearer token auth
internal = APITool(
    spec="./openapi.yaml",
    base_url="https://api.internal.corp/v1",
    headers={"Authorization": "Bearer sk_..."},
)

# Query-parameter auth
external = APITool(
    spec="https://api.example.com/openapi.json",
    params={"api_key": "my-key"},
)

Fixed path parameters (multi-tenant APIs)

Use path_params to inject fixed values into URL path templates. The values are URL-encoded and substituted automatically — the LLM never sees or passes them.

python
multitenant = APITool(
    spec="https://api.example.com/openapi.json",
    headers={"Authorization": "Bearer sk_..."},
    path_params={"globalCompanyId": "acme-corp"},
)

Discover available endpoints

python
print(APITool.inspect(spec="https://petstore3.swagger.io/api/v3/openapi.json"))
# → {"tool": "petstore3", "methods": ["list_pets", "add_pet", ...], "total": 19}

print(APITool.inspect(spec="./openapi.yaml", verbose=True))

Cache compiled specs for large APIs

python
api = APITool(
    spec="https://api.stripe.com/openapi.json",
    headers={"Authorization": "Bearer sk_live_..."},
    cache=True,  # Saved to ~/delfhos/api_cache/
)

LLM enrichment

Pass enrich=True to have an LLM rewrite endpoint descriptions before the agent runs. If you don't pass llm=, the agent automatically uses the light_llm (or llm in single-model mode) — no extra configuration needed.

python
import os
from delfhos import Agent, APITool

# Model is inferred from Agent's light_llm automatically
finnhub = APITool(
    spec="https://finnhub.io/static/swagger.json",
    headers={"X-Finnhub-Token": os.environ["FINNHUB_API_KEY"]},
    cache=True,
    enrich=True,  # uses light_llm by default
)

agent = Agent(tools=[finnhub], llm="gemini-3.5-flash")
agent.run("What is the current price of AAPL?")
agent.stop()

# Or specify a model explicitly:
finnhub = APITool(
    spec="https://finnhub.io/static/swagger.json",
    headers={"X-Finnhub-Token": os.environ["FINNHUB_API_KEY"]},
    cache=True,
    enrich=True,
    llm="gemini-3.5-flash",  # override the enrichment model
)

Use Local or Custom OpenAI-compatible Models

Use LLMConfig to configure native providers and any OpenAI-compatible custom endpoint — local models, open-source providers, or enterprise servers.

python
from delfhos import Agent, LLMConfig

# Local Ollama model
agent = Agent(tools=[...], llm=LLMConfig(model="llama3.2", base_url="http://localhost:11434/v1"))

# LM Studio
agent = Agent(tools=[...], llm=LLMConfig(
    model="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
    base_url="http://localhost:1234/v1",
))

# Groq (cloud, OpenAI-compatible)
agent = Agent(tools=[...], llm=LLMConfig(
    model="llama-3.3-70b-versatile",
    base_url="https://api.groq.com/openai/v1",
    api_key="gsk_...",
))

# Enterprise server with multiple auth headers
agent = Agent(tools=[...], llm=LLMConfig(
    model="llama-3-70b",
    base_url="https://llm.corp.internal/v1",
    headers={"X-Tenant-ID": "acme-prod", "X-User-Token": "tok_abc123"},
))

Mix local and cloud in a single agent

python
agent = Agent(
    tools=[...],
    light_llm=LLMConfig(model="qwen2.5:7b", base_url="http://localhost:11434/v1"),
    heavy_llm="gemini-3.1-flash-lite",
)

Use Multiple LLMs for Different Tasks

Save money by routing different tasks to different models. Use a fast cheap model for tool selection and a powerful model for code generation.

Quick recipe — cost optimization

python
from delfhos import Agent, SQL, Gmail, Sheets, Drive

agent = Agent(
    tools=[SQL(...), Gmail(...), Sheets(...), Drive(...)],
    light_llm="claude-sonnet-4-6",
    heavy_llm="gpt-5.5",
    # prefilter_mode="auto" by default — picks "filter" for this tool count
)
# Result: 60% fewer tokens in code generation, 2–3× lower cost

With specialized overrides

python
agent = Agent(
    tools=[...],
    light_llm="claude-sonnet-4-6",
    heavy_llm="gpt-5.5",
    vision_llm="gpt-5.5",
)

Control Tool Permissions with allow and confirm

Two independent parameters that let you define what a tool can do and whether a human must approve it before it runs.

allow

Defines which actions the agent is permitted to use at all. Actions not in the list are hidden from the LLM.

Enforced before code generation

confirm

Defines which actions must be approved by a human before they execute. The agent can plan them, but execution pauses until you approve or reject.

Enforced before execution

Common patterns

python
# Read-only, no prompts
Gmail(oauth_credentials="...", allow=["read"], confirm=False)

# Full access, fully autonomous
SQL(url="...", confirm=False)

# Full access, approve everything
Drive(oauth_credentials="...", confirm=True)

# Full access, approve only destructive actions
Calendar(oauth_credentials="...", confirm=["delete"])

On @tool functions

python
from delfhos import tool

@tool(confirm=False)         # always runs automatically
def get_account_balance(account_id: str) -> float:
    """Return the current balance for an account."""
    ...

@tool(confirm=True)          # always pauses for approval
def transfer_funds(from_id: str, to_id: str, amount: float) -> bool:
    """Transfer funds between accounts."""
    ...

Require Human Approval Before Actions

Three modes: interactive terminal prompt, custom callback, or programmatic API for background agents.

Custom approval handler

python
def my_approval_handler(request):
    if "delete" in request.message.lower():
        return False      # Always auto-reject deletes
    return True           # Auto-approve everything else

agent = Agent(
    tools=[Gmail(oauth_credentials="...", confirm=True)],
    llm="claude-sonnet-4-6",
    on_confirm=my_approval_handler,
)

Programmatic approval (background agents)

python
agent = Agent(tools=[...], llm="claude-sonnet-4-6")
agent.start()
agent.submit("Draft and send weekly reports")

# In a web handler or another thread:
pending = agent.get_pending_approvals()
for req in pending:
    agent.approve(req["request_id"], response="Looks good!")
    # or agent.reject(req["request_id"], reason="Wrong recipient")

Run a Task: Blocking, Async, or Background

Three task entry points — pick one based on whether you want to wait for the result. run() blocks, arun() is awaitable, submit() returns a task_id immediately.

MethodBlocks?ReturnsUse when
run(task, timeout=60)yesResponsesimple scripts — you want the answer now
arun(task, timeout=60)yes (await)Responseinside async/await code (FastAPI, asyncio apps)
submit(task)notask_id (str)fire-and-forget — track with poll(), or run many at once

run_chat() is separate — an interactive terminal chat loop. The only recent rename: the old run_async() is now submit().

Blocking — run()

python
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="client_secrets.json")], llm="gemini-3.5-flash")

resp = agent.run("Summarize unread emails")   # waits up to timeout (default 60s)
print(resp.text)
print(resp.cost_usd, resp.duration_ms)
agent.stop()

run() returns a Response (text, status, error, cost_usd, duration_ms, files, trace). If the timeout elapses first you get a Response with status=False and error="Timeout...".

Async — arun()

Same contract as run(), but awaitable — use it from async code.

python
import asyncio
from delfhos import Agent, Gmail

async def main():
    agent = Agent(tools=[Gmail(oauth_credentials="client_secrets.json")], llm="gemini-3.5-flash")
    result = await agent.arun("Summarize unread emails", timeout=60.0)
    print(result.text)
    agent.stop()

asyncio.run(main())

Or with a context manager for automatic cleanup:

python
async def main():
    with Agent(tools=[Gmail(oauth_credentials="...")], llm="gemini-3.5-flash") as agent:
        result = await agent.arun("Summarize unread emails")
        print(result.text)
# agent.stop() called automatically

Background — submit()

Returns a task_id immediately and runs the task in the background. Inspect progress with poll() (next section), or fire several at once (see running tasks concurrently).

python
task_id = agent.submit("Draft and send the weekly report")   # returns at once
# ... do other work, then check on it later via agent.poll(task_id)

Poll a Running Request

Watch a task while it runs — for a progress UI, a dashboard, or a TTS pipeline. Submit with submit() and poll for live snapshots with poll().

python
import time
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="...")], llm="claude-sonnet-4-6")

task_id = agent.submit("Summarize my unread emails and draft replies")

while True:
    snap = agent.poll(task_id)            # → StreamSnapshot
    print(snap.state, "|", snap.output_so_far[-80:])
    for ev in snap.events:
        print(f"  [{ev.kind}] {ev.label} ({ev.status})")
    if snap.is_terminal:                  # "done" or "error"
        break
    time.sleep(0.2)

print("final:", snap.result or snap.error)
agent.stop()

StreamSnapshot fields

Each call returns a StreamSnapshot — a point-in-time view of the request at that instant.

FieldMeaning
state"queued" → "running" → "done" / "error" (queued is brief)
taskThe task text
elapsed_msTime since the request started
eventsUnified live timeline — list of StreamEvent
output_so_farprint() output captured so far (grows during the run)
resultFinal answer once state == "done"
errorError message once state == "error"
cost_usd, tokens_used, files, tracePopulated once terminal
is_terminalTrue when state is "done" or "error"

Each StreamEvent has kind ("phase", "tool", or "say"), plus label, status, started_at, and duration_ms. This is the same information the trace records — surfaced live instead of only at the end.

Run Many Tasks Concurrently

A single agent can run multiple tasks at the same time. Submit them with submit() and poll each by its own task_id — they never interfere with one another.

python
import time
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="...")], llm="gemini-3.5-flash")

# Kick off three tasks at once — they run in parallel, not one after another.
task_ids = [
    agent.submit("Summarize today's unread emails"),
    agent.submit("Find any pending invoices"),
    agent.submit("Draft a reply to Ana's last email"),
]

# Poll each independently until all are done.
pending = set(task_ids)
while pending:
    for tid in list(pending):
        snap = agent.poll(tid)
        if snap.is_terminal:
            print(tid, "->", snap.result or snap.error)
            pending.discard(tid)
    time.sleep(0.3)

agent.stop()

The same applies over HTTP: fire several POST /run calls and poll each returned task_id separately.

How it works

Each submitted task runs as its own concurrent unit on the agent's background scheduler. All per-task state — the live trace, tool timeline, captured output, result, cost, and tokens — is keyed by task_id, so two tasks running at the same time never overwrite each other's progress or final result. poll(task_id) always reflects exactly that task.

What "concurrent" means here

Tasks interleave while they wait on I/O — LLM calls, tool/API requests, network. This is the dominant cost in agent work, so in practice several tasks make progress together. It is not CPU parallelism: if a task's generated code does heavy synchronous computation, it holds the line until it yields. For CPU-bound parallelism, run separate Agent instances (e.g. one per process).

Things to keep in mind

  • Shared connections must tolerate concurrent use. If several tasks use the same tool instance (e.g. one Gmail()), they'll call it at the same time. The built-in tools are I/O clients and handle this fine; if you write a custom tool with mutable shared state, make it reentrant.
  • Console output interleaves. Logs from parallel tasks mix in the terminal (each line is tagged with its task_id). This is cosmetic — the structured data in each poll() snapshot stays clean and separated.
  • run() / arun() are unaffected. They block until their single task finishes, so they don't introduce concurrency on their own. Concurrency only comes from issuing multiple submit() calls (or multiple POST /run).

Expose the Agent over HTTP

agent.serve() spins up a small embedded FastAPI + uvicorn server. No extra install — both are bundled with Delfhos. Everything is reachable from the Agent object.

python
from delfhos import Agent, Gmail

agent = Agent(tools=[Gmail(oauth_credentials="...")], llm="gpt-5.5")

# Local only, open (fine for development)
agent.serve(port=8080)

# Public, authenticated (production)
agent.serve(host="0.0.0.0", port=8080, api_key="sk-my-secret")
EndpointPurposeAuth
POST /runBody {"task": "..."} → {"task_id": "..."}✔︎
GET /tasks/{id}JSON StreamSnapshot (state, events, output, cost, tokens)✔︎
GET /health{"ok": true} — always public (for load balancers)

Submit with POST /run, then poll GET /tasks/{id} until is_terminal is true. Multiple POST /run calls are accepted and run concurrently — poll each task_id on its own.

Authentication

Pass api_key (a string or a list of keys), or set the DELFHOS_API_KEY env var (comma-separated for multiple keys). Clients authenticate with either header:

bash
curl -H "Authorization: Bearer sk-my-secret" \
     -X POST localhost:8080/run -H 'content-type: application/json' \
     -d '{"task": "Summarize my unread emails"}'
# {"task_id": "..."}

curl -H "X-API-Key: sk-my-secret" localhost:8080/tasks/<task_id>   # poll until is_terminal

Fail-closed: binding to a non-loopback interface (e.g. 0.0.0.0) without any key raises an error rather than serving openly. Always put HTTPS (a reverse proxy such as Caddy/Nginx/Cloudflare) in front when exposing publicly — the API key travels in clear over plain HTTP.

Mount inside your own ASGI app

Use asgi_app() to get the FastAPI app and compose it with an existing server — still entirely through Agent.

python
from fastapi import FastAPI

app = FastAPI()
app.mount("/agent", agent.asgi_app(api_key="sk-my-secret"))
# -> POST /agent/run, GET /agent/tasks/{id}, GET /agent/health

Use Two Accounts of the Same Type

Instantiate any connection type multiple times by giving each instance a unique name.

python
from delfhos import Agent, Gmail

work = Gmail(oauth_credentials="work_oauth.json", name="work_email")
personal = Gmail(oauth_credentials="personal_oauth.json", name="personal_email")

agent = Agent(tools=[work, personal], llm="claude-sonnet-4-6")
agent.run("Forward the invoice from my work inbox to my personal email address.")
agent.stop()

Any built-in connection type can be instantiated multiple times as long as each has a unique name.

Enable Tool Prefiltering to Reduce Costs

When you have many tools, prefiltering uses a fast model to select only the relevant subset before the expensive code-generation call — typically cutting 40–70% of context tokens.

Control prefiltering with the prefilter_mode parameter:

ModeWhen to useWhat it does
"auto" (default)Always — let Delfhos decide"off" for <10 actions, "filter" for 10–49, "search" for ≥50
"filter"10–49 tool actionsOne fast LLM call reads the full tool list and selects relevant tools
"search"≥50 tool actionsIterative search loop: LLM browses tool summaries up to 5 rounds before finalising
"off"<10 actions or debuggingNo routing — the heavy LLM sees every tool on every call

Auto mode (default)

python
from delfhos import Agent, Gmail, Sheets, Drive, SQL, WebSearch

# "auto" — Delfhos picks the right mode based on action count
agent = Agent(
    tools=[Gmail(...), Sheets(...), Drive(...), SQL(...), WebSearch(...)],
    light_llm="gemini-3.5-flash",   # Used for prefiltering
    heavy_llm="gemini-3.5-flash",     # Used for code generation
)

agent.run("What is the weather in London?")
# auto selects "filter" (5 tools → 25–40 total actions)
# Prefilter selects: [WebSearch] — Gmail/Sheets/Drive/SQL excluded

Pin a specific mode

python
# Force search mode (best for APITool with 50+ endpoints)
agent = Agent(
    tools=[my_api_tool],             # e.g. 110 Finnhub endpoints
    llm="gemini-3.5-flash",
    prefilter_mode="search",
)

# Disable prefiltering entirely (fastest for tiny tool sets)
agent = Agent(
    tools=[my_single_tool],
    llm="gemini-3.5-flash",
    prefilter_mode="off",
)

How each mode works internally

filter mode — single-pass routing
  1. The light_llm receives the full list of tool:ACTION pairs and the task description.
  2. It returns the subset of actions to include; the rest are excluded from the code-gen context.
search mode — iterative discovery (up to 5 rounds)
  1. Round 1: The light_llm sees a compact inventory (tool names + action names only).
  2. If it needs detail, it emits SEARCH: <keywords> — Delfhos runs a ranked keyword search and returns top matches.
  3. When confident, it emits DONE: tool:ACTION, … to finalise the selection.
  4. If all 5 rounds are exhausted, the union of every surfaced tool is used as a fallback.
off mode — no prefilter
  1. The heavy LLM receives every tool's full documentation on every call. Fine for small agents; expensive at scale.

Add Long-term Memory to an Agent

Persist facts across program restarts using semantic search. Relevant facts are injected automatically before each task.

python
from delfhos import Agent, Memory

memory = Memory(
    namespace="crm_agent",
    embedding_model="all-MiniLM-L6-v2",
)

memory.save("""
Alice Chen — VP Sales, alice@acme.com, Enterprise tier
Bob Torres — Dev Lead, bob@acme.com, Pro tier
""")

agent = Agent(tools=[...], llm="claude-sonnet-4-6", memory=memory)
agent.run("Draft a response to Alice's support ticket")

Load from a file

python
memory.add("knowledge_base.md")

Inspect a Connection's Available Actions

Use inspect() to list the actions any connection exposes — at the class level (no auth) or on a configured instance — and turn any OpenAPI spec into a readable action list.

python
from delfhos import Gmail, Drive, APITool

# Class-level (no auth required)
print(Gmail.inspect())

# Instance-level (includes connection details)
gmail = Gmail(oauth_credentials="client_secrets.json")
print(gmail.inspect())
print(gmail.inspect(verbose=True))   # Full action descriptions

# REST API endpoints
print(APITool.inspect(spec="https://petstore3.swagger.io/api/v3/openapi.json"))
print(APITool.inspect(spec="./openapi.yaml", verbose=True))

Cost Tracking & Budgets

Delfhos tracks token usage and estimates costs automatically. Pricing lives in ~/delfhos/pricing.json.

Read cost after a run

python
result = agent.run("...")
print(f"${result.cost_usd:.5f}")
print(agent.usage)

Set a budget limit

python
agent = Agent(tools=[...], llm="gpt-5.5", budget_usd=0.50)

# Raises AGT-006 if accumulated cost reaches $0.50
agent.reset_budget()       # Reset counter, keep limit
agent.reset_budget(1.00)   # Reset counter and set new limit

Check budget status

python
status = agent.status()
print(status["budget"]["limit_usd"])      # Configured budget limit
print(status["budget"]["remaining_usd"])  # Limit minus total_cost_usd
print(status["budget"]["is_exhausted"])   # True if remaining <= 0

Pass API Keys Programmatically

Instead of environment variables, pass provider keys directly to the Agent via the providers parameter.

python
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    providers={
        "google": "GOOGLE_API_KEY_HERE",
        "openai": "OPENAI_API_KEY_HERE",
    },
)

Add a System Prompt

Inject a persistent persona, behavioral guardrails, or output format instructions into every LLM call.

python
agent = Agent(
    tools=[SQL(url="..."), Gmail(oauth_credentials="...")],
    llm="claude-sonnet-4-6",
    system_prompt="""
You are a data analyst for Acme Corp.
- Always cite the SQL query you used.
- Prefer charts over raw numbers when sharing results.
- Never email results to external addresses without explicit confirmation.
""",
)

Configure the Execution Sandbox

Delfhos executes LLM-generated code in an isolated sandbox. By default it auto-detects Docker and uses the strongest isolation available.

python
# Default — auto-detects Docker, falls back gracefully
agent = Agent(tools=[SQL(url="...")], llm="claude-sonnet-4-6")

# Force Docker (fails if Docker is not running)
agent = Agent(tools=[SQL(url="...")], llm="claude-sonnet-4-6", sandbox="docker")

# Pin to local sandbox (no Docker required)
agent = Agent(tools=[SQL(url="...")], llm="claude-sonnet-4-6", sandbox="local")

Resource limits (Docker mode only)

python
agent = Agent(
    tools=[...],
    llm="claude-sonnet-4-6",
    sandbox="docker",
    sandbox_config={
        "memory_limit": "1g",
        "cpu_limit":    2.0,
        "timeout":      600,
        "network":      False,
        "pids_limit":   128,
    },
)

Allow Extra Python Libraries in the Sandbox

By default the sandbox only permits a safe subset of the standard library. Use allowed_libs to extend the import allowlist with additional packages.

Default allowed modules

ModuleWhat it provides
jsonJSON encode / decode
reRegular expressions
datetimeDates, times, timedeltas
mathArithmetic, trigonometry, logarithms
statisticsMean, median, stdev, variance
csvCSV reading and writing
ioIn-memory byte / text streams
pathlibObject-oriented filesystem paths (Path)
asyncioAsync / await primitives (proxied, no raw event-loop access)
timetime(), sleep(), monotonic()

In addition, built-in functions (int, str, list, dict, sorted, zip, map, filter, enumerate, …) and common exception types (ValueError, KeyError, TypeError, …) are available directly — no import needed.

python
from delfhos import Agent, SQL

agent = Agent(
    tools=[SQL(url="postgresql://...")],
    llm="gemini-3.5-flash",
    allowed_libs=["pandas", "numpy"],
)
agent.run("Load the sales table and compute monthly totals")

Local sandbox

In local mode the packages must already be installed in your Python environment. allowed_libs only lifts the import restriction — it does not install anything automatically.

python
# pip install pandas numpy  ← install first
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    sandbox="local",
    allowed_libs=["pandas", "numpy"],
)

Docker sandbox

In Docker mode Delfhos automatically pip-installs the requested packages inside the container before executing the task. You do not need them installed on the host.

python
agent = Agent(
    tools=[...],
    llm="gemini-3.5-flash",
    sandbox="docker",
    allowed_libs=["pandas", "scikit-learn", "openpyxl"],
)
agent.run("Read the uploaded Excel file and train a simple classifier")
Security note: Only add libraries you trust. Network-capable packages (requests, httpx) remain blocked at the OS level in Docker mode — adding them to allowed_libs only unlocks the Python import; actual outbound connections still cannot reach the internet.

Pass input files to the agent workspace

Inject local files into the sandbox so the agent's generated code can read them directly.

python
from delfhos import Agent, SQL

agent = Agent(
    tools=[SQL(url="postgresql://...")],
    llm="claude-sonnet-4-6",
    files=[
        "/data/sales_q3.csv",
        "/data/product_catalog.xlsx",
        "/config/mapping.json",
    ],
)

result = agent.run(
    "Read the sales CSV, join it with the product catalog, "
    "and write a summary to the database."
)
agent.stop()
Note: Files passed via files= are read-only. To produce new files, use add_to_output_files().

Extract output files from a task result

When the agent needs to return a file, it calls add_to_output_files() inside generated code. After the task completes, the files are available on result.files.

python
result = agent.run(
    "Query the top 100 customers by revenue last month "
    "and export the data as a CSV file."
)

if result.files:
    for label, path in result.files.items():
        print(f"{'{'}label{'}'} → {'{'}path{'}'}")
        # → top_customers: /tmp/delfhos_out_abc123/top_customers.csv

Retry on Failure

On each failure the error message is fed back to the LLM so it can generate corrected code.

python
agent = Agent(
    tools=[...],
    llm="gemini-3.1-flash-lite",
    retry_count=3,   # Retry up to 3 times on non-fatal errors
)

The default is retry_count=1 (no retry).

Use rerun() for Replanning

Stop mid-way to hand back what the agent learned at runtime, and ask for a fresh code-generation pass for remaining work.

rerun() is built-in inside every generated script. Use it when the agent cannot write correct code for the next step without first inspecting an API's dynamic response.

python
# Basic pattern
async def main():
    data = await my_api(params, desc="fetching report...")
    sample = data["rows"][0] if data.get("rows") else {}

    rerun(
        context=f"columns={'{'}data['columns']{'}'}, sample={'{'}repr(sample)[:400]{'}'}",
        remaining="Format data['rows'] as a markdown table using the exact columns."
    )

await main()