Goal-oriented guides for accomplishing specific tasks. Assumes basic familiarity with Delfhos. Jump directly to the section that solves your problem.
Connect to a SQL Database
Connect an agent to PostgreSQL, MySQL, or MariaDB. The agent can inspect schemas, run SELECT queries, and execute writes.
schema — inspect table schemas and column definitionsquery — execute SELECT querieswrite — execute INSERT, UPDATE, DELETE statementsRestrict to read-only
Connect to Google Sheets
Read and write spreadsheet data, create new sheets, apply formatting, and build charts.
Connect to Google Drive
Search, upload, share, and manage files. Use allow and confirm to scope permissions precisely.
Connect to Google Docs & Calendar
Create and edit documents, manage calendar events, and combine these with web search in a single agent.
Connect to any REST API
APITool turns any OpenAPI 3.x specification into a set of callable agent actions. The compiler reads the spec and registers every endpoint automatically.
From a public spec URL
From a local spec with auth headers
Fixed path parameters (multi-tenant APIs)
Use path_params to inject fixed values into URL path templates. The values are URL-encoded and substituted automatically — the LLM never sees or passes them.
Discover available endpoints
Cache compiled specs for large APIs
LLM enrichment
Pass enrich=True to have an LLM rewrite endpoint descriptions before the agent runs. If you don't pass llm=, the agent automatically uses the light_llm (or llm in single-model mode) — no extra configuration needed.
Use Local or Custom OpenAI-compatible Models
Use LLMConfig to configure native providers and any OpenAI-compatible custom endpoint — local models, open-source providers, or enterprise servers.
Mix local and cloud in a single agent
Use Multiple LLMs for Different Tasks
Save money by routing different tasks to different models. Use a fast cheap model for tool selection and a powerful model for code generation.
Quick recipe — cost optimization
With specialized overrides
Control Tool Permissions with allow and confirm
Two independent parameters that let you define what a tool can do and whether a human must approve it before it runs.
Defines which actions the agent is permitted to use at all. Actions not in the list are hidden from the LLM.
Enforced before code generation
Defines which actions must be approved by a human before they execute. The agent can plan them, but execution pauses until you approve or reject.
Enforced before execution
Common patterns
On @tool functions
Require Human Approval Before Actions
Three modes: interactive terminal prompt, custom callback, or programmatic API for background agents.
Custom approval handler
Programmatic approval (background agents)
Run a Task: Blocking, Async, or Background
Three task entry points — pick one based on whether you want to wait for the result. run() blocks, arun() is awaitable, submit() returns a task_id immediately.
| Method | Blocks? | Returns | Use when |
|---|---|---|---|
| run(task, timeout=60) | yes | Response | simple scripts — you want the answer now |
| arun(task, timeout=60) | yes (await) | Response | inside async/await code (FastAPI, asyncio apps) |
| submit(task) | no | task_id (str) | fire-and-forget — track with poll(), or run many at once |
run_chat() is separate — an interactive terminal chat loop. The only recent rename: the old run_async() is now submit().
Blocking — run()
run() returns a Response (text, status, error, cost_usd, duration_ms, files, trace). If the timeout elapses first you get a Response with status=False and error="Timeout...".
Async — arun()
Same contract as run(), but awaitable — use it from async code.
Or with a context manager for automatic cleanup:
Background — submit()
Returns a task_id immediately and runs the task in the background. Inspect progress with poll() (next section), or fire several at once (see running tasks concurrently).
Poll a Running Request
Watch a task while it runs — for a progress UI, a dashboard, or a TTS pipeline. Submit with submit() and poll for live snapshots with poll().
StreamSnapshot fields
Each call returns a StreamSnapshot — a point-in-time view of the request at that instant.
| Field | Meaning |
|---|---|
| state | "queued" → "running" → "done" / "error" (queued is brief) |
| task | The task text |
| elapsed_ms | Time since the request started |
| events | Unified live timeline — list of StreamEvent |
| output_so_far | print() output captured so far (grows during the run) |
| result | Final answer once state == "done" |
| error | Error message once state == "error" |
| cost_usd, tokens_used, files, trace | Populated once terminal |
| is_terminal | True when state is "done" or "error" |
Each StreamEvent has kind ("phase", "tool", or "say"), plus label, status, started_at, and duration_ms. This is the same information the trace records — surfaced live instead of only at the end.
Run Many Tasks Concurrently
A single agent can run multiple tasks at the same time. Submit them with submit() and poll each by its own task_id — they never interfere with one another.
The same applies over HTTP: fire several POST /run calls and poll each returned task_id separately.
How it works
Each submitted task runs as its own concurrent unit on the agent's background scheduler. All per-task state — the live trace, tool timeline, captured output, result, cost, and tokens — is keyed by task_id, so two tasks running at the same time never overwrite each other's progress or final result. poll(task_id) always reflects exactly that task.
What "concurrent" means here
Tasks interleave while they wait on I/O — LLM calls, tool/API requests, network. This is the dominant cost in agent work, so in practice several tasks make progress together. It is not CPU parallelism: if a task's generated code does heavy synchronous computation, it holds the line until it yields. For CPU-bound parallelism, run separate Agent instances (e.g. one per process).
Things to keep in mind
- Shared connections must tolerate concurrent use. If several tasks use the same tool instance (e.g. one
Gmail()), they'll call it at the same time. The built-in tools are I/O clients and handle this fine; if you write a custom tool with mutable shared state, make it reentrant. - Console output interleaves. Logs from parallel tasks mix in the terminal (each line is tagged with its
task_id). This is cosmetic — the structured data in eachpoll()snapshot stays clean and separated. - run() / arun() are unaffected. They block until their single task finishes, so they don't introduce concurrency on their own. Concurrency only comes from issuing multiple
submit()calls (or multiplePOST /run).
Expose the Agent over HTTP
agent.serve() spins up a small embedded FastAPI + uvicorn server. No extra install — both are bundled with Delfhos. Everything is reachable from the Agent object.
| Endpoint | Purpose | Auth |
|---|---|---|
| POST /run | Body {"task": "..."} → {"task_id": "..."} | ✔︎ |
| GET /tasks/{id} | JSON StreamSnapshot (state, events, output, cost, tokens) | ✔︎ |
| GET /health | {"ok": true} — always public (for load balancers) | — |
Submit with POST /run, then poll GET /tasks/{id} until is_terminal is true. Multiple POST /run calls are accepted and run concurrently — poll each task_id on its own.
Authentication
Pass api_key (a string or a list of keys), or set the DELFHOS_API_KEY env var (comma-separated for multiple keys). Clients authenticate with either header:
Fail-closed: binding to a non-loopback interface (e.g. 0.0.0.0) without any key raises an error rather than serving openly. Always put HTTPS (a reverse proxy such as Caddy/Nginx/Cloudflare) in front when exposing publicly — the API key travels in clear over plain HTTP.
Mount inside your own ASGI app
Use asgi_app() to get the FastAPI app and compose it with an existing server — still entirely through Agent.
Use Two Accounts of the Same Type
Instantiate any connection type multiple times by giving each instance a unique name.
Any built-in connection type can be instantiated multiple times as long as each has a unique name.
Enable Tool Prefiltering to Reduce Costs
When you have many tools, prefiltering uses a fast model to select only the relevant subset before the expensive code-generation call — typically cutting 40–70% of context tokens.
Control prefiltering with the prefilter_mode parameter:
| Mode | When to use | What it does |
|---|---|---|
| "auto" (default) | Always — let Delfhos decide | "off" for <10 actions, "filter" for 10–49, "search" for ≥50 |
| "filter" | 10–49 tool actions | One fast LLM call reads the full tool list and selects relevant tools |
| "search" | ≥50 tool actions | Iterative search loop: LLM browses tool summaries up to 5 rounds before finalising |
| "off" | <10 actions or debugging | No routing — the heavy LLM sees every tool on every call |
Auto mode (default)
Pin a specific mode
How each mode works internally
- The light_llm receives the full list of tool:ACTION pairs and the task description.
- It returns the subset of actions to include; the rest are excluded from the code-gen context.
- Round 1: The light_llm sees a compact inventory (tool names + action names only).
- If it needs detail, it emits SEARCH: <keywords> — Delfhos runs a ranked keyword search and returns top matches.
- When confident, it emits DONE: tool:ACTION, … to finalise the selection.
- If all 5 rounds are exhausted, the union of every surfaced tool is used as a fallback.
- The heavy LLM receives every tool's full documentation on every call. Fine for small agents; expensive at scale.
Add Long-term Memory to an Agent
Persist facts across program restarts using semantic search. Relevant facts are injected automatically before each task.
Load from a file
Inspect a Connection's Available Actions
Use inspect() to list the actions any connection exposes — at the class level (no auth) or on a configured instance — and turn any OpenAPI spec into a readable action list.
Cost Tracking & Budgets
Delfhos tracks token usage and estimates costs automatically. Pricing lives in ~/delfhos/pricing.json.
Read cost after a run
Set a budget limit
Check budget status
Pass API Keys Programmatically
Instead of environment variables, pass provider keys directly to the Agent via the providers parameter.
Add a System Prompt
Inject a persistent persona, behavioral guardrails, or output format instructions into every LLM call.
Configure the Execution Sandbox
Delfhos executes LLM-generated code in an isolated sandbox. By default it auto-detects Docker and uses the strongest isolation available.
Resource limits (Docker mode only)
Allow Extra Python Libraries in the Sandbox
By default the sandbox only permits a safe subset of the standard library. Use allowed_libs to extend the import allowlist with additional packages.
Default allowed modules
| Module | What it provides |
|---|---|
| json | JSON encode / decode |
| re | Regular expressions |
| datetime | Dates, times, timedeltas |
| math | Arithmetic, trigonometry, logarithms |
| statistics | Mean, median, stdev, variance |
| csv | CSV reading and writing |
| io | In-memory byte / text streams |
| pathlib | Object-oriented filesystem paths (Path) |
| asyncio | Async / await primitives (proxied, no raw event-loop access) |
| time | time(), sleep(), monotonic() |
In addition, built-in functions (int, str, list, dict, sorted, zip, map, filter, enumerate, …) and common exception types (ValueError, KeyError, TypeError, …) are available directly — no import needed.
Local sandbox
In local mode the packages must already be installed in your Python environment. allowed_libs only lifts the import restriction — it does not install anything automatically.
Docker sandbox
In Docker mode Delfhos automatically pip-installs the requested packages inside the container before executing the task. You do not need them installed on the host.
requests, httpx) remain blocked at the OS level in Docker mode — adding them to allowed_libs only unlocks the Python import; actual outbound connections still cannot reach the internet.Pass input files to the agent workspace
Inject local files into the sandbox so the agent's generated code can read them directly.
files= are read-only. To produce new files, use add_to_output_files().Extract output files from a task result
When the agent needs to return a file, it calls add_to_output_files() inside generated code. After the task completes, the files are available on result.files.
Retry on Failure
On each failure the error message is fed back to the LLM so it can generate corrected code.
The default is retry_count=1 (no retry).
Use rerun() for Replanning
Stop mid-way to hand back what the agent learned at runtime, and ask for a fresh code-generation pass for remaining work.
rerun() is built-in inside every generated script. Use it when the agent cannot write correct code for the next step without first inspecting an API's dynamic response.
