Apache 2.0 · Open Source

The agent framework built for real work.

Secure by design. Efficient by architecture. Precise in execution. Delfhos is the AI agent framework you can actually trust in production.

Quickstart arrow_forward
2.2×
Faster execution — precise, no wasted steps
2.0×
Fewer tokens — efficient by architecture
100%
Success rate — secure, auditable, reliable
production_agent.py
1 from delfhos import Agent
2 from delfhos.connections import GmailConnection, SQLConnection
3
4 agent = Agent(connections=[
5 GmailConnection(allow=["read"]), # read-only — can never send
6 SQLConnection(url=DB_URL, confirm=True), # asks before writes
7 ])
8
9 result = agent.run("Find the 5 biggest open invoices")

Works with the tools you already use

Gmail
Gmail
SQL
SQL
Sheets
Sheets
Drive
Drive
Docs
Docs
Calendar
Calendar
WebSearch
WebSearch
MCP
MCP
@tool
@tool

// efficient. Reproducible benchmarks.

Efficient by design. Lower cost, lower latency, less noise.

layers_clear The Struggle

Token costs spiral out of control

Verbose frameworks often include excessive metadata and redundant internal loops, causing context windows to fill rapidly and costs to balloon.

savings
Delfhos: 2.0x Fewer Tokens

Efficient Code-Act architecture.

Delfhos
1,141
PydanticAI
2,304
hourglass_empty The Struggle

Latency that users actually feel

Complex chain abstractions create significant overhead. Users often face long "thinking" pauses that break the flow of real-time applications.

bolt
Delfhos: 2.2x Faster Execution

Streamlined task orchestration.

Delfhos
2.0s avg
Agno
4.4s avg
visibility_off The Struggle

Opaque execution, debugging unknowns

Black-box abstractions in ReAct frameworks hinder traceability and exception handling. Code-act frameworks like Delfhos reduce potential errors and improve auditability.

account_tree
Delfhos: Full Code-Level Auditability

Native Python traceability and robust management.

System AuditDelfhosLangChain
Trace DepthFullLimited
Success Rate100%100%

// precise. Less code, more control.

langchain_agent.py ~60 lines · complex
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain_community.tools.gmail.utils import build_resource_service
# Manual credential management and tool binding
gmail_service = build_resource_service(credentials=...)
tools = [GmailGetMessage(api_resource=gmail_service), ...]
llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = hub.pull("hwchase17/openai-functions-agent")
agent = create_openai_functions_agent(llm, tools, prompt)
# ~45 more lines of orchestration...
delfhos_agent.py 9 lines · clear
from delfhos import Agent
from delfhos.connections import GmailConnection, SheetsConnection
agent = Agent(connections=[
GmailConnection(api_key=GMAIL_KEY),
SheetsConnection(api_key=SHEETS_KEY)
])
agent.run("Find the invoices of December and upload their info to a new sheets")

// precise. One reasoning step. Real results.

Why Code-Act is the precise way to run agents

Real-world task: "Find the 5 biggest open invoices in the database and add them to a new Google Sheet."

Traditional Agent Loop (ReAct)

4-6 LLM Calls
Call 1: Planning Step

"I need to query the database. First I must search for an SQL tool and learn how to use it."

High friction: Frequent context-switching between user intent and internal discovery logic.

Call 2: Execution & Observation

"Fetching results... OK, I have 5 rows. Now I must look for a Google Sheets tool to proceed."

High friction: Forcing the model to pause and parse raw outputs back into text summaries repeatedly.

Call 3: Context Reconciliation

"Preparing the sheet... I hope the row data formatting from two steps ago is still in context."

Risk: Accumulated prompt history leads to 'hallucinations' or loss of specific data points.

Sequential reasoning forces the model to pause, parse text output, and re-invoke the LLM for every small logical step, growing the context exponentially.

Delfhos Code-Act Engine

1 LLM Call
Generated script — 1 LLM call
# Tools pre-selected. No discovery overhead.
rows = await sql.query("SELECT id, amount FROM invoices
WHERE status = 'open'
ORDER BY amount DESC LIMIT 5")
sheet = await sheets.create("Top Open Invoices", rows)
# Variables stay in scope. No re-prompting.
return sheet.url # ✓ Done.
check Pre-filter tools
check One reasoning step
check Sandbox safety
check Native Python speed

// secure. Production-safe by default.

Secure by design. Not bolted on.

Most frameworks hand your agent a full set of keys and hope for the best. Delfhos forces you to declare exactly which actions each connection is allowed to take — and which require human approval before they run.

warning
Without guardrails
# Agent has full, unrestricted access
agent = Agent(connections=[
GmailConnection(), # can read AND send
SQLConnection(url=DB_URL), # can DELETE
DriveConnection(), # can delete files
])
# "Summarise overdue invoices"
# → agent reads, then decides to send
# → reminder emails to ALL customers
verified_user
With Delfhos controls
# Restrict exactly what each tool can do
agent = Agent(connections=[
GmailConnection(allow=["read"]), # sends blocked
SQLConnection(url=DB_URL, confirm=True), # writes need approval
DriveConnection(allow=["list", "read"]), # no delete
])
# "Summarise overdue invoices"
# → reads emails, queries SQL
# → returns summary. Nothing more.

Three ways to handle human approval

Set confirm=True on any connection. Choose the mode that fits your workflow.

terminal
Interactive

Agent pauses and prompts you in the terminal before any flagged action. Zero setup — works out of the box.

SQLConnection(
url=DB_URL,
confirm=True # default
)
webhook
Callback

Pass an async function. Route approvals to Slack, email, a web UI — any system you control.

async def my_approver(action):
return await slack.ask(action)
SQLConnection(on_confirm=my_approver)
science
Programmatic

Auto-approve or deny based on your own logic. Write tests that verify what your agent asks for before it acts.

async def test_approver(a):
return a.action == "read"
SQLConnection(on_confirm=test_approver)

// secure · efficient · precise — everything you need to ship.

verified_user

Per-Action Allow Lists

Restrict any connection to only the actions you permit. allow=["read"] means the agent literally cannot send emails, no matter what it's told.

how_to_reg

Human-in-the-Loop

Set confirm=True and every flagged action pauses for your approval — interactively, via callback, or programmatically in tests.

terminal

Code-Act Engine

The LLM generates a Python script, not a list of function calls. One reasoning pass handles multi-step tasks — no context ballooning across tool calls.

cable

Built-in Tool Suite

Gmail, SQL, Google Sheets, Drive, Docs, Calendar, WebSearch, and MCP servers — all production-ready, no glue code required.

restart_alt

Auto Retry on Failure

When generated code fails, Delfhos feeds the traceback back to the LLM and retries automatically. No silent failures.

psychology

Any LLM, Any Provider

OpenAI, Anthropic, Google, xAI, Ollama — swap models per role. Use a lightweight model for routing and a powerful one for code generation.

From zero to a production agent in under a minute.

01
download

Install the library

pip install delfhos content_copy
02
code

Connect your tools

Connect Gmail, SQL, Drive, Sheets, or any custom tool with a connection object. Set allow lists and confirm gates to control exactly what your agent can do.

03
play_arrow

Run and monitor

Delfhos handles the observation loop, error correction, and tracing. Just call agent.run(task).

Secure. Efficient. Precise. The agent framework built for real work.

Open source · Apache 2.0 LICENSE · Python ≥ 3.9