Benchmark Results
Four frameworks. Four real-world tasks. One clear winner across speed, token efficiency, and developer experience.
// methodology
Reproducible & transparent
Each framework ran the same tasks under identical conditions — same model, same prompts, same hardware. Task 2 (Gmail Email Triage) is excluded from aggregates: native Gmail integration differences make cross-framework comparisons unreliable there.
Tasks Evaluated
| # | Task | Tools |
|---|---|---|
| 1 | Expense Categorisation | 1 custom tool |
| 2 | Quarterly ROI Analysis | 2 chained tools |
| 3 | HR Headcount Report | 2 chained tools |
| 4 | IT Ticket Prioritisation | 2 chained tools |
Speed
Average execution time per task (seconds). Lower is better.
Token Usage
Average input + output tokens per task. Lower is better — directly impacts cost.
Token Breakdown — Average per Task
| Framework | Avg Input | Avg Output | Avg Total |
|---|---|---|---|
| Delfhos | 828 | 313 | 1,141 |
| LangChain | 1,734 | 425 | 2,159 |
| PydanticAI | 1,842 | 462 | 2,304 |
| Agno | 1,760 | 451 | 2,211 |
Setup Complexity
Non-blank, non-comment lines to initialise the agent and define tools (imports excluded). Lower is better.
// aggregate
The full picture
| Framework | Avg Speed | Avg Tokens | Avg Setup LOC | Success Rate |
|---|---|---|---|---|
| Delfhos | 2.0 s | 1,141 | 8.8 | 100% |
| PydanticAI | 3.6 s | 2,304 | 9.8 | 100% |
| Agno | 4.4 s | 2,211 | 10.8 | 100% |
| LangChain | 3.8 s | 2,159 | 15.8 | 100% |
Speed
Delfhos consistently finishes first. The gap widens as tool-chaining complexity increases — up to 2.2× faster than Agno on multi-tool tasks.
Token Efficiency
Delfhos sends leaner prompts across all tasks. The advantage is most pronounced on Task 4, where competitors consume ~3× more tokens.
Developer Experience
LangChain requires roughly double the boilerplate of Delfhos. PydanticAI and Agno are more concise but still trail by 1–2 lines per task.
Leading on every metric.
All four frameworks achieved 100% success. Delfhos leads on speed, tokens, and simplicity — without sacrificing reliability.