The Practical Guide to AI Agent Tool Calling

What it is, why it breaks, and how to get it right

Tool calling is the difference between an AI that talks about doing things and one that actually does them. If you're building AI agent workflows, understanding how tool calling works — and where it fails — is non-negotiable.

This guide covers the fundamentals, common failure modes, and practical patterns used in production agent setups.

What Is Tool Calling?

Tool calling (also called "function calling") lets an AI model request the execution of external functions during a conversation. Instead of just generating text, the model can say:

"I need to look this up. Call search_web(query='AI agent patterns 2026') and bring me the result."

The host application runs the function, returns the result, and the model continues with real data.

The key insight: The model doesn't run the tool — your code does. The model just decides when and how to call it.

The Basic Flow

User message
    ↓
Model decides: "I need a tool"
    ↓
Model outputs: { tool: "get_weather", args: { city: "Denver" } }
    ↓
Your code runs get_weather("Denver") → "72°F, sunny"
    ↓
Result injected back into context
    ↓
Model generates final response

This loop can repeat — a model can call multiple tools in sequence before giving a final answer.

Defining Tools Well

The quality of your tool definitions directly determines how reliably the model uses them.

Bad tool definition:

{
  "name": "get_data",
  "description": "Gets data"
}

Good tool definition:

{
  "name": "search_customer_records",
  "description": "Search the CRM for customer records by name, email, or account ID. Use this when you need to look up a specific customer's subscription status, billing history, or contact info. Do NOT use for general product questions.",
  "parameters": {
    "query": {
      "type": "string",
      "description": "Name, email address, or account ID to search for"
    },
    "limit": {
      "type": "integer",
      "description": "Max results to return (default: 5, max: 20)"
    }
  },
  "required": ["query"]
}

Rules for good tool definitions:

Name it like a function — verb + noun (searchrecords, sendemail, create_ticket)
Describe when to use it, not just what it does
Tell the model when NOT to use it (prevents misuse)
Document every parameter clearly
Mark required vs. optional parameters

Common Failure Modes

1. The Model Ignores the Tool

Symptom: You defined a tool but the model just answers from its training data instead.

Why: Vague description, or the model doesn't think it needs external data for this question.

Fix: Be explicit in your system prompt: "For any question about current customer status, you MUST use searchcustomerrecords before answering."

2. Wrong Tool, Wrong Args

Symptom: Model calls searchproducts when you wanted it to call searchorders.

Why: Tool names or descriptions are too similar.

Fix: Make descriptions clearly distinguish when each tool applies. Add examples.

3. Hallucinated Tool Calls

Symptom: Model calls a tool that doesn't exist, or invents parameters.

Why: Model is pattern-matching based on training data.

Fix: Validate all tool calls before executing. Never trust the model's output blindly.

VALID_TOOLS = {"search_customers", "create_ticket", "send_email"}

def handle_tool_call(tool_name, args):
    if tool_name not in VALID_TOOLS:
        return {"error": f"Unknown tool: {tool_name}"}
    return dispatch_tool(tool_name, args)

4. Infinite Tool Loops

Symptom: Agent keeps calling tools in a loop, never producing a final answer.

Why: Each tool result triggers another tool call, and there's no stopping condition.

Fix: Set a hard limit on tool call iterations per conversation turn.

MAX_TOOL_CALLS = 10

tool_call_count = 0
while model_wants_tool and tool_call_count < MAX_TOOL_CALLS:
    execute_tool()
    tool_call_count += 1

5. Slow / Expensive Cascades

Symptom: Simple requests trigger 5+ tool calls and take 30 seconds.

Why: Model is being overly thorough, or tools aren't returning enough data in a single call.

Fix: Design tools to return rich, complete responses. A tool that returns context, not just data, reduces follow-up calls.

Parallel vs. Sequential Tool Calls

Modern models (Claude, GPT-4o) support parallel tool calling — requesting multiple tools at once in a single response.

Sequential (slow):

→ call get_customer(id=123)
← result
→ call get_orders(customer_id=123)
← result
→ call get_subscription(customer_id=123)
← result
→ final answer

Parallel (fast):

→ call get_customer(id=123)
  call get_orders(customer_id=123)
  call get_subscription(customer_id=123)
← all results arrive simultaneously
→ final answer

If your tools are independent (one doesn't need the output of another), always enable and encourage parallel calling. It can cut latency by 60-80%.

Tool Design Patterns

The Context Loader

Return everything the model might need in one call, not just what was asked for.

def get_customer_context(customer_id):
    return {
        "customer": get_customer(customer_id),
        "subscription": get_subscription(customer_id),
        "recent_orders": get_orders(customer_id, limit=5),
        "open_tickets": get_tickets(customer_id, status="open")
    }

The Action + Confirmation Pattern

For destructive actions, use a two-step tool pattern: one tool to preview, one to execute.

tools = [
    {
        "name": "preview_refund",
        "description": "Preview what a refund would look like before processing it"
    },
    {
        "name": "process_refund", 
        "description": "Actually process the refund. Only call this after showing the user a preview and getting confirmation."
    }
]

The Graceful Error

Always return structured errors the model can reason about:

# Don't do this
raise Exception("Customer not found")

# Do this
return {
    "success": False,
    "error": "customer_not_found",
    "message": "No customer found with ID 123. Try searching by email instead.",
    "suggested_next_step": "call search_customers with their email address"
}

Models can reason about structured errors and recover gracefully. Exceptions crash the flow.

Security: Don't Trust the Model

This cannot be overstated: validate everything.

Check that tool names exist before executing
Validate argument types and ranges
Enforce rate limits per tool
Log every tool call for auditing
Never let the model construct raw SQL, shell commands, or API calls directly

The model is untrusted input. Treat tool calls like user input from a web form.

Testing Your Tools

Before going to production:

Happy path — does the tool return correct results for normal inputs?
Edge cases — empty results, null values, very long strings
Bad inputs — what happens when the model passes wrong types?
Timeout behavior — what happens when the external API is slow?
Adversarial — what if someone tries to inject instructions into tool results?

A tool that works 95% of the time will fail constantly at production scale.

Quick Reference

| Problem | Fix | |---------|-----| | Model ignores tool | Explicit system prompt instructions | | Wrong tool called | Clearer descriptions + negative examples | | Infinite loop | Hard iteration limit | | Too many API calls | Richer tool responses, parallel calling | | Security risk | Validate all inputs, rate limit | | Slow responses | Parallel tool calls, caching |

What's Next

Tool calling is the foundation, but production agent systems need more:

Memory — so agents remember context across conversations
Routing — so the right tool (or agent) handles the right request
Observability — so you can see what your agent is actually doing
Guardrails — so agents don't go off the rails

All of this is covered in the Ask Patrick Library — battle-tested configs updated regularly.

Want the full playbook?

Get copy-paste AI templates, prompt frameworks, and agent patterns — all in one place.

Join The Library — $9/mo

Cancel any time. Instant access.