Jonatan Matajonmatum.com
conceptsnotesexperimentsessays
© 2026 Jonatan Mata. All rights reserved.v2.1.1
Concepts

Function Calling

LLM capability to generate structured calls to external functions based on natural language, enabling integration with APIs, databases, and real-world tools.

evergreen#function-calling#tool-use#llm#api#json#structured-output

What it is

Function calling is the capability of an LLM to decide when to invoke an external function and generate the necessary arguments in structured format (typically JSON). The model doesn't execute the function — it generates the call specification that the host system executes.

This capability transforms LLMs from text generators into action orchestrators.

How it works

  1. Definition: the model is provided a schema of available functions (name, description, parameters)
  2. Decision: given a query, the model decides whether to call a function or respond directly
  3. Generation: if it decides to call, it generates JSON with the function name and arguments
  4. Execution: the host system executes the actual function
  5. Continuation: the result is returned to the model to formulate the final response

Example flow

User: "What's the weather in Madrid?"

Model generates:
{
  "function": "get_weather",
  "arguments": { "city": "Madrid", "units": "celsius" }
}

System executes get_weather("Madrid", "celsius") → "18°C, partly cloudy"

Model responds: "In Madrid it's 18°C with partly cloudy skies."

Example with the Anthropic API

import anthropic
 
client = anthropic.Anthropic()
 
tools = [{
    "name": "get_weather",
    "description": "Gets the current weather for a city",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        },
        "required": ["city"]
    }
}]
 
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Madrid?"}]
)
# response.content includes a tool_use block with name and input

Provider comparison

FeatureOpenAIAnthropicGeminiBedrock
Parallel callsYesYesYesModel-dependent
Tool streamingYesYesYesYes
Forced modetool_choice: requiredtool_choice: anyMode configurationVia toolChoice
Schema formatJSON SchemaJSON SchemaProtobuf or JSONJSON Schema

Schema design

Function description quality determines model accuracy:

  • Descriptive names: search_knowledge_base is better than search
  • Specific descriptions: "Search documents by semantic similarity in the knowledge base" is better than "Search stuff"
  • Enums over free strings: use "enum": ["celsius", "fahrenheit"] instead of "type": "string"
  • Optional parameters with defaults: reduce the model's cognitive load
  • Examples in descriptions: help the model understand expected format

Relationship with MCP

The Model Context Protocol standardizes how models discover and call tools. Function calling is the underlying mechanism; MCP is the protocol that makes it interoperable across different systems.

Usage patterns

  • Simple tools: calculator, unit conversion, date/time
  • External APIs: weather, search, databases
  • System actions: create files, send emails, execute code
  • Parallel calls: multiple functions in a single response — the model generates an array of calls that the system executes concurrently
  • Chained calls: one function's result feeds the next — the model receives the result and decides the next step
  • Structured output: using function calling not to execute functions but to force the model to generate JSON with a specific schema

Considerations

  • Validation: always validate generated arguments before execution
  • Security: limit which functions are available based on context
  • Error handling: the model should be able to interpret and recover from errors
  • Clear descriptions: function description quality directly affects accuracy

Why it matters

Function calling is the mechanism that turns LLMs from text generators into agents that interact with the real world. Without it, models can only respond with text. With it, they can query databases, call APIs, and execute concrete actions.

References

  • Tool Use — Anthropic — Anthropic, 2024. Tool use guide with Claude.
  • Function Calling — Gemini — Google, 2024. Function calling implementation in Gemini.
  • Function Calling — OpenAI Cookbook — OpenAI, 2024. Practical guide with examples.
  • Tool Use — Amazon Bedrock — AWS, 2024. Tool use documentation for Bedrock.
  • Function Calling — Mistral AI — Mistral AI, 2024. Implementation in Mistral models.

Related content

  • AI Agents

    Autonomous systems that combine language models with reasoning, memory, and tool use to execute complex multi-step tasks with minimal human intervention.

  • Model Context Protocol (MCP)

    Open protocol created by Anthropic that standardizes how AI applications connect with external tools, data, and services through a universal interface.

  • Large Language Models

    Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.

  • Agentic Workflows

    Design patterns where AI agents execute complex multi-step tasks autonomously, combining reasoning, tool use, and iterative decision-making.

  • Building a Second Brain in Public

    Chronicle of building a second brain with a knowledge graph, bilingual pipeline, and agent endpoints — in days, not weeks, and what that teaches about the gap between theory and working systems.

  • Tool Use Patterns

    Design strategies and patterns for AI agents to select, invoke, and combine external tools effectively to complete complex tasks.

  • Chain-of-Thought

    Prompting technique that improves LLM reasoning by asking them to decompose complex problems into explicit intermediate steps before reaching a conclusion.

  • AI Orchestration

    Patterns and frameworks for coordinating multiple AI models, tools, and data sources in production pipelines, managing flow between components, memory, and error recovery.

Concepts