AI agents are one of 2025’s most important software concepts — language models that don’t just answer questions but take sequences of actions, use tools, and pursue goals over multiple steps. Here is a clear-eyed explanation of what they actually are and how to build one.
What Makes Something an Agent
A chatbot that answers a question is not an agent. An agent is a language model that: (1) receives a goal or task, (2) decides what actions to take, (3) executes actions (using tools — web search, code execution, database queries, API calls), (4) observes the results, (5) decides next actions based on results, and (6) repeats until the goal is achieved or it determines it cannot complete the task. The key is the tool-use and multi-step decision loop.
A Minimal Agent in Python
Using the Anthropic SDK with tool use:
import anthropic
client = anthropic.Anthropic()
tools = [{
"name": "search_web",
"description": "Search the web for current information",
"input_schema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}]
def run_agent(task):
messages = [{"role": "user", "content": task}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
return response.content[0].text
# Handle tool use
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": [{"type": "tool_result", "tool_use_id": block.id, "content": result}]})
break
Frameworks vs. Raw APIs
LangChain, LangGraph, CrewAI, and AutoGen are agent frameworks that handle the loop, memory, and multi-agent orchestration for you. They are useful for complex multi-agent systems but add significant abstraction and complexity. For most use cases, building with the raw Anthropic/OpenAI API gives you more control and easier debugging. Use frameworks when you need: persistent memory across conversations, multi-agent workflows, or complex state management.
The Hardest Problem: Reliability
Agents fail more than single-turn LLM calls. They take wrong actions, get stuck in loops, misinterpret tool results, and hallucinate successful completions. Production agents need: explicit step limits, strong error handling, human-in-the-loop checkpoints for irreversible actions, and extensive logging of every decision and tool call. Start simple — a well-designed two-step agent is more useful than a complex ten-step one that fails unpredictably.




