The Claude API (provided by Anthropic) gives programmatic access to Claude models for developers building applications, tools, and workflows. This guide covers how to get started, the core concepts, and the practical patterns that most developers need in the first week of working with the API.
Setup and Authentication
Getting access: create an account at console.anthropic.com. New accounts receive free credits to start. For production use, billing is configured per-use (no fixed monthly fee). The API key: from the console, generate an API key. Store it as an environment variable — never hardcode it in code. Standard practice: `export ANTHROPIC_API_KEY=your-key-here` in your shell profile, then access it as `os.environ[“ANTHROPIC_API_KEY”]` in Python. The Python SDK: `pip install anthropic`. The JavaScript/TypeScript SDK: `npm install @anthropic-ai/sdk`. Minimum working example in Python:
“`python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model=”claude-sonnet-4-6″,
max_tokens=1024,
messages=[{“role”: “user”, “content”: “Hello”}]
)
print(message.content[0].text)
“`
The model names: current models as of mid-2025: `claude-opus-4-8` (most capable), `claude-sonnet-4-6` (best balance of capability and speed), `claude-haiku-4-5-20251001` (fastest and cheapest). Always check docs.anthropic.com for the current model IDs — they change as new versions are released.
Core Concepts
Messages API: the primary API for multi-turn conversations. The `messages` parameter takes an array of `{role, content}` objects — alternating between `user` and `assistant`. The system prompt: passed as the `system` parameter to `messages.create()`. This is separate from the messages array and is not included in token count in the same way. Use the system prompt for: persona, instructions, constraints, formatting requirements, and context that applies to the whole conversation. Max tokens: the `max_tokens` parameter sets the maximum output length — set it higher than you think you need (truncated output is worse than paying for extra tokens). Token limits vary by model: Claude Sonnet 4.6 supports 200k input context and up to 64k output tokens. Streaming: for user-facing applications, stream the response rather than waiting for the full output. Use `client.messages.stream()` which yields events as they arrive. In Python:
“`python
with client.messages.stream(
model=”claude-sonnet-4-6″,
max_tokens=1024,
messages=[{“role”: “user”, “content”: “Write a short story”}]
) as stream:
for text in stream.text_stream:
print(text, end=””, flush=True)
“`
Tool use (function calling): define tools that Claude can call to take actions or retrieve information. Claude outputs a `tool_use` content block specifying which tool to call and with what parameters; you execute the tool and return the result as a `tool_result` content block. This is the foundation for building AI agents. Vision: Claude can process images passed as base64-encoded data or URLs in the content array. Useful for document processing, image analysis, and multimodal applications.
Practical Patterns and Costs
Prompt caching: the API supports prefix caching — if you mark a large section of your system prompt or context with `cache_control: {“type”: “ephemeral”}`, subsequent requests that share that prefix use cached computation at 10% of the normal input token cost. Significant saving for applications with large, stable system prompts. Batch processing: for non-realtime workloads, the Message Batches API processes requests asynchronously at 50% of the standard per-token price. Useful for bulk document processing, evaluation runs, and data enrichment. Current pricing (mid-2025, approximate): Claude Sonnet 4.6 — input $3/MTok, output $15/MTok ($0.003/$0.015 per 1000 tokens). A typical 1000-token input + 500-token output request costs approximately $0.01050. With caching, repeated system prompts cost significantly less. Error handling: implement retry logic with exponential backoff for 529 (overloaded) and 529-equivalent errors. The API is generally reliable but usage spikes can cause occasional rate limiting. Rate limits: new accounts have conservative rate limits; they increase automatically as usage grows. If you need higher limits immediately, contact Anthropic’s team. The anthropic Python SDK includes built-in retry logic: `anthropic.Anthropic(max_retries=3)`.



