The Claude API · ai/in practice

The Anthropic API is how you talk to Claude from code. Not through a chat window, not through a terminal, but directly — via HTTP requests that your application constructs, sends, and parses. Where claude.ai is a conversation tool and Claude Code is an agentic assistant that edits your files, the API is raw material: stateless, programmable, and completely under your control.

Every interaction is a transaction. You send a request containing a model name, a system prompt, a list of messages, and a token budget. The API returns a completion. No session, no persistent memory, no interface overhead. The tradeoff is intentional — in exchange for building your own state management, you get full control over model selection, prompt structure, retry logic, cost routing, and everything else that a production integration requires.

What You Can Build With It

The API is the right choice any time you need Claude embedded in your own system rather than accessed through someone else's:

# Things that make sense as API integrations

Your own chatbot or assistant     # you own the UI, the memory, the UX
Document analysis pipelines       # batch-process PDFs, reports, code
Backend enrichment services       # classification, summarization, extraction
Code review automation            # hook into CI, post inline comments
Customer-facing AI features       # inside your SaaS, mobile app, CLI
Research / evaluation harnesses   # run thousands of prompts, measure results

If claude.ai's conversation UI can accomplish the task, use it — it's faster to start. The API becomes the right tool when you need automation, scale, integration with your own data, or behavior that a chat interface cannot express.

Getting Access: Sign-Up and API Keys

Access is through Anthropic Console — the control plane for API keys, usage monitoring, and billing. The sign-up process takes about five minutes:

Go to console.anthropic.com and create an account (email + verification).
Once in, navigate to API Keys in the left sidebar.
Click Create Key, give it a name, and copy the key immediately — it is shown only once. Anthropic does not store a recoverable copy.
Add a payment method under Billing. New accounts receive a small credit to start; after that, usage is pay-as-you-go.
Optionally, set a monthly spend limit under Billing → Usage Limits. This is worth doing before you write any code — a runaway loop or a misconfigured batch job can exhaust a budget quickly.

Store your API key in an environment variable or a secrets manager, never in source code. If a key is committed to a public repo, rotate it immediately from the Console — there is no grace period.

How the API Charges You

Unlike claude.ai Pro, which is a flat monthly subscription with a usage cap, the API is pure pay-per-token. You pay for exactly what you use — no more, no less. Light use can cost cents a month. Heavy production use requires deliberate budgeting.

Input vs. Output Token Pricing

Every API call involves two distinct token pools, and they are not billed at the same rate:

# Token pricing structure (illustrative — check console for current rates)

Input tokens   # everything you send: system prompt, history, your message
Output tokens  # what the model generates in response
               # typically priced at 3–5× the input rate
               # generation is computationally heavier than encoding

Cached input   # stable prefixes you mark for server-side caching
               # billed at ~10% of normal input cost after the first call
               # TTL: 5 minutes; refresh before it expires in long sessions

Model Tiers and Cost Tradeoffs

Anthropic offers multiple models within the Claude family. Cost and capability scale together — there is no free lunch, but there is a smart routing decision:

# Claude model tiers (current family as of 2025)

claude-haiku-4-5       # fastest, cheapest — classification, extraction, simple Q&A
claude-sonnet-4-6      # best balance for most production tasks
claude-opus-4-7        # highest capability, highest cost — deep reasoning, complex code

# Routing principle: use the smallest model that reliably handles the task.
# Flagship → mid-tier swap can reduce cost 5–15× with no meaningful quality loss
# on well-defined, structured tasks.

Batch API: 50% Off for Non-Urgent Work

For workloads that don't need a real-time response — evaluation runs, content generation pipelines, analysis over a dataset — the Batch API runs requests asynchronously at half the standard token price. Results are returned within 24 hours. If a user isn't actively waiting for the response, batch is almost always the right choice.

Building Your First App: A Working Python Example

The following is a complete, runnable script. It sends a single message and prints the response. That's it — no framework, no abstraction, nothing to untangle.

Prerequisites

Python 3.8+ — check with python --version or python3 --version
pip — bundled with Python 3.4+; check with pip --version
An Anthropic API key — from console.anthropic.com (see sign-up steps above)
A few cents of API credit — a single Haiku call costs fractions of a cent

Step 1 — Install the SDK

# In your terminal
pip install anthropic

Step 2 — Set Your API Key

Set it as an environment variable rather than hardcoding it. The SDK reads ANTHROPIC_API_KEY automatically if you set it this way.

# macOS / Linux
export ANTHROPIC_API_KEY="sk-ant-..."

# Windows (Command Prompt)
set ANTHROPIC_API_KEY=sk-ant-...

# Windows (PowerShell)
$env:ANTHROPIC_API_KEY = "sk-ant-..."

# For permanent storage, add the export line to ~/.bashrc, ~/.zshrc,
# or your shell's profile file.

Step 3 — Write the Script

Create a file called ask_claude.py:

import anthropic

client = anthropic.Anthropic()
# Reads ANTHROPIC_API_KEY from the environment automatically.
# To pass the key explicitly: Anthropic(api_key="sk-ant-...")

message = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=512,
    system="You are a concise technical assistant. Answer in plain text only.",
    messages=[
        {"role": "user", "content": "What is an API? One paragraph."}
    ],
)

# The response text
print(message.content[0].text)

# Token usage — always log this in production
print(f"\n--- usage ---")
print(f"Input tokens:  {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

Step 4 — Run It

python ask_claude.py

# Expected output (text will vary):
An API (Application Programming Interface) is a defined contract between two
pieces of software that specifies how they can communicate. One side exposes
a set of endpoints or functions; the other side calls them and receives
structured responses. APIs abstract the implementation details so callers
don't need to know how the other side works, only what to send and what to
expect back.

--- usage ---
Input tokens:  28
Output tokens: 71

What the Response Object Contains

The message object returned by the SDK gives you more than just the text:

message.id               # unique message ID (useful for logging)
message.model            # which model actually handled the request
message.stop_reason      # "end_turn", "max_tokens", "stop_sequence", etc.
message.content[0].text  # the response text (content is a list — handle multi-block)
message.usage.input_tokens   # tokens you sent
message.usage.output_tokens  # tokens generated

Always log usage in production. It's the only reliable way to understand where your token budget is going and which calls are expensive. Silent accumulation is how API bills surprise teams at the end of the month.

Adding a Conversation (Multi-Turn)

The API has no built-in session state — you manage history yourself by passing previous turns in the messages list. Each call is independent; the context is only what you include:

history = []

def chat(user_input):
    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=history,
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

print(chat("My name is Alex."))
print(chat("What is my name?"))
# "Your name is Alex." — because the first turn is in the history.

The conversation window is cumulative. Every message you send carries the entire prior history. A long session doesn't just cost more per turn — it costs more on every turn. Cap history length or summarize aggressively in production.

What to Watch For in Production

The example above works immediately. Moving it into a production service introduces a short list of non-obvious problems worth knowing before you hit them:

Rate Limits

New accounts start at conservative rate limits (tokens per minute, requests per minute). These tiers increase automatically as spend history accumulates, or you can request a limit increase through the Console. Design with retry logic from day one — a 429 response means the current window is exhausted, not that you're banned. Exponential backoff with jitter is the standard approach.

max_tokens Is a Hard Ceiling

If the model's response would exceed max_tokens, it is truncated mid-sentence and stop_reason will read max_tokens instead of end_turn. Always check stop_reason. For tasks where incomplete output is worse than no output (code generation, JSON, structured data), handle this case explicitly.

API Key Security

An API key has the same access level as your Anthropic account. Treat it accordingly. For deployed applications, load it from a secrets manager (AWS Secrets Manager, GCP Secret Manager, Vault, etc.) rather than environment variables baked into a container image or committed to a config file.

Reference Links

Rather than reproduce what Anthropic documents well, go to the source for the following:

API Reference — complete endpoint documentation, all parameters, response schemas, error codes:
docs.anthropic.com/en/api
Models overview — current model IDs, context windows, and capability notes:
docs.anthropic.com/en/docs/about-claude/models
Pricing — current per-token rates for all models and the Batch API:
anthropic.com/pricing
Python SDK — source, changelog, and usage examples:
github.com/anthropics/anthropic-sdk-python
Prompt caching guide — how to mark cache breakpoints and measure cache hit rates:
docs.anthropic.com/en/docs/build-with-claude/prompt-caching
Message Batches API — async batch processing at half price:
docs.anthropic.com/en/docs/build-with-claude/message-batches
Rate limits reference — tier definitions, TPM/RPM tables, and how to request increases:
docs.anthropic.com/en/api/rate-limits
Usage limits (Console) — set spend limits and monitor consumption:
console.anthropic.com/settings/limits

A Practical Checklist

Before your first production deployment — and before you invite anyone else to use what you've built — run through this list:

Account and security:
  [ ] API key stored in environment variable or secrets manager — not in code
  [ ] Monthly spend limit set in the Console
  [ ] Key has a descriptive name and a recorded owner

API calls:
  [ ] Model selected intentionally — not defaulting to flagship for all tasks
  [ ] max_tokens set high enough to avoid mid-sentence truncation
  [ ] stop_reason checked in production code
  [ ] usage logged on every call (input + output tokens)
  [ ] Exponential backoff implemented for 429 responses

Cost control:
  [ ] System prompt trimmed — no stale examples or redundant instructions
  [ ] Conversation history capped or summarized at a defined turn limit
  [ ] Prompt caching enabled for stable system prompts or reference content
  [ ] Non-urgent batch workloads routed to the Batch API (50% discount)
  [ ] Cheap model used for simple/structured tasks; expensive model reserved for complex ones

The API is unglamorous infrastructure. It does exactly what you tell it to do, at exactly the cost of what you send and receive. That precision is the point. Get the fundamentals right — keys, limits, usage logging, retry logic — and everything else builds cleanly on top.

top