Building Your First AI Agent

An agent is a program that perceives its environment, decides what to do, takes an action, observes the result, and repeats. That's the whole loop. Everything else is implementation detail.

This is how to build one that actually works.

Start with the loop, not the model

Most people start by picking a model. That's the wrong starting point. Start with the loop:

Observe — what information does the agent have?
Plan — what should it do next?
Act — execute the action (call a tool, write to memory, produce output)
Observe — what happened? Feed the result back.
Repeat until done or stuck.

The model is just the planning step. Everything else is infrastructure.

The minimal viable agent

Here's the minimum you need:

A model with tool-calling capability. Any modern frontier model works — Claude, GPT-4, Gemini. The key requirement is structured tool calls, not just text generation.

A set of tools. Start with two or three. A search tool, a write-file tool, a done tool. You can add more later. Agents with 20 tools from day one are agents that do nothing well.

A system prompt that defines the task, the rules, and how the agent knows it's finished. This is the hardest part to get right.

A loop that runs until the model calls done or hits a step limit. Always have a step limit. Agents without exit conditions are agents that run forever and cost you money.

Tools: the only part that matters

Agents are only as useful as their tools. A brilliant model with bad tools produces bad outcomes. An average model with well-designed tools produces good outcomes.

Good tool design means:

Atomic operations. Each tool does one thing. read_file(path) not read_and_analyze_file(path). The model handles composition; tools handle execution.

Clear error messages. When a tool fails, return a structured error that tells the model what went wrong and what to try instead. {"error": "file not found", "suggestion": "list files in parent directory first"} is infinitely more useful than a stack trace.

Idempotent where possible. If the agent retries a tool call, will it create duplicate records? Send duplicate emails? Design tools to be safely retried.

Bounded output. A tool that returns 10MB of database records will blow up your context window. Truncate, paginate, summarize. Return what's needed, not everything.

Memory: four kinds, not one

Most agent tutorials treat memory as a single thing. It's not. There are four distinct kinds:

In-context memory is the conversation history. Fast, expensive, has a hard limit. Use for the current task.

External memory is a database, vector store, or file system the agent can read and write. Use for facts that need to persist across sessions.

Episodic memory is a log of past agent runs. What worked, what failed, what took too long. Use for learning and debugging.

Semantic memory is an embedded knowledge base — documents, wikis, reference material. Use for domain knowledge the agent needs to reference.

Most simple agents only need in-context memory. Don't add complexity you don't need yet.

The system prompt is the agent

The model doesn't have values or goals. Your system prompt gives it values and goals. This is where most agents fail — not in the code, in the prompt.

A good agent system prompt includes:

Role and mission. What is this agent? What is it trying to accomplish? Be concrete. "You are a research agent. Your job is to answer questions about competitor pricing by searching the web, reading pages, and summarizing findings" beats "You are a helpful assistant."

Available tools. Even though tools are declared in the API call, explaining them in the system prompt improves reliability. Tell the model what each tool does and when to use it.

Decision rules. What should the agent do when uncertain? When should it ask for clarification vs. proceed with best effort? When should it stop? Make these explicit.

Output format. What should the final output look like? A structured JSON object? A markdown report? A simple yes/no? Specify it.

Step limit awareness. Tell the agent it has N steps and to be efficient. This changes behavior. Agents told they have limited steps plan better.

When to use agents vs. not

Agents are overkill for most tasks. If you can solve a problem with a single API call to a model, do that. Agents add latency, cost, and failure modes.

Use agents when:

The task requires multiple steps that depend on each other
The path from input to output isn't known in advance
Different tool calls need to happen based on intermediate results
The task runs autonomously without human oversight

Don't use agents when:

A single generation with a good prompt solves the problem
Latency matters (agents are slow — many model calls, many tool calls)
The task is simple enough to test exhaustively
You need 100% reproducibility

The failure modes to plan for

Every agent will eventually:

Loop. The model gets stuck in a cycle, calling the same tool over and over. Add step limits. Log what's happening. Add a loop-detection heuristic.

Hallucinate tool inputs. The model will call tools with arguments that don't exist, are malformed, or are wrong. Validate inputs at the tool layer. Return structured errors, not exceptions.

Get lost. After enough steps, the model loses track of what it was doing. Keep the system prompt clear. Summarize the current state in the prompt periodically for long-running agents.

Cost too much. An agent running for 20 steps with large context at frontier model prices is expensive. Track token usage. Set hard cost limits.

Ship something, then iterate

The agents worth using are the ones that shipped. A rough agent that handles 70% of cases and fails gracefully on the other 30% is more valuable than a perfect agent that never gets deployed.

Start minimal. Add tools one at a time. Fix the failure modes you actually encounter, not the ones you imagine. The loop is short. Use it.

Published under Field Notes. Not for sale. Share freely.