Running an AI-First Business

AI-first doesn't mean AI-everything. It means the default question for every repetitive task is "should a human do this, or should an agent?" Most of the time, the answer is agent. Here's how to actually run a company that way.

The operating principle

Every task in a business is either:

A decision that requires human judgment
Work that executes a decision

Category 2 is almost entirely automatable with AI agents. Category 1 is where humans should spend their time.

The failure mode of most "AI transformation" efforts is automating category 1 work (analysis, strategy, creative direction) while leaving category 2 work (data entry, status updates, report generation, routine communications) to humans. Backwards. Fix the backwards problem first.

The audit

Before you touch any tooling, audit two weeks of your own work. Log every task you do. At the end, categorize each:

D — requires my judgment, can't be delegated A — could be automated with current AI H — could be done by a human with less context than me (delegate to a person)

Most knowledge worker calendars look like: 20% D, 30% A, 50% H. The A tasks are where you start.

The three loops you need to automate

The information loop. Someone in your organization asks "what's the status of X" many times per week. That person is doing work that should be done by an agent. Build an agent that monitors X and surfaces status proactively, or that can answer queries on demand.

The reporting loop. Weekly summaries, daily digests, performance reviews — these are mechanical work that compounds into hours per week. An agent with access to your data sources can generate these in seconds.

The routing loop. Inbound requests (support tickets, lead inquiries, task assignments) need to be categorized and routed. This is pattern matching, which models are very good at. Build the routing layer before you build anything else.

How to actually delegate to agents

The mistake most people make is treating AI agents like employees you give instructions to. They're not. They're systems you design. The quality of the agent reflects the quality of the design.

Define the output first. What exactly should the agent produce? A JSON object? A Slack message in a specific format? A database record? Be specific. Vague output requirements produce vague agents.

Define the failure modes. What should the agent do when it can't complete the task? Fail loudly (good for high-stakes work). Fail silently and log (good for background processing). Ask a human (good for edge cases).

Define the boundaries. What data can the agent access? What actions can it take? What can't it do? Write this down. Then enforce it in the tool design, not just the prompt.

Start narrower than you think. An agent that handles 10% of a workflow perfectly beats an agent that handles 100% of it badly. Expand scope after you trust the narrow version.

The staffing implication

AI-first doesn't mean no hiring. It means hiring differently.

You stop hiring people to do repetitive execution work. You start hiring people who can:

Design and maintain agent systems
Make the judgment calls agents can't make
Communicate with customers and partners at the level that requires real relationship
Set direction for systems and verify they're working

The ratio shifts. One person who can design and maintain agent systems plus a few humans for judgment and relationship work can do what previously required 10 people. This is real, it's happening, and it compounds.

The trust curve

The reason most AI automation projects fail is they skip the trust-building step. An agent makes one mistake, someone loses confidence, the project dies.

Build the trust curve intentionally:

Phase 1: Shadow mode. The agent runs but doesn't act. It produces the output it would have produced; a human reviews it and takes the action manually. You're evaluating the agent, not deploying it.

Phase 2: Supervised automation. The agent acts, but requires human approval for all consequential actions. Humans see everything before it happens.

Phase 3: Exception handling. The agent acts autonomously for high-confidence cases. Humans only see flagged cases. The agent defines what "high-confidence" means and flags when uncertain.

Phase 4: Autonomous. The agent runs independently. Humans monitor outcomes, not individual actions.

Most tasks should never fully reach Phase 4. "Most of the way automated with human escalation path" is the right steady state for business-critical work.

The measurement layer

If you can't measure it, you can't improve it. For every agent in production, you need:

Volume. How many tasks did it handle this week? Success rate. What percentage completed without escalation? Escalation rate. What percentage required human intervention, and why? Cost. What does it cost per task in API fees? Latency. How long does it take?

Review these weekly in the first 90 days after deploying any agent. The patterns tell you where to improve.

The org design that works

The companies running AI-first well tend to organize like this:

Everyone owns their own automation. Every person is responsible for identifying and proposing automation for their own repetitive work. This is a hiring filter — people who can't think in systems struggle here.

One person owns the agent platform. Someone is responsible for the infrastructure: the tools, the MCP servers, the logging, the deployment. This doesn't have to be a full-time role at first.

Decisions stay with humans. Agents execute. Humans decide strategy, handle exceptions, and own relationships. This is not a temporary state; it's the design.

The companies that fail at this try to get AI to make decisions and humans to execute. Wrong direction. AI executes faster and cheaper than any human. Human judgment, deployed at the right moment, is what you can't replicate.

Published under Field Notes. Not for sale. Share freely.