Methodology

The First Draft

Why teams should build AI trust by using agents for internal preparation before delegating forward-facing work.

TLDR

  • AI is easy to try but hard to trust, so the safest adoption path is to begin with internal preparation instead of external action.
  • Treat AI work as a first draft: useful material for review, not a finished outcome.
  • Teams compound AI effectiveness by reviewing drafts, testing patterns, improving context, and promoting only narrow work from draft to recommendation to action.

AI is easy to try and hard to trust.

That is the practical problem behind most serious AI adoption. A team can ask a model to write a summary, prepare a reply, classify a request, or suggest a next step within minutes. But trust arrives more slowly, because professional work depends on context, evidence, authority, and judgment.

"The first draft" is a principle for closing that gap. It means using AI first for internal preparation, not forward-facing action. The agent prepares the brief, source map, options, checklist, draft response, meeting prep, or test case. A person reviews, refines, rejects, or uses that work before it reaches a client, customer, patient, regulator, supplier, employee, or public record.

The point is not that AI output is low value. The point is that the first durable value of AI is often not the final answer. It is the prepared material that helps a professional think faster and more clearly while still owning the outcome.

Trust compounds through reviewed preparation
1

AI prepares

Draft

2

Human reviews

Judgment

3

Record notes

Corrections

4

Reusable checks

Evals + rules

5. Promote, then repeat

Draft -> recommend -> approve -> automate

This is closely related to AI readiness. A system may be ready to draft before it is ready to send. It may be ready to prepare a recommendation before it is ready to execute. That distinction is where trust begins.

The First Draft As A Working Principle

The first draft principle says AI should begin as preparation for human judgment.

In practical terms, that means the agent can:

  • summarise a document set;
  • extract open questions;
  • compare a request against a policy;
  • draft a client update;
  • prepare options for a decision;
  • turn notes into a task plan;
  • identify missing evidence;
  • propose test cases for a workflow;
  • create an internal brief before a meeting.

The human then reviews the draft with professional responsibility. They check what is missing, what is unsupported, what sounds plausible but wrong, what requires escalation, and what should be changed before anyone else sees it.

This is not a cosmetic review step. It is the operating boundary that makes early AI use useful without pretending the system is already ready for delegation.

Why Trust Is Hard

AI systems can produce fluent outputs before the surrounding workflow is trustworthy.

NIST's AI Risk Management Framework is useful here because it treats trustworthy AI as a lifecycle and context problem, not just a model-quality problem. It frames risk management around governance, mapping, measurement, and management, and it asks organisations to consider validity, reliability, safety, security, accountability, transparency, explainability, privacy, and fairness in context 1.

That matters because a good-looking output can still fail in professional work for reasons that are not visible in the prose:

  • the model did not have the latest source;
  • the workflow had unclear authority;
  • a policy exception was hidden in another system;
  • the tone was wrong for the relationship;
  • the recommendation was correct generally but wrong for this case;
  • the answer omitted uncertainty;
  • the reviewer had no way to inspect the evidence.

Trust is therefore not created by asking whether the output sounds good. It is created by repeatedly seeing how the system behaves under real constraints.

Why Forward-Facing Work Creates Risk Too Early

Forward-facing AI work carries external consequence. It may send a message, create a commitment, update a record, approve a step, reject a request, or expose the organisation's judgment to someone outside the review loop.

That is a poor starting point for trust. It asks the team to learn the system at the same time that the system is creating consequences.

Internal preparation changes the risk profile. A bad draft can be corrected. A weak source map can be improved. A missing assumption can be spotted. A wrong classification can become a test case. The organisation learns without making the AI's first failure visible to the outside world.

This does not mean agents should never act. It means action should come after the team understands what the agent is good at, where it fails, what context it needs, and which actions are low-risk, reversible, and observable.

Anthropic's guidance on building effective agents makes a similar operational point: start with simple, composable patterns, add complexity only when needed, and use clear workflows, tool interfaces, checkpoints, and evaluation as systems become more agentic 3. The first draft is that principle applied to adoption: keep the early system inside preparation until the team has evidence that broader delegation is justified.

Internal Preparation Teaches The Team How The Agent Works

The first benefit of first-draft work is learning.

People often describe AI adoption as if the organisation is testing a tool. In reality, the organisation is testing a relationship between a tool, a workflow, a data environment, and professional judgment.

Internal preparation lets the team observe the agent's behaviour without hiding behind a finished output. Reviewers start to see patterns:

  • which prompts produce useful structure;
  • which source types the agent overweights;
  • when it asks clarifying questions;
  • when it fills gaps with assumptions;
  • how it handles conflicting evidence;
  • where it needs examples;
  • which tasks require narrower instructions;
  • which tasks should not use AI yet.

That knowledge is difficult to get from a one-off pilot or a generic productivity metric. It comes from repeated work where the agent prepares something and the professional compares it against the reality of the job.

McKinsey's 2025 State of AI research found that many organisations use generative AI, but relatively few report mature, scaled impact, and the companies seeing more value are more likely to redesign workflows rather than simply add tools 5. That fits the first draft model. Adoption improves when teams use AI to understand and reshape work, not only to generate isolated outputs.

First Drafts Preserve Professional Judgment

The second benefit is judgment.

AI can reduce effort, but it can also weaken attention if the workflow turns people into passive approvers. A professional who only clicks approve on a polished answer may stop doing the work that made their review meaningful: checking evidence, noticing exceptions, reading tone, weighing risk, and deciding what the organisation can stand behind.

First-draft workflows keep that judgment active. The AI does the preparation work that is often slow, repetitive, or messy. The professional still decides what is accurate, appropriate, complete, and responsible.

This distinction is especially important for agents. A chatbot answer may end in conversation. An agent may use tools, move across systems, and prepare or take actions. Recent research on human oversight of agentic systems argues that oversight is not just a final approval step; it can include control before deployment, co-planning, real-time monitoring, and post-hoc review across the agent lifecycle 6.

The first draft sits early in that oversight model. It keeps the agent close enough to real work to be useful, but not so autonomous that people lose visibility into how the work is being formed.

How This Changes Testing

If AI is treated as a first draft, testing changes.

The team is no longer asking only, "Did the agent produce a good answer?" It asks:

  • Did the agent use the right sources?
  • Did it identify missing information?
  • Did it show assumptions clearly?
  • Did it separate fact from interpretation?
  • Did it preserve the user's decision rights?
  • Did the reviewer know what to check?
  • Did corrections become reusable examples or eval cases?

This makes testing more concrete. A draft can be compared against a known-good brief. A source map can be checked for missing documents. A recommendation can be reviewed against policy. A reviewer correction can become a future test case.

OpenAI's evals guidance recommends using representative examples, clear grading criteria, and expert or human-labeled reference outputs when measuring model performance 4. First-draft adoption gives teams the raw material for that practice. Every reviewed draft can teach the system what "good" looks like in the organisation's real context.

The practical loop is simple:

1. Use AI to prepare an internal draft. 2. Have a responsible person review it. 3. Record what was wrong, missing, or useful. 4. Turn repeated corrections into prompts, examples, checks, or evals. 5. Promote only the parts that become reliable.

That loop is how a team moves from interesting demos to dependable workflows.

How It Compounds

First-draft use compounds because each reviewed draft improves the operating environment around the agent.

The team writes better instructions because it learns what the agent misunderstands. It improves retrieval because reviewers see which sources were missing. It creates better templates because repeated drafts reveal the structure of the work. It defines clearer tool boundaries because people see where preparation should stop and action should begin. It builds eval sets because real review notes become test cases.

Over time, the organisation can promote some workflows through stages:

StageAI roleHuman role
DraftPrepare material for reviewCheck and refine
RecommendPropose a next step with evidenceDecide and approve
Act with approvalPrepare the action and waitConfirm or reject
Act automaticallyExecute narrow, monitored workReview exceptions and audits

The important point is that each stage earns trust from the previous one. A team should not jump from "the model can write a good paragraph" to "the agent can act on behalf of the organisation." It should move from preparation to recommendation to approved action to narrow automation when the evidence supports that move.

What This Is Not

The first draft is not a refusal to use AI seriously.

It is also not a claim that every AI output must be manually rewritten forever. Some tasks will become stable enough for automation. Low-risk, reversible, observable actions can often move faster than high-risk, ambiguous, external actions.

The first draft is a sequencing principle. It says trust should be built inside the organisation before the system represents the organisation outside it.

It is also not a license for vague review. If the reviewer cannot see sources, assumptions, uncertainty, and authority boundaries, the process is weak. A person should not be asked to approve what they cannot inspect.

What This Looks Like In Practice

Client services

An agent prepares a client update from recent notes, open tasks, delivery dates, decisions, and unresolved risks. It does not send the update. The account owner checks tone, relationship context, commercial sensitivity, and whether the evidence supports the message.

Operations

An agent reviews incoming requests and prepares triage notes: likely category, urgency, missing information, owner, and suggested next step. A manager reviews edge cases and turns repeated corrections into routing rules.

Legal and compliance

An agent compares a request against policy and prepares a memo with relevant clauses, missing facts, and open questions. A professional decides whether the analysis is complete enough to rely on.

Product and engineering

An agent drafts test cases, failure modes, release notes, or migration checklists. Engineers review whether the draft matches the real system and add missed edge cases to the test suite.

The Conclusion

The first useful role for AI in professional work is often not the final outcome. It is the first draft.

That draft helps teams prepare, think, test, and learn. It lets them see how the agent behaves while preserving human judgment over work that carries consequence. It reduces early adoption risk because mistakes remain internal and reviewable. It creates the raw material for better prompts, better retrieval, better evals, better tool boundaries, and eventually better delegation.

AI trust is not built by handing agents forward-facing responsibility on day one. It is built through repeated, reviewed preparation until the organisation knows what the agent can do, what it cannot do, and where human judgment still matters.

Sources

  1. NIST AI Risk Management Framework 1.0
  2. NIST AI Risk Management Framework overview
  3. Anthropic, "Building effective agents"
  4. OpenAI, "Evals"
  5. McKinsey, "The State of AI: How organizations are rewiring to capture value"
  6. Dhanush Kumar, Jenna Wiens, Michael Madaio, "Human oversight of agentic systems in practice"

/ Start

Start with one operating area. Expand from there.

Begin with a focused review rhythm, workflow, or team where better operating context would immediately change the quality of preparation and judgment.

Book a demo
© 2026 Interfacing Research Laboratory
All rights reserved.