June 21, 2026Concepts

Agentic Behaviour

A practical guide to agentic behaviour: how systems progress from chatbots to workflows, tool use, and goal-directed systems that need context, feedback, boundaries, and review.

TLDR

Written June 21, 2026

Agentic behaviour is not a binary jump from chatbot to autonomy; it is a progression from response, to workflow, to tool use, to goal-directed operation.
The more agentic a system becomes, the more it needs context, feedback, permissions, checkpoints, measurable outcomes, and review.
Organisations become usable by agents when work is exposed as readable state with clear tools, boundaries, and review paths.

Agentic behaviour is the ability of software to pursue a goal through a sequence of decisions and actions, rather than only returning a single response.

In this article, we use the term to describe a progression. A chatbot responds. An automation follows a defined path. An agentic workflow combines model calls, tools, and checks inside an orchestrated process. An agentic system can decide more of its own path, use tools, learn from feedback, pause for review, and continue towards a stated outcome.

That progression matters because "agentic" is often treated as a switch: either a system is a chatbot, or it is autonomous. That framing is misleading. In practice, useful agentic behaviour is gradual. Each step towards autonomy increases the need for clearer context, stronger permissions, better feedback loops, measurable outcomes, and human checkpoints.

Agentic Behaviour As A Working Definition

Agentic behaviour means goal-directed software behaviour with some ability to choose intermediate steps.

The concept is older than current large language models. Research on autonomous agents has long examined systems that operate in an environment, pursue goals, and take actions. Recent LLM-based agent research focuses on how language models can support planning, memory, tool use, reflection, and action across longer tasks. A 2024 survey of emerging agent architectures similarly treats agents as systems assembled from model capabilities, memory, planning, action, and evaluation rather than as a single model call.

Anthropic makes a useful practical distinction between workflows and agents^{1Anthropic: Building effective agents}: workflows use LLMs and tools along predefined code paths, while agents let the LLM dynamically direct its own process and tool usage. That distinction is not just technical. It is the difference between software that follows a mapped route and software that can choose the route within defined boundaries.

This is why agentic behaviour should be understood as a design spectrum:

Stage	What the system does	Main design question
Chatbot	Responds to a prompt	Is the answer useful and grounded?
Automation	Runs a predefined process	Is the process stable enough to encode?
Agentic workflow	Uses models and tools inside controlled paths	Where should the system branch, check, and stop?
Agentic system	Chooses steps towards a goal	What autonomy is allowed, and how is it reviewed?

The point is not to make every system as autonomous as possible. The point is to match the level of agentic behaviour to the work.

From Chatbots To Agentic Systems

Chatbots respond

A chatbot is usually reactive. It receives a message and returns a response. That response may be intelligent, fluent, and useful, but the system is still centred on the conversation.

This can be enough. A support assistant that explains a policy, a drafting assistant that rewrites text, or an internal knowledge assistant that retrieves an answer may not need to act beyond the response. The value comes from language, retrieval, and clarity.

The limitation is that much organisational work does not end with an answer. It requires checking a record, updating a system, routing a task, preparing a draft, asking for approval, or monitoring whether an outcome happened.

Automations run fixed paths

Automation adds execution, but usually through a predetermined route. A rule says: when this event happens, do that action. Send the reminder. Create the ticket. Update the status. Move the file. Trigger the approval.

This is powerful when the work is stable. It is also legible: people can inspect the rule, test the path, and know what should happen.

The limitation is brittleness. The more the work depends on context, judgment, exceptions, or incomplete information, the harder it becomes to encode as a fixed sequence.

Agentic workflows add judgement inside structure

An agentic workflow keeps the benefits of structure while introducing model judgement at specific points. Anthropic's examples^{1Anthropic: Building effective agents} include prompt chaining, routing, parallelisation, orchestrator-worker patterns, and evaluator-optimizer loops. These patterns let a system decompose a task, classify work, run checks, compare outputs, or improve a draft against criteria without handing over the entire process to an autonomous agent.

This is often the practical middle ground. The organisation defines the path, tools, data sources, and stop points. The model helps where language, classification, synthesis, or planning is useful.

For example, a client update workflow might retrieve the matter record, summarise recent changes, check for missing evidence, draft the update, and route it to a reviewer. The model is doing real work, but the system still has a known shape.

Agentic systems choose more of the path

An agentic system has more freedom to decide the next step. It may inspect the state of work, select tools, create subtasks, execute actions, evaluate results, ask for clarification, and continue until it reaches a stopping condition.

Anthropic describes agents made a related technical move by interleaving reasoning traces and task-specific actions, so a language model can update plans while interacting with external sources or environments.

This is the point where the design problem changes. The central question is no longer only "can the model produce the right output?" It becomes "can the system pursue the right outcome, with the right context, through the right permissions, and stop at the right time?"

How To Think About Agentic Systems

Agentic systems are not defined by a dramatic user interface. They are defined by the operating contract around autonomy.

Autonomy

Autonomy should be specific. A system might be allowed to read documents, draft a response, create a task, update a status, or submit a transaction. Those are different levels of authority.

Good design separates "can prepare" from "can execute." Many useful agentic systems should prepare work for review before they are allowed to act directly.

Tools

Tools are how an agent changes from a text system into an operating system participant. IBM describes AI agents that tool definitions deserve careful design because agents depend on clear interfaces to use external services reliably.

Tool design should be narrow, documented, testable, and permissioned. A tool that says "update client record" is usually too broad. A better tool names the exact record type, allowed fields, required source evidence, validation rules, and audit behaviour.

Feedback

Agentic behaviour depends on feedback. Without feedback, a system cannot know whether its last action moved it closer to the goal.

Feedback can come from retrieval results, test runs, workflow state, user corrections, reviewer decisions, metrics, or system errors. Anthropic's evaluator-optimizer pattern^{1Anthropic: Building effective agents} is useful here because it treats evaluation as a loop: generate, assess against criteria, then improve when the criteria justify it.

Permissions

Permissions are not just a security layer. They are part of the agent's understanding of the work.

An agent should know what it can read, what it can write, what it can suggest, what it can send externally, and what always requires approval. This is especially important when the system can cross boundaries between clients, matters, departments, finance records, personal data, or regulated decisions.

Checkpoints

Checkpoints are deliberate moments where the system pauses. They can be mandatory approvals, confidence thresholds, budget limits, maximum iteration counts, escalation rules, or "ask a human" moments.

NIST's AI Risk Management Framework^{2NIST: Artificial Intelligence Risk Management Framework 1.0} places governance, measurement, management, accountability, documentation, monitoring, and human-AI oversight inside the lifecycle of trustworthy AI systems. For agentic systems, those controls are not paperwork after the fact. They are design requirements.

Measurable outcomes

Agentic systems need measurable outcomes because autonomy without evaluation becomes theatre.

Useful measures include task completion, reviewer acceptance rate, correction rate, elapsed time, error rate, escalation quality, avoided rework, user satisfaction, and downstream outcome quality. Anthropic's practical examples^{1Anthropic: Building effective agents} point to customer support and coding partly because they have clearer success criteria: resolved cases, verified tests, and reviewable outputs.

Organisational Legibility For Agents

An agent cannot work responsibly inside an organisation it cannot read.

Most organisations already contain the raw material agents need: documents, messages, meetings, tasks, records, approvals, deadlines, budgets, relationships, and decisions. The problem is that this material is scattered across tools and often depends on tacit knowledge.

For an agent, tacit knowledge is a missing interface. Relevant work has to enter from documents, emails, calendars, meetings, case systems, finance tools, task boards, support systems, and databases with enough source context to know who said it, when it changed, which record it belongs to, and what authority it carries.

That information then has to become concepts the organisation can trust: client, matter, owner, source, deadline, approval, risk, renewal, task, decision, and outcome. Organisational language matters here. If the system cannot distinguish a draft from an approved document, a note from a commitment, or a suggestion from a decision, agentic behaviour becomes risky.

The operating state has to be durable, queryable, permissioned, and usable through controlled tools: search, retrieval, APIs, editors, workflow actions, queues, review surfaces, and write-back paths.

This is the point where the progression becomes practical. A chatbot can answer from retrieved context. An automation can trigger from structured events. An agentic workflow can combine retrieval, checks, and tools. An agentic system can pursue a goal because the organisation exposes a readable, permissioned environment.

Boundaries Make Autonomy Useful

The more agentic a system becomes, the more explicit its boundaries need to be.

Those boundaries should cover:

Purpose: what the system is for, and what it is not for.
Scope: which records, teams, clients, domains, or processes it can touch.
Authority: what it can read, draft, update, send, approve, or only recommend.
Evidence: which sources are authoritative, stale, incomplete, or disputed.
Review: who checks outputs, when review is mandatory, and what reviewers must see.
Failure: what happens when the system is uncertain, blocked, looping, or receiving conflicting instructions.
Measurement: how the organisation knows the system is improving the work.

This is not a pessimistic view of agents. It is the practical condition for trusting them. NIST frames AI risk management^{3NIST: AI Risk Management Framework overview} as a way to improve the ability to incorporate trustworthiness into the design, development, use, and evaluation of AI systems. Agentic systems make that design discipline more important because the system is not merely producing content; it may be changing the state of work.

When This May Not Be Needed

Not every useful AI feature needs agentic behaviour.

If the task is a single response, a well-grounded chatbot may be enough. If the task is repeatable and stable, traditional automation may be clearer, cheaper, and more auditable. If the process is rare, low-value, or highly sensitive, human work with AI assistance may be better than delegating steps to a system. If success cannot be measured, it is usually too early to add autonomy.

Anthropic's guidance^{1Anthropic: Building effective agents} is direct on this point: start with the least complex effective solution and add agentic complexity only when it demonstrably improves outcomes. Agentic systems often trade cost and latency for better task performance, so the tradeoff needs to be justified.

The practical test is:

Does the task require multiple steps?
Are the steps hard to predict in advance?
Can the system get reliable feedback from the environment?
Are tools available through safe, narrow interfaces?
Are permissions and checkpoints explicit?
Is there a measurable outcome?
Can a human review the important decisions?

If the answer is mostly no, the organisation probably needs better retrieval, clearer workflow, or narrower automation before it needs an agentic system.

The Practical Argument

Agentic behaviour is not a leap from chat to autonomy. It is a ladder.

At the bottom, systems respond. Then they follow workflows. Then they use tools within structured paths. Then they pursue goals with more freedom. Each step can be useful, but each step also raises the standard for context, feedback, permissions, checkpoints, and measurement.

This is why agentic systems are organisational systems, not only AI features. The model matters, but the model is not enough. The organisation has to expose the state of work in a form the system can read and act on safely. It has to define what the system can do, what it must not do, and when people remain responsible.

The useful future is not "fully autonomous everything." It is work systems where software can read context, use tools, prepare actions, ask for judgment, learn from feedback, and move towards measurable outcomes inside clear boundaries.

Sources

/ Start

Start with one operating area. Expand from there.

Begin with a focused review rhythm, workflow, or team where better operating context would immediately change the quality of preparation and judgment.

Book a demo