True agency: what an agent does when you let it operate

Fourth article in the cognition / doctrine series. If the real cost of inference now makes a deep use of AI on critical acts sustainable, and if the throttling of consumer tools forbids that very same use to Copilot or ChatGPT in a chat interface, the question becomes: through which technical object does one move, concretely, from the prompt to autonomous operation on a tender? The answer is called an agent -- provided you know which one.

In a large French IT services firm, in March 2026, the head of a business unit invited his teams to a product demonstration. The vendor had come to present "the first AI agent capable of responding to a tender on its own." The demo was convincing. The user uploaded a CCTP, clicked a button, and three minutes later a fifty-page technical proposal appeared, peppered with references and calibrated to the criteria of the consultation rules. The room applauded; the director scheduled a POC.

Three months later, the POC was abandoned. The tool that had seemed magical in the demo proved incapable of holding up against a real DCE. On the first live tender, it generated a proposal that mixed in references from other clients, missed one of the eliminatory requirements, and returned, for the weighting formula, a faulty analysis that a senior bid manager would not have fallen for in ten seconds. The internal conclusion landed at the debrief meeting: "that was not an agent. It was a workflow in disguise."

The conclusion is exact, and it is shared by roughly every management team that has seriously tested a product stamped "AI agent" in 2025-2026. At this hour, the word covers technical objects of profoundly divergent natures, only one of which truly changes the arithmetic of the work. This article untangles the confusion, retraces the research trajectory that made the third object viable, and proposes the concrete framework management teams should use before signing the word "agent" onto a set of requirements.

Untangling what the word covers

At the bottom of the spectrum, the dressed-up chatbot. Under the hood, a prompt interface to a language model, augmented with a few instructions hidden in the system prompt and a branding that speaks of an agent. The user types a request, the model answers, the cycle ends. No choice of tool. No state memory between turns. No capacity to act on an external system. Microsoft Copilot, ChatGPT in its standard interface, Claude.ai in chat mode belong to this class -- whatever the sophistication of the model they embed.

One notch up, the driven workflow. A product that orchestrates a sequence of predefined steps, each one possibly delegated to a language model for text production, but whose sequence itself is frozen at design time. The vendor has written a graph: read the CCTP → extract the requirements → generate a draft → produce the final response. At each step, an LLM call may intervene. But the LLM never decides whether a step should be taken, in what order, or whether one should be added. The logic is exogenous, declared, verifiable. Zapier, n8n, Make, and the near-totality of tools stamped "agent" in 2025 belong to this class.

At the very top of the spectrum, the true agent -- a system in which the language model itself chooses the actions to undertake, starting from an expressed intention and an environment it observes. It has a repertoire of tools -- read a file, write a file, call an API, run code, query a database, launch a search, delegate to another agent. At each turn of the loop, it observes the state of the world, chooses the next action, executes it, observes the result, updates its mental state. The sequence emerges as it operates, without any vendor having written it in advance.

What separates these objects can be summed up in a minimal framework.

Criterion	Chatbot	Workflow	True agent
Choice of actions	None	Frozen by the vendor	Decided by the model
State memory	None between turns	Variables passed step to step	Persistent representation, kept updated
Revision loop	None	Linear or deterministic branching	Can backtrack, restart, request intervention

The decisive criterion is the first. On the very same tender, a true agent may one day choose to call a calculation tool after reading three pages, and another day to re-read the entire DCE before any calculation, because the context of the second engagement led it to judge the stakes lay elsewhere. This sequence autonomy defines the class -- and, by construction, forbids guaranteeing it through a suite of test cases.

The trajectory that made the object possible

The concept's birth certificate is precise. Yao et al., in November 2022, published "ReAct: Synergizing Reasoning and Acting in Language Models" at NeurIPS. The proposed pattern is simple to state and powerful in execution: explicitly alternating, within the model's reasoning chain, Thought steps where the model expresses what it believes it should do, Action steps where it picks a tool from a predefined list and formulates the call, and Observation steps where it receives the tool's result and incorporates it into its chain. The cycle continues until a Finish step by which the model declares the task accomplished. For the first time, the LLM stopped confining itself to producing a text -- it conducted a mission.

ReAct's immediate limitation surfaces as soon as an agent strings several attempts together: it does not know it has erred, and reproduces the same mistake on every restart. Shinn et al., in "Reflexion: Language Agents with Verbal Reinforcement Learning" (NeurIPS 2023), corrected this by adding a self-critique loop. At the end of each attempt, the agent writes a report of what worked and what failed, keeps that report in long-term memory, and uses it to inform the next attempt. Performance rose sharply on reasoning benchmarks -- HotpotQA, HumanEval for code, ALFWorld for interactive environments.

Wang et al. published, in March 2023, "Voyager: An Open-Ended Embodied Agent with Large Language Models," which pushed the logic into Minecraft. Over days of autonomous exploration, Voyager progressively builds a library of reusable skills -- "how to craft a stone pickaxe," "how to find iron" -- which it accumulates and combines to solve objectives of growing complexity. The demonstration is unsettling: an agent can build its own repertoire of expertise through exploration, without any skill having been hand-coded.

The next industrial step is less glorious. AutoGPT, launched in March 2023 and massively adopted, illustrates the limits of the first generation of consumer agents. The system loops, loses its state, hallucinates its tools, burns through API budgets without converging. The experience reports documented in 2023-2024 -- "95% of non-trivial attempts fail," "context drift makes the agent unusable beyond fifty actions" -- made the industry wary of the word agent for eighteen months.

The industrial turning point came, in 2025-2026, from a cluster of converging technical maturations -- which had never coexisted before. The context window extended to a million tokens now lets the agent hold the state of a long mission without drifting, where the ceiling at 32 or 128k tokens made it lose the thread by the fiftieth turn. Native tool use, formalized by Anthropic in "Building effective agents" (2024) and by OpenAI in the "function calling" specification, reaches a reliability above 99% on public benchmarks -- τ-bench, AgentBench, ToolBench -- whereas a 2023 agent saw its chances of success fall to 50% after ten consecutive calls. The maturity of so-called computer use architectures -- a capability published by Anthropic in October 2025, refined in 2026 -- opens the agent to work in non-instrumented tools: moving the cursor, clicking, reading the screen, typing on the keyboard. And the cost of inference, brought under Opus 4.7 to a range of 150 to 400 dollars for a complete tender, becomes compatible with a budget whose total is counted in the tens of thousands of euros -- a range detailed in the article on the real cost of inference.

In that same period the architectural doctrine that had been missing took shape. The Supervisor-Worker pattern, by which a supervising agent orchestrates specialized sub-agents. The Planner-Executor pattern, by which a planning agent decomposes the mission before an execution agent conducts it. The ReAct + Reflexion + memory hierarchy combination, which became the implicit standard of the serious agentic products shipped in 2026. The literature -- Wang et al. "A Survey on Large Language Model based Autonomous Agents" (2024), Xi et al. "The Rise and Potential of Large Language Model Based Agents" (2023), Anthropic's white papers "How we built our multi-agent research system" (2025) -- now provides an operational framework that did not exist two years ago.

The agents of 2026 are in no way improved AutoGPTs; they belong to another generation of technical objects. Most organizations evaluating them today start from a mental representation inherited from the 2023 products -- which leads them to underestimate what a true agent can now do, while overestimating what a workflow in disguise pretends to do in its place.

The concrete framework buyers should use

The category error -- buying a workflow in disguise for a cognitive use, or a true agent for an industrial use -- has become, in 2026, the most costly error of management teams investing in AI. The decision framework, however, holds in few words.

For occasional conversational assistance -- drafting an email, summarizing a note, a first version of a short brief, brainstorming a closed question -- the dressed-up chatbot suffices. Copilot, ChatGPT, Claude.ai in chat mode cover the use legitimately, and the inference overhead of agency on these objects remains unjustified.

For the repetitive sequence with stable rules -- onboarding a new user across several systems, batch processing of homogeneous documents, automatic generation of internal memos, exporting a CRM to a reporting tool -- the driven workflow is the appropriate tool. The sequence is known, exceptions are rare, predictability prevails over adaptability. Entrusting these objects to a true agent costs more for an equivalent result, or even one that is less reliable, because the agent retains the freedom to misinterpret an instruction that a workflow would execute without a second thought.

For the complex mission with an unpredictable sequence -- strategic analysis of a tender, transversal audit, competitive review, instructing a decision under incomplete information, conducting a tender response -- the true agent changes the arithmetic. The sequence of actions cannot be written in advance; it depends on what the agent will discover reading the first documents, on the strategic inflections it will identify by cross-referencing sources, on the points of divergence that will appear only after the fifteenth turn. On these missions, the workflow in disguise produces a smooth median deliverable; the true agent produces a deliverable that resembles the work of a competent junior supervised by a senior. The difference is measured in win rate, in margins on contracts won, in person-hours bought back.

On a tender, what a true agent does

The bid manager expresses an initial intention -- "study this DCE, identify the appropriate response strategy, and produce a first skeleton of a technical proposal consistent with my track record." From there, the agent operates.

It opens the documents, reads them, identifies some as structuring and others as incidental. It cross-references the weighting formula with the volumes of the DQE, spotting the zones of strong price sensitivity. It goes back to the CCTP to verify a requirement whose initial extraction struck it as ambiguous. It invokes a pricing simulation tool and notes that the formula structurally favors the incumbent -- it flags this as a strategic point. It consults the internal track record and identifies a few transferable references. It drafts a chapter, re-reads it, detects an internal contradiction, rewrites it. Then it stops, formulates an explicit question to the human -- "the strategy seems to require a trade-off between margin and win rate; what is the priority?" -- and waits for the answer before continuing.

None of these actions were scripted. It is the agent that decides, at each turn, what should be done -- invoke a tool, re-read a passage, stop, ask. The initial reading of the CCTP remains accessible twenty actions later, because the mental state is persistent. The contradictory draft is corrected because a self-critique loop was triggered. The question to the human emerges because the agent has identified the boundary of what it knows how to do, rather than manufacturing a confident answer on ground where human judgment is required.

This last capacity -- knowing how to name the zone where one stops -- on its own constitutes one of the most reliable marks of serious agency. Epistemological marking plays a central role here. A workflow in disguise carries on to the end by construction, because no branch was coded to handle doubt. The difference ceases to be cosmetic the moment it protects the organization against the smooth, structurally insufficient deliverables that the Copilot illusion had already documented on another terrain.

What management teams should stop and start

Stop calling agent what is not one. The vocabulary was debased in 2024-2025 by vendors and the trade press. An internal arbitration request phrased as "should we buy this agent?" concerns, nine times out of ten, a workflow in disguise. The minimal framework -- choice of action, state memory, self-correction loop -- should appear in any set of requirements that speaks of an agent. If the supplier cannot, or will not, qualify its product on these criteria, the doubt is settled.

Engage true agency on critical cognitive acts. Complex tender responses, transversal audit, due diligence, instructing a decision under incomplete information. The 2026-2028 trajectory is now legible: the organizations that will have engaged true agency will have, two years from now, a methodological lead over those that will have persisted in conflating a chatbot demo with an operational system. The cost of inference is the entry ticket -- it remains compatible with the budgets of critical acts, as the previous article documented in detail. The cognitive ticket -- human framing upstream, epistemic operators laid down by hand, change management for bid managers and consultants -- constitutes the real half of the investment.

True agency does not substitute the machine for the human. It frees the person-days hitherto devoted to what the human should not be doing -- extracting, listing, cross-referencing, verifying, formatting, writing a first draft -- to redeploy them toward what only the human can do: strategic framing, arbitration, the signature of the final operator. A different division of labor, rather than a substitution.

The machine can now conduct the mission.

The meaning of the mission still has to be laid down oneself.

Primary sources: Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," NeurIPS 2022. Shinn et al., "Reflexion: Language Agents with Verbal Reinforcement Learning," NeurIPS 2023. Wang et al., "Voyager: An Open-Ended Embodied Agent with Large Language Models," arXiv 2305.16291, 2023. Wang et al., "A Survey on Large Language Model based Autonomous Agents," Frontiers of Computer Science, 2024. Xi et al., "The Rise and Potential of Large Language Model Based Agents: A Survey," arXiv 2309.07864, 2023. Anthropic, "Building effective agents," anthropic.com, December 2024. Anthropic, "How we built our multi-agent research system," anthropic.com, 2025. Anthropic, "Computer use," October 2025 and 2026 updates. OpenAI, "Function calling and the Assistants API," platform.openai.com. Park et al., "Generative Agents: Interactive Simulacra of Human Behavior," UIST 2023. Yao et al., "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," NeurIPS 2023. Liu et al., "AgentBench: Evaluating LLMs as Agents," ICLR 2024. τ-bench (Sierra AI), 2024.

True agency: what an agent does when you let it operate

True agency: what an agent does when you let it operate

Untangling what the word covers

The trajectory that made the object possible

The concrete framework buyers should use

On a tender, what a true agent does

What management teams should stop and start

Ready to transform your tender response?

Recommended articles

Your bid reviews are useless — and AI is about to prove it

What the Assistant Makes Visible — Four Tiers of Reciprocity

Pre-sales is an exercise in command -- and you are leading it without a staff map