Making Agentic Behaviour Predictable
Making Agentic Behaviour Predictable
Agentic behaviour is stochastic, fluid, adaptive. This is a feature. But it brings challenges, particularly when it comes to security and predictability, two things that more serious applications (regulated, business critical) cannot do without.
Google DeepMind's CaMeL system tackles agent security by enforcing deterministic execution: the model proposes a plan; a deterministic interpreter executes it, checking every tool call against security policies with conventional software.
IBM's hybrid approach to codebase modernization arrives at a similar principle from a different domain — pairing deterministic structural analysis with agentic automation, using code analysis and dependency mapping to "ground modernization in truth" and constrain everything the AI touches. Different problems, same instinct: keep LLMs out of the execution path.
The security logic is sound. Prompt injection and agent goal hijacking remain unsolved by probabilistic defenses. The agent's behaviour is an ever-evolving surface that human permissioning systems can't easily wrap around.
Leaving LLMs out of execution gives you deterministic guarantees, at a cost.
CaMeL requires every task expressible as an up-front plan. IBM needs extensive deterministic tooling as the foundation. Both demand significant setup, and both constrain what agents can do. Pure deterministic execution defeats much of the purpose of agentic applications — the flexibility, the adaptability, the ability to handle situations nobody pre-programmed for, which is the actual appeal.
But is this appeal about truly needing novel, agentic behaviour at every execution, or is it the need to not think about pre-programming?
In other words. Do you need an agent because the task is truly so fluid it needs a slightly different approach every time, or is it because the task is not fully known yet? Where do most use cases actually fall?
The Plug-in Hybrid
Plug-in hybrid cars are an excellent metaphor for the kind of solution I'm looking for.
Pure EVs bring zero emissions, are cheaper per km but matching combustion's range comes at a high cost. Combustion engines can do much higher ranges at higher per-km cost and emissions, but lower upfront investment. PHEVs take the pragmatic middle ground: electric for daily commutes, gas for road trips.
The case for PHEVs comes from looking at actual usage patterns: for the average driver, most trips fall within electric range of the average PHEV. The average German commuter drives just shy of 20 km per leg, a PHEV with 40 km electric range that charges at night will have essentially the same emissions profile as a Tesla Model 3, at a fraction of the battery cost.
Look at actual use cases and design smartly, rather than _maxxing a purist architecture.
Same for agents. Not every task needs full agentic reasoning at every step. Many tasks — probably most — have a fixed structure once the approach is figured out. The intelligence is needed to design the solution, not to execute it every time. An agent that can switch modes — agentic reasoning to understand a new task, then crystallizing the solution into a deterministic workflow — gets the benefits of both without committing to either.
You don't need to be a purist about the powertrain. You need to look at what trips you're actually taking.
So, what trips are we actually taking?
What's the "Average Drive" of Personal Agents?
To test this intuition, Claudio and I surveyed r/OpenClaw and cataloged 25 distinct use cases people described. The tasks cluster into clear patterns.
Communication pipelines. Email triage and auto-replies connected to Microsoft 365. Cold outreach systems linking Apollo, Hunter.io, and ZeroBounce — the agent finds leads, writes emails, verifies addresses, populates campaigns. LinkedIn connection requests and DMs. The structure: read inbox, decide, draft, send.
Content production. Batch-shoot videos, dump to Google Drive; the agent writes captions learned from 30+ top creators, uploads via Publer on a schedule. Upload a .docx, agent formats and publishes to WordPress with SEO in under 40 seconds. Diagrams and visual explanations generated for blog posts. The structure: create, format, publish.
Monitoring and digests. An agent that wakes via cron at 7 AM, crawls r/LocalLLaMA, GitHub trending, and Hugging Face papers, then delivers a curated digest via Telegram. SEO keyword reports. Weather and medication reminders. The structure: trigger, crawl, process, notify.
Business operations. Proposal generation from call transcripts — agent learned from hundreds of past proposals, builds value-based pricing, sends to PandaDoc. One user described sending $150K proposals with minimal manual review. CRM automation pushing leads through HubSpot pipelines. Full VPS management for an ecommerce agency. The structure: trigger, template, fill, submit.
Personal productivity. A Notion-based "Mission Control" managing calendar, projects, content, clients, and onboarding. Meal planning that cross-references a recipe book, produces a shopping list, and populates a Walmart cart — the user just checks out. Personalized voice briefings via ElevenLabs summarizing the day. The structure: aggregate, process, present.
Development. Coding agents that build scripts overnight. Two agents on separate machines left in a Discord channel that autonomously built a 3-layer memory architecture, negotiated roles, and synchronized via Git. Self-debugging agents fixing their own broken updates. This cluster genuinely needs agentic reasoning at every step.
The pattern is hard to miss. Roughly 20 of 25 use cases are fundamentally pipelines: trigger, fetch, process, output. The agentic intelligence was needed to design the workflow — understanding what to scrape, how to format, when to trigger. Once designed, the execution is a fixed sequence of steps with LLM calls in between. Far more fixed than a pure agentic loop.
All of these use cases require some long-range flexibility as well. All workflows become stale as trends change, specs change, the user's needs change, the business's strategic landscape changes. All of these workflows need agentic maintenance. But not execution.
Most trips fall within electric range, with the occasional visit to the gas station to adjust and adapt.
Three Benefits of Crystallisation
If most agent tasks can be pipeline-shaped, there's a strong case for having agents crystallize their behaviour — shift from agentic execution to deterministic workflows once the task structure is known, and maintain workflows rather than skills or guidelines.
Three benefits make this worth pursuing:
Permissioning ergonomics
An agent asks you to approve a chain of ten three-line bash calls, each with four piped commands. You're staring at curl -s "https://..." | jq '.results[]' | grep -E "pattern" | awk '{print $2}' ten times in a row, trying to verify nothing is exfiltrating data. This is the permission fatigue problem made concrete. What the hell is in that URL? Was that jq switched to a sh # on the 11th iteration? Sure, containerize your agent, but if they can see sensitive data and the outside world they can still compromise you.
Now compare: the agent presents a documented workflow. "This script checks three listing sites for boats matching your criteria, extracts prices and locations, and writes results to a spreadsheet." You read the code once, understand the scope, approve with confidence.
Same operation. Radically different review experience. It's easier for a human to approve a clearly documented, deterministic workflow than to audit a chain of opaque shell commands. This constraint is enough to define a reasonable permissions set. The crystallized version is reviewable, editable, and bounded. The agentic version asks you to trust each step in isolation without seeing the whole picture.
Predictability
Code is deterministic. Once the needed behaviour is known, it is better for the agent to crystallize it into a workflow than to re-derive the approach from scratch every time.
A crystallized workflow does the same thing every run. Code can be made to fail loudly, and the model can make adjustments. This should be designed in: workflows that cry when they fail. Explicit error handling. Clear logging. Defined success criteria.
But code makes it possible.
Token savings
This is the most concrete benefit. Take monitoring used boat prices — a real enough task.
The agentic approach: pass a full HTML listing page from YachtWorld.com to Claude Opus every day. The model parses hundreds of kilobytes of HTML, extracts prices, compares to yesterday, drafts a summary. Every day, you're paying for large-context inference to do what a 50-line Python script with BeautifulSoup could do deterministically.
The crystallized approach: have the agent write a good crawler once. The crawler runs on a cron job, processes HTML deterministically, stores structured data. The model monitors for predetermined triggers: a price threshold, an exception, a regular review of structured data to produce a summary and re-evaluate crawling criteria and thresholds.
Where This Goes
Having agents switch aggressively from agentic to deterministic behaviour — defaulting to crystallisation once a task structure stabilises or even looks stable at first shot — is an interesting default I want to explore.
Agents that prefer workflows over repeated reasoning, that build infrastructure for themselves rather than solving the same problem fresh every day.
The plug-in hybrid doesn't agonize over which mode to use. Electric when electric works. Gas when it has to. An agent with the same disposition — agentic when exploring, deterministic when executing known patterns.
I'm building toward this. Will share results shortly.
-- A