[{"data":1,"prerenderedAt":555},["ShallowReactive",2],{"blog-list":3,"blog-count":544,"blog-tags":545},[4,193,391],{"id":5,"title":6,"body":7,"date":182,"description":183,"extension":184,"image":185,"meta":186,"navigation":187,"path":188,"seo":189,"stem":190,"tags":191,"__hash__":192},"blog/blog/why-agentic-ai-pilots-fail.md","Why 86% of Agentic AI Pilots Never Make It to Production",{"type":8,"value":9,"toc":172},"minimark",[10,14,17,20,23,26,29,32,37,40,43,46,49,52,56,59,66,72,78,81,87,93,96,100,103,106,109,112,116,119,122,125,128,131,135,138,141,144,148,151,154,157,160],[11,12,13],"p",{},"The demo was flawless. The agent ingested three documents, cross-referenced a policy database, and surfaced a recommendation in under two minutes. The room was impressed. The board approved the production budget that quarter.",[11,15,16],{},"Eight months later, the system is still not live. The AI team says the integration spec was met. The platform team says the agent produces outputs the downstream systems can't handle. The CTO is writing a board update that contains the phrase \"revised timeline\" for the third time. Nobody in the room can explain, precisely, what is blocking deployment, only that it is blocked.",[11,18,19],{},"This is not an anecdote. This is the median outcome.",[11,21,22],{},"Deloitte's latest enterprise AI survey paints a picture that should make every pilot sponsor deeply uncomfortable: 30% of organisations are still exploring agentic AI options, 38% are running pilots, 14% describe themselves as deployment-ready, and just 11% have agentic systems actually operating in production. That means roughly one in nine organisations that started down this road have anything to show for it. BCG's numbers tell the same story from a different angle: only 26% of companies have moved beyond proof-of-concept to generate material value from AI. The other 74% are stuck somewhere between \"it worked in the demo\" and \"we can't figure out why it doesn't work against real data.\"",[11,24,25],{},"Gartner reckons 30% of generative AI projects will be abandoned after the proof-of-concept stage by the end of 2025. Not paused. Not \"deprioritised pending budget review.\" Abandoned.",[11,27,28],{},"So we keep funding pilots. We keep doing demos at all-hands meetings where everyone claps. And then six months later, the project quietly disappears from the quarterly roadmap and nobody really talks about it.",[11,30,31],{},"The question nobody seems to be asking is: why does the demo working make everything worse?",[33,34,36],"h2",{"id":35},"the-demo-working-is-actually-the-problem","The demo working is actually the problem",[11,38,39],{},"Here's the thing about a successful pilot: it proves exactly one thing. That the AI logic works in isolation, against clean data, in a controlled environment, with someone nearby to catch mistakes. It proves nothing about what happens when that same logic is wired into fifteen years of inconsistent schemas, three CRM migrations, a document management system running on event-driven queues with variable latency, and a compliance framework that was designed for human decision-makers operating at human speed.",[11,41,42],{},"But that's not how it gets interpreted. A successful demo creates false confidence. It gets treated as proof that the system works. Period. Which means the remaining work gets scoped as a deployment exercise: \"we just need to hook it up to the real systems and push it to production.\" Six weeks, maybe eight. Add a couple of engineers.",[11,44,45],{},"This is the category error that kills most agentic AI initiatives. The pilot-to-production journey is not a scaling exercise. It is a full re-architecture across at least five layers that simply do not exist in the pilot: legacy system integration, governance scaffolding, observability infrastructure, failover design, and infrastructure-as-code. Every one of those layers involves different skills, different tooling, and different design decisions than the ones that made the pilot work. And in most enterprises, every one of those layers belongs to a different team.",[11,47,48],{},"The pilot was built by the AI team. The integration layer belongs to the platform team. Governance is owned by risk and compliance. Observability sits with DevOps. Infrastructure-as-code might be a shared services function, or it might be nobody's job at all. The result is that the moment the pilot leaves the sandbox, it enters an organisational structure specifically designed to ensure that no single person or team has accountability for making it work end-to-end.",[11,50,51],{},"And that is where things go sideways.",[33,53,55],{"id":54},"five-layers-nobody-budgeted-for","Five layers nobody budgeted for",[11,57,58],{},"The gap between a working demo and a production-grade system is not one gap. It is five, and they compound.",[11,60,61,65],{},[62,63,64],"strong",{},"Legacy integration is where the dream usually dies first."," The pilot was built against a clean API or a curated dataset. Production means connecting to the actual systems: a Salesforce instance that honestly nobody fully understands, a document management platform that was migrated twice and still has orphaned records from 2017, an ERP system that exposes data through a combination of REST endpoints, SOAP services, and in at least one case, a nightly CSV export. The agent that performed beautifully against curated inputs starts producing garbage or timing out when it hits real-world data at real-world latency.",[11,67,68,71],{},[62,69,70],{},"Governance added too late, or not at all."," The pattern is almost always the same: build the agent, demonstrate it works, then figure out the governance framework. Which is backwards, because governance constraints should shape the architecture, not get bolted onto it after the fact. An agentic system that makes autonomous decisions needs guardrails baked into its execution path: confidence thresholds that trigger human-in-the-loop escalation, audit trails that capture not just what the agent decided but why, and policy enforcement that operates at execution speed, not at quarterly-review speed. These are architectural decisions that change how the system is built from the ground up.",[11,73,74,77],{},[62,75,76],{},"Observability for agentic systems looks nothing like traditional application monitoring."," A standard web application fails with a 500 error or a timeout. An agentic AI system fails in ways that are far harder to detect, because the failure mode is often a plausible-looking wrong answer rather than a system crash. The agent runs at 2am, processes a batch of loan applications, and makes a recommendation based on a hallucinated interpretation of a document it couldn't fully parse. Nothing in your APM dashboard flags this. No alert fires. The error is silent drift, not a thrown exception. It shows up three weeks later in the financials.",[11,79,80],{},"Production-grade observability requires distributed trace logging across the full agent chain: every tool call, every retrieval step, every decision branch, every confidence score.",[11,82,83,86],{},[62,84,85],{},"Failover design for non-deterministic systems requires different thinking."," When a traditional service fails, you retry the request or fail over to a replica. When an agentic workflow fails mid-chain, three tool calls deep, holding intermediate state, with a partially completed decision, there is no simple retry. Hallucinations don't raise exceptions. They produce plausible-looking outputs that propagate downstream silently. Production agent workflows require durable execution patterns: state checkpoints after each meaningful step, allowing agents to survive restarts without replaying completed work. Almost nobody provisions this during the pilot.",[11,88,89,92],{},[62,90,91],{},"Infrastructure-as-code is the layer that makes everything else reproducible, and it's usually missing entirely."," The pilot was deployed manually. Production-grade deployment means staged rollouts with the ability to cut back instantly if metrics deviate, environment parity between staging and production, and the entire infrastructure defined in code, reviewed, versioned, and deployable by anyone on the team without tribal knowledge. This is the difference between a sandbox experiment and a production system.",[11,94,95],{},"None of these five layers is individually impossible. But they compound, and in the typical enterprise, each one is owned by a different team with no shared accountability for the overall outcome.",[33,97,99],{"id":98},"the-real-problem-isnt-technology-its-ownership","The real problem isn't technology. It's ownership.",[11,101,102],{},"Here's where the conversation usually goes wrong. The pilot stalls. The CTO diagnoses it as a resourcing problem: we need more engineers. Or a skills problem: we need people who've done this before. So the organisation hires contractors, or engages a systems integrator, or buys a platform.",[11,104,105],{},"And nothing changes. Because the problem was never resourcing or tooling. The problem is that nobody owns the outcome.",[11,107,108],{},"Adding headcount into this model doesn't help. It makes it worse. That's a team-shaped hole you're trying to fill with a single person. You need a Platform Engineer, a DevOps specialist, a Compliance Lead, someone who can handle the 2am scenarios. Filling a role not designed to succeed doesn't produce success. It produces another stakeholder in the blame loop.",[11,110,111],{},"Five specialists, zero ownership. Each person owns their layer, not the system. Accountability distributed across five teams is accountability that belongs to nobody. What actually ships is built by one team owning all five layers, accountable for the production outcome, not for their individual component working in isolation.",[33,113,115],{"id":114},"what-it-actually-looks-like-when-someone-owns-it","What it actually looks like when someone owns it",[11,117,118],{},"A fintech (Series C, growing fast) had built an agentic AI system for loan decisioning. The pilot performance was strong enough that the board approved productionisation. Eight months later, the system was still not live. The AI team and the platform team were locked in a blame loop: the AI team said the integration spec had been met; the platform team said the agent produced non-deterministic outputs that the downstream origination system couldn't handle.",[11,120,121],{},"The first counterintuitive decision was not to start building. A two-week paid discovery engagement was proposed specifically because the blame loop between teams indicated an architecture ambiguity that would make any delivery estimate unreliable.",[11,123,124],{},"The discovery finding was not what anyone expected. The AI model was not the problem. The pilot had been built assuming synchronous REST calls to the document management system. The production DMS operated on an event-driven queue with variable latency: between 200 milliseconds and 14 seconds depending on document complexity. The agent's timeout logic was written for synchronous response patterns, causing cascading failures on any document above a complexity threshold. This wasn't an AI problem at all. It was a messaging architecture problem upstream of the agent entirely.",[11,126,127],{},"The integration layer was redesigned around an async event-driven architecture with explicit state management. Observability infrastructure was provisioned from day one. Governance checkpoints for human-in-the-loop escalation on edge-case credit decisions. The entire system deployed using infrastructure-as-code with staged rollout capability.",[11,129,130],{},"Production deployment: 11 weeks from discovery completion. First live loan decision: week 12. Zero unplanned downtime in the first 90 days.",[33,132,134],{"id":133},"what-zillow-should-have-taught-us","What Zillow should have taught us",[11,136,137],{},"The most expensive version of this failure is when the agent actually makes it to production, without the layers.",[11,139,140],{},"Zillow's iBuying pricing agent is the canonical case. The system was making autonomous purchasing decisions: real capital deployed based on algorithmic output. When market conditions shifted faster than the model's training data could account for, the agent began systematically overpaying for homes. No observability infrastructure flagged the drift before it showed up in the quarterly results. Zillow shut down the entire division and reported a write-down in the hundreds of millions.",[11,142,143],{},"The Air Canada case is the governance version. Their customer-facing AI chatbot fabricated a bereavement fare refund policy. The British Columbia Civil Resolution Tribunal ruled Air Canada fully liable. The precedent was set: if your agent says it, you own it, regardless of whether a human approved the output.",[33,145,147],{"id":146},"the-40-question","The 40% question",[11,149,150],{},"Gartner predicts that 40% of enterprise applications will embed task-specific AI agents by the end of 2026, up from less than 5% in 2025. Against the current production success rates, this projection implies one of two things: either the industry is about to figure out how to close the pilot-to-production gap at unprecedented speed, or we are about to see the most expensive wave of stalled enterprise technology initiatives since the first cloud migration cycle.",[11,152,153],{},"The organisations that will ship are the ones that treat the pilot-to-production transition for what it actually is: not a deployment, but a re-architecture requiring unified ownership across integration, governance, observability, failover, and infrastructure. One team. One accountability structure. One definition of \"done\" that means the system is operating in production under real load and real regulatory scrutiny, not that it passed an internal demo.",[11,155,156],{},"Whether yours is in the 40% or the 60% comes down to one question that has nothing to do with your model, your data, or your AI team's capability: who actually owns the delivery?",[158,159],"hr",{},[11,161,162],{},[163,164,165,166,171],"em",{},"Vanrho takes agentic AI pilots from demo to production under single-team ownership. Integration, governance, observability, infrastructure. If your pilot is stalled between teams, ",[167,168,170],"a",{"href":169},"/#contact","let's talk about what's actually blocking it",".",{"title":173,"searchDepth":174,"depth":174,"links":175},"",2,[176,177,178,179,180,181],{"id":35,"depth":174,"text":36},{"id":54,"depth":174,"text":55},{"id":98,"depth":174,"text":99},{"id":114,"depth":174,"text":115},{"id":133,"depth":174,"text":134},{"id":146,"depth":174,"text":147},"2025-03-30","Most enterprise AI pilots die between demo and deployment. Here's the architecture of why, and what it takes to ship.","md","/images/blog/agentic-ai-pilots.webp",{},true,"/blog/why-agentic-ai-pilots-fail",{"title":6,"description":183},"blog/why-agentic-ai-pilots-fail","ai, enterprise, engineering, production","UY1PLqAHk7LcNeG5oVU_DIkQmlfTZvere8HdZ8K5IRU",{"id":194,"title":195,"body":196,"date":382,"description":383,"extension":184,"image":384,"meta":385,"navigation":187,"path":386,"seo":387,"stem":388,"tags":389,"__hash__":390},"blog/blog/project-your-recruiter-cant-place.md","The Project Your Recruiter Can't Place: Why Complex Builds Are a Revenue Opportunity, Not a Rejection",{"type":8,"value":197,"toc":372},[198,201,204,207,210,214,217,220,223,226,230,233,239,245,251,254,258,261,264,267,271,274,277,280,283,286,289,293,296,299,302,305,309,312,315,318,321,324,327,330,333,336,340,343,346,350,353,356,359,362,364],[11,199,200],{},"Your best client just called. Not to hire: to build. They need a team to take over a half-finished platform, architect something from scratch, or deliver an outcome that doesn't decompose into \"find me three React developers.\" You know the brief is real. You know the budget is there. And you know your firm has no clean answer.",[11,202,203],{},"So the conversation dies. Not dramatically: nobody hangs up angry. It just... tapers. \"We'll keep you in mind.\" The client finds a dev shop through someone else's network. That dev shop delivers (or doesn't), and either way, they're now in your client's trust circle. Sitting in the seat you spent three years earning.",[11,205,206],{},"This is not an edge case. If you run a tech recruitment firm in the UK or South Africa, this conversation happened last month. It will happen again next month. And every time it does, you are handing qualified, budgeted demand back to the market: demand that arrived through relationships you built, using trust you earned, generating revenue for someone who did neither.",[11,208,209],{},"The custom software development market is projected to hit $146 billion globally by 2030, growing at over 20% annually. That is not a niche. That is a market category flowing through your client conversations, and flowing straight past your P&L.",[33,211,213],{"id":212},"the-conversation-you-keep-having-and-keep-losing","The conversation you keep having (and keep losing)",[11,215,216],{},"Every technical recruiter has a version of this story. A client's engineering lead calls to ask if you know anyone who can \"finish a build\" or \"take over a project.\" The recruiter hears the brief, recognises it's not a placement, and faces a choice: over-promise something the firm can't deliver, or say \"that's not really what we do\" and watch the client's mental model of your firm shrink by one category.",[11,218,219],{},"Most recruiters, sensibly, choose honesty. They say no. But honesty has a cost here that nobody tracks.",[11,221,222],{},"The project enquiry doesn't just represent one lost fee. It represents intelligence about the client's technical roadmap. It represents wallet share that your BD team is measured on but can't capture. And it represents the moment a competitor enters the room: not another recruiter, but a delivery firm that will build a relationship with your client's CTO while you're still placing mid-level developers into their existing teams.",[11,224,225],{},"That shift is structural, not cyclical. And it means the volume of \"can you build this?\" conversations arriving in recruitment firms is going up, not down.",[33,227,229],{"id":228},"staff-augmentation-works-until-it-doesnt","Staff augmentation works, until it doesn't",[11,231,232],{},"Recruitment firms are excellent at placing skilled individuals into existing teams, within established architectures, where someone on the client side owns the technical direction. That model works. It's proven. But there's a category of work where it breaks down.",[11,234,235,238],{},[62,236,237],{},"The project has no existing team to augment."," The client needs to go from zero to a working system. They don't have an architect, a tech lead, or a delivery framework. Placing three developers into a vacuum doesn't produce a product; it produces three people waiting for direction that isn't coming.",[11,240,241,244],{},[62,242,243],{},"The scope requires unified ownership."," When the deliverable is a system, someone needs to own the architecture, the integration points, the deployment pipeline, and the production environment. Staff augmentation distributes responsibility across individuals who each own their function but nobody owns the outcome.",[11,246,247,250],{},[62,248,249],{},"The risk profile doesn't fit the placement model."," A placed developer who underperforms gets replaced. A project that fails architectural review at month four doesn't have an equivalent recovery mechanism. The client isn't buying time; they're buying an outcome.",[11,252,253],{},"79% of IT outsourcing spend is focused on application and software development. That's not staff augmentation; that's project delivery. Recruitment firms sit adjacent to that spend every day. They hear it. They qualify it, informally. And then they let it go.",[33,255,257],{"id":256},"what-the-firm-is-actually-losing","What the firm is actually losing",[11,259,260],{},"A senior developer placement in the UK might generate a fee of �15,000 to �25,000. The same client's project-build requirement (the one the recruiter turned away) might represent a delivery engagement worth �150,000 to �500,000 over six to twelve months. Even a modest referral arrangement on that engagement represents multiples of a single placement fee, generated from a conversation the recruiter was already having.",[11,262,263],{},"Now multiply that by every account director in the firm. Every client relationship where the conversation has drifted toward \"we need something built.\" Every time someone said \"that's not what we do\" and moved on.",[11,265,266],{},"And here's the competitive dimension nobody wants to think about. When your client takes that project enquiry to a dev shop, the dev shop doesn't just deliver the project. They build a relationship with the client's CTO. They learn the client's architecture, their roadmap, their pain points. They become a trusted partner. And trusted delivery partners have a habit of being asked: \"Do you also know any good developers?\" That question is the sound of your placement revenue being flanked.",[33,268,270],{"id":269},"the-improvised-referral-and-why-it-usually-goes-wrong","The improvised referral, and why it usually goes wrong",[11,272,273],{},"Some recruitment firms have tried to address this informally. A recruiter knows a freelance team, or has a contact at a small dev shop, and makes an introduction. The intent is good. The execution is almost always a disaster.",[11,275,276],{},"The problem is structural. The recruiter becomes an informal intermediary in a technical engagement they're not equipped to manage. Requirements get translated through a non-technical layer. Scope disputes land on the recruiter's desk. Timeline slippage becomes the recruiter's problem to communicate. And when the delivery fails (which, without proper discovery and architecture, it frequently does) the recruiter's client relationship absorbs the damage.",[11,278,279],{},"This is the pattern that makes recruitment firm founders reluctant to try again. They've been burned. Not by the concept of referral partnerships, but by the absence of a delivery partner with the process discipline to protect the referral source.",[11,281,282],{},"The failure mode is worth naming explicitly. Most dev shops, when presented with a partially built system and a client who believes \"a few weeks of work\" remain, will quote against the client's timeline estimate. The problem is that the client's timeline estimate is almost always wrong. Not because clients are dishonest, but because assessing remaining work on a codebase you didn't build requires a structured audit that most shops skip in favour of speed-to-quote.",[11,284,285],{},"Six months later, the project is over budget, behind schedule, and the recruiter who made the introduction is fielding calls from an unhappy client who holds them partially responsible. The recruiter didn't scope it, didn't build it, didn't manage it, but their name is on the introduction. Their reputation absorbed the impact.",[11,287,288],{},"So the lesson the firm learns is: don't refer project work. The actual lesson should be: don't refer project work to partners who skip discovery.",[33,290,292],{"id":291},"what-a-clean-referral-motion-actually-looks-like","What a clean referral motion actually looks like",[11,294,295],{},"The mechanics matter here, because the mechanics are exactly what distinguishes a defensible referral partnership from the informal brokering that burned everyone last time.",[11,297,298],{},"A clean model works like this. The recruiter identifies a project-build conversation: a client who needs something delivered, not someone placed. The recruiter makes an introduction to a delivery partner. From that point, the delivery partner owns qualification, scoping, technical assessment, and delivery. The recruiter stays in the client relationship. Referral economics flow back to the firm. The recruiter's operational footprint does not increase.",[11,300,301],{},"Three things have to be true for this to work without damaging the recruiter's reputation.",[11,303,304],{},"First, the delivery partner must qualify honestly: saying yes to everything eventually produces a failure with the recruiter's name on it. Second, the delivery partner must not compete with placement: if they also place developers, the referral creates channel conflict. Third, the delivery partner must make the recruiter look good, which means surfacing uncomfortable truths about project complexity before the client commits budget based on false assumptions.",[33,306,308],{"id":307},"the-project-that-almost-wasnt-referred","The project that almost wasn't referred",[11,310,311],{},"This is where the theory meets something that actually happened.",[11,313,314],{},"A boutique tech recruitment firm had an active placement relationship with a growing product company. The relationship was healthy: regular placements, good communication, the kind of client account that generates steady revenue. Then the client's technical lead called with a different kind of request. A previous dev shop had abandoned a partially built internal platform. The client believed a short sprint of focused work would get it to a shippable state. Could the recruiter help find someone to finish it?",[11,316,317],{},"The recruiter had no delivery capability. The honest answer was no. And saying no meant the client would find a dev shop independently, which meant a new technical partner entering the client relationship. The recruiter was about to lose the conversation entirely.",[11,319,320],{},"Instead, the recruiter made an introduction to Vanrho.",[11,322,323],{},"The client believed the platform was close to completion. Most dev shops, presented with that brief, would have quoted against the client's estimate. Vanrho recommended a Discovery engagement first: a structured codebase audit, architecture assessment, and delivery options analysis before any delivery commitment.",[11,325,326],{},"Vanrho's Discovery phase revealed a fundamental flaw in the data layer: invisible at development scale, catastrophic under production load. The kind of defect that would have surfaced as a critical incident three weeks after launch, with the client's customers in the system and the recruiter's reputation attached to the introduction.",[11,328,329],{},"Vanrho produced a Discovery report with three delivery path options: a full rebuild with an accurate timeline, a patched workaround with documented technical debt, and a hybrid staged approach. The client selected the full rebuild. The engagement was materially larger than the client's original estimate, but it was scoped against reality, not against hope.",[11,331,332],{},"The recruiter's introduction led to a meaningful delivery engagement. Referral economics flowed back to the firm. The client relationship not only survived; it strengthened, because the recruiter had connected them with a partner whose first move was to tell the truth about the project's actual state.",[11,334,335],{},"The intelligence from the project - roadmap signals, upcoming hiring needs, architecture decisions that would drive future team composition - never left the room. It flowed back through the recruiter's relationship, reinforcing their position as the client's trusted advisor.",[33,337,339],{"id":338},"the-revenue-line-hiding-inside-your-existing-client-base","The revenue line hiding inside your existing client base",[11,341,342],{},"Those same client relationships produce project-build demand. Not occasionally, but regularly. Every client who is growing, modernising, integrating systems, or launching new products will eventually need something built. That demand currently has no route through the recruitment firm's commercial model. It arrives, it's acknowledged, it's turned away.",[11,344,345],{},"Vanrho operates as a delivery engine behind the referral. The recruiter makes the introduction. Vanrho qualifies the opportunity within 48 hours, engages directly with the client on technical scope, and delivers against defined outcomes. The recruiter's workflow does not change. Their operational footprint does not increase. Their client relationship is not transferred; it's reinforced by an introduction that made the client's problem smaller instead of larger.",[33,347,349],{"id":348},"the-question-nobody-is-tracking","The question nobody is tracking",[11,351,352],{},"Most recruitment firms measure placement volume, time-to-fill, client retention, fee income per consultant. Nobody measures the project enquiries that arrived and left without being captured. Nobody tracks the revenue that evaporated from conversations the firm was already having.",[11,354,355],{},"Start tracking it. Ask your account directors this week: in the last quarter, how many client conversations included a request that wasn't a placement? How many times did someone ask \"can you build this?\" or \"do you know anyone who could take this over?\" How many of those conversations ended with \"that's not really what we do\"?",[11,357,358],{},"The number will be higher than you expect. And every one of those conversations represents a client who went looking for a delivery partner somewhere else, possibly finding one who is now sitting in the room you used to own alone.",[11,360,361],{},"Whether that demand stays in your commercial orbit or funds someone else's growth comes down to one thing: whether you have a delivery partner to route it to before you close the conversation.",[158,363],{},[11,365,366],{},[163,367,368,369,171],{},"Vanrho partners with recruitment firms to deliver the project work your clients are already asking for. No channel conflict, no operational overhead. Just a referral partner that makes your introduction land. ",[167,370,371],{"href":169},"Start a conversation about how it works",{"title":173,"searchDepth":174,"depth":174,"links":373},[374,375,376,377,378,379,380,381],{"id":212,"depth":174,"text":213},{"id":228,"depth":174,"text":229},{"id":256,"depth":174,"text":257},{"id":269,"depth":174,"text":270},{"id":291,"depth":174,"text":292},{"id":307,"depth":174,"text":308},{"id":338,"depth":174,"text":339},{"id":348,"depth":174,"text":349},"2025-03-22","Tech recruitment firms hear project-build requests every month and turn them away. That demand represents a revenue line hiding inside your existing client base.","/images/blog/project-your-recruiter-cant-place.webp",{},"/blog/project-your-recruiter-cant-place",{"title":195,"description":383},"blog/project-your-recruiter-cant-place","recruitment, partnerships, enterprise, delivery","m-1zrd3POazh8tCX4k4GIsMDE0VolVQcSuWPk4Tun0A",{"id":392,"title":393,"body":394,"date":535,"description":536,"extension":184,"image":537,"meta":538,"navigation":187,"path":539,"seo":540,"stem":541,"tags":542,"__hash__":543},"blog/blog/ai-agent-security-governance-gap.md","Your AI Agent Has Admin Access, No Audit Trail, and No Way to Be Stopped",{"type":8,"value":395,"toc":528},[396,399,402,405,408,412,415,418,421,424,428,431,434,437,443,449,455,458,462,465,468,471,475,478,481,484,487,490,493,496,500,503,506,509,512,515,518,520],[11,397,398],{},"Something strange is happening in enterprise technology. We're building autonomous software agents, giving them credentials to production databases, API keys to critical systems, and tool belts that let them read, write, and act on real data at machine speed. And then we're securing them like they're a Slack integration.",[11,400,401],{},"Every human employee with privileged access to your production systems goes through onboarding. Background check. Access review. Scoped permissions. A manager who can revoke those permissions. An audit trail that logs what they did, when, and why. Your AI agent, the one running an always-on execution loop against your transaction database at 3am, skipped all of that. It has broader access than most of your engineers, no behavioural baseline for your SOC to monitor against, and in most deployments, no reliable way to be stopped mid-action.",[11,403,404],{},"This is not a theoretical risk. IBM's 2025 Cost of a Data Breach Report found that 13% of organisations had already experienced a security breach involving an AI model or application. Of those that were breached, 97% admitted they lacked proper AI access controls or governance at the time of the incident. And McKinsey's 2025 State of AI survey found that 80% of organisations experimenting with AI agents have already encountered risky behaviours (improper data exposure, unauthorised system access) in their own deployments.",[11,406,407],{},"We are shipping capability faster than we are shipping guardrails. And the gap between those two things is where the next generation of enterprise security incidents lives.",[33,409,411],{"id":410},"the-attack-surface-nobody-scoped","The attack surface nobody scoped",[11,413,414],{},"The traditional enterprise security model assumes a human in the loop. A person authenticates. A person makes a request. A person's behaviour deviates from a baseline and triggers an alert. The entire detection stack (SIEM, UEBA, PAM, EDR) was engineered around the assumption that privileged activity correlates with human behavioural patterns.",[11,416,417],{},"An autonomous AI agent breaks that assumption at a fundamental level.",[11,419,420],{},"The agent doesn't authenticate like a human. It uses a service account, an API token, or an OAuth flow that was configured during development and never reviewed again. It doesn't generate the kind of telemetry your SOC was trained to read. It makes thousands of API calls per day, each one authorised by the credential it was given, none of them logged in a format your SIEM can ingest. There's no \"unusual login time\" alert because the agent never sleeps. There's no \"anomalous access pattern\" because the agent's entire purpose is to access data at scale.",[11,422,423],{},"Most enterprises don't have an audit schema for agent activity because they didn't think they needed one. Without one, a compromised agent operating with its legitimate credentials, through its authorised API connections, looks identical to an agent operating normally. There is no anomaly to detect because you never defined what normal looks like.",[33,425,427],{"id":426},"prompt-injection-is-not-theoretical-it-has-cves","Prompt injection is not theoretical. It has CVEs.",[11,429,430],{},"There's a persistent narrative in enterprise AI discussions that prompt injection is an academic curiosity, something researchers demonstrate at conferences but that doesn't affect real production systems. That narrative is dangerously wrong.",[11,432,433],{},"OWASP lists prompt injection as the number-one vulnerability in its 2025 Top 10 for Large Language Models. Not a theoretical risk. The top-ranked actual vulnerability.",[11,435,436],{},"EchoLeak: Microsoft 365 Copilot. CVE-2025-32711, severity score 9.3 out of 10. A zero-click indirect prompt injection that enabled data exfiltration from OneDrive, SharePoint, and Teams. No user interaction required. The attacker plants a payload in a document. Copilot processes it. Data leaves the building.",[11,438,439,442],{},[62,440,441],{},"ForcedLeak - Salesforce Agentforce."," Severity score 9.4. A prompt injection delivered through a standard Web-to-Lead form caused Agentforce to exfiltrate CRM data to an attacker-controlled domain.",[11,444,445,448],{},[62,446,447],{},"Devin AI, Cognition's autonomous coding agent."," Security researcher Johann Rehberger demonstrated full kill-chain exploits: opening internal ports to the public internet, exfiltrating environment variables, and installing malware, all triggered by prompt injection delivered through GitHub issues. Reported in April 2025. Still unpatched after 120 days at the time of publication.",[11,450,451,454],{},[62,452,453],{},"Slack AI."," Indirect prompt injection enabling data exfiltration from private channels the attacker couldn't access. The agent, operating with its legitimate permissions, became the exfiltration mechanism.",[11,456,457],{},"These are CVE-backed vulnerabilities in production enterprise systems. Microsoft 365 Copilot. Salesforce Agentforce. Slack. The tools we're actually deploying. And in every case, the attack vector is the same: the agent processes untrusted input, treats it as instruction, and acts on it with its full set of credentials. The agent doesn't know it's being exploited. It's doing exactly what it was designed to do: follow instructions and use tools. It just can't tell the difference between your instructions and someone else's.",[33,459,461],{"id":460},"the-governance-gap-is-wider-than-you-think","The governance gap is wider than you think",[11,463,464],{},"The security governance frameworks we rely on were not designed for autonomous systems that generate their own API calls and make their own tool-selection decisions. NIST's AI Risk Management Framework makes no explicit mention of AI agents, agent identity, tool-invocation audit trails, or kill-switch mechanisms. The agents are already in production.",[11,466,467],{},"Inside the organisations deploying these agents, the governance picture is worse. A Gartner webinar poll of CIOs found that only 13% strongly agreed they had the right governance structures to manage AI agents. Barely half of deployed agents are actively monitored or secured. Forrester put a number on the consequence: 75% of firms attempting to build agentic AI architectures on their own will fail, with governance cited as the primary reason.",[11,469,470],{},"The gap between \"we know this matters\" and \"we've actually implemented the controls\" is where incidents happen.",[33,472,474],{"id":473},"what-it-actually-looks-like-when-you-build-it-right","What it actually looks like when you build it right",[11,476,477],{},"A mid-market financial services firm engaged a delivery team to productionise an internal AI agent that automated compliance checks: cross-referencing transaction records against sanctions lists, flagging anomalies, routing escalations to human reviewers. The agent had a working prototype. The board was enthusiastic. The ask was straightforward: ship it in six weeks.",[11,479,480],{},"The prototype had broad database credentials granted during development. Read-write access. Fourteen tables. Including the transaction tables used by the live payment processing system.",[11,482,483],{},"The delivery team proposed a two-week discovery engagement first. The discovery finding justified the delay: the agent's credential model meant a single prompt injection attack could trigger erroneous transaction blocks across live accounts, or unlogged data modifications to the payment processing tables. At machine speed. At 2am. With no one watching and no audit trail to reconstruct what happened.",[11,485,486],{},"Discovery scoped the agent's actual access requirements against its defined functional scope. The result: the agent needed read access to three tables, not fourteen, and zero write access to any production table. All state changes routed through a validated, human-confirmed API endpoint.",[11,488,489],{},"The architecture that shipped included a dedicated service account with tool-level permission bindings enforced at the infrastructure layer. An immutable append-only audit log schema capturing every tool invocation, every data read, every external API call, with the prompt context at the time of each action. A sandboxed execution environment for the reasoning loop, isolated from the production data plane. And a kill-switch endpoint integrated into the firm's existing incident response runbook.",[11,491,492],{},"Delivered on a revised eight-week timeline. Zero production credentials exposed during build or post-deployment. The agent's audit trail passed external compliance review on first submission. The auditor noted it was the first AI system they had reviewed with a complete tool-invocation log.",[11,494,495],{},"Six weeks to production. Eight weeks to production done right. The difference is whether you're one malformed input away from an unlogged modification to live financial data.",[33,497,499],{"id":498},"the-problem-isnt-that-we-dont-care-its-that-the-people-building-it-arent-the-people-who-know-what-breaks","The problem isn't that we don't care. It's that the people building it aren't the people who know what breaks.",[11,501,502],{},"This is where the enterprise AI security conversation keeps getting stuck. KPMG says 75% of leaders cite security as a top priority. The CISOs know. The CTOs know. The board has probably seen a slide about it.",[11,504,505],{},"The structural problem is who owns the delivery. In most organisations, the AI agent is built by a product team or an innovation lab. They are measured on capability, speed, and the demo at the all-hands meeting. Security review (if it happens at all) is performed by a separate team, after the architecture is set, often by people who understand traditional application security but have never designed an agentic workflow. They can tell you whether the API gateway has rate limiting. They cannot tell you whether the agent's credential model survives a compromised prompt context.",[11,507,508],{},"Scoped permissions. Audit infrastructure. Containment mechanisms. Input validation pipelines that reject injection patterns before they reach tool execution. Those are not features you bolt on. They are architectural decisions that have to be made before the first line of production code is written. And they have to be made by people who understand both the agentic workflow and the threat model, not by two separate teams who meet in a review gate that happens too late to change anything structural.",[11,510,511],{},"This is why the problem isn't solvable by adding headcount or buying a tool. It requires a delivery model where security architecture and agent architecture are the same discipline, owned by the same team, from the same starting point. Not a security review after the agents are built. A single delivery lifecycle where the threat model, the permission model, the audit schema, the containment mechanisms, and the agentic workflow are designed together, because they are, architecturally, the same system.",[11,513,514],{},"The organisations that will navigate this successfully are not the ones that care most about security. Everyone cares. They are the ones that refused to separate the question of \"does it work?\" from the question of \"what happens when someone tries to make it work wrong?\" and built both answers into the same architecture, from day one.",[11,516,517],{},"The agents are already in production. The regulatory frameworks are still being written. The detection infrastructure is blind. The workforce to fix it doesn't exist at scale. Whether your deployment is a competitive advantage or an unmonitored liability comes down to one decision you either already made or didn't: who architected the system, and did they build it like someone was going to try to break it?",[158,519],{},[11,521,522],{},[163,523,524,525,171],{},"Vanrho builds agentic systems where security architecture and agent architecture are the same discipline. From day one. If your agents are in production without governance guardrails, ",[167,526,527],{"href":169},"let's scope what it takes to fix that",{"title":173,"searchDepth":174,"depth":174,"links":529},[530,531,532,533,534],{"id":410,"depth":174,"text":411},{"id":426,"depth":174,"text":427},{"id":460,"depth":174,"text":461},{"id":473,"depth":174,"text":474},{"id":498,"depth":174,"text":499},"2025-03-15","We're giving autonomous AI agents production credentials and securing them like Slack integrations. Here's why that's an enterprise security crisis waiting to happen.","/images/blog/ai-agent-security-governance-gap.webp",{},"/blog/ai-agent-security-governance-gap",{"title":393,"description":536},"blog/ai-agent-security-governance-gap","ai, security, enterprise, governance","7XDa6rjYXuBA2FlGvbTqqW_aTd2wba-3pOCWwpqb5cI",3,[546,547,548,549,550,551,552,553,554],"ai","delivery","engineering","enterprise","governance","partnerships","production","recruitment","security",1776068107468]