01
Eval regression
Every release is gated by the eval suite the agent shipped with, plus regression tests for any model, prompt, tool, or retrieval change. No release ships if regression fails.
Managed AgentOps
After the Operating Implementation deploys an agent into your environment, Managed AgentOps keeps it safe, useful, and aligned as workflows, systems, policies, portfolios, and risk posture change. Continuous evals, source freshness, change control, incident response, expansion review, and retirement are part of the contract.
Operating record
An agent is not production-ready because a demo works. It is production-ready when the release can be reproduced, monitored, audited, paused, rolled back, and improved. The operating record is the evidence.
01
Every release is gated by the eval suite the agent shipped with, plus regression tests for any model, prompt, tool, or retrieval change. No release ships if regression fails.
02
Every governed source has an expected freshness rule. Stale sources change agent behavior: refuse, qualify, or escalate. Operators see freshness on every answer.
03
Continuous trace review across user requests, retrievals, tool calls, model calls, permission decisions, guardrail decisions, and approvals. Anomalies surface to the operating record.
04
Scheduled review of which actors and agents have access to which sources, fields, documents, and tools. Scoped credentials are rotated and revoked under documented SLAs.
05
Prompt, model, tool, retrieval, source, definition, and permission changes flow through a review with risk delta, eval impact, and approval status. Material changes require sponsor signoff.
06
Adoption, user correction rate, override rate, override reason, latency, cost, and the value indicators agreed in the diagnostic value case. Reviewed at the operating cadence.
07
Runbook for security, privacy, quality, operational, and AI incidents. Severity classification, escalation path, containment, notification, remediation, post-incident review.
08
Documented mechanism to limit, suspend, roll back, retire, or remove an agent capability or source connection when evidence, policy, or risk changes.
Lifecycle variants
Not every workflow gets a deployed agent. Some need remediation first; some should not be automated. Each path has its own lifecycle contract under the same operating discipline.
01
For agents that pass production readiness and operate at rungs 1 through 6 of the safety ladder. Eval regression, trace review, source freshness, change control, value monitoring, incident response, expansion review, retirement.
Cadence
Continuous + monthly + quarterly review
02
For workflows that need source-trust remediation, ownership clarification, control hardening, or policy resolution before agent deployment. The lifecycle tracks remediation milestones, not agent operation.
Cadence
Until readiness gate passes
03
For workflows where the right answer was do-not-automate. The cadence revisits the decision as systems, regulation, source quality, sponsor commitment, or operating stability changes.
Cadence
Annual or trigger-based
Agent Safety Ladder
Every agent under Managed AgentOps operates at a documented rung. Climbing rungs requires evidence, eval thresholds, control verification, and a separate approval. Rungs 7 and 8 are reserved for capabilities that have proven themselves at lower rungs first.
01
Retrieves and presents governed information. No drafts, no actions.
02
Summarizes governed sources with citations. No new claims.
03
Categorizes intake and routes to owners. Audited.
04
Recommends with cited reasoning. Human acts.
05
Drafts replies, summaries, packets. Human reviews and sends.
06
Reads from approved APIs and document sets. Tool contracts enforced.
07
Triggers actions only after explicit human approval. Audit trail required.
08
Operates within tightly scoped guardrails after operating evidence supports it.
Change control
Agents are software. Software ships through release management. Every prompt, model, tool, retrieval, source, definition, or permission change has an impact on behavior; change control documents that impact and requires the right approval.
| Change type | Required review |
|---|---|
| Prompt change | Eval regression. Risk delta. Sponsor signoff if user-facing. |
| Model swap | Full eval. Cost / latency review. Vendor and security review. |
| Tool added | Tool contract. Permission scope. Eval cases. Pilot before production. |
| Retrieval change | Document corpus diff. Citation accuracy. Permission re-check. |
| Source change | Lineage update. Truth profile re-evaluation. Mapping diff. |
| Definition change | Owner approval. Downstream metric audit. Refresh of cached views. |
| Permission change | Access review. Audit trail. SLA on revocation. |
Expansion or retirement
A working agent is not entitled to keep running. A non-working agent is not entitled to be replaced. Each cycle ends with an explicit decision to expand, redesign, retire, or hold the agent in place.
Decision
Add scope, add a workflow, add a rung. Requires evidence and approval.
Decision
Rebuild the agent under different constraints, model, or topology.
Decision
Keep operating as-is. Defer expansion. Maintain the operating record.
Decision
Remove the agent. Document why. Archive the operating record. Plan the human-only fallback.
Next step
We build the way we plan to operate. Talk to us about the operating contract before the agent ships, not after. The lifecycle object is part of the diagnostic decision packet, not an afterthought.