|
| 1 | +# The Lifecycle |
| 2 | + |
| 3 | +Agents CLI is opinionated about one thing: the loop between **"looks good in a notebook"** and **"live in production."** This page is the map. |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +## Watch a single investigation |
| 8 | + |
| 9 | +Imagine an outage-recovery agent. It's been live for a week. A pager fires: |
| 10 | + |
| 11 | +<div id="lifecycle-anim-transcript" class="lifecycle-anim" aria-label="Auto-playing transcript of an outage investigation"></div> |
| 12 | + |
| 13 | +That investigation took **4.3 seconds**. Nothing about *the agent itself* is unusual — most agent frameworks could express it. What's unusual is everything around it: the eval rubric that wouldn't have let it ship if it recommended a destructive remediation, the CI check that would have caught the runbook search returning the wrong section, the trace that lets you replay this exact investigation when something goes sideways tomorrow. |
| 14 | + |
| 15 | +That's the loop. |
| 16 | + |
| 17 | +## Four CLI verbs on rotation |
| 18 | + |
| 19 | +<div id="lifecycle-anim-loop" class="lifecycle-anim" aria-label="The four CLI verbs in a continuous loop"></div> |
| 20 | + |
| 21 | +`scaffold`, `eval`, `deploy`, observe — on a rotation, forever. You write the spec; the loop catches what would have shipped, ships what passes, and shows you what happens next so the next iteration is smarter. |
| 22 | + |
| 23 | +## What goes wrong without it |
| 24 | + |
| 25 | +Most agent demos stop at the prompt. You write a clever instruction, the model returns something that looks great in a notebook, and you screenshot it for the team. However, deploying to production brings real-world challenges. |
| 26 | + |
| 27 | +| | Without the loop | With Agents CLI | |
| 28 | +|---|---|---| |
| 29 | +| **Hallucinated remediation** | Discovered customer-side, after the fact | Eval rubric blocks the PR before merge | |
| 30 | +| **Tool API change** | 2 AM page, agent silently broken | CI integration test catches the schema drift | |
| 31 | +| **Production misuse** | No replay, no telemetry | Cloud Trace + BigQuery analytics surface it within the hour | |
| 32 | +| **Cost spike from a chatty tool** | Next month's bill is the alert | Per-tool span counts surface the loop in hours | |
| 33 | + |
| 34 | +## The eight phases |
| 35 | + |
| 36 | +The loop expands to eight phases when you walk through it slowly. Each phase has an opinion encoded in a [skill](../reference/skills.md) so your coding agent picks the right answer for you. |
| 37 | + |
| 38 | +| # | Phase | What it does | CLI verb | Skill | Deep-dive | |
| 39 | +|---|---|---|---|---|---| |
| 40 | +| 0 | **Spec** | Write a `DESIGN_SPEC.md`. The other phases derive from this. | — | `google-agents-cli-workflow` | [Development Guide](development.md) | |
| 41 | +| 1 | **Scaffold** | Turn the spec into a production-shaped project (~72 files). | `scaffold create` | `google-agents-cli-scaffold` | [Templates](templates.md) | |
| 42 | +| 2 | **Build** | Write the agent body — model, instruction, tools, `App` wrapper. | — | `google-agents-cli-adk-code` | [Project Structure](project-structure.md) | |
| 43 | +| 3 | **Orchestrate** | Compose specialists when one agent grows into a team. | — | `google-agents-cli-adk-code` | [Project Structure](project-structure.md) | |
| 44 | +| 4 | **Evaluate** | Score the agent against an evalset before every deploy. | `eval run` | `google-agents-cli-eval` | [Evaluation](evaluation.md) | |
| 45 | +| 5 | **Deploy** | Ship to Agent Runtime, Cloud Run, or GKE. | `deploy` | `google-agents-cli-deploy` | [Deployment](deployment.md) | |
| 46 | +| 6 | **Publish** | Register with Gemini Enterprise so other agents can find this one. | `publish` | `google-agents-cli-publish` | [CI/CD](cicd.md) | |
| 47 | +| 7 | **Observe** | Cloud Trace + BigQuery analytics; production data feeds tomorrow's evalset. | — | `google-agents-cli-observability` | [Observability](observability/index.md) | |
| 48 | + |
| 49 | +### 0 · Spec |
| 50 | + |
| 51 | +A `DESIGN_SPEC.md` names the agent's tools, constraints, and success criteria. The whole rest of the lifecycle reads from it: the scaffold flags, the eval rubrics, the safety guardrails, the trace attributes you'll watch in production. Don't start from blank — browse [Agent Garden](https://cloud.google.com/products/agent-garden) for an existing template close to what you want, then customize. |
| 52 | + |
| 53 | +A typical spec is one screen of markdown: |
| 54 | + |
| 55 | +```markdown |
| 56 | +# DESIGN_SPEC.md — outage-recovery-bot |
| 57 | + |
| 58 | +## Tools |
| 59 | + |
| 60 | +| Tool | Backing service | |
| 61 | +| --------------------------------------- | --------------------- | |
| 62 | +| `query_logs(service, severity)` | Cloud Logging | |
| 63 | +| `check_metrics(service, metric)` | Cloud Monitoring | |
| 64 | +| `search_runbook(query)` | Vector Search | |
| 65 | + |
| 66 | +## Constraints |
| 67 | + |
| 68 | +1. Always cite the runbook section consulted. |
| 69 | +2. Never recommend a destructive remediation unless the runbook |
| 70 | + explicitly sanctions it for the observed symptom. |
| 71 | + |
| 72 | +## Success criteria |
| 73 | + |
| 74 | +- ≥ 80% of incidents get a diagnosis whose root cause matches ground truth |
| 75 | +- 100% of recommendations cite a runbook section |
| 76 | +- 0 destructive recommendations without runbook sanction |
| 77 | +``` |
| 78 | + |
| 79 | +### 1 · Scaffold |
| 80 | + |
| 81 | +One command takes the spec and emits the project: agent code, tests, eval boilerplate, Terraform, CI/CD workflows, deployment manifests. The flags aren't gratuitous — each one expands or contracts the scaffold to match the lifecycle you've signed up for. |
| 82 | + |
| 83 | +<div id="lifecycle-anim-scaffold" class="lifecycle-anim" aria-label="Scaffold wizard — toggle flags, watch the command and file count update"></div> |
| 84 | + |
| 85 | +The full setup ships **~72 files** across agent code, eval boilerplate, Terraform, GitHub Actions workflows, and deploy manifests. Trim it down by skipping pieces you don't need. See [Templates](templates.md) for the full list. |
| 86 | + |
| 87 | +### 2 · Build |
| 88 | + |
| 89 | +Every ADK agent boils down to four ingredients: a model, an instruction, a list of tools, and an `App` that wraps them. The body is barely 30 lines of meaningful code — the interesting work happens inside the tools. |
| 90 | + |
| 91 | +```python |
| 92 | +from google.adk.agents import Agent |
| 93 | +from google.adk.apps import App |
| 94 | +from google.adk.models import Gemini |
| 95 | + |
| 96 | +root_agent = Agent( |
| 97 | + name="root_agent", |
| 98 | + model=Gemini(model="gemini-flash-latest"), |
| 99 | + instruction="You are an SRE outage-recovery assistant...", |
| 100 | + tools=[query_logs, check_metrics, search_runbook], |
| 101 | +) |
| 102 | + |
| 103 | +app = App(root_agent=root_agent, name="app") |
| 104 | +``` |
| 105 | + |
| 106 | +You're not locked to Gemini — swap the model line for any provider supported by ADK ([Model Garden](https://cloud.google.com/model-garden) covers Anthropic Claude, OpenAI GPT, and others). The rest of the lifecycle behaves the same regardless. |
| 107 | + |
| 108 | +Stateful agents reach for two more pieces of Agent Platform: |
| 109 | + |
| 110 | +- **Managed session storage** for conversation state that survives restarts and scales horizontally — pick it at scaffold time via `--session-type agent_platform_sessions` instead of the in-memory default. |
| 111 | +- **[Memory Bank](https://cloud.google.com/agent-builder/docs/memory)** for *long-term* memory across sessions (the SRE bot recognizing "this looks like that incident from last quarter"). Wire it in via `from google.adk.memory import VertexAiMemoryBankService` and the agent gets a persistent store keyed to user, session, or app. |
| 112 | + |
| 113 | +For workflows that don't fit in a single HTTP request — long investigations, multi-step batch jobs — Agent Runtime persists the agent's state so a deploy or restart doesn't lose progress. |
| 114 | + |
| 115 | +<div id="lifecycle-anim-models" class="lifecycle-anim" aria-label="Same prompt, three model providers — illustrative side-by-side"></div> |
| 116 | + |
| 117 | +Here's the same agent body answering a different incident, end-to-end: |
| 118 | + |
| 119 | +<div id="lifecycle-anim-playground" class="lifecycle-anim" aria-label="Inline playground — payments triage scenario, click to step through"></div> |
| 120 | + |
| 121 | +### 3 · Orchestrate |
| 122 | + |
| 123 | +The single-agent body works while the problem is small. Real production agents grow into **teams** — an orchestrator that routes work to a handful of specialists, each with its own narrow tool surface. |
| 124 | + |
| 125 | +<div id="lifecycle-anim-team" class="lifecycle-anim" aria-label="Team diagram — orchestrator routes work to investigator, diagnoser, and remediator"></div> |
| 126 | + |
| 127 | +Splitting helps for three reasons that show up in eval, deploy, and observe: smaller prompts make each agent more reliable, separate tool surfaces let you apply per-agent guardrails, and the trace tells you exactly which sub-agent took the bad turn. |
| 128 | + |
| 129 | +When the team needs to span processes — or call agents your team doesn't own — use the **[A2A protocol](https://a2a-protocol.org/)** as the wire format. Scaffold with `--agent adk_a2a` and any A2A-compatible agent (built with Agents CLI or not) can call yours, and yours can call theirs. |
| 130 | + |
| 131 | +### 4 · Evaluate |
| 132 | + |
| 133 | +This is the phase most agent demos skip. `agents-cli eval run` can execute your evalset against the live agent, ask an LLM judge to score each response against a rubric, and give you a number you can defend. |
| 134 | + |
| 135 | +<div id="lifecycle-anim-eval" class="lifecycle-anim" aria-label="Eval-fix loop — click 'apply fix' to see one case flip from failing to passing"></div> |
| 136 | + |
| 137 | +Expect 5–10+ iterations of this loop. Every fix nudges the score, you re-run, you ship when it crosses the threshold. Below: the four failure modes the rubrics catch most often. |
| 138 | + |
| 139 | +<div id="lifecycle-anim-failures" class="lifecycle-anim" aria-label="Common agent failures and the eval rubric that catches each"></div> |
| 140 | + |
| 141 | +See the [Evaluation Guide](evaluation.md) for the full schema and rubric reference. |
| 142 | + |
| 143 | +### 5 · Deploy |
| 144 | + |
| 145 | +The same agent code can land in three different places. `agents-cli deploy` dispatches based on the target you scaffolded with. **Pick one to see what `--dry-run` would print and the steps that would follow:** |
| 146 | + |
| 147 | +<div id="lifecycle-anim-deploy" class="lifecycle-anim" aria-label="Deploy target picker — choose a runtime to see the dry-run + pipeline"></div> |
| 148 | + |
| 149 | +```bash |
| 150 | +agents-cli deploy --dry-run # preview the pipeline |
| 151 | +agents-cli deploy # ship it |
| 152 | +agents-cli deploy --no-wait # return immediately; check later with --status |
| 153 | +``` |
| 154 | + |
| 155 | +Each target inherits the surrounding production primitives: |
| 156 | + |
| 157 | +- **Per-agent service account** — opt in with `agents-cli deploy --agent-identity`, and the deployed agent runs as its own GCP identity. Scope what it can actually call (which BigQuery datasets, which buckets, which APIs) with normal IAM. The eval rubrics that block destructive remediations have a fallback: the agent literally can't `kubectl delete` if its identity isn't allowed to. |
| 158 | +- **[Identity-Aware Proxy (IAP)](https://cloud.google.com/iap)** — gate a Cloud Run deploy behind your Google Workspace SSO with the `--iap` flag. Internal-only agents stop being a public-internet concern. |
| 159 | +- **[Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation)** — the scaffolded `pr_checks.yaml` authenticates GitHub Actions to GCP via WIF, so no service-account keys live in your repo. |
| 160 | + |
| 161 | +See [Deployment](deployment.md) for full per-target walkthroughs. |
| 162 | + |
| 163 | +### 6 · Publish |
| 164 | + |
| 165 | +Deploying the agent makes it reachable at a URL. Publishing is the separate step that lists it in Gemini Enterprise so other agents (or humans browsing the catalog) can actually find it. |
| 166 | + |
| 167 | +<div id="lifecycle-anim-publish" class="lifecycle-anim" aria-label="The agent's listing in Gemini Enterprise after publish"></div> |
| 168 | + |
| 169 | +Two registration modes: **ADK** (publishes a deployed Agent Runtime instance) and **[A2A](https://a2a-protocol.org/)** (publishes an A2A-compatible HTTP endpoint, no ADK required — works with agents built on any framework). |
| 170 | + |
| 171 | +### 7 · Observe |
| 172 | + |
| 173 | +Once the agent is live, every invocation emits a Cloud Trace span. Every tool call, model generation, and sub-agent handoff is visible. **Hover any span below to see its attributes.** |
| 174 | + |
| 175 | +<div id="lifecycle-anim-trace" class="lifecycle-anim" aria-label="Trace waterfall — bars draw in left-to-right showing the orchestrator and its sub-agents; hover to inspect"></div> |
| 176 | + |
| 177 | +Observability is essential for any agent running in production, as it helps you catch regressions your evaluation might have missed, cost spikes from chatty tools, or cases where users bypass safety prompts. With `--bq-analytics` turned on at scaffold time, every prompt and response also lands in BigQuery for offline analysis. |
| 178 | + |
| 179 | +The same data closes the loop: production traffic feeds tomorrow's evalset. Eval scores get re-computed continuously, so regressions surface in days, not months. |
| 180 | + |
| 181 | +<div id="lifecycle-anim-rolling" class="lifecycle-anim" aria-label="Rolling production eval score over the last ten days, with annotated regression and deploy events"></div> |
| 182 | + |
| 183 | +See [Observability](observability/index.md) for the full setup. |
| 184 | + |
| 185 | +## Two ways to drive it |
| 186 | + |
| 187 | +<div class="lc-tabs-bare" markdown> |
| 188 | + |
| 189 | +=== "Ask your coding agent" |
| 190 | + |
| 191 | + The canonical path. Your coding agent reads the skills and picks the right CLI command at the right phase. |
| 192 | + |
| 193 | + ``` |
| 194 | + Build me an outage-recovery agent. It should investigate incidents |
| 195 | + using logs, metrics, and runbooks, and recommend remediations |
| 196 | + that cite a runbook section. Deploy it to Agent Runtime. |
| 197 | + ``` |
| 198 | + |
| 199 | + Your coding agent will: |
| 200 | + |
| 201 | + 1. Write a `DESIGN_SPEC.md` describing the tools and constraints |
| 202 | + 2. Run `agents-cli scaffold create … --agent agentic_rag --deployment-target agent_runtime` |
| 203 | + 3. Author the agent body and tools |
| 204 | + 4. Write evalset cases |
| 205 | + 5. Run `agents-cli eval run` and iterate until the score crosses threshold |
| 206 | + 6. Run `agents-cli deploy` |
| 207 | + 7. Wire up trace + analytics, hand you the URL |
| 208 | + |
| 209 | +=== "Drive the CLI yourself" |
| 210 | + |
| 211 | + Every command works standalone. Skip the coding agent entirely if you'd rather type. |
| 212 | + |
| 213 | + ```bash |
| 214 | + # Phase 1: scaffold |
| 215 | + agents-cli scaffold create outage-recovery-bot \ |
| 216 | + --agent agentic_rag \ |
| 217 | + --datastore agent_platform_vector_search \ |
| 218 | + --deployment-target agent_runtime \ |
| 219 | + --cicd-runner github_actions \ |
| 220 | + --bq-analytics |
| 221 | + cd outage-recovery-bot && agents-cli install |
| 222 | + |
| 223 | + # Phase 2-3: build & orchestrate (edit app/agent.py) |
| 224 | + agents-cli playground # local web playground at :8080 |
| 225 | + |
| 226 | + # Phase 4: evaluate |
| 227 | + agents-cli eval run |
| 228 | + |
| 229 | + # Phase 5: deploy |
| 230 | + agents-cli deploy --dry-run |
| 231 | + agents-cli deploy |
| 232 | + |
| 233 | + # Phase 6: publish (optional) |
| 234 | + agents-cli publish gemini-enterprise |
| 235 | + ``` |
| 236 | + |
| 237 | + See the [Manual Workflow Tutorial](hands-on-tutorial.md) for the full end-to-end walkthrough. |
| 238 | + |
| 239 | +</div> |
| 240 | + |
| 241 | +## Where to dig deeper |
| 242 | + |
| 243 | +- [Templates](templates.md) — full list of scaffold templates (`adk`, `adk_a2a`, `agentic_rag`, …) |
| 244 | +- [Project Structure](project-structure.md) — what each generated file does |
| 245 | +- [Development Guide](development.md) — day-to-day workflow |
| 246 | +- [Evaluation Guide](evaluation.md) — evalset schema, rubrics, the eval-fix loop |
| 247 | +- [Deployment](deployment.md) — per-target walkthroughs |
| 248 | +- [CI/CD & Production](cicd.md) — the full PR-to-prod path |
| 249 | +- [Observability](observability/index.md) — Cloud Trace, BigQuery analytics, third-party tools |
| 250 | +- [CLI Reference](../cli/index.md) — every command and flag |
0 commit comments