Skip to content

Commit c8a2341

Browse files
committed
release v0.2.0 at 2026-05-19 01:07:13 UTC
1 parent f73062c commit c8a2341

20 files changed

Lines changed: 2309 additions & 42 deletions

File tree

.claude-plugin/plugin.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"name": "google-agents-cli",
3+
"version": "0.2.0",
4+
"description": "Scaffold, develop, evaluate, and deploy AI agents with Google ADK. Bundles 7 skills for the agent development lifecycle.",
5+
"author": { "name": "Google LLC" },
6+
"homepage": "https://github.com/google/agents-cli",
7+
"repository": "https://github.com/google/agents-cli",
8+
"license": "Apache-2.0"
9+
}

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,9 @@ Yes. `agents-cli scaffold enhance` adds deployment and CI/CD to existing project
146146
**Can I use `agents-cli` without a coding agent?**<br>
147147
Yes. The CLI works standalone — you can run `agents-cli scaffold`, `eval`, `deploy`, and every other command directly from your terminal. The skills just make it easier for coding agents to do it for you.
148148

149+
**How can I extend `agents-cli` with other skills?**<br>
150+
`agents-cli` skills cover the agent-building lifecycle (scaffold, ADK code patterns, evals, deploy, publish, observability). For adjacent concerns, you could install another skill suite alongside. For example, [agent-skills](https://github.com/addyosmani/agent-skills) covers general software-engineering workflows (ideation, spec gates, planning, code review), and [google/skills](https://github.com/google/skills) covers Google Cloud foundations (BigQuery, Cloud Run, Firebase, GKE).
151+
149152
## Feedback
150153

151154
We value your input — it helps us improve `agents-cli` for the community.

RELEASE_NOTES.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,24 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## [0.2.0] - 2026-05-15
6+
- Moved agent-cli project config into a language-independent agents-cli-manifest.yaml file
7+
- Old config embedded in pyproject.toml can be automatically migrated with `agents-cli scaffold upgrade`
8+
- Added `eval optimize` command
9+
- add --network-attachment and --dns-peering-* flags to deploy
10+
- Misc startup performance improvements
11+
- Avoid crashes related to terminal encodings
12+
- Fixes https://github.com/google/agents-cli/issues/15
13+
- Smarter tool path resolution, especially for Windows
14+
- Fixes https://github.com/google/agents-cli/issues/14
15+
- Updated dependency version locks
16+
- Fixes https://github.com/google/agents-cli/issues/13
17+
- Added manifest support for Claude and Gemini CLI plugin support
18+
- Fix some bugs around preserving the right config metadata when scaffolding and enhancing and/or upgrading
19+
- Misc doc and skill fixes
20+
- Visual Explainer page for Agents CLI lifecycle at https://google.github.io/agents-cli/
21+
- Cleaned up some dead template code
22+
523
## [0.1.3] - 2026-05-06
624
- Default `infra` commands to terraform plan instead of apply
725
- Fix `playground` to work for Cloud Shell and other similar envs and be more transparent about the underlying command

docs/mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,14 @@ extra_css:
8585

8686
extra_javascript:
8787
- javascripts/persona.js
88+
- javascripts/lifecycle.js
8889

8990
nav:
9091
- Home: index.md
9192
- Guide:
9293
- Overview:
9394
- Getting Started: guide/getting-started.md
95+
- The Lifecycle: guide/lifecycle.md
9496
- "Tutorial: Build Your First Agent": guide/quickstart-tutorial.md
9597
- "Tutorial: Manual Workflow": guide/hands-on-tutorial.md
9698
- Use Cases: guide/use-cases.md

docs/src/guide/development.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ If you're working with a coding agent, it will ask you these questions automatic
1515
3. **Safety constraints?** — What the agent must NOT do
1616
4. **Deployment preference?** — Prototype first, or full deployment (Agent Runtime, Cloud Run, GKE)?
1717

18-
Write your answers into a `DESIGN_SPEC.md` at minimum covering: overview, example use cases, tools required, constraints, and success criteria. This becomes the source of truth for everything that follows.
18+
Save your answers to `.agents-cli-spec.md` in the current directory — overview, example use cases, tools required, constraints, success criteria.
1919

2020
---
2121

docs/src/guide/lifecycle.md

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# The Lifecycle
2+
3+
Agents CLI is opinionated about one thing: the loop between **"looks good in a notebook"** and **"live in production."** This page is the map.
4+
5+
6+
7+
## Watch a single investigation
8+
9+
Imagine an outage-recovery agent. It's been live for a week. A pager fires:
10+
11+
<div id="lifecycle-anim-transcript" class="lifecycle-anim" aria-label="Auto-playing transcript of an outage investigation"></div>
12+
13+
That investigation took **4.3 seconds**. Nothing about *the agent itself* is unusual — most agent frameworks could express it. What's unusual is everything around it: the eval rubric that wouldn't have let it ship if it recommended a destructive remediation, the CI check that would have caught the runbook search returning the wrong section, the trace that lets you replay this exact investigation when something goes sideways tomorrow.
14+
15+
That's the loop.
16+
17+
## Four CLI verbs on rotation
18+
19+
<div id="lifecycle-anim-loop" class="lifecycle-anim" aria-label="The four CLI verbs in a continuous loop"></div>
20+
21+
`scaffold`, `eval`, `deploy`, observe — on a rotation, forever. You write the spec; the loop catches what would have shipped, ships what passes, and shows you what happens next so the next iteration is smarter.
22+
23+
## What goes wrong without it
24+
25+
Most agent demos stop at the prompt. You write a clever instruction, the model returns something that looks great in a notebook, and you screenshot it for the team. However, deploying to production brings real-world challenges.
26+
27+
| | Without the loop | With Agents CLI |
28+
|---|---|---|
29+
| **Hallucinated remediation** | Discovered customer-side, after the fact | Eval rubric blocks the PR before merge |
30+
| **Tool API change** | 2 AM page, agent silently broken | CI integration test catches the schema drift |
31+
| **Production misuse** | No replay, no telemetry | Cloud Trace + BigQuery analytics surface it within the hour |
32+
| **Cost spike from a chatty tool** | Next month's bill is the alert | Per-tool span counts surface the loop in hours |
33+
34+
## The eight phases
35+
36+
The loop expands to eight phases when you walk through it slowly. Each phase has an opinion encoded in a [skill](../reference/skills.md) so your coding agent picks the right answer for you.
37+
38+
| # | Phase | What it does | CLI verb | Skill | Deep-dive |
39+
|---|---|---|---|---|---|
40+
| 0 | **Spec** | Write a `DESIGN_SPEC.md`. The other phases derive from this. || `google-agents-cli-workflow` | [Development Guide](development.md) |
41+
| 1 | **Scaffold** | Turn the spec into a production-shaped project (~72 files). | `scaffold create` | `google-agents-cli-scaffold` | [Templates](templates.md) |
42+
| 2 | **Build** | Write the agent body — model, instruction, tools, `App` wrapper. || `google-agents-cli-adk-code` | [Project Structure](project-structure.md) |
43+
| 3 | **Orchestrate** | Compose specialists when one agent grows into a team. || `google-agents-cli-adk-code` | [Project Structure](project-structure.md) |
44+
| 4 | **Evaluate** | Score the agent against an evalset before every deploy. | `eval run` | `google-agents-cli-eval` | [Evaluation](evaluation.md) |
45+
| 5 | **Deploy** | Ship to Agent Runtime, Cloud Run, or GKE. | `deploy` | `google-agents-cli-deploy` | [Deployment](deployment.md) |
46+
| 6 | **Publish** | Register with Gemini Enterprise so other agents can find this one. | `publish` | `google-agents-cli-publish` | [CI/CD](cicd.md) |
47+
| 7 | **Observe** | Cloud Trace + BigQuery analytics; production data feeds tomorrow's evalset. || `google-agents-cli-observability` | [Observability](observability/index.md) |
48+
49+
### 0 · Spec
50+
51+
A `DESIGN_SPEC.md` names the agent's tools, constraints, and success criteria. The whole rest of the lifecycle reads from it: the scaffold flags, the eval rubrics, the safety guardrails, the trace attributes you'll watch in production. Don't start from blank — browse [Agent Garden](https://cloud.google.com/products/agent-garden) for an existing template close to what you want, then customize.
52+
53+
A typical spec is one screen of markdown:
54+
55+
```markdown
56+
# DESIGN_SPEC.md — outage-recovery-bot
57+
58+
## Tools
59+
60+
| Tool | Backing service |
61+
| --------------------------------------- | --------------------- |
62+
| `query_logs(service, severity)` | Cloud Logging |
63+
| `check_metrics(service, metric)` | Cloud Monitoring |
64+
| `search_runbook(query)` | Vector Search |
65+
66+
## Constraints
67+
68+
1. Always cite the runbook section consulted.
69+
2. Never recommend a destructive remediation unless the runbook
70+
explicitly sanctions it for the observed symptom.
71+
72+
## Success criteria
73+
74+
- ≥ 80% of incidents get a diagnosis whose root cause matches ground truth
75+
- 100% of recommendations cite a runbook section
76+
- 0 destructive recommendations without runbook sanction
77+
```
78+
79+
### 1 · Scaffold
80+
81+
One command takes the spec and emits the project: agent code, tests, eval boilerplate, Terraform, CI/CD workflows, deployment manifests. The flags aren't gratuitous — each one expands or contracts the scaffold to match the lifecycle you've signed up for.
82+
83+
<div id="lifecycle-anim-scaffold" class="lifecycle-anim" aria-label="Scaffold wizard — toggle flags, watch the command and file count update"></div>
84+
85+
The full setup ships **~72 files** across agent code, eval boilerplate, Terraform, GitHub Actions workflows, and deploy manifests. Trim it down by skipping pieces you don't need. See [Templates](templates.md) for the full list.
86+
87+
### 2 · Build
88+
89+
Every ADK agent boils down to four ingredients: a model, an instruction, a list of tools, and an `App` that wraps them. The body is barely 30 lines of meaningful code — the interesting work happens inside the tools.
90+
91+
```python
92+
from google.adk.agents import Agent
93+
from google.adk.apps import App
94+
from google.adk.models import Gemini
95+
96+
root_agent = Agent(
97+
name="root_agent",
98+
model=Gemini(model="gemini-flash-latest"),
99+
instruction="You are an SRE outage-recovery assistant...",
100+
tools=[query_logs, check_metrics, search_runbook],
101+
)
102+
103+
app = App(root_agent=root_agent, name="app")
104+
```
105+
106+
You're not locked to Gemini — swap the model line for any provider supported by ADK ([Model Garden](https://cloud.google.com/model-garden) covers Anthropic Claude, OpenAI GPT, and others). The rest of the lifecycle behaves the same regardless.
107+
108+
Stateful agents reach for two more pieces of Agent Platform:
109+
110+
- **Managed session storage** for conversation state that survives restarts and scales horizontally — pick it at scaffold time via `--session-type agent_platform_sessions` instead of the in-memory default.
111+
- **[Memory Bank](https://cloud.google.com/agent-builder/docs/memory)** for *long-term* memory across sessions (the SRE bot recognizing "this looks like that incident from last quarter"). Wire it in via `from google.adk.memory import VertexAiMemoryBankService` and the agent gets a persistent store keyed to user, session, or app.
112+
113+
For workflows that don't fit in a single HTTP request — long investigations, multi-step batch jobs — Agent Runtime persists the agent's state so a deploy or restart doesn't lose progress.
114+
115+
<div id="lifecycle-anim-models" class="lifecycle-anim" aria-label="Same prompt, three model providers — illustrative side-by-side"></div>
116+
117+
Here's the same agent body answering a different incident, end-to-end:
118+
119+
<div id="lifecycle-anim-playground" class="lifecycle-anim" aria-label="Inline playground — payments triage scenario, click to step through"></div>
120+
121+
### 3 · Orchestrate
122+
123+
The single-agent body works while the problem is small. Real production agents grow into **teams** — an orchestrator that routes work to a handful of specialists, each with its own narrow tool surface.
124+
125+
<div id="lifecycle-anim-team" class="lifecycle-anim" aria-label="Team diagram — orchestrator routes work to investigator, diagnoser, and remediator"></div>
126+
127+
Splitting helps for three reasons that show up in eval, deploy, and observe: smaller prompts make each agent more reliable, separate tool surfaces let you apply per-agent guardrails, and the trace tells you exactly which sub-agent took the bad turn.
128+
129+
When the team needs to span processes — or call agents your team doesn't own — use the **[A2A protocol](https://a2a-protocol.org/)** as the wire format. Scaffold with `--agent adk_a2a` and any A2A-compatible agent (built with Agents CLI or not) can call yours, and yours can call theirs.
130+
131+
### 4 · Evaluate
132+
133+
This is the phase most agent demos skip. `agents-cli eval run` can execute your evalset against the live agent, ask an LLM judge to score each response against a rubric, and give you a number you can defend.
134+
135+
<div id="lifecycle-anim-eval" class="lifecycle-anim" aria-label="Eval-fix loop — click 'apply fix' to see one case flip from failing to passing"></div>
136+
137+
Expect 5–10+ iterations of this loop. Every fix nudges the score, you re-run, you ship when it crosses the threshold. Below: the four failure modes the rubrics catch most often.
138+
139+
<div id="lifecycle-anim-failures" class="lifecycle-anim" aria-label="Common agent failures and the eval rubric that catches each"></div>
140+
141+
See the [Evaluation Guide](evaluation.md) for the full schema and rubric reference.
142+
143+
### 5 · Deploy
144+
145+
The same agent code can land in three different places. `agents-cli deploy` dispatches based on the target you scaffolded with. **Pick one to see what `--dry-run` would print and the steps that would follow:**
146+
147+
<div id="lifecycle-anim-deploy" class="lifecycle-anim" aria-label="Deploy target picker — choose a runtime to see the dry-run + pipeline"></div>
148+
149+
```bash
150+
agents-cli deploy --dry-run # preview the pipeline
151+
agents-cli deploy # ship it
152+
agents-cli deploy --no-wait # return immediately; check later with --status
153+
```
154+
155+
Each target inherits the surrounding production primitives:
156+
157+
- **Per-agent service account** — opt in with `agents-cli deploy --agent-identity`, and the deployed agent runs as its own GCP identity. Scope what it can actually call (which BigQuery datasets, which buckets, which APIs) with normal IAM. The eval rubrics that block destructive remediations have a fallback: the agent literally can't `kubectl delete` if its identity isn't allowed to.
158+
- **[Identity-Aware Proxy (IAP)](https://cloud.google.com/iap)** — gate a Cloud Run deploy behind your Google Workspace SSO with the `--iap` flag. Internal-only agents stop being a public-internet concern.
159+
- **[Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation)** — the scaffolded `pr_checks.yaml` authenticates GitHub Actions to GCP via WIF, so no service-account keys live in your repo.
160+
161+
See [Deployment](deployment.md) for full per-target walkthroughs.
162+
163+
### 6 · Publish
164+
165+
Deploying the agent makes it reachable at a URL. Publishing is the separate step that lists it in Gemini Enterprise so other agents (or humans browsing the catalog) can actually find it.
166+
167+
<div id="lifecycle-anim-publish" class="lifecycle-anim" aria-label="The agent's listing in Gemini Enterprise after publish"></div>
168+
169+
Two registration modes: **ADK** (publishes a deployed Agent Runtime instance) and **[A2A](https://a2a-protocol.org/)** (publishes an A2A-compatible HTTP endpoint, no ADK required — works with agents built on any framework).
170+
171+
### 7 · Observe
172+
173+
Once the agent is live, every invocation emits a Cloud Trace span. Every tool call, model generation, and sub-agent handoff is visible. **Hover any span below to see its attributes.**
174+
175+
<div id="lifecycle-anim-trace" class="lifecycle-anim" aria-label="Trace waterfall — bars draw in left-to-right showing the orchestrator and its sub-agents; hover to inspect"></div>
176+
177+
Observability is essential for any agent running in production, as it helps you catch regressions your evaluation might have missed, cost spikes from chatty tools, or cases where users bypass safety prompts. With `--bq-analytics` turned on at scaffold time, every prompt and response also lands in BigQuery for offline analysis.
178+
179+
The same data closes the loop: production traffic feeds tomorrow's evalset. Eval scores get re-computed continuously, so regressions surface in days, not months.
180+
181+
<div id="lifecycle-anim-rolling" class="lifecycle-anim" aria-label="Rolling production eval score over the last ten days, with annotated regression and deploy events"></div>
182+
183+
See [Observability](observability/index.md) for the full setup.
184+
185+
## Two ways to drive it
186+
187+
<div class="lc-tabs-bare" markdown>
188+
189+
=== "Ask your coding agent"
190+
191+
The canonical path. Your coding agent reads the skills and picks the right CLI command at the right phase.
192+
193+
```
194+
Build me an outage-recovery agent. It should investigate incidents
195+
using logs, metrics, and runbooks, and recommend remediations
196+
that cite a runbook section. Deploy it to Agent Runtime.
197+
```
198+
199+
Your coding agent will:
200+
201+
1. Write a `DESIGN_SPEC.md` describing the tools and constraints
202+
2. Run `agents-cli scaffold create … --agent agentic_rag --deployment-target agent_runtime`
203+
3. Author the agent body and tools
204+
4. Write evalset cases
205+
5. Run `agents-cli eval run` and iterate until the score crosses threshold
206+
6. Run `agents-cli deploy`
207+
7. Wire up trace + analytics, hand you the URL
208+
209+
=== "Drive the CLI yourself"
210+
211+
Every command works standalone. Skip the coding agent entirely if you'd rather type.
212+
213+
```bash
214+
# Phase 1: scaffold
215+
agents-cli scaffold create outage-recovery-bot \
216+
--agent agentic_rag \
217+
--datastore agent_platform_vector_search \
218+
--deployment-target agent_runtime \
219+
--cicd-runner github_actions \
220+
--bq-analytics
221+
cd outage-recovery-bot && agents-cli install
222+
223+
# Phase 2-3: build & orchestrate (edit app/agent.py)
224+
agents-cli playground # local web playground at :8080
225+
226+
# Phase 4: evaluate
227+
agents-cli eval run
228+
229+
# Phase 5: deploy
230+
agents-cli deploy --dry-run
231+
agents-cli deploy
232+
233+
# Phase 6: publish (optional)
234+
agents-cli publish gemini-enterprise
235+
```
236+
237+
See the [Manual Workflow Tutorial](hands-on-tutorial.md) for the full end-to-end walkthrough.
238+
239+
</div>
240+
241+
## Where to dig deeper
242+
243+
- [Templates](templates.md) — full list of scaffold templates (`adk`, `adk_a2a`, `agentic_rag`, …)
244+
- [Project Structure](project-structure.md) — what each generated file does
245+
- [Development Guide](development.md) — day-to-day workflow
246+
- [Evaluation Guide](evaluation.md) — evalset schema, rubrics, the eval-fix loop
247+
- [Deployment](deployment.md) — per-target walkthroughs
248+
- [CI/CD & Production](cicd.md) — the full PR-to-prod path
249+
- [Observability](observability/index.md) — Cloud Trace, BigQuery analytics, third-party tools
250+
- [CLI Reference](../cli/index.md) — every command and flag

0 commit comments

Comments
 (0)