feat: Switch AI benchmark to google.genai Batch API for 50% cost reduction

## Context

The AI benchmark system (`run_live_benchmark.py` + `consult_ai_gaps.py`) currently uses OpenRouter as a proxy to access Gemini Flash. For overnight/batch runs, we should switch to Google's native `google.genai` package which offers:

- **Batch API** (`client.batches.create()`) with 50% cost reduction for non-interactive workloads
- **Native structured output** via `response_schema` parameter (replaces JSON fence stripping)
- Direct API access without proxy latency

## Proposed Changes

1. **Add `google-genai` to `[project.optional-dependencies.ai]`** in `pyproject.toml`
2. **Add `make_google_caller()` factory** in `consult_ai_gaps.py` alongside existing `make_openrouter_caller()`
3. **Support `GOOGLE_API_KEY` environment variable** for authentication
4. **Implement batch mode** for overnight runs:
   - Collect all prompts upfront
   - Submit as a single batch via `client.batches.create()`
   - Poll for completion
   - Parse results
5. **Keep OpenRouter as fallback** when `GOOGLE_API_KEY` is not set
6. **Update `MODEL_REGISTRY`** to include native Gemini model IDs

## Batch API Usage Pattern

```python
from google import genai

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])

# Submit batch
batch = client.batches.create(
    model="gemini-2.5-flash",
    requests=[
        genai.types.BatchRequest(
            custom_id=gap_key,
            request=genai.types.GenerateContentRequest(
                contents=prompt,
                config=genai.types.GenerateContentConfig(
                    response_schema=TypedActionSchema,
                ),
            ),
        )
        for gap_key, prompt in gap_prompts
    ],
)

# Poll for completion
while batch.state == "PENDING":
    time.sleep(30)
    batch = client.batches.get(name=batch.name)

# Parse results
for result in client.batches.list_results(name=batch.name):
    responses[result.custom_id] = result.response.text
```

## Benefits

- ~50% cost reduction on batch workloads
- Native structured output (no JSON parsing errors)
- Lower latency (no proxy hop)
- Better rate limit handling (Google's native quotas)

## Notes

- OpenRouter caller remains for interactive/debugging use
- `backend` field on `BenchmarkConfig` already supports documenting which mode was used

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Switch AI benchmark to google.genai Batch API for 50% cost reduction #1

Context

Proposed Changes

Batch API Usage Pattern

Benefits

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Switch AI benchmark to google.genai Batch API for 50% cost reduction #1

Description

Context

Proposed Changes

Batch API Usage Pattern

Benefits

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions