Context
The AI benchmark system (run_live_benchmark.py + consult_ai_gaps.py) currently uses OpenRouter as a proxy to access Gemini Flash. For overnight/batch runs, we should switch to Google's native google.genai package which offers:
- Batch API (
client.batches.create()) with 50% cost reduction for non-interactive workloads
- Native structured output via
response_schema parameter (replaces JSON fence stripping)
- Direct API access without proxy latency
Proposed Changes
- Add
google-genai to [project.optional-dependencies.ai] in pyproject.toml
- Add
make_google_caller() factory in consult_ai_gaps.py alongside existing make_openrouter_caller()
- Support
GOOGLE_API_KEY environment variable for authentication
- Implement batch mode for overnight runs:
- Collect all prompts upfront
- Submit as a single batch via
client.batches.create()
- Poll for completion
- Parse results
- Keep OpenRouter as fallback when
GOOGLE_API_KEY is not set
- Update
MODEL_REGISTRY to include native Gemini model IDs
Batch API Usage Pattern
from google import genai
client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
# Submit batch
batch = client.batches.create(
model="gemini-2.5-flash",
requests=[
genai.types.BatchRequest(
custom_id=gap_key,
request=genai.types.GenerateContentRequest(
contents=prompt,
config=genai.types.GenerateContentConfig(
response_schema=TypedActionSchema,
),
),
)
for gap_key, prompt in gap_prompts
],
)
# Poll for completion
while batch.state == "PENDING":
time.sleep(30)
batch = client.batches.get(name=batch.name)
# Parse results
for result in client.batches.list_results(name=batch.name):
responses[result.custom_id] = result.response.text
Benefits
- ~50% cost reduction on batch workloads
- Native structured output (no JSON parsing errors)
- Lower latency (no proxy hop)
- Better rate limit handling (Google's native quotas)
Notes
- OpenRouter caller remains for interactive/debugging use
backend field on BenchmarkConfig already supports documenting which mode was used
Context
The AI benchmark system (
run_live_benchmark.py+consult_ai_gaps.py) currently uses OpenRouter as a proxy to access Gemini Flash. For overnight/batch runs, we should switch to Google's nativegoogle.genaipackage which offers:client.batches.create()) with 50% cost reduction for non-interactive workloadsresponse_schemaparameter (replaces JSON fence stripping)Proposed Changes
google-genaito[project.optional-dependencies.ai]inpyproject.tomlmake_google_caller()factory inconsult_ai_gaps.pyalongside existingmake_openrouter_caller()GOOGLE_API_KEYenvironment variable for authenticationclient.batches.create()GOOGLE_API_KEYis not setMODEL_REGISTRYto include native Gemini model IDsBatch API Usage Pattern
Benefits
Notes
backendfield onBenchmarkConfigalready supports documenting which mode was used