Skip to content

Commit c27d3c2

Browse files
authored
Intelligent CI Test Selection for PRs (#536)
# Intelligent CI Test Selection for PRs ## Summary This PR implements intelligent test selection for pull requests, automatically determining which integration test suites to run based on changed files. This reduces CI time and costs by running only relevant tests while maintaining safety through fallback mechanisms. ## Problem Previously, all integration test suites ran on every PR regardless of what code changed. This resulted in: - Unnecessary CI execution time and costs - Slower feedback cycles for developers - Resource waste when only a small portion of the codebase changed ## Solution The implementation analyzes changed files in PRs and maps them to specific test suites. It includes: - **Automatic test selection**: Runs only test suites relevant to changed code paths - **Safety fallbacks**: Runs all tests when changes touch critical infrastructure or when analysis fails - **Manual override**: Option to force running all tests via workflow dispatch ## Changes ### 1. Test Suite Mapping Script (`.github/scripts/determine-test-suites.py`) - Analyzes git diff to identify changed files - Maps code paths to test suites: - `pinecone/db_control/` → control tests (serverless, resources/index, resources/collections, asyncio variants) - `pinecone/db_data/` → data tests (sync, asyncio, gRPC) - `pinecone/inference/` → inference tests (sync, asyncio) - `pinecone/admin/` → admin tests - `pinecone/grpc/` → gRPC-specific tests - Plugin-related files → plugin tests - Identifies critical paths that require full test suite: - `pinecone/config/`, `pinecone/core/`, `pinecone/openapi_support/` - `pinecone/utils/`, `pinecone/exceptions/` - Core interface files (`pinecone.py`, `pinecone_asyncio.py`, etc.) - Falls back to running all tests if: - Script execution fails - No files match any mapping - Critical paths are touched ### 2. Updated PR Workflow (`.github/workflows/on-pr.yaml`) - Added `determine-test-suites` job that runs before integration tests - Added `run_all_tests` input parameter for manual override via workflow dispatch - Passes selected test suites to integration test workflow - Includes error handling and validation ### 3. Updated Integration Test Workflow (`.github/workflows/testing-integration.yaml`) - Added optional inputs for each job type's test suites: - `rest_sync_suites_json` - `rest_asyncio_suites_json` - `grpc_sync_suites_json` - `admin_suites_json` - Filters test matrix based on provided suites - Skips jobs when their test suite array is empty - Maintains backward compatibility (runs all tests when inputs not provided) ## Usage ### Automatic (Default) On every PR, the workflow automatically: 1. Analyzes changed files 2. Determines relevant test suites 3. Runs only those test suites ### Manual Override To force running all tests on a PR: 1. Go to Actions → "Testing (PR)" workflow 2. Click "Run workflow" 3. Check "Run all integration tests regardless of changes" 4. Run the workflow ## Safety Features 1. **Critical path detection**: Changes to core infrastructure (config, utils, exceptions, etc.) trigger full test suite 2. **Fallback on failure**: If the analysis script fails, falls back to running all tests 3. **Empty result handling**: If no tests match, runs all tests as a safety measure 4. **Main branch unchanged**: Main branch workflows continue to run all tests ## Example Scenarios ### Scenario 1: Change only `pinecone/db_data/index.py` - **Runs**: `data`, `data_asyncio`, `data_grpc_futures` test suites - **Skips**: `control/*`, `inference/*`, `admin`, `plugins` test suites - **Result**: ~70% reduction in test execution ### Scenario 2: Change `pinecone/config/pinecone_config.py` - **Runs**: All test suites (critical path) - **Reason**: Configuration changes affect all functionality ### Scenario 3: Change `pinecone/inference/inference.py` - **Runs**: `inference/sync`, `inference/asyncio` test suites - **Skips**: Other test suites - **Result**: ~85% reduction in test execution ## Testing The implementation has been tested with: - ✅ YAML syntax validation - ✅ Python script syntax validation - ✅ Test suite mapping logic verification - ✅ Edge case handling (empty arrays, failures, etc.) ## Benefits - **Cost savings**: Reduce CI costs by running only relevant tests - **Faster feedback**: Developers get test results faster when only subset runs - **Better resource utilization**: CI runners are used more efficiently - **Maintainability**: Easy to update mappings as codebase evolves ## Backward Compatibility - Main branch workflows unchanged (still run all tests) - PR workflows backward compatible (can manually trigger full suite) - Existing test suite structure unchanged - No changes to test code itself ## Future Improvements Potential enhancements for future PRs: - Track test execution time savings - Add metrics/logging for test selection decisions - Fine-tune mappings based on actual usage patterns - Consider test dependencies (e.g., if A changes, also run B)
1 parent e872037 commit c27d3c2

File tree

4 files changed

+325
-22
lines changed

4 files changed

+325
-22
lines changed
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Determine which integration test suites to run based on changed files in a PR.
4+
5+
This script analyzes git diff to identify changed files and maps them to test suites.
6+
Critical paths trigger running all tests for safety.
7+
"""
8+
9+
import json
10+
import subprocess
11+
import sys
12+
from typing import Set
13+
14+
15+
# Define all possible test suites organized by job type
16+
ALL_REST_SYNC_SUITES = [
17+
"control/serverless",
18+
"control/resources/index",
19+
"control/resources/collections",
20+
"inference/sync",
21+
"plugins",
22+
"data",
23+
]
24+
25+
ALL_REST_ASYNCIO_SUITES = [
26+
"control_asyncio/resources/index",
27+
"control_asyncio/*.py",
28+
"inference/asyncio",
29+
"data_asyncio",
30+
]
31+
32+
ALL_GRPC_SYNC_SUITES = ["data", "data_grpc_futures"]
33+
34+
ALL_ADMIN_SUITES = ["admin"]
35+
36+
# Critical paths that require running all tests
37+
CRITICAL_PATHS = [
38+
"pinecone/config/",
39+
"pinecone/core/",
40+
"pinecone/openapi_support/",
41+
"pinecone/utils/",
42+
"pinecone/exceptions/", # Used across all test suites for error handling
43+
"pinecone/pinecone.py",
44+
"pinecone/pinecone_asyncio.py",
45+
"pinecone/pinecone_interface_asyncio.py", # Core asyncio interface
46+
"pinecone/legacy_pinecone_interface.py", # Legacy interface affects many tests
47+
"pinecone/deprecation_warnings.py", # Affects all code paths
48+
"pinecone/__init__.py",
49+
"pinecone/__init__.pyi",
50+
]
51+
52+
# Path to test suite mappings
53+
# Format: (path_pattern, [list of test suites])
54+
PATH_MAPPINGS = [
55+
# db_control mappings
56+
(
57+
"pinecone/db_control/",
58+
[
59+
"control/serverless",
60+
"control/resources/index",
61+
"control/resources/collections",
62+
"control_asyncio/resources/index",
63+
"control_asyncio/*.py",
64+
],
65+
),
66+
# db_data mappings
67+
("pinecone/db_data/", ["data", "data_asyncio", "data_grpc_futures"]),
68+
# inference mappings
69+
("pinecone/inference/", ["inference/sync", "inference/asyncio"]),
70+
# admin mappings
71+
("pinecone/admin/", ["admin"]),
72+
# grpc mappings
73+
(
74+
"pinecone/grpc/",
75+
[
76+
"data_grpc_futures",
77+
"data", # grpc affects data tests too
78+
],
79+
),
80+
# plugin mappings
81+
("pinecone/deprecated_plugins.py", ["plugins"]),
82+
("pinecone/langchain_import_warnings.py", ["plugins"]),
83+
]
84+
85+
86+
def get_changed_files(base_ref: str = "main") -> Set[str]:
87+
"""Get list of changed files compared to base branch."""
88+
try:
89+
# For PRs, compare against the base branch
90+
# For local testing, compare against HEAD
91+
result = subprocess.run(
92+
["git", "diff", "--name-only", f"origin/{base_ref}...HEAD"],
93+
capture_output=True,
94+
text=True,
95+
check=True,
96+
)
97+
files = {line.strip() for line in result.stdout.strip().split("\n") if line.strip()}
98+
return files
99+
except subprocess.CalledProcessError:
100+
# Fallback: try comparing against HEAD~1 for local testing
101+
try:
102+
result = subprocess.run(
103+
["git", "diff", "--name-only", "HEAD~1"], capture_output=True, text=True, check=True
104+
)
105+
files = {line.strip() for line in result.stdout.strip().split("\n") if line.strip()}
106+
return files
107+
except subprocess.CalledProcessError:
108+
# If git commands fail, return empty set (will trigger full suite)
109+
return set()
110+
111+
112+
def is_critical_path(file_path: str) -> bool:
113+
"""Check if a file path is in a critical area that requires all tests."""
114+
return any(file_path.startswith(critical) for critical in CRITICAL_PATHS)
115+
116+
117+
def map_file_to_test_suites(file_path: str) -> Set[str]:
118+
"""Map a single file path to its relevant test suites."""
119+
suites = set()
120+
121+
for path_pattern, test_suites in PATH_MAPPINGS:
122+
if file_path.startswith(path_pattern):
123+
suites.update(test_suites)
124+
125+
return suites
126+
127+
128+
def determine_test_suites(changed_files: Set[str], run_all: bool = False) -> dict:
129+
"""
130+
Determine which test suites to run based on changed files.
131+
132+
Returns a dict with keys: rest_sync, rest_asyncio, grpc_sync, admin
133+
Each value is a list of test suite names to run.
134+
"""
135+
if run_all or not changed_files:
136+
# Run all tests if explicitly requested or no files changed
137+
return {
138+
"rest_sync": ALL_REST_SYNC_SUITES,
139+
"rest_asyncio": ALL_REST_ASYNCIO_SUITES,
140+
"grpc_sync": ALL_GRPC_SYNC_SUITES,
141+
"admin": ALL_ADMIN_SUITES,
142+
}
143+
144+
# Check for critical paths
145+
has_critical = any(is_critical_path(f) for f in changed_files)
146+
if has_critical:
147+
# Run all tests if critical paths are touched
148+
return {
149+
"rest_sync": ALL_REST_SYNC_SUITES,
150+
"rest_asyncio": ALL_REST_ASYNCIO_SUITES,
151+
"grpc_sync": ALL_GRPC_SYNC_SUITES,
152+
"admin": ALL_ADMIN_SUITES,
153+
}
154+
155+
# Map files to test suites
156+
rest_sync_suites = set()
157+
rest_asyncio_suites = set()
158+
grpc_sync_suites = set()
159+
admin_suites = set()
160+
161+
for file_path in changed_files:
162+
# Skip non-Python files and test files
163+
if not file_path.startswith("pinecone/"):
164+
continue
165+
166+
suites = map_file_to_test_suites(file_path)
167+
168+
# Categorize suites by job type
169+
for suite in suites:
170+
if suite in ALL_REST_SYNC_SUITES:
171+
rest_sync_suites.add(suite)
172+
if suite in ALL_REST_ASYNCIO_SUITES:
173+
rest_asyncio_suites.add(suite)
174+
if suite in ALL_GRPC_SYNC_SUITES:
175+
grpc_sync_suites.add(suite)
176+
if suite in ALL_ADMIN_SUITES:
177+
admin_suites.add(suite)
178+
179+
# If no tests matched, run all (safety fallback)
180+
if not (rest_sync_suites or rest_asyncio_suites or grpc_sync_suites or admin_suites):
181+
return {
182+
"rest_sync": ALL_REST_SYNC_SUITES,
183+
"rest_asyncio": ALL_REST_ASYNCIO_SUITES,
184+
"grpc_sync": ALL_GRPC_SYNC_SUITES,
185+
"admin": ALL_ADMIN_SUITES,
186+
}
187+
188+
return {
189+
"rest_sync": sorted(list(rest_sync_suites)),
190+
"rest_asyncio": sorted(list(rest_asyncio_suites)),
191+
"grpc_sync": sorted(list(grpc_sync_suites)),
192+
"admin": sorted(list(admin_suites)),
193+
}
194+
195+
196+
def main():
197+
"""Main entry point."""
198+
import argparse
199+
200+
parser = argparse.ArgumentParser(
201+
description="Determine test suites to run based on changed files"
202+
)
203+
parser.add_argument(
204+
"--base-ref", default="main", help="Base branch/ref to compare against (default: main)"
205+
)
206+
parser.add_argument("--run-all", action="store_true", help="Force running all test suites")
207+
parser.add_argument(
208+
"--output-format",
209+
choices=["json", "json-pretty"],
210+
default="json",
211+
help="Output format (default: json)",
212+
)
213+
214+
args = parser.parse_args()
215+
216+
changed_files = get_changed_files(args.base_ref)
217+
test_suites = determine_test_suites(changed_files, run_all=args.run_all)
218+
219+
# Output as JSON
220+
if args.output_format == "json-pretty":
221+
print(json.dumps(test_suites, indent=2))
222+
else:
223+
print(json.dumps(test_suites))
224+
225+
# Exit with non-zero if no test suites selected (shouldn't happen with safety fallback)
226+
if not any(test_suites.values()):
227+
sys.exit(1)
228+
229+
230+
if __name__ == "__main__":
231+
main()

.github/workflows/on-pr.yaml

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,13 @@ on:
1616
- '*.gif'
1717
- '*.svg'
1818
- '*.example'
19-
workflow_dispatch: {}
19+
workflow_dispatch:
20+
inputs:
21+
run_all_tests:
22+
description: 'Run all integration tests regardless of changes'
23+
required: false
24+
default: 'false'
25+
type: boolean
2026

2127
permissions: {}
2228

@@ -34,21 +40,83 @@ jobs:
3440
with:
3541
python_versions_json: '["3.9"]'
3642

43+
determine-test-suites:
44+
name: Determine test suites
45+
runs-on: ubuntu-latest
46+
outputs:
47+
rest_sync_suites: ${{ steps.determine.outputs.rest_sync_suites }}
48+
rest_asyncio_suites: ${{ steps.determine.outputs.rest_asyncio_suites }}
49+
grpc_sync_suites: ${{ steps.determine.outputs.grpc_sync_suites }}
50+
admin_suites: ${{ steps.determine.outputs.admin_suites }}
51+
steps:
52+
- uses: actions/checkout@v4
53+
with:
54+
fetch-depth: 0 # Fetch full history for git diff
55+
- name: Determine test suites
56+
id: determine
57+
run: |
58+
run_all="${{ github.event.inputs.run_all_tests == 'true' }}"
59+
if [ "${{ github.event_name }}" = "pull_request" ]; then
60+
base_ref="${{ github.event.pull_request.base.ref }}"
61+
else
62+
base_ref="main"
63+
fi
64+
65+
if [ "$run_all" = "true" ]; then
66+
echo "Running all tests (manual override)"
67+
python3 .github/scripts/determine-test-suites.py --run-all --output-format json > test_suites.json
68+
else
69+
echo "Determining test suites based on changed files (base: $base_ref)"
70+
if ! python3 .github/scripts/determine-test-suites.py --base-ref "$base_ref" --output-format json > test_suites.json 2>&1; then
71+
echo "Script failed, falling back to all tests"
72+
python3 .github/scripts/determine-test-suites.py --run-all --output-format json > test_suites.json
73+
fi
74+
fi
75+
76+
# Validate JSON was created
77+
if [ ! -f test_suites.json ] || ! jq empty test_suites.json 2>/dev/null; then
78+
echo "Error: Failed to generate valid test_suites.json, falling back to all tests"
79+
python3 .github/scripts/determine-test-suites.py --run-all --output-format json > test_suites.json
80+
fi
81+
82+
# Extract each job type's suites and set as outputs
83+
rest_sync=$(jq -c '.rest_sync' test_suites.json)
84+
rest_asyncio=$(jq -c '.rest_asyncio' test_suites.json)
85+
grpc_sync=$(jq -c '.grpc_sync' test_suites.json)
86+
admin=$(jq -c '.admin' test_suites.json)
87+
88+
echo "rest_sync_suites=$rest_sync" >> $GITHUB_OUTPUT
89+
echo "rest_asyncio_suites=$rest_asyncio" >> $GITHUB_OUTPUT
90+
echo "grpc_sync_suites=$grpc_sync" >> $GITHUB_OUTPUT
91+
echo "admin_suites=$admin" >> $GITHUB_OUTPUT
92+
93+
echo "Selected test suites:"
94+
echo "REST sync: $rest_sync"
95+
echo "REST asyncio: $rest_asyncio"
96+
echo "gRPC sync: $grpc_sync"
97+
echo "Admin: $admin"
98+
3799
create-project:
38100
uses: './.github/workflows/project-setup.yaml'
39101
secrets: inherit
40102
needs:
41103
- unit-tests
42104

43105
integration-tests:
106+
if: always() && (needs.unit-tests.result == 'success' && needs.create-project.result == 'success' && needs.determine-test-suites.result == 'success')
44107
uses: './.github/workflows/testing-integration.yaml'
45108
secrets: inherit
46109
needs:
47110
- unit-tests
48111
- create-project
112+
- determine-test-suites
49113
with:
50114
encrypted_project_api_key: ${{ needs.create-project.outputs.encrypted_project_api_key }}
51115
python_versions_json: '["3.13", "3.9"]'
116+
rest_sync_suites_json: ${{ needs.determine-test-suites.outputs.rest_sync_suites || '' }}
117+
rest_asyncio_suites_json: ${{ needs.determine-test-suites.outputs.rest_asyncio_suites || '' }}
118+
grpc_sync_suites_json: ${{ needs.determine-test-suites.outputs.grpc_sync_suites || '' }}
119+
admin_suites_json: ${{ needs.determine-test-suites.outputs.admin_suites || '' }}
52120

53121
cleanup-project:
54122
if: ${{ always() }}

0 commit comments

Comments
 (0)