The Regression Finder Tool is designed to identify the specific commit that caused a regression in a codebase. It takes a list of failed tests from a suite and maps each failure to the corresponding commit and author responsible.
- Input Handling: Accepts a list of failed tests in a suite.
- Data Export: Exports details of unit tests, commits, and authors.
- Performance Optimization: Optimized to work efficiently with large repositories.
- OpenJDK: Ensure a JDK is installed.
- Environment Variable: Set
JAVA_HOME
to the JDK installation path. - Gradle: Ensure Gradle is installed and properly configured.
To start using the Regression Finder Tool, clone the repository from GitHub:
git clone https://github.com/sohamviradiya-work/summer-internship-project
Navigate to the project directory and set up the Gradle wrapper:
cd summer-internship-project
gradle wrapper
To create the JSON configuration file for the project, follow the structure and instructions below. Each key-value pair is explained for clarity:
{
"repositoryLink" : null,
"repositoryPath" : "../resources/large-repo",
"branches" : ["master"],
"days" : 20,
"firstCommit" : "0d12dba93927a2e3b480a981ceb9d6c7415eda09",
"testInputFile" : "./tests.csv",
"tests": [
{
"testProject": "calculator",
"testClass" : "org.advanced.AdvancedCalculatorTest",
"testMethod" : "testCosNegative"
},
{
"testProject": "statistics",
"testClass" : "org.advanced.AdvancedStatisticsTest",
"testMethod" : "testCorrelationSimple"
}
],
"testSrcPath" : "/src/test/java/",
"logToConsole": true,
"jiraTickets" : true,
"teamsNotifications" : true,
"reportLastPhase": true,
"resultsPath" : "./results",
"method" : "Bisect"
}
- repositoryLink: Set this only if you need to clone the repository first. Provide the repository's SSH or HTTPS link.
- repositoryPath: Specify the path where the repository is or will be cloned. will be created if not exists.
- branches: List the branches you want to consider. Use an array to include multiple branches.
- days: Indicate the number of days for which commits should be considered.
- firstCommit: Specify the commit hash from which to start considering subsequent commits.
- testInputFile: Path to the CSV file containing tests. If this is null, the tests defined in the 'tests' array will be used.
- testSrcPath: Path to the test source folder relative to the subproject or project root.
- tests: Provide details of tests if 'testInputFile' is null. Each test should have the project, class, and method specified.
- logToConsole: Set to
true
to print logs to the terminal, orfalse
to send them to a file at./results/.log
. - resultsPath: Specify where to store the results, folder will be cleaned before execution and created if not exists.
- jiraTickets: Boolean flag to create JIRA tickets.
- reportLastPhase: Boolean flag to report tests which are failing since the first commit.
- teamsNotifications: Boolean flag to send notifications to Microsoft Teams.
- method: Choose the method to use; options include 'Bisect', 'Linear', or 'Batch XX'.
-
JIRA Configuration:
- JIRA_MAIL: The email address associated with your JIRA account. This email will be used to report issues.
- JIRA_TOKEN: The API token generated from your JIRA account for authentication.
- JIRA_SERVER: The base URL of your JIRA server, where your JIRA instance is hosted (e.g.,
https://your-domain.atlassian.net
). - JIRA_PROJECT_KEY: The key of the JIRA project you're working with (e.g.,
PROJ
). - JIRA_ISSUE_TYPE: The ID of the issue type in JIRA that you want to create (e.g.,
Bug
orTask
). - JIRA_ISSUE_TRANSITION: The ID of the transition action you want to perform on a JIRA issue (e.g.,
31
for "Start Progress").
-
Teams Bot Configuration:
- TEAMS_BOT_API_URL: The endpoint URL of your Microsoft Teams bot. This is where your bot's API is hosted (e.g.,
https://your-bot.azurewebsites.net/
).
- TEAMS_BOT_API_URL: The endpoint URL of your Microsoft Teams bot. This is where your bot's API is hosted (e.g.,
- Create a
.env
file: This file should be placed in the root directory of the project. - Format: Use the format
KEY=VALUE
for each environment variable.
JIRA_MAIL=your-email@example.com
JIRA_TOKEN=your-jira-api-token
JIRA_SERVER=https://your-domain.atlassian.net
JIRA_PROJECT_KEY=PROJ
JIRA_ISSUE_TYPE=Bug
JIRA_ISSUE_TRANSITION=31
TEAMS_BOT_API_URL=https://your-bot.azurewebsites.net/
Execute the tool using the following command:
./gradlew :regression-finder:run
The results will be generated and stored in a CSV file located at ./results/blame.csv
.
- Role: Manages Git-related operations.
- Responsibilities:
- Checking out specific commits.
- Checking if a file has changed between commits.
- Role: Manages Gradle-related operations.
- Responsibilities:
- Syncing project dependencies.
- Running tests.
- Building the project.
- Role: Acts as a coordinator between GitWorker and GradleWorker.
- Responsibilities:
- Commands GradleWorker to sync dependencies if
build.gradle
has changed. - Distributes commands from RegressionFinder to GitWorker and GradleWorker.
- Commands GradleWorker to sync dependencies if
- Role: Implements the core algorithm for finding regressions.
- Subtypes:
- BisectRegressionFinder
- BatchRegressionFinder
- LinearRegressionFinder
- Responsibilities:
- Provides commands to run specific tests for specific commits.
- Processes test results.
- If a regression is found, passes findings to BlameWriter with test and commit information.
- Repeats the process for another commit with a reduced test set according to the algorithm.
- Role: Handles the output of regression findings.
- Responsibilities:
- Passes results to configured output streams like Jira Ticket Writer, CSV File Writer, and MS Teams Client.
- Role: Writes results to a CSV file.
- Responsibilities:
- Converts results to CSV row format.
- Writes results to a file.
- Role: Manages Jira-related operations.
- Responsibilities:
- Converts results to Jira tickets.
- Assigns the ticket to the commit author using Jira APIs.
- Role: Manages Microsoft Teams notifications.
- Responsibilities:
- Converts results to MS Teams messages.
- Sends messages to the commit author using Teams APIs.
- Role: Initializes and orchestrates the regression finding process.
- Responsibilities:
- Fetching tests from the configuration.
- Fetching commits that satisfy given criteria.
- Providing tests and commits to RegressionFinder.
- Description:
- Tests each commit one by one in a linear sequence to find the commit that introduced the regression.
- Process:
- Check out and test each commit sequentially from the newest to the oldest.
- Stop when the first commit that passes the test is found.
- The next commit after the passing one is identified as the one causing the regression.
- Advantages:
- Guarantees finding the exact commit.
- Disadvantages:
- Can be very time-consuming for a large number of commits, as every commit must be tested.
- Description:
- Tests commits in groups (batches) to narrow down a smaller set that may contain the regression. Once a problematic batch is identified, individual commits within that batch are tested.
- Process:
- Divide the range of commits into smaller batches.
- Check out and test the first commit of each batch, starting from the newest to the oldest.
- Identify the batch containing the regression (where the first commit starts passing).
- Within the identified batch, run the linear algorithm to find the exact commit.
- Time Complexity:
- O(T(B + N/B)), where B is the batch size, and N is the number of commits. Best results are obtained for B ~ √N i.e. O(√N).
- Advantages:
- Reduces the number of initial tests by testing in batches.
- Disadvantages:
- Can be slow if the regression-causing commit is far in the past.
- May miss the regression if some failing commits temporarily pass the tests.
- Description:
- Uses a binary search approach to identify the regression commit, halving the search space each time based on the test results.
- Process:
- Identify the range of commits to be tested (from a known good commit to a known bad commit).
- Check out and test the midpoint commit.
- If the midpoint commit passes, discard the past half (good commits) and continue searching in the future half.
- If the midpoint commit fails, discard the future half (bad commits) and continue searching in the past half.
- Repeat until the exact commit causing the regression is found.
- Time Complexity:
- O(T log(N)), where T is the number of tests and N is the number of commits.
- Advantages:
- Efficiently narrows down the range of commits.
- Disadvantages:
- Can produce large commit jumps, leading to longer build and dependency syncing times.
- May miss the regression if some failing commits temporarily pass the tests.
- Bisect Regression Finding: Best for large ranges of commits when quick narrowing down is needed.
- Linear Regression Finding: Suitable for smaller ranges of commits or when accuracy is more important than efficiency.
- Batch Regression Finding: Better speed than Linear Regression Finding, with smaller commit jumps and higher accuracy than Bisect Regression Finding.