Bugfix/bb 2 way diff fix #1172

MaxHoecker · 2024-08-23T14:59:56Z

User description

Fixes #1131

Gets the 2-way file diff between the merging branch head and the best common ancestor

PR Type

Bug fix, Enhancement

Description

Fixed issue Bitbucket server uses 2-way diff but Codium AI uses 3-way diff #1131: Implemented correct 2-way file diff calculation for Bitbucket Server
Enhanced BitbucketServerProvider class to find the best common ancestor (merge base) between branches
Updated get_diff_files method to use the correct base commit for diff calculation
Added comprehensive unit tests to verify the correct behavior of the 2-way diff functionality
Improved code flexibility by allowing injection of a custom bitbucket_client in the constructor

Changes walkthrough 📝

Relevant files

Enhancement

bitbucket_server_provider.py `Improve BitbucketServerProvider diff calculation` pr_agent/git_providers/bitbucket_server_provider.py Added optional `bitbucket_client` parameter to the constructor Implemented `get_best_common_ancestor` method to find the merge base Updated `get_diff_files` to use the best common ancestor for diff calculation	+23/-4

Tests

test_bitbucket_provider.py `Add comprehensive tests for BitbucketServerProvider` tests/unittest/test_bitbucket_provider.py Added new test class `TestBitbucketServerProvider` Implemented multiple test cases for `get_diff_files` method Added mock implementations for Bitbucket API calls	+200/-1

💡 PR-Agent usage:
Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

codiumai-pr-agent-pro · 2024-08-23T15:00:27Z

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
🔀 Multiple PR themes Sub-PR theme: Implement 2-way diff calculation for Bitbucket Server Relevant files: pr_agent/git_providers/bitbucket_server_provider.py Sub-PR theme: Add unit tests for Bitbucket Server 2-way diff functionality Relevant files: tests/unittest/test_bitbucket_provider.py
⚡ Key issues to review Code Complexity The `get_best_common_ancestor` method might be difficult to understand and maintain. Consider adding more comments or breaking it down into smaller, more descriptive functions.

codiumai-pr-agent-pro · 2024-08-23T15:01:19Z

PR Code Suggestions ✨

Latest suggestions up to cf14e45

Category	Suggestion	Score
Maintainability	Use parameterized testing to reduce code duplication in API version tests Consider using a parameterized test for the different API versions to reduce code duplication and improve test maintainability. tests/unittest/test_bitbucket_provider.py [233-297] -def test_get_diff_files_multi_merge_diverge_60(self): - bitbucket_client = self.get_multi_merge_diverge_mock_client(60) +import pytest + +@pytest.mark.parametrize("api_version, expected_file_content", [ + (60, 'file\nwith\nmultiple\nlines\nto\nemulate\na\nreal\nfile'), + (70, 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile'), + (816, 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile') +]) +def test_get_diff_files_multi_merge_diverge(self, api_version, expected_file_content): + bitbucket_client = self.get_multi_merge_diverge_mock_client(api_version) provider = BitbucketServerProvider( "https://git.onpreminstance.com/projects/AAA/repos/my-repo/pull-requests/1", bitbucket_client=bitbucket_client ) expected = [ FilePatchInfo( - 'file\nwith\nmultiple\nlines\nto\nemulate\na\nreal\nfile', - 'readme\nwithout\nsome\nlines\nto\nsimulate\na\nreal\nfile', - '--- \n+++ \n@@ -1,9 +1,9 @@\n-file\n-with\n-multiple\n+readme\n+without\n+some\n lines\n to\n-emulate\n+simulate\n a\n real\n file', - 'Readme.md', - edit_type=EDIT_TYPE.MODIFIED, - ) - ] - - actual = provider.get_diff_files() - - assert actual == expected - -def test_get_diff_files_multi_merge_diverge_70(self): - bitbucket_client = self.get_multi_merge_diverge_mock_client(70) - - provider = BitbucketServerProvider( - "https://git.onpreminstance.com/projects/AAA/repos/my-repo/pull-requests/1", - bitbucket_client=bitbucket_client - ) - - expected = [ - FilePatchInfo( - 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile', + expected_file_content, 'readme\nwithout\nsome\nlines\nto\nsimulate\na\nreal\nfile', '--- \n+++ \n@@ -1,9 +1,9 @@\n-file\n-with\n+readme\n+without\n some\n lines\n to\n-emulate\n+simulate\n a\n real\n file', 'Readme.md', edit_type=EDIT_TYPE.MODIFIED, ) ] actual = provider.get_diff_files() assert actual == expected -def test_get_diff_files_multi_merge_diverge_816(self): - bitbucket_client = self.get_multi_merge_diverge_mock_client(816) - - provider = BitbucketServerProvider( - "https://git.onpreminstance.com/projects/AAA/repos/my-repo/pull-requests/1", - bitbucket_client=bitbucket_client - ) - - expected = [ - FilePatchInfo( - 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile', - 'readme\nwithout\nsome\nlines\nto\nsimulate\na\nreal\nfile', - '--- \n+++ \n@@ -1,9 +1,9 @@\n-file\n-with\n+readme\n+without\n some\n lines\n to\n-emulate\n+simulate\n a\n real\n file', - 'Readme.md', - edit_type=EDIT_TYPE.MODIFIED, - ) - ] - - actual = provider.get_diff_files() - - assert actual == expected - Suggestion importance[1-10]: 8 Why: Parameterized testing significantly improves test maintainability and reduces code duplication, which is a substantial improvement in the testing framework.	8
Maintainability	Extract version comparison logic into separate methods for improved readability Consider extracting the version comparison logic into a separate method for better readability and maintainability. pr_agent/git_providers/bitbucket_server_provider.py [162-186] -if self.bitbucket_api_version is not None and self.bitbucket_api_version >= LooseVersion("8.16"): +def _get_base_sha(self): + if self._is_version_gte("8.16"): + return self._get_base_sha_for_version_816_and_above() + else: + return self._get_base_sha_for_version_below_816() + +def _is_version_gte(self, version): + return self.bitbucket_api_version is not None and self.bitbucket_api_version >= LooseVersion(version) + +def _get_base_sha_for_version_816_and_above(self): try: - base_sha = self.bitbucket_client.get(self._get_merge_base())['id'] - except Exception as e: - get_logger().error(f"Failed to get the best common ancestor for PR: {self.pr_url}, \nerror: {e}") - raise e -else: + return self.bitbucket_client.get(self._get_merge_base())['id'] + except Exception as error: + get_logger().error(f"Failed to get the best common ancestor for PR: {self.pr_url}, \nerror: {error}") + raise error + +def _get_base_sha_for_version_below_816(self): source_commits_list = list(self.bitbucket_client.get_pull_requests_commits( self.workspace_slug, self.repo_slug, self.pr_num )) - # if Bitbucket api version is None or < 7.0 then do a simple diff with a guaranteed common ancestor base_sha = source_commits_list[-1]['parents'][0]['id'] - # if Bitbucket api version is 7.0-8.15 then use 2-way diff functionality for the base_sha - if self.bitbucket_api_version is not None and self.bitbucket_api_version >= LooseVersion("7.0"): + if self._is_version_gte("7.0"): try: destination_commits = list( self.bitbucket_client.get_commits(self.workspace_slug, self.repo_slug, base_sha, self.pr.toRef['latestCommit'])) base_sha = self.get_best_common_ancestor(source_commits_list, destination_commits, base_sha) - except Exception as e: + except Exception as error: get_logger().error( - f"Failed to get the commit list for calculating best common ancestor for PR: {self.pr_url}, \nerror: {e}") - raise e + f"Failed to get the commit list for calculating best common ancestor for PR: {self.pr_url}, \nerror: {error}") + raise error + return base_sha +base_sha = self._get_base_sha() + Suggestion importance[1-10]: 7 Why: Extracting complex logic into separate methods improves code maintainability and readability, which is a valuable improvement but not critical.	7
Best practice	Use more descriptive variable names in exception handling Consider using a more descriptive variable name instead of 'e' in the exception handling blocks. This will make the code more readable and maintainable. pr_agent/git_providers/bitbucket_server_provider.py [165-167] -except Exception as e: - get_logger().error(f"Failed to get the best common ancestor for PR: {self.pr_url}, \nerror: {e}") - raise e +except Exception as error: + get_logger().error(f"Failed to get the best common ancestor for PR: {self.pr_url}, \nerror: {error}") + raise error Suggestion importance[1-10]: 6 Why: Using more descriptive variable names improves code readability, but it's a minor improvement that doesn't address major issues or bugs.	6

Previous suggestions

✅ Suggestions up to commit 2a9e3ee

Category	Suggestion	Score
Best practice	✅ Refactor the mock method to use a dictionary for improved efficiency and maintainability Suggestion Impact: The commit implemented the suggestion by replacing the series of if-elif statements with a dictionary (content_map) and using the .get() method for efficient lookup code diff: - if at == '9c1cffdd9f276074bfb6fb3b70fbee62d298b058': - return 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile\n' - elif at == '2a1165446bdf991caf114d01f7c88d84ae7399cf': - return 'file\nwith\nmultiple \nlines\nto\nemulate\na\nfake\nfile\n' - elif at == 'f617708826cdd0b40abb5245eda71630192a17e3': - return 'file\nwith\nmultiple \nlines\nto\nemulate\na\nreal\nfile\n' - elif at == 'cb68a3027d6dda065a7692ebf2c90bed1bcdec28': - return 'file\nwith\nsome\nchanges\nto\nemulate\na\nreal\nfile\n' - elif at == '1905dcf16c0aac6ac24f7ab617ad09c73dc1d23b': - return 'file\nwith\nsome\nlines\nto\nemulate\na\nfake\ntest\n' - elif at == 'ae4eca7f222c96d396927d48ab7538e2ee13ca63': - return 'readme\nwithout\nsome\nlines\nto\nsimulate\na\nreal\nfile' - elif at == '548f8ba15abc30875a082156314426806c3f4d97': - return 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile' - return '' + content_map = { + '9c1cffdd9f276074bfb6fb3b70fbee62d298b058': 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile\n', + '2a1165446bdf991caf114d01f7c88d84ae7399cf': 'file\nwith\nmultiple \nlines\nto\nemulate\na\nfake\nfile\n', + 'f617708826cdd0b40abb5245eda71630192a17e3': 'file\nwith\nmultiple \nlines\nto\nemulate\na\nreal\nfile\n', + 'cb68a3027d6dda065a7692ebf2c90bed1bcdec28': 'file\nwith\nsome\nchanges\nto\nemulate\na\nreal\nfile\n', + '1905dcf16c0aac6ac24f7ab617ad09c73dc1d23b': 'file\nwith\nsome\nlines\nto\nemulate\na\nfake\ntest\n', + 'ae4eca7f222c96d396927d48ab7538e2ee13ca63': 'readme\nwithout\nsome\nlines\nto\nsimulate\na\nreal\nfile', + '548f8ba15abc30875a082156314426806c3f4d97': 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile' + } + + return content_map.get(at, '') Consider using a more efficient data structure, such as a dictionary, for the `mock_get_content_of_file` method. This would improve the readability and maintainability of the code, especially as the number of test cases grows. tests/unittest/test_bitbucket_provider.py [25-40] def mock_get_content_of_file(self, project_key, repository_slug, filename, at=None, markup=None): - if at == '9c1cffdd9f276074bfb6fb3b70fbee62d298b058': - return 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile\n' - elif at == '2a1165446bdf991caf114d01f7c88d84ae7399cf': - return 'file\nwith\nmultiple \nlines\nto\nemulate\na\nfake\nfile\n' - elif at == 'f617708826cdd0b40abb5245eda71630192a17e3': - return 'file\nwith\nmultiple \nlines\nto\nemulate\na\nreal\nfile\n' - elif at == 'cb68a3027d6dda065a7692ebf2c90bed1bcdec28': - return 'file\nwith\nsome\nchanges\nto\nemulate\na\nreal\nfile\n' - elif at == '1905dcf16c0aac6ac24f7ab617ad09c73dc1d23b': - return 'file\nwith\nsome\nlines\nto\nemulate\na\nfake\ntest\n' - elif at == 'ae4eca7f222c96d396927d48ab7538e2ee13ca63': - return 'readme\nwithout\nsome\nlines\nto\nsimulate\na\nreal\nfile' - elif at == '548f8ba15abc30875a082156314426806c3f4d97': - return 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile' - return '' + content_map = { + '9c1cffdd9f276074bfb6fb3b70fbee62d298b058': 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile\n', + '2a1165446bdf991caf114d01f7c88d84ae7399cf': 'file\nwith\nmultiple \nlines\nto\nemulate\na\nfake\nfile\n', + 'f617708826cdd0b40abb5245eda71630192a17e3': 'file\nwith\nmultiple \nlines\nto\nemulate\na\nreal\nfile\n', + 'cb68a3027d6dda065a7692ebf2c90bed1bcdec28': 'file\nwith\nsome\nchanges\nto\nemulate\na\nreal\nfile\n', + '1905dcf16c0aac6ac24f7ab617ad09c73dc1d23b': 'file\nwith\nsome\nlines\nto\nemulate\na\nfake\ntest\n', + 'ae4eca7f222c96d396927d48ab7538e2ee13ca63': 'readme\nwithout\nsome\nlines\nto\nsimulate\na\nreal\nfile', + '548f8ba15abc30875a082156314426806c3f4d97': 'file\nwith\nsome\nlines\nto\nemulate\na\nreal\nfile' + } + return content_map.get(at, '') Apply this suggestion Suggestion importance[1-10]: 9 Why: This suggestion greatly improves code maintainability and readability, especially as the number of test cases grows, making it a valuable refactoring.	9
Maintainability	Improve variable naming for better code readability and understanding Consider using a more descriptive variable name for `guaranteed_common_ancestor`. A name like `oldest_source_commit` might better describe what this variable represents, improving code readability. pr_agent/git_providers/bitbucket_server_provider.py [156-160] source_commits_list = list(self.bitbucket_client.get_pull_requests_commits(self.workspace_slug, self.repo_slug, self.pr_num)) -guaranteed_common_ancestor = source_commits_list[-1]['parents'][0]['id'] -destination_commits = list(self.bitbucket_client.get_commits(self.workspace_slug, self.repo_slug, guaranteed_common_ancestor, self.pr.toRef['latestCommit'])) +oldest_source_commit = source_commits_list[-1]['parents'][0]['id'] +destination_commits = list(self.bitbucket_client.get_commits(self.workspace_slug, self.repo_slug, oldest_source_commit, self.pr.toRef['latestCommit'])) -base_sha = self.get_best_common_ancestor(source_commits_list, destination_commits, guaranteed_common_ancestor) +base_sha = self.get_best_common_ancestor(source_commits_list, destination_commits, oldest_source_commit) Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggested variable name change significantly improves code readability and understanding, which is crucial for maintainability.	8
Performance	✅ Optimize the creation of the destination_commit_hashes set using a set comprehension and union operation Suggestion Impact: The suggestion was directly implemented in the commit. The code was changed to use a set comprehension with a union operation to create destination_commit_hashes, exactly as suggested. code diff: - destination_commit_hashes = {commit['id'] for commit in destination_commits_list} - destination_commit_hashes.add(guaranteed_common_ancestor) + destination_commit_hashes = {commit['id'] for commit in destination_commits_list} \| {guaranteed_common_ancestor} The `get_best_common_ancestor` method could be optimized by using a set comprehension instead of a loop for creating `destination_commit_hashes`. This would make the code more concise and potentially more efficient. pr_agent/git_providers/bitbucket_server_provider.py [140-143] @staticmethod def get_best_common_ancestor(source_commits_list, destination_commits_list, guaranteed_common_ancestor) -> str: - destination_commit_hashes = {commit['id'] for commit in destination_commits_list} - destination_commit_hashes.add(guaranteed_common_ancestor) + destination_commit_hashes = {commit['id'] for commit in destination_commits_list} \| {guaranteed_common_ancestor} Apply this suggestion Suggestion importance[1-10]: 7 Why: This suggestion offers a minor performance improvement and enhances code readability by using more concise Python syntax.	7
Enhancement	Simplify the initialization of the Bitbucket client by using a default value in the method signature Consider using a default value for `bitbucket_client` instead of `None`. This would allow you to create the `Bitbucket` instance directly in the parameter list, reducing the need for an additional conditional statement in the method body. pr_agent/git_providers/bitbucket_server_provider.py [17-19] def __init__( self, pr_url: Optional[str] = None, incremental: Optional[bool] = False, - bitbucket_client: Optional[Bitbucket] = None, + bitbucket_client: Optional[Bitbucket] = Bitbucket(url=self._parse_bitbucket_server(pr_url), + token=get_settings().get("BITBUCKET_SERVER.BEARER_TOKEN", None)), ): Apply this suggestion Suggestion importance[1-10]: 5 Why: The suggestion improves code readability but may not be the best practice as it could lead to unexpected behavior if the URL is not provided.	5

mrT23 · 2024-08-25T14:19:50Z

@MaxHoecker
Does this PR support the different calculation methods that were used in Bitbucket Server 7.0 before and after ?

In other words, is it backward (and forward) compatible ?

MaxHoecker · 2024-08-26T13:50:38Z

@mrT23
It does not support the 3-way. Bitbucket Server 7.0 came out over 4 years ago, are you suggesting an additional config value to toggle between them?

mrT23 · 2024-08-26T17:15:15Z

@MaxHoecker I dont know.
I am not a Bitbucket server user.
Just trying to make sure this is an improvement.

@MarkRx are you interested in reviewing this PR ?

MaxHoecker · 2024-08-26T17:48:19Z

Added a config to allow the legacy 3-way diff. Although I think it's unlikely anyone using this and bitbucket server is using Bitbucket Server 6.0 or older, so I defaulted to the new diff calculation method.

MarkRx · 2024-08-27T14:44:10Z

Looks like there is an API endpoint you can call to get the base line commit for a 2-way diff. It may be better to use that to match what BB is doing. If the API client doesn't expose the endpoint you can call it manually like I did here.

As for supporting a 3-way diff unfortunately BB forces the 2-way diff from version 7+. Given that Enterprises should be keeping up to date in theory support for 6 with a 3-way diff is not necessary.

MaxHoecker · 2024-08-28T17:14:59Z

Unfortunately that endpoint is not available in my Bitbucket API version. I'll add a check for the api version and use a different strategy depending on api version though

…egy depending on API version, and expanded unit test cases

MaxHoecker · 2024-08-28T22:00:40Z

Changes pushed, @MarkRx please review when you get the chance 😄

MarkRx · 2024-08-29T19:20:28Z

pr_agent/git_providers/bitbucket_server_provider.py

@@ -407,3 +451,6 @@ def get_pr_labels(self, update=False):

    def _get_pr_comments_path(self):
        return f"rest/api/latest/projects/{self.workspace_slug}/repos/{self.repo_slug}/pull-requests/{self.pr_num}/comments"
+
+    def _get_best_common_ancestor(self):


This method name is confusing

completely agree, will fix

MarkRx · 2024-08-29T19:30:34Z

pr_agent/git_providers/bitbucket_server_provider.py

+        base_sha, head_sha = source_commits_list[-1]['parents'][0]['id'], source_commits_list[0]['id']
+
+        # if Bitbucket api version is greater than or equal to 7.0 then use 2-way diff functionality for the base_sha
+        if self.bitbucket_api_version is not None and self.bitbucket_api_version >= LooseVersion("7.0"):


it may be more clear to split this into 3 cases:
<7
<8.16
8.16+

Unfortunately I don't think this block can be cleanly split into 3 cases like that based on the shared logic between certain cases. I do think I can cut out a call to get the source_commits_list for the >8.16 condition though so I'll at least add that

mrT23 · 2024-09-03T13:24:39Z

ok, thanks for the contribution

hussam789 · 2024-09-04T14:29:10Z

@CodiumAI-Agent /improve
--pr_code_suggestions.num_code_suggestions_per_chunk="5"

mhoecke1 added 2 commits August 23, 2024 10:47

BB server 2 way diff fixes

2beefab

removing unnecessary if check

2a9e3ee

codiumai-pr-agent-pro bot added enhancement New feature or request Bug fix labels Aug 23, 2024

codiumai-pr-agent-pro bot added the Review effort [1-5]: 3 label Aug 23, 2024

implementing PR bot feedback

a99ebf8

MaxHoecker force-pushed the bugfix/bb-2-way-diff-fix branch from 9176c0d to ec5c00d Compare August 26, 2024 17:40

adding config value for old Bitbucket Server diff functionality

0442cdc

MaxHoecker force-pushed the bugfix/bb-2-way-diff-fix branch from ec5c00d to 0442cdc Compare August 26, 2024 21:12

Codium-ai deleted a comment from codiumai-pr-agent-pro bot Aug 27, 2024

implementing more feedback, choosing a different Bitbucket diff strat…

29c5075

…egy depending on API version, and expanded unit test cases

MaxHoecker force-pushed the bugfix/bb-2-way-diff-fix branch from 7539917 to 29c5075 Compare August 28, 2024 21:13

MarkRx reviewed Aug 29, 2024

View reviewed changes

further cleaned up code based on feedback

cf14e45

MarkRx approved these changes Aug 30, 2024

View reviewed changes

mrT23 approved these changes Sep 3, 2024

View reviewed changes

mrT23 merged commit d4d9a7f into Codium-ai:main Sep 3, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/bb 2 way diff fix #1172

Bugfix/bb 2 way diff fix #1172

MaxHoecker commented Aug 23, 2024 •

edited by codiumai-pr-agent-pro bot

Loading

codiumai-pr-agent-pro bot commented Aug 23, 2024

codiumai-pr-agent-pro bot commented Aug 23, 2024 •

edited by CodiumAI-Agent

Loading

mrT23 commented Aug 25, 2024

MaxHoecker commented Aug 26, 2024

mrT23 commented Aug 26, 2024

MaxHoecker commented Aug 26, 2024

MarkRx commented Aug 27, 2024

MaxHoecker commented Aug 28, 2024

MaxHoecker commented Aug 28, 2024

MarkRx Aug 29, 2024

MaxHoecker Aug 29, 2024

MarkRx Aug 29, 2024

MaxHoecker Aug 29, 2024

mrT23 commented Sep 3, 2024

hussam789 commented Sep 4, 2024

Bugfix/bb 2 way diff fix #1172

Bugfix/bb 2 way diff fix #1172

Conversation

MaxHoecker commented Aug 23, 2024 • edited by codiumai-pr-agent-pro bot Loading

User description

PR Type

Description

Changes walkthrough 📝

codiumai-pr-agent-pro bot commented Aug 23, 2024

PR Reviewer Guide 🔍

codiumai-pr-agent-pro bot commented Aug 23, 2024 • edited by CodiumAI-Agent Loading

PR Code Suggestions ✨

Previous suggestions

mrT23 commented Aug 25, 2024

MaxHoecker commented Aug 26, 2024

mrT23 commented Aug 26, 2024

MaxHoecker commented Aug 26, 2024

MarkRx commented Aug 27, 2024

MaxHoecker commented Aug 28, 2024

MaxHoecker commented Aug 28, 2024

MarkRx Aug 29, 2024

Choose a reason for hiding this comment

MaxHoecker Aug 29, 2024

Choose a reason for hiding this comment

MarkRx Aug 29, 2024

Choose a reason for hiding this comment

MaxHoecker Aug 29, 2024

Choose a reason for hiding this comment

mrT23 commented Sep 3, 2024

hussam789 commented Sep 4, 2024

MaxHoecker commented Aug 23, 2024 •

edited by codiumai-pr-agent-pro bot

Loading

codiumai-pr-agent-pro bot commented Aug 23, 2024 •

edited by CodiumAI-Agent

Loading