Skip to content

[cocom] Add repository level analysis via lizard #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 27, 2019

Conversation

inishchith
Copy link
Contributor

@inishchith inishchith commented Jun 18, 2019

A repository level analysis implementation for CoCom Backend via Lizard.
This implementation doesn't harm the incremental fetches (which was one of the issues with implementation proposed in #38)

Evaluation:

Repository Number of Commits *File Level Repository Level
chaoss/grimoirelab-perceval 1387 23.65 min 27.97 min
chaoss/grimoirelab-sirmordred 869 9.69 min 4.27 min
chaoss/grimoirelab-graal 169 1.73 min 0.90 min

@valeriocos Please do have a look when you get time. Thanks :)

WIP

  • Tests
  • Docstrings

Closes #36

Signed-off-by: inishchith inishchith@gmail.com

@inishchith inishchith force-pushed the repository_level_cocom_lizard branch 6 times, most recently from d341d05 to 3f8a84d Compare June 20, 2019 05:17
@inishchith
Copy link
Contributor Author

@valeriocos after considering only the master branch, following are the results.

Repository Number of Commits *File Level Repository Level
chaoss/grimoirelab-perceval 1394 26:22 min 26:56 min
chaoss/grimoirelab-sirmordred 869 08:51 min 3:51 min
chaoss/grimoirelab-graal 171 2:24 min 1:04 min

Also, please let me know what you think of the changes. Thanks :)

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I left a comment about the use of the library typing.

@inishchith inishchith force-pushed the repository_level_cocom_lizard branch from 3f8a84d to e859c4f Compare June 21, 2019 08:17
@inishchith
Copy link
Contributor Author

@valeriocos I had a question.
Currently, I haven't incorporated cloc related analysis data ( blanks, comments ).
I was thinking to execute it at file_level and add the information here. If you agree, I can evaluate the time after making the changes and report it here.

Let me know what you think :)

@valeriocos
Copy link
Member

Good point @inishchith. Yes, please try to integrate it cloc info, thanks!

@inishchith inishchith force-pushed the repository_level_cocom_lizard branch from e859c4f to 0a05bb6 Compare June 21, 2019 09:02
@inishchith
Copy link
Contributor Author

@valeriocos Thanks. Done!.
Can you have a look again?

@inishchith inishchith marked this pull request as ready for review June 21, 2019 11:44
@inishchith inishchith changed the title [cocom] Add repository level analysis via lizard (WIP) [cocom] Add repository level analysis via lizard Jun 22, 2019
@inishchith
Copy link
Contributor Author

inishchith commented Jun 22, 2019

@valeriocos Addition of cloc at file level has slowed down the analysis.

Evaluation:

Repository Number of Commits *File Level Repository Level
chaoss/grimoirelab-perceval 1394 --- min --- min
chaoss/grimoirelab-graal 171 2:22 min 28:35 min

I can think of two ways now:

  1. Incorporate current implementation and then evaluate, optimize current implementation during the later phases(3rd Coding Phase) as discussed, which helps us with the entire data for metrics visualization. ( preferred )
  2. (Or) Leave out comments and blanks for now ( for the time of execution issues ) and then incorporate it in the later phase(3rd Coding Phase).
    Let me know what you think :)

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please @inishchith squash the commits and improve the commit message and description.
Thanks

@valeriocos
Copy link
Member

Incorporate current implementation and then evaluate, optimize current implementation during the later phases(3rd Coding Phase) as discussed, which helps us with the entire data for metrics visualization. ( preferred )

+1! thanks!

@inishchith
Copy link
Contributor Author

@valeriocos Thanks for the review.
I've worked on the changes addressed above. Please have a look and let me know if any more changes required.

Thanks!

'ccn': analysis.CCN,
'tokens': analysis.token_count,
'num_funs': len(analysis.function_list),
'file_path': analysis.filename,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local repository path should be stripped out, for instance with:
'file_path': analysis.filename.replace(repository_path, '')

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It should be!
I'll update the PR with the change

@valeriocos
Copy link
Member

valeriocos commented Jun 25, 2019

We could add to this PR the code to identify files modified in a commit when executing the analysis at repo-level, what do you think?

The change should be so big, it would be just a matter of passing the commit (or the list of files in the commit), and then mark the files in the result of the analysis. See an attempt below:

slimbook@slimbook-KATANA:~/Escritorio/sources/graal$ git diff
diff --git a/graal/backends/core/analyzers/lizard.py b/graal/backends/core/analyzers/lizard.py
index d7d210e..0ee828e 100644
--- a/graal/backends/core/analyzers/lizard.py
+++ b/graal/backends/core/analyzers/lizard.py
@@ -89,7 +89,7 @@ class Lizard(Analyzer):
         result['funs'] = funs_data
         return result
 
-    def __analyze_repository(self, repository_path, details):
+    def __analyze_repository(self, repository_path, commit, details):
         """Add code complexity information for a given repository
         using Lizard and CLOC.
 
@@ -112,12 +112,14 @@ class Lizard(Analyzer):
 
         for analysis in repository_analysis:
             cloc_analysis = cloc.analyze(file_path=analysis.filename)
+            file_path = analysis.filename.replace(repository_path, '')
             result = {
                 'loc': analysis.nloc,
                 'ccn': analysis.CCN,
                 'tokens': analysis.token_count,
                 'num_funs': len(analysis.function_list),
-                'file_path': analysis.filename,
+                'file_path': file_path,
+                'in_commit': True, # check file in commit.files.file
                 'blanks': cloc_analysis['blanks'],
                 'comments': cloc_analysis['comments']
             }
@@ -140,7 +142,8 @@ class Lizard(Analyzer):
         details = kwargs['details']
 
         if kwargs.get('repository_level', False):
-            result = self.__analyze_repository(kwargs["repository_path"], details)
+            commit = kwargs['commit']
+            result = self.__analyze_repository(kwargs["repository_path"], commit, details)
         else:
             result = self.__analyze_file(kwargs['file_path'], details)
 
diff --git a/graal/backends/core/cocom.py b/graal/backends/core/cocom.py
index 39e1681..8a1aa68 100644
--- a/graal/backends/core/cocom.py
+++ b/graal/backends/core/cocom.py
@@ -190,7 +190,7 @@ class CoCom(Graal):
                 file_info.update({'file_path': file_path})
                 analysis.append(file_info)
         else:
-            analysis = self.analyzer.analyze(self.worktreepath)
+            analysis = self.analyzer.analyze(self.worktreepath, commit)
         return analysis
 
     def _post(self, commit):
@@ -261,7 +261,7 @@ class RepositoryAnalyzer:
         self.details = details
         self.lizard = Lizard()
 
-    def analyze(self, repository_path):
+    def analyze(self, repository_path, commit):
         """Analyze the content of a repository using CLOC and Lizard.
 
         :param repository_path: repository path
@@ -282,6 +282,7 @@ class RepositoryAnalyzer:
         kwargs = {
             'repository_path': repository_path,
             'repository_level': True,
+            'commit': commit,
             'details': self.details
         }
         lizard_analysis = self.lizard.analyze(**kwargs)

@inishchith
Copy link
Contributor Author

@valeriocos The changes look good to me and hopefully we can use this information to provide some insights at file-level as mentioned at inishchith/gsoc#11 (comment)

I shall update the PR now. Thanks!

@inishchith inishchith force-pushed the repository_level_cocom_lizard branch from 6a59a9b to b38400e Compare June 25, 2019 17:01
Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @inishchith for the PR. I left a minor comment about a docstring, once fixed the PR can be merged

Add repository level analysis by adding a new category of analyzer(RepositoryAnalyzer) which executes lizard at repository level for cocom backend.
Add and alter tests for corresponding changes.

Signed-off-by: inishchith <inishchith@gmail.com>
@inishchith inishchith force-pushed the repository_level_cocom_lizard branch from b38400e to 26921fe Compare June 27, 2019 07:19
@valeriocos valeriocos merged commit 26921fe into chaoss:master Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cocom] Evaluating results with repository level analysis
2 participants