Git Density (git-density
) is a tool to analyze git
-repositories with the goal of detecting the source code density.
It was developed during the research phase of the short technical paper and poster "A changeset-based approach to assess source code density and developer efficacy" [1] and has since been extended to support thorough analyses and insights.
To build the application, restore all nuget packages and simply rebuild all projects.
Run GitDensity.exe
, which has an exhaustive command line interface for analyzing repositories. This implementation also includes a reimplementation of git-hours
[2], runnable using GitHours.exe
(with a similar command line interface).
There are also separate command line tools for extracting metrics (GitMetrics.exe
) and smaller utility that unites a few stand-alone commands (GitTools.exe
, see below).
This application relies on an external executable to run clone detection. Currently, it uses a local version of Softwerk's clone detection service [3]. To obtain a copy free for academic use of this tool, please contact sebastian.honel@lnu.se (primarily) or welf.lowe@lnu.se.
You are not required to use the clone detection in order to obtain a notion fo source code density. In order to obtain a rough notion of it, you may use git-tools
which will extract a ratio of net-lines to gross-lines as density.
The clone detection used in git-density
, however, also computes a string similarity which will yield a most-precise approximation of the source code density.
As for git-metrics
, the application relies on another tool that supports currently obtaining software metrics from Java applications.
Metrics are obtained by building the application (for each commit).
Please contact me if you intend to use Git Metrics and require the tool. The tool is free for academic use.
Git Density is a solution that currently features these three applications:
git-density
: A new metric to detect the density of software projects.- When running
git-density
on a repository, it will compute the density metric as well asgit-hours
and also attempt to obtain the project's metrics at each commit usinggit-metrics
. - Since the data produced by
git-density
is exhaustive and not plain, it must use a relational database as backend and does not support (yet) the output to file/stdout. All of its results are stored in the database for each repository. - It is possible to remove all previous analysis results for one repository (please refer to the command-line help).
- When running
git-hours
: A C# reimplementation of git-hours with some more features (like timespans between commits or time spent by each developer)- It comes also with its own command-line interface and supports
JSON
-formatted output. This useful for just analyzing the time spent on a repository. git-hours
is also part of the full analysis as run bygit-density
.
- It comes also with its own command-line interface and supports
git-metrics
: A C# wrapper around another tool that can build Java-based projects and extract common software metrics at each commit for the entire project and for files affected by the commit.- It comes also with its own command-line interface and supports
JSON
-formatted output (likegit-hours
). - It is part of the full analysis of
git-density
as well. - Please note that the standalone CLI interface is not yet fully implemented, although just minor things are missing (planned is a
JSON
-formatted output).
- It comes also with its own command-line interface and supports
git-tools
: A stand-alone application that uses some of the tools from the other projects to extract information from git repositories and stores them asCSV
-files.- Has its own command-line interface and supports online/offline repos and parallelization.
- Supports two methods currently: Simple and Extended (default) extraction.
- Does not require tools for clone-detection or metrics, as these are not extracted.
- Extracts 58 features (13 features + counts for 20 keywords (see [5]) in Simple-mode):
"SHA1", "RepoPathOrUrl", "AuthorName", "CommitterName", "AuthorTime", "CommitterTime", "Message", "AuthorEmail", "CommitterEmail", "IsInitialCommit", "IsMergeCommit", "NumberOfParentCommits", "ParentCommitSHA1s"
plus 25 in extended:"MinutesSincePreviousCommit", "AuthorNominalLabel", "CommitterNominalLabel", "NumberOfFilesAdded", "NumberOfFilesAddedNet", "NumberOfLinesAddedByAddedFiles", "NumberOfLinesAddedByAddedFilesNet", "NumberOfFilesDeleted", "NumberOfFilesDeletedNet", "NumberOfLinesDeletedByDeletedFiles", "NumberOfLinesDeletedByDeletedFilesNet", "NumberOfFilesModified", "NumberOfFilesModifiedNet", "NumberOfFilesRenamed", "NumberOfFilesRenamedNet", "NumberOfLinesAddedByModifiedFiles", "NumberOfLinesAddedByModifiedFilesNet", "NumberOfLinesDeletedByModifiedFiles", "NumberOfLinesDeletedByModifiedFilesNet", "NumberOfLinesAddedByRenamedFiles", "NumberOfLinesAddedByRenamedFilesNet", "NumberOfLinesDeletedByRenamedFiles", "NumberOfLinesDeletedByRenamedFilesNet", "Density", "AffectedFilesRatioNet"
All applications can be run standalone, but may also be included as references, as they all feature a public API.
You may also use other types of databases, as Git Density supports these: MsSQL2000
, MsSQL2005
, MsSQL2008
, MsSQL2012
, MySQL
, Oracle10
, Oracle9
, PgSQL81
, PgSQL82
, SQLite
, SQLiteTemp
(temporary database that is discarded after the analysis, mainly for testing).
Please use the following BibTeX to cite GitDensity
:
@article{honel2020gitdensity, title={Git Density (2022.10): Analyze git repositories to extract the Source Code Density and other Commit Properties}, DOI={10.5281/zenodo.2565238}, url={https://doi.org/10.5281/zenodo.2565238}, publisher={Zenodo}, author={Sebastian Hönel}, year={2022}, month={Oct}, abstractNote={Git Density (git-density
) is a tool to analyzegit
-repositories with the goal of detecting the source code density. It was developed during the research phase of the short technical paper and poster "A changeset-based approach to assess source code density and developer efficacy" and has since been extended to support extended analyses.}, }
[1] Hönel, S., Ericsson, M., Löwe, W. and Wingkvist, A., 2018, May. A changeset-based approach to assess source code density and developer efficacy. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings (pp. 220-221). ACM, https://www.icse2018.org/event/icse-2018-posters-poster-a-changeset-based-approach-to-assess-source-code-density-and-developer-efficacy
[2] Git hours. "Estimate time spent on a Git repository." https://github.com/kimmobrunfeldt/git-hours
[3] QTools Clone Detection. http://qtools.se/
[4] Hönel, S., Ericsson, M., Löwe, W. and Wingkvist, A., 2019. Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities. In The 19th IEEE International Conference on Software Quality, Reliability, and Security.
[5] Levin, S. and Yehudai, A., 2017, November. Boosting automatic commit classification into maintenance activities by utilizing source code changes. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering (pp. 97-106).