This Java "Autograding Model" library evaluates projects based on a configurable set of metrics:
-
Test statistics (e.g., number of failed tests)
-
Code coverage (e.g., percentage of covered lines)
-
Mutation coverage (e.g., percentage of survived mutations)
-
Static analysis warnings (e.g., number of SpotBugs warnings)
-
Software metrics (e.g., cyclomatic complexity)
The full list of supported metrics is defined by the parsers in the analysis-model and coverage-model libraries, which this library uses to read tool reports.
The autograding library reads reports produced by other tools (e.g., JUnit, JaCoCo, SpotBugs), aggregates the results, and evaluates them. Depending on the configuration, it computes a score to assess overall quality or generates a metrics report without scoring. It is designed for CI environments such as Jenkins, GitHub Actions, and GitLab CI, and can annotate pull or merge requests. Quality gates can be enforced to fail a build if defined criteria are not met.
This autograding library is the foundation for the following tools:
-
GitHub quality monitor: Monitors and enforces the quality of pull requests (or single commits) in GitHub.
-
GitHub autograding action: Computes an autograding score for student classroom projects in GitHub pull requests.
-
GitLab autograding action: Computes an autograding score for student projects in GitLab merge requests.
-
Jenkins autograding plugin: Shows the autograding results in Jenkins' UI.
When you use this library to compute an autograding score, you must define the metrics, their tools, and the scoring criteria in a JSON configuration. Details about each metric are explained in the Metrics Documentation.
Example Autograding Configuration
{
"tests": {
"name": "JUnit Tests",
"id": "tests",
"tools": [
{
"id": "junit",
"name": "Unit Tests",
"pattern": "**/target/*-reports/TEST*.xml"
}
],
"failureRateImpact": -1,
"maxScore": 100
},
"analysis": [
{
"name": "Style",
"id": "style",
"tools": [
{
"id": "checkstyle",
"pattern": "**/target/**checkstyle-result.xml"
},
{
"id": "pmd",
"pattern": "**/target/pmd-*/pmd.xml"
}
],
"errorImpact": -1,
"highImpact": -1,
"normalImpact": -1,
"lowImpact": -1,
"maxScore": 100
},
{
"name": "Bugs",
"id": "bugs",
"icon": "bug",
"tools": [
{
"id": "spotbugs",
"sourcePath": "src/main/java",
"pattern": "**/target/spotbugsXml.xml"
}
],
"errorImpact": -3,
"highImpact": -3,
"normalImpact": -3,
"lowImpact": -3,
"maxScore": 100
}
],
"coverage": [
{
"name": "Code Coverage",
"tools": [
{
"id": "jacoco",
"name": "Line Coverage",
"metric": "line",
"sourcePath": "src/main/java",
"pattern": "**/target/site/jacoco/jacoco.xml"
},
{
"id": "jacoco",
"name": "Branch Coverage",
"metric": "branch",
"sourcePath": "src/main/java",
"pattern": "**/target/site/jacoco/jacoco.xml"
}
],
"maxScore": 100,
"missedPercentageImpact": -1
},
{
"name": "Mutation Coverage",
"tools": [
{
"id": "pit",
"name": "Mutation Coverage",
"metric": "mutation",
"sourcePath": "src/main/java",
"pattern": "**/target/pit-reports/mutations.xml"
},
{
"id": "pit",
"name": "Test Strength",
"metric": "test-strength",
"sourcePath": "src/main/java",
"pattern": "**/target/pit-reports/mutations.xml"
}
],
"maxScore": 100,
"missedPercentageImpact": -1
}
]
}
When you use this library to generate a metrics report without scores, you must define the individual metrics and their configuration in a JSON configuration, as shown below. This configuration is a subset of the autograding score configuration, without scoring criteria. Details about each metric are provided in the documentation at the end of this document.
Example Metric Configuration
{
"tests": {
"name": "Tests",
"tools": [
{
"id": "junit",
"name": "Unit Tests",
"pattern": "**/target/*-reports/TEST*util*.xml"
},
{
"id": "junit",
"icon": "no_entry",
"name": "Architecture Tests",
"pattern": "**/target/surefire-reports/TEST*archunit*.xml"
}
]
},
"analysis": [
{
"name": "Style",
"id": "style",
"tools": [
{
"id": "checkstyle",
"pattern": "**/target/**checkstyle-result.xml"
},
{
"id": "pmd",
"pattern": "**/target/pmd-*/pmd.xml"
}
]
},
{
"name": "Bugs",
"id": "bugs",
"icon": "bug",
"tools": [
{
"id": "spotbugs",
"sourcePath": "src/main/java",
"pattern": "**/target/spotbugsXml.xml"
},
{
"id": "error-prone",
"pattern": "**/maven.log"
}
]
},
{
"name": "API Problems",
"id": "api",
"icon": "no_entry_sign",
"tools": [
{
"id": "revapi",
"sourcePath": "src/main/java",
"pattern": "**/target/revapi-result.json"
}
]
},
{
"name": "Vulnerabilities",
"id": "vulnerabilities",
"icon": "shield",
"tools": [
{
"id": "owasp-dependency-check",
"icon": "shield",
"pattern": "**/target/dependency-check-report.json"
}
]
}
],
"coverage": [
{
"name": "Code Coverage",
"tools": [
{
"id": "jacoco",
"metric": "line",
"sourcePath": "src/main/java",
"pattern": "**/target/site/jacoco/jacoco.xml"
},
{
"id": "jacoco",
"metric": "branch",
"sourcePath": "src/main/java",
"pattern": "**/target/site/jacoco/jacoco.xml"
}
]
},
{
"name": "Mutation Coverage",
"tools": [
{
"id": "pit",
"metric": "mutation",
"sourcePath": "src/main/java",
"pattern": "**/target/pit-reports/mutations.xml"
},
{
"id": "pit",
"metric": "test-strength",
"sourcePath": "src/main/java",
"pattern": "**/target/pit-reports/mutations.xml"
}
]
}
],
"metrics":
{
"name": "Software Metrics",
"tools": [
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "cyclomatic-complexity"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "cognitive-complexity"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "npath-complexity"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "loc"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "ncss"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "cohesion"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "weight-of-class"
}
]
}
}
Quality gates enforce a minimum quality level for a project. For example, you can require at least 80% line coverage and no critical bugs. The following example shows how to define such a quality gate:
{
"qualityGates": [
{
"metric": "line",
"threshold": 80.0,
"criticality": "FAILURE"
},
{
"metric": "spotbugs",
"threshold": 0.0,
"criticality": "UNSTABLE"
}
]
}
Tip
|
The quality gate configuration is not part of the autograding score JSON configuration. It is a separate configuration consumed by the corresponding tools. In GitLab, pass the configuration via an environment variable; in GitHub Actions, pass it via an action input. |
The following sections describe each metric and its JSON configuration. Every metric can be enabled and configured individually. All configurations share the same structure: define a list of tools to collect the data, a name, an icon (Markdown emoji or OpenMoji), and optionally a maximum score (if a score should be computed). Each tool must provide a glob pattern that locates its result files in the workspace (e.g., JUnit XML reports) and a parser ID so the underlying model can select the correct parser. See analysis model and coverage model for the list of supported parsers.
Optionally, you can define the impact of each result (e.g., a failed test or a missed line) on the final score. Impacts are positive or negative numbers multiplied by the measured values during evaluation. With negative impacts, the score starts at the maximum and subtracts the penalties. With positive impacts, values add up towards the maximum score (capped at the maximum).
This metric can be configured using a JSON object tests
, see the following example:
{
"tests": {
"tools": [
{
"id": "junit",
"name": "Unittests",
"pattern": "**/junit*.xml"
}
],
"name": "JUnit",
"passedImpact": 10,
"skippedImpact": -1,
"failureImpact": -5,
"maxScore": 100
}
}
You can either count passed tests as positive impact or failed tests as negative impact (or use a mix of both). Alternatively, you can use the success or failure rate of the tests to compute the impact. This alternative approach is shown in the next example:
{
"tests": {
"tools": [
{
"id": "junit",
"name": "Unittests",
"pattern": "**/junit*.xml"
}
],
"name": "JUnit",
"successRateImpact": 1,
"failureRateImpact": 0,
"maxScore": 100
}
}
Skipped tests are listed individually. For failed tests, the error message and stack trace are shown after the summary in the pull or merge request.
This metric can be configured using a JSON object coverage
, see the following example:
{
"coverage": [
{
"tools": [
{
"id": "jacoco",
"name": "Line Coverage",
"metric": "line",
"sourcePath": "src/main/java",
"pattern": "**/jacoco.xml"
},
{
"id": "jacoco",
"name": "Branch Coverage",
"metric": "branch",
"sourcePath": "src/main/java",
"pattern": "**/jacoco.xml"
}
],
"name": "JaCoCo",
"maxScore": 100,
"coveredPercentageImpact": 1,
"missedPercentageImpact": -1
},
{
"tools": [
{
"id": "pit",
"name": "Mutation Coverage",
"metric": "mutation",
"sourcePath": "src/main/java",
"pattern": "**/mutations.xml"
}
],
"name": "PIT",
"maxScore": 100,
"coveredPercentageImpact": 1,
"missedPercentageImpact": 0
}
]
}
You can either use the covered percentage as positive impact or the missed percentage as negative impact (a mix of both makes little sense but would work as well).
Please make sure to define exactly a unique and supported metric for each tool.
For example, JaCoCo provides line
and branch
coverage, so you need to define two tools for JaCoCo.
PIT provides mutation coverage, so you need to define a tool for PIT that uses the metric mutation
.
Missed lines or branches and survived mutations can be posted as comments in pull or merge requests if the corresponding tool supports this.
This metric can be configured using a JSON object analysis
, see the following example:
{
"analysis": [
{
"name": "Style",
"id": "style",
"tools": [
{
"id": "checkstyle",
"name": "CheckStyle",
"pattern": "**/target/checkstyle-result.xml"
},
{
"id": "pmd",
"name": "PMD",
"pattern": "**/target/pmd.xml"
}
],
"errorImpact": 1,
"highImpact": 2,
"normalImpact": 3,
"lowImpact": 4,
"maxScore": 100
},
{
"name": "Bugs",
"id": "bugs",
"icon": "bug",
"tools": [
{
"id": "spotbugs",
"name": "SpotBugs",
"sourcePath": "src/main/java",
"pattern": "**/target/spotbugsXml.xml"
}
],
"errorImpact": -11,
"highImpact": -12,
"normalImpact": -13,
"lowImpact": -14,
"maxScore": 100
}
]
}
Typically, negative impacts are used here so that each warning reduces the final score according to its severity. All warnings can be posted as comments in pull or merge requests if supported.
Software metrics can be configured using a JSON object metrics
, see the following example:
{
"metrics":
{
"name": "Software Metrics",
"tools": [
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "cyclomatic-complexity"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "cognitive-complexity"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "npath-complexity"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "loc"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "ncss"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "cohesion"
},
{
"id": "metrics",
"pattern": "**/metrics/pmd.xml",
"metric": "weight-of-class"
}
]
}
}
Currently, no impacts can be defined for software metrics; values are shown for reporting only. This may change in the future, if there is a need for it.