-
Notifications
You must be signed in to change notification settings - Fork 28
feat: add reproducible central buildspec generation #1115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
return True | ||
|
||
|
||
def compare_rc_build_spec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the logic of this function is taken from https://github.com/oracle/macaron/blob/6a712af1ebdcb435bd5b7199dc1b4f5473663090/tests/vsa/compare_vsa.py
I think the comparing functions within compare_vsa.py
could be refactored out into a tests_util.py
module so that all "compare" scripts used in integration tests (here) could use if needed.
Please let me know if this should be done in a subsequent PR or this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either is fine for me.
2547333
to
3014540
Compare
3014540
to
da329fe
Compare
pformat(patches), | ||
) | ||
|
||
group = purl.namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Reproducible Central Buildspec file needs the GAV coordinate of the component. This will only make sense for an input maven
type PackageURL.
At the moment, we accept any type of PackageURL. Please let me know if it makes sense to enforce the input PackageURL to be of type maven when we are generating a Reproducible Central Buildspec ? (Ideally each type of Buildspec format might have different requirements on the input PURL)
src/macaron/build_spec_generator/reproducible_central/reproducible_central.py
Outdated
Show resolved
Hide resolved
Because we are providing the path to the database from CLI argument, we need to support mounting this database file into the container file system too. |
@tromai Instead of storing the build spec in |
@behnazh-w I only use I can definitely change it to a PURL based path format. The only downside of using a PURL based path here is that it would be a bit challenging for scripting in a CI/CD environment, if that is the targeted use case (you might need to compute the PURL based path from outside of Macaron). |
I think the VSA feature is a little different because it is generated by the policy engine that enforces a given policy, which is not applied to a PURL. The subject instead is specified in the policy itself.
Yes that's true, but I think overall it's less error-prone and we avoid over-writing previous results. |
src/macaron/build_spec_generator/cli_command_parser/gradle_cli_parser.py
Outdated
Show resolved
Hide resolved
c928fb1
to
4a4ac5d
Compare
4a4ac5d
to
0ff4d6c
Compare
Signed-off-by: Trong Nhan Mai <trong.nhan.mai@oracle.com>
Signed-off-by: Trong Nhan Mai <trong.nhan.mai@oracle.com>
Signed-off-by: Trong Nhan Mai <trong.nhan.mai@oracle.com>
Signed-off-by: Trong Nhan Mai <trong.nhan.mai@oracle.com>
Signed-off-by: Trong Nhan Mai <trong.nhan.mai@oracle.com>
|
||
|
||
@dataclass | ||
class MavenSystemPropeties(OptionDef[dict[str, str | None]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class MavenSystemPropeties(OptionDef[dict[str, str | None]]): | |
class MavenSystemProperties(OptionDef[dict[str, str | None]]): |
class MavenGoalPhase(OptionDef[list[str]]): | ||
"""This option represents the positional goal/plugin-phase option in Maven CLI command. | ||
|
||
argparse.Namespace stores this as a list of string. This is stored internally as a list of string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of this option would still be useful.
# TODO: we need to confirm whether one can provide | ||
# -P or -pl multiple times and the values will be aggregate into a list of string | ||
# The current implementation only consider one instance of -P or -pl. | ||
# Where to begin: | ||
# https://github.com/apache/maven/blob/maven-3.9.x/maven-embedder/src/main/java/org/apache/maven/cli/CLIManager.java | ||
# https://github.com/apache/commons-cli/blob/master/src/main/java/org/apache/commons/cli/Parser.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this comment should be moved down a bit to where the related options are.
class MavenCLICommandParser: | ||
"""A Maven CLI Command Parser.""" | ||
|
||
ACCEPTABLE_EXECUTABLE = ["mvn", "mvnw"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACCEPTABLE_EXECUTABLE = ["mvn", "mvnw"] | |
ACCEPTABLE_EXECUTABLE = {"mvn", "mvnw"} |
def apply_patch( | ||
self, | ||
cli_command: MavenCLICommand, | ||
options_patch: Mapping[str, MavenOptionPatchValueType | None], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
options_patch: Mapping[str, MavenOptionPatchValueType | None], | |
patch_options: Mapping[str, MavenOptionPatchValueType | None], |
I think this is actually more appropriate than options_patch
. Also applies to the Gradle counterpart.
ext: JavaArtifactExt | ||
The extension of the main artifact file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext: JavaArtifactExt | |
The extension of the main artifact file. |
return get_jdk_version_from_jar(local_artifact_path) | ||
|
||
|
||
def find_jdk_version_from_central_maven_repo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things for this function and the things it relies on:
- The Java extension enum seems to only be used here, and contains only a single value. Maybe it should just be hardcoded? Or stored not as an enum?
- The caching strategy also seems to be somewhat static; however I see some value of having that option.
- The two
find_jdk...
functions are very similar. I would suggest having a single entry point that diverges based on the caching logic (assuming that's being kept), as it is mostly the download part that differs.
>>> normalize_jdk_version("25.0.1") | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a missing example result here.
>>> normalize_jdk_version("25.0.1") | ||
""" | ||
first, _, after = jdk_version_str.partition(".") | ||
jdk_major_ver = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jdk_major_ver = None |
|
||
"""This module contains the logic to nomarlize a JDK version string to a major version number.""" | ||
|
||
SUPPORTED_JAVA_VERSION = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a set.
return result | ||
|
||
|
||
def compile_sqlite_select_statement(select_statment: Select) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def compile_sqlite_select_statement(select_statment: Select) -> str: | |
def compile_sqlite_select_statement(select_statement: Select) -> str: |
purl_string : str | ||
The PackageURL string to find the BuildToolFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be updated to match the function parameter.
|
||
Returns | ||
------- | ||
Select[tuple[BuildAsCodeFacts]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Select[tuple[BuildAsCodeFacts]] | |
Select[tuple[BuildToolFacts]] |
purl_string : str | ||
The PackageURL string to find the BuildToolFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated.
purl_string : str | ||
The PackageURL string to find the BuildServiceFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated.
purl_string : str | ||
The PackageURL string to find the BuildScriptFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated.
purl_string : str | ||
The PackageURL string to look for the BuildToolFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated.
purl_string : str | ||
The PackageURL string to look for the BuildAsCodeFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated.
purl_string : str | ||
The PackageURL string to look for the BuildServiceFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated.
purl_string : str | ||
The PackageURL string to look for the BuildScriptFacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated.
SBT = "sbt" | ||
|
||
|
||
def format_build_command_infos(build_command_infos: list[GenericBuildCommandInfo]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def format_build_command_infos(build_command_infos: list[GenericBuildCommandInfo]) -> str: | |
def format_build_command_info(build_command_info: list[GenericBuildCommandInfo]) -> str: |
if fact.language in {"java"}: | ||
try: | ||
macaron_build_tool_name = _MacaronBuildToolName(fact.build_tool_name) | ||
except ValueError: | ||
continue | ||
|
||
# TODO: What happen if we report multiple build tools in the database. | ||
return macaron_build_tool_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if fact.language in {"java"}: | |
try: | |
macaron_build_tool_name = _MacaronBuildToolName(fact.build_tool_name) | |
except ValueError: | |
continue | |
# TODO: What happen if we report multiple build tools in the database. | |
return macaron_build_tool_name | |
if fact.language != "java": | |
continue | |
try: | |
macaron_build_tool_name = _MacaronBuildToolName(fact.build_tool_name) | |
except ValueError: | |
continue | |
# TODO: What happen if we report multiple build tools in the database. | |
return macaron_build_tool_name |
an error, or no build command is found from the database. | ||
""" | ||
try: | ||
lookup_build_command_infos = lookup_any_build_command(component_id, session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookup_build_command_infos = lookup_any_build_command(component_id, session) | |
lookup_build_command_info = lookup_any_build_command(component_id, session) |
lookup_build_command_jdk = ( | ||
get_lookup_build_command_jdk( | ||
lookup_build_command_info, | ||
) | ||
if lookup_build_command_info | ||
else None | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should only be executed if the jdk_from_jar
is None
.
if not patched_build_commands: | ||
logger.error( | ||
"Failed to patch command sequences %s.", | ||
[selected_build_command], | ||
) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the JDK discovery can rely on the build information via lookup_build_command_info
, I'd suggest moving all of the build command related code before the JDK related code. Up to and including the highlighted block here.
Also might be a good idea to extract some of this out to standalone functions. E.g. JDK, Patched Build Commands, etc.
def test_comparing_gradle_cli_command_unequal_types( | ||
gradle_cli_parser: GradleCLICommandParser, | ||
command: str, | ||
that: Any, | ||
) -> None: | ||
"""Test comparing MavenCLICommand with another incompatible type oject.""" | ||
this_command = gradle_cli_parser.parse(command.split()) | ||
assert not this_command == that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this testing? It seems to be comparing a GradleCLICommand object against a bool, list, and set.
The result object. | ||
expected : dict[str, str] | ||
The expected object. | ||
compare_fn_map : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compare_fn_map : str | |
compare_fn_map : dict[str, CompareFn] |
|
||
def extract_data_from_build_spec(build_spec_path: str) -> dict[str, str] | None: | ||
"""Extract data from build spec.""" | ||
original_build_spec_content = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original_build_spec_content = None |
break | ||
|
||
if effective_cli_parser: | ||
expected_cli_commands.append(cli_parser.parse(cmd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected_cli_commands.append(cli_parser.parse(cmd)) | |
expected_cli_commands.append(effective_cli_parser.parse(cmd)) |
assert not lookup_any_build_command(component_id=1, session=macaron_db_session) | ||
|
||
|
||
def test_invalid_input_databse(invalid_db_session: Session) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_invalid_input_databse(invalid_db_session: Session) -> None: | |
def test_invalid_input_database(invalid_db_session: Session) -> None: |
Summary
This Pull Request adds a new command called
gen-build-spec
.This command generates a buildspec, which contains the build related information for a PURL that Macaron has analyzed. The output file will be stored within
output/<purl_based_path>/macaron.buildspec
, where<purl_based_path>
being the directory structure according to the input PackageURL.An example
In this example, the final path to the output buildspec file is
output/maven/org_apache_hugegraph/computer-k8s/macaron.buildspec
The content of
output/maven/org_apache_hugegraph/computer-k8s/macaron.buildspec
, which uses the Reproducible Central buildspec format.
This Buildspec ideally can be used directly as part of the Reproducible Central rebuild infrastructure.
Description of changes
Macaron database extractor
The first step to generate a buildspec is to extract the build related information from an existing Macaron SQLite database. The module
macaron_db_extractor.py
added in this commit does just that.It uses sqlalchemy SELECT statement for ORM Mapped Classes to extract the data from the database into equivalent ORM Mapped instances that we defined in src/macaron/database/table_definitions.py for example.
Maven and Gradle CLI Command Parser
We use the build commands obtained in CI/CD configuration (e.g. from github action workflow yaml file) for the final buildspec. However, those build commands cannot be used as is and they requires some additional patching to work as a rebuild command.
A proper way to patch any maven and gradle CLI build command is to first parse is. The maven and gradle CLI command parsers added in this commit leverage Python's builtin
argparse
library.CLI Build Command Patcher
The modules added in this commit uses the Maven and GRadle CLI Command Parser to parse and patch the build commands obtained from the Macaron database.
Jdk version finding from java Maven Central artifacts
Macaron can obtain the JDK version for a given build command obtained from CI/CD configuration. In some cases, the CI/CD configuration doesn't have enough information for us to obtain the JDK version. Therefore, we also rely on the JDK version included in
META-INF/MANIFEST.MF
in java artifacts from Maven Centralhttps://repo1.maven.org/
.The module
jdk_finder.py
added in this commit help download the java artifacts from Maven Central given a maven type PURL, then returns the JDK version if it is available inMETA-INF/MANIFEST.MF
.In some cases, the JDK version string from
META-INF/MANIFEST.MF
don't only contain the JDK major version. For example:Because Reproducible Central Buildspec requires only the major version of JDK, we need to extract that major version only. The
jdk_version_normalizer.py
module contains the logic to do just that. It is added this in commit.Generating the Reproducible Central Buildspec
The two commits
use all components listed above to generate the final Reproducible Central Buildspec
Testing
This feature includes unit tests for all components used in RC Buildspec generation (e.g. CLI parsers, CLI patchers, JDK version finder, etc.)
For integration tests, a new script called compare_rc_build_spec.py is added to compare the result Buildspec in the integration tests.
Checklist
verified
label should appear next to all of your commits on GitHub.output/macaron.buildspec
(could also be in a different PR).