Skip to content

Conversation

lucia-sb
Copy link
Contributor

@lucia-sb lucia-sb commented Sep 12, 2025

What does this PR do?

Adds logic that uploads dependency size data to a JSON artifact, so size measurements no longer depend on merged lockfiles.

Motivation

Size measuraments are inaccurate because dependency lockfiles are merged in a follow-up PR. As a result, updates or new dependencies aren’t reflected in the size delta of the originating PR.
This change uploads dependency sizes as a JSON artifact during the build, allowing us to measure size changes directly, without relying on lockfile merges.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

⚠️ Recommendation: Add qa/skip-qa Label

This PR does not modify any files shipped with the agent.

To help streamline the release process, please consider adding the qa/skip-qa label if these changes do not require QA testing.

@lucia-sb lucia-sb changed the title create dependency json [AI-5800] Measure size of dependencies before PRs are merged Sep 12, 2025
Copy link

codecov bot commented Sep 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.96%. Comparing base (7b59ac0) to head (a481300).
⚠️ Report is 13 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lucia-sb lucia-sb marked this pull request as ready for review September 17, 2025 10:44
@lucia-sb lucia-sb requested review from a team as code owners September 17, 2025 10:44
@lucia-sb lucia-sb added the qa/skip-qa Automatically skip this PR for the next QA label Sep 17, 2025
Copy link
Contributor

@AAraKKe AAraKKe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It looks very promising and pretty well organized.

Although there are a bunch of comments I think we are pretty close to get it ready! Let me know if you want to discuss any of them.

My Feedback Legend

Here's a quick guide to the prefixes I use in my comments:

question: I need clarification or I'm seeking to understand your approach.
suggestion: I'm proposing an improvement. This is optional but recommended.
nit: A minor, non-blocking issue (e.g., style, typo). Feel free to ignore.
request: A change I believe is necessary before this can be merged.

I would start by addressing the request and answering the question. Afterwards, you can move to suggestion and nit.

Comment on lines +42 to +43
platform: str | None,
version: str | None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!! 😊

@@ -55,6 +60,8 @@ def status(
raise ValueError(f"Invalid platform: {platform}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can do

raise ValueError(f"Invalid platform: {platform!r}")

the !r modifier prints the repr of the variable which for a string includes the quotes. I will result in a message like

Invalid platform: 'no-platform'

instead of

Invalid platform: no-platform

This is strictly personal but normally I prefer show that the value being printed is a string supplied by the user the fact that it is a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@cache
def get_previous_dep_sizes_json(base_commit, platform):
print(f"Getting previous dependency sizes json for base_commit={base_commit}, platform={platform}")
run_id = get_run_id(base_commit, '.github/workflows/measure-disk-usage.yml')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Can we extract this string to a module constant in the module? Just in case we need to modify or access it, avoid having several replicas or the same string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

print(f"Previous run_id: {run_id}")
compressed_json = None
uncompressed_json = None
if run_id and check_artifact_exists(run_id, f'status_compressed_{platform}.json'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I am wondering, we keep questioning first whether there are artifacts to download and when there are None we set whatever value to None.

Whey don't we just call get_artifacts and the same for everything where we validate first whether they exist, and return None on error? We would be saving several api calls.

For exampe, if I try to run gh run download for an artifact that does not exist I get

gh run download 17802166735 --name naaaan
no artifact matches any of the names or patterns provided

which seems that it is already telling me that it does not exist, right? We can just get the output of the command we want to run and validate against the error mode to know they do not exist. No need to first ask whether they exist or not and then download it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

assert convert_to_human_readable_size(1024) == "1.0 KB"
assert convert_to_human_readable_size(1048576) == "1.0 MB"
assert convert_to_human_readable_size(1073741824) == "1.0 GB"
assert convert_to_human_readable_size(1024) == "1.0 KiB"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request: for these kind of tests lest parameterize them. These multiple asserts to test different cases can break without giving us the full picture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -126,3 +127,10 @@ def test_status_wrong_plat_and_version(ddev):
):
result = ddev("size", "status", "--platform", "linux", "--python", "2.10", "--compressed")
assert result.exit_code != 0


def test_status_dependency_sizes(ddev, mock_size_status):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what is this test testing? The name is not very descriptive and it seems we are just really testing that we cannot pass commit and dependency-sizes at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the tests

}
}

mock_file_handler = mock_open()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this first mock doe snot need to be a mock_open, can be a simple MagicMock. The mock_open helps by returning whatever is on read_data when the return_value of the __enter__ method of the context manager is called but since here we want to give 2 different bheaviors, we just need something that provides 2 different side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants