Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP: Add metrics tracking for GCSFileIO #4267

Merged
merged 3 commits into from
Mar 23, 2022

Conversation

rajarshisarkar
Copy link
Contributor

This PR adds metrics tracking for GCSFileIO. This would help reporting IO metrics via the Hadoop FileSystem.Statistics.

@github-actions github-actions bot added the GCP label Mar 4, 2022
@rajarshisarkar
Copy link
Contributor Author

@danielcweeks Can you please review when you get time. If the implementation looks okay then I can go ahead with the other FileIOs.

Comment on lines 61 to 62
GCSOutputStream stream = new GCSOutputStream(storage, randomBlobId(), properties, MetricsContext
.nullMetrics());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Avoid breaking up a chained call as an argument across two lines. In this case I would move all arguments to the next line.

But having MetricsContext and then .nullMetrics() on the next line is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have made the changes.

@danielcweeks
Copy link
Contributor

Hey @rajarshisarkar This is great. I'd actually point out there is some parallel work to make the initialization dynamic in #4254 and that looks like it's about to go in. You might want to take a quick peek and see if we can align with that approach.

@rajarshisarkar
Copy link
Contributor Author

Thanks for having a look, @danielcweeks. Sure, I shall rebase once #4254 is merged to master.

@rdblue rdblue requested a review from danielcweeks March 7, 2022 22:21
public static GCSInputFile fromLocation(String location, Storage storage, GCPProperties gcpProperties) {
return new GCSInputFile(storage, BlobId.fromGsUtilUri(location), gcpProperties);

public static GCSInputFile fromLocation(String location, Storage storage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public? Since it is a public constructor in a public class, we should deprecate the old constructor instead of directly removing it. If we want to avoid needing to deprecate in the future, then I think we should consider making this class and constructor package-private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, for the feedback. I have made the class and method package-private.

@rdblue
Copy link
Contributor

rdblue commented Mar 8, 2022

Most of this looks good. Thanks, @rajarshisarkar!

@rajarshisarkar
Copy link
Contributor Author

Thanks, for the review @kbendick @danielcweeks @rdblue!

Should we consider merging this PR (or) wait for #4254 to be merged to master?

@rdblue
Copy link
Contributor

rdblue commented Mar 23, 2022

I'm going to merge this since there's still ongoing discussion about #4254. Thanks @rajarshisarkar!

@rdblue rdblue merged commit 7de917d into apache:master Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants