-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-17271. S3A statistics to support IOStatistics #2324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17271. S3A statistics to support IOStatistics #2324
Conversation
156 files being changed makes this a fun patch. I can review this or next week (if not committed yet). Don't get blocked by my review 😄 |
80bf3c6
to
fe897b4
Compare
@liuml07 it's split in hadoop-common API and the changes in S3A, where S3A reworks a lot of the statistics code in general (move to interfaces over single implementations in S3AInstrumentation, using the IOStatistics as the way stats are collected and reported). Yes, it touches a lot of files. Look at #2323 first, as that is what we expect applications to use and for other instrumented bits of code (ABFS) to pick up. FWIW I have a PoC parquet hadoop reader which collects this stuff, that's how I've been seeing how well the API works in practise |
fe897b4
to
890f7c4
Compare
I'm going to update with those fixes, and then actually move to a leaner version of this PR
This will massively reduce the s3a code diff, and meets the core goal of "easier for apps to pick up statistics of application use". |
made slightly less "clean" by retaining use of S3AInstrumentation and implementation classes in some places. Less of a switch to the interfaces, but let's be honest: except in tests, there is only one implementation, so let's stay with that. |
115f3a6
to
bab0a98
Compare
bab0a98
to
362792e
Compare
Contains HADOOP-16830. Add IOStatistics API Change-Id: Ic5c4b014f90f1ed61c37d3ce9b17290736475047
This is to allow hadoop-aws to compile without needing to move to the moved interfaces; it will also allow anything external which used those methods (unlikely) to keep working. Also: throw synchronized at methods to get findbugs to STFU. Tempting just to turn it off on the basis that it is overkill, but it could maybe be correct. Making all of MeanStatistics synchronized is irritating, as it will hurt performance on what should be a lightweight class. But it is needed to ensure samples and sum are consistently set. Change-Id: I4c3e2726e1e97d705c55dbb43a507ea4d0e81e28
Change-Id: I5f64704a82a196fd8ff66cbde10d4970722e1fd7
Change-Id: I3e9bb83bc32eddc5c7d84c1df7be77cea3e60f5d
-remove superfluous javadocs -names of inner classes are, where possible, same as the old one. -storeContext reverts to instrumentation as name of field of statistics context; reduces diff slightly Change-Id: I80bc65376f0d87bc9c38f265b38b1ee834c17d96
- failures are registered and reported differently - helper in statistics.impl to wrap functions and callable with handling for this Move BufferedIOStatistics streams from impl to fs.statistics, so its declared OK for others to use counter increment to ignore negative values; returns existing value. successful increment returns the latest value, as documented. Tests for the counter and gauge behaviour Change-Id: Ic851240ebe94033bf93e0a11abefb7adda534191
Change-Id: I5d7237d844de94a1620356a1a45bb08a7515d768
on hasNext/Next, close() is called, so as to cleanup any open state/handles etc. Avoids having to tell developers downstream to look for an iterator being closeable, and call it if so. + minor improvements in iterator test suite + fix typo found by Mehakmeet Change-Id: Ibd103941278b8c0bc6c93a09ea469be4c60a58b1
+test changes add verification that IOStatistics propagate all the way through the tombstone reconciling chain Change-Id: I57260cf0d1f512dff01f8a3aa54427ec9d321877
+ add class which can be used to store duration results, primarily for testing. Made public for others to play with too Change-Id: I0a7f274f5a2a180e1800102be177b308050725c0
* move the methods in fs.impl.FutureIOSupport which turn out to be needed when working with the async code to the public utils.functional package * duration tracking invocation in IOStatisticsBinding tuning based on the ABFS coding experience. Change-Id: Ide9467fa67a8a5d3d8daec94a7a39ff87f106a47
* checkstyles * operation cost test setup failing if Statistic wasn't mapped to a counter Change-Id: I4445c32e0dc3c84064ad2a4ee6a2a9fe955c567c
Fix checkstyle in RemoteIOException unwrapping of exception looks for Error and rethrows Change-Id: I673b1e487cae2ec3204ce6ba1103cdae14cfe48c
+ IOStatisticsAssertions include checks for null IOStatistics instance passed in Not sure that we need the overkill of a wrapper for that type casting, but as we couldn't get away without it in HADOOP-17281, I've gone with it. Change-Id: I3ad4457d63860ac55c6dc7a6d22100dd86e12bf1
Rebase to trunk, fix up listStatusIterable, using the new type casting wrapper in RemoteIterators Change-Id: I158d09863d0dac6866a7c2f1e9130da8645dd82b
* the trackDuration(DurationTrackerFactory,...) methods support null factory and switch to a stub tracker * there's an overloaded logIOStatisticsAtDebug() method to use the IOStatisticsLogging own logger Change-Id: Ie786dc7aa8920a3fc8b22b55916f4b811810dab3
* PairedDurationTrackerFactory allows listing code to update both FS and instance level stats. * test assertions to become verifyStatistic* assertStatistic* to make clear these are statistic assertions; also helps code completion * stream_read_unbuffered stream key Change-Id: I25272fffeb3e66b5cec90ae6c6af74808c139b26
* S3AInstrumentation to have IOStatisticsStore for aggregating values * as counters &c on the FS are incrememented, so is that instanceIOStatistics * when streams are merged back in close, they are updated too * S3AFileSystem.toString() to print the stats * All test suites to collect & log stats in @afterclass * New test run @ end of all suites to print the aggregate stats of that test process. Printing out the aggregate stats is interesting, even though test runs are unrepresentative of real work. Makes me think about what a command line tool to actually aggregate a list of stats files to produce a final one. Change-Id: I24ea0ddd3b64c8233f1bdc1d021b98503df6e914
Change-Id: I911c74d77809038d0849ae90835fa3b88ff08b55
0d26d5d
to
6bfbd94
Compare
the code to catch/wrap/unwrap IOEs in ansyc logic -> java's own UncheckedIOException. Broadly more usable. Change-Id: I61a6194f952448da01ab48badae0927a243a53b2
changes in hadoop-common Change-Id: Id32d456004eb3ea9a6e0bab154fa41ca2f4e2623
This comment has been minimized.
This comment has been minimized.
test run failure
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @steveloughran, this will be very useful in production. I haven't used the new API itself, so I just making my code review based on reading the code.
I added some comments based on that.
throws IOException { | ||
Preconditions.checkArgument(StringUtils.isNotEmpty(uploadId), | ||
"empty/null upload ID: "+ uploadId); | ||
Preconditions.checkArgument(parts != null, | ||
"No uploaded parts list"); | ||
Preconditions.checkArgument(!parts.isEmpty(), | ||
"No uploaded parts to save"); | ||
|
||
// put a 0-byte file with the name of the original under-magic path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it is needed to add this with the IOStat feature? is it related somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's always done. I just added the docs for people to understand what is going on. We put the file so that code which expects a file to exist after close() is satisified. Some code in spark does. Luckily nothing ever tries to read the data until the job is committed.
* @param pathStyleAccess enable path style access? | ||
* @return new AmazonS3 client | ||
*/ | ||
private AmazonS3 buildAmazonS3Client( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated. Please explain why this change is needed for the stat API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to collect AWS stats too. I couldn't actually get it to work so the final bit of wiring is left for HADOOP-13551.
The stats collection classes for AWS are there. But we need to switch to the builder API For the S3 client. And when you do that, things stop working. I concluded that was a separate PR to follow. (See! I tried to keep things under control!)
* @param count count of exceptions | ||
*/ | ||
void exceptionInMultipartComplete(int count) { | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could keep the javadocs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed everywhere where some more detail in javadocs can go, added
*/ | ||
void exceptionInMultipartAbort() { | ||
exceptionsInMultipartFinalize.incrementAndGet(); | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could keep the javadocs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Get the number of bytes pending upload. | ||
* @return the number of bytes in the pending upload state. | ||
*/ | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could keep the javadocs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* to be called at the end of the write. | ||
* @param size size in bytes | ||
*/ | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could keep the javadocs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/** | ||
* Output stream has closed. | ||
* Trigger merge in of all statistics not updated during operation. | ||
*/ | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could keep the javadocs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the implementation-specific detail -the stuff which isn't just a copy and paste of the interface text
# This causes all remote iterator stats | ||
# to be logged when the RemoteIterators.foreach() method is | ||
# invoked | ||
log4j.logger.org.apache.hadoop.util.functional.RemoteIterators=DEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe info level logging is better here, or this is needed to show all stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( i know this is for testing )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, its too much. Sorry, must have had it set up for me and missed I was committing it
@InterfaceAudience.Private | ||
@InterfaceStability.Unstable | ||
public class WrappedIOException extends RuntimeException { | ||
public class WrappedIOException extends UncheckedIOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice touch, we were talking about this exception is a more narrow kind of runtime exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only found out about this exception recently. We can throw it in our APIs without feeling guilty. Not our fault java insists on checked exceptions
*/ | ||
@InterfaceAudience.Public | ||
@InterfaceStability.Unstable | ||
public final class RemoteIterators { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a RemoteIterator class in hadoop. Can we extend that one, or is there a reason to create a whole new one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a class to work with RemoteIterator instances
- A set of chainable iterators to filter, transform and map from java Iterable to remote Iterator. If you look at S3a's Listing class, we wrap iterators this way. RemoteIterators makes this more generic
- And the iterators pass down IOStatistics all the way to the bottom. This is needed so we can get the final IOStats through those FS listing APIs.
Change-Id: I0a5d24689bdfaea9d4976abc03f1680bea7fda74
Going to close this PR and re-open a new one rebased to trunk. Sorry |
Contains
HADOOP-16830. Add IOStatistics API
This is the aggregate branch which also contains #2323; it supercedes #2069