Skip to content

HADOOP-17271. S3A statistics to support IOStatistics #2324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steveloughran
Copy link
Contributor

Contains
HADOOP-16830. Add IOStatistics API

This is the aggregate branch which also contains #2323; it supercedes #2069

@liuml07
Copy link
Member

liuml07 commented Sep 21, 2020

156 files being changed makes this a fun patch. I can review this or next week (if not committed yet). Don't get blocked by my review 😄

@liuml07 liuml07 self-requested a review September 21, 2020 22:00
@steveloughran steveloughran force-pushed the s3/HADOOP-16830-iostatistics-all branch from 80bf3c6 to fe897b4 Compare September 22, 2020 14:15
@steveloughran
Copy link
Contributor Author

@liuml07 it's split in hadoop-common API and the changes in S3A, where S3A reworks a lot of the statistics code in general (move to interfaces over single implementations in S3AInstrumentation, using the IOStatistics as the way stats are collected and reported).

Yes, it touches a lot of files. Look at #2323 first, as that is what we expect applications to use and for other instrumented bits of code (ABFS) to pick up. FWIW I have a PoC parquet hadoop reader which collects this stuff, that's how I've been seeing how well the API works in practise

@steveloughran steveloughran force-pushed the s3/HADOOP-16830-iostatistics-all branch from fe897b4 to 890f7c4 Compare September 23, 2020 11:12
@steveloughran
Copy link
Contributor Author

./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FunctionsRaisingIOE.java:41:   * @deprecated use {@link org.apache.hadoop.util.functional.FunctionRaisingIOE}: Line is longer than 80 characters (found 81). [LineLength]
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FunctionsRaisingIOE.java:54:   * @deprecated use {@link org.apache.hadoop.util.functional.BiFunctionRaisingIOE}: Line is longer than 80 characters (found 83). [LineLength]
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/FunctionsRaisingIOE.java:65:   * @deprecated use {@link org.apache.hadoop.util.functional.CallableRaisingIOE}: Line is longer than 80 characters (found 81). [LineLength]
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/package-info.java:28: * {@link org.apache.hadoop.fs.statistics.IOStatisticsSource#getIOStatistics()} .: Line is longer than 80 characters (found 81). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:1723:      });: 'block rcurly' has incorrect indentation level 6, expected level should be one of the following: 8, 10. [Indentation]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:1740:      });: 'block rcurly' has incorrect indentation level 6, expected level should be one of the following: 8, 10. [Indentation]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/PendingSet.java:222:  public void setIOStatistics(final IOStatisticsSnapshot ioStatistics) {:58: 'ioStatistics' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SinglePendingCommit.java:450:  public void setIOStatistics(final IOStatisticsSnapshot ioStatistics) {:58: 'ioStatistics' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/files/SuccessData.java:352:  public void setIOStatistics(final IOStatisticsSnapshot ioStatistics) {:58: 'ioStatistics' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/NetworkBinding.java:57:   * If {@code SSLConnectionSocketFactory} cannot be found on the classpath, the value: Line is longer than 80 characters (found 86). [LineLength]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContextBuilder.java:116:      final S3AStatisticsContext statisticsContext) {:34: 'statisticsContext' hides a field. [HiddenField]
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/statistics/CommitterStatistics.java:21:import org.apache.hadoop.fs.statistics.IOStatisticsSource;:8: Unused import - org.apache.hadoop.fs.statistics.IOStatisticsSource. [UnusedImports]

I'm going to update with those fixes, and then actually move to a leaner version of this PR

  • no rework of stats context/interfaces
  • just make S3AInstrumentation classes all serve up IOStatistics, with common base classes

This will massively reduce the s3a code diff, and meets the core goal of "easier for apps to pick up statistics of application use".

@apache apache deleted a comment from hadoop-yetus Sep 29, 2020
@apache apache deleted a comment from hadoop-yetus Sep 29, 2020
@apache apache deleted a comment from hadoop-yetus Sep 29, 2020
@apache apache deleted a comment from hadoop-yetus Sep 29, 2020
@steveloughran
Copy link
Contributor Author

made slightly less "clean" by retaining use of S3AInstrumentation and implementation classes in some places. Less of a switch to the interfaces, but let's be honest: except in tests, there is only one implementation, so let's stay with that.

@steveloughran steveloughran force-pushed the s3/HADOOP-16830-iostatistics-all branch 2 times, most recently from 115f3a6 to bab0a98 Compare October 8, 2020 14:12
@apache apache deleted a comment from hadoop-yetus Oct 12, 2020
@steveloughran steveloughran force-pushed the s3/HADOOP-16830-iostatistics-all branch from bab0a98 to 362792e Compare October 12, 2020 15:06
@apache apache deleted a comment from hadoop-yetus Oct 14, 2020
@apache apache deleted a comment from hadoop-yetus Oct 14, 2020
Contains
HADOOP-16830. Add IOStatistics API

Change-Id: Ic5c4b014f90f1ed61c37d3ce9b17290736475047
This is to allow hadoop-aws to compile without needing to move to the
moved interfaces; it will also allow anything external which used those
methods (unlikely) to keep working.

Also: throw synchronized at methods to get findbugs to STFU.

Tempting just to turn it off on the basis that it is overkill, but it could
maybe be correct.

Making all of MeanStatistics synchronized is irritating, as it will hurt
performance on what should be a lightweight class. But it is needed to
ensure samples and sum are consistently set.

Change-Id: I4c3e2726e1e97d705c55dbb43a507ea4d0e81e28
Change-Id: I5f64704a82a196fd8ff66cbde10d4970722e1fd7
Change-Id: I3e9bb83bc32eddc5c7d84c1df7be77cea3e60f5d
-remove superfluous javadocs
-names of inner classes are, where possible, same as
 the old one.
-storeContext reverts to instrumentation as name of field
 of statistics context; reduces diff slightly

Change-Id: I80bc65376f0d87bc9c38f265b38b1ee834c17d96
- failures are registered and reported differently
- helper in statistics.impl to wrap functions and callable with handling for this

Move BufferedIOStatistics streams from impl to fs.statistics, so its declared
OK for others to use

counter increment to ignore negative values; returns existing value.
successful increment returns the latest value, as documented.

Tests for the counter and gauge behaviour

Change-Id: Ic851240ebe94033bf93e0a11abefb7adda534191
Change-Id: I5d7237d844de94a1620356a1a45bb08a7515d768
on hasNext/Next, close() is called, so as to cleanup any open state/handles
etc. Avoids having to tell developers downstream to look for an iterator
being closeable, and call it if so.

+ minor improvements in iterator test suite
+ fix typo found by Mehakmeet

Change-Id: Ibd103941278b8c0bc6c93a09ea469be4c60a58b1
+test changes add verification that IOStatistics propagate all the way
through the tombstone reconciling chain

Change-Id: I57260cf0d1f512dff01f8a3aa54427ec9d321877
+ add class which can be used to store duration results, primarily for
testing. Made public for others to play with too

Change-Id: I0a7f274f5a2a180e1800102be177b308050725c0
* move the methods in fs.impl.FutureIOSupport which turn out to be needed
  when working with the async code to the public utils.functional package
* duration tracking invocation in IOStatisticsBinding tuning based on the
  ABFS coding experience.

Change-Id: Ide9467fa67a8a5d3d8daec94a7a39ff87f106a47
* checkstyles
* operation cost test setup failing if Statistic wasn't mapped to a counter

Change-Id: I4445c32e0dc3c84064ad2a4ee6a2a9fe955c567c
Fix checkstyle in RemoteIOException
unwrapping of exception looks for Error and rethrows

Change-Id: I673b1e487cae2ec3204ce6ba1103cdae14cfe48c
+ IOStatisticsAssertions include checks for null IOStatistics instance
passed in

Not sure that we need the overkill of a wrapper for that type casting, but
as we couldn't get away without it in HADOOP-17281, I've gone with it.

Change-Id: I3ad4457d63860ac55c6dc7a6d22100dd86e12bf1
Rebase to trunk, fix up listStatusIterable, using the new
type casting wrapper in RemoteIterators

Change-Id: I158d09863d0dac6866a7c2f1e9130da8645dd82b
* the trackDuration(DurationTrackerFactory,...) methods support null factory
  and switch to a stub tracker
* there's an overloaded logIOStatisticsAtDebug() method to use the
  IOStatisticsLogging own logger

Change-Id: Ie786dc7aa8920a3fc8b22b55916f4b811810dab3
* PairedDurationTrackerFactory allows listing code to update both FS and instance
  level stats.
* test assertions to become verifyStatistic* assertStatistic* to make clear
  these are statistic assertions; also helps code completion
* stream_read_unbuffered stream key

Change-Id: I25272fffeb3e66b5cec90ae6c6af74808c139b26
* S3AInstrumentation to have IOStatisticsStore for aggregating values
* as counters &c on the FS are incrememented, so is that instanceIOStatistics
* when streams are merged back in close, they are updated too
* S3AFileSystem.toString() to print the stats
* All test suites to collect & log stats in @afterclass
* New test run @ end of all suites to print the aggregate stats of that test
  process.

Printing out the aggregate stats is interesting, even though test runs
are unrepresentative of real work. Makes me think about what a command line
tool to actually aggregate a list of stats files to produce a final one.

Change-Id: I24ea0ddd3b64c8233f1bdc1d021b98503df6e914
Change-Id: I911c74d77809038d0849ae90835fa3b88ff08b55
@steveloughran steveloughran force-pushed the s3/HADOOP-16830-iostatistics-all branch from 0d26d5d to 6bfbd94 Compare October 28, 2020 10:28
the code to catch/wrap/unwrap IOEs in ansyc logic -> java's own
UncheckedIOException. Broadly more usable.

Change-Id: I61a6194f952448da01ab48badae0927a243a53b2
changes in hadoop-common

Change-Id: Id32d456004eb3ea9a6e0bab154fa41ca2f4e2623
@hadoop-yetus

This comment has been minimized.

@apache apache deleted a comment from hadoop-yetus Oct 29, 2020
@apache apache deleted a comment from hadoop-yetus Oct 29, 2020
@apache apache deleted a comment from hadoop-yetus Oct 29, 2020
@steveloughran
Copy link
Contributor Author

steveloughran commented Oct 29, 2020

test run failure

 mvit -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep -Ds3guard -Ddynamo  -Dfs.s3a.directory.marker.audit=true -Dscale
[INFO] Running org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB
[ERROR] Tests run: 20, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 49.926 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreAuthoritativeMode
[ERROR] testRenameDirMarksDestAsAuth(org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreAuthoritativeMode)  Time elapsed: 2.451 s  <<< ERROR!
`s3a://stevel-ireland/fork-0002/test/base/auth/testRenameDirMarksDestAsAuth/dest/subdir': Directory is not marked as authoritative in the S3Guard store
	at org.apache.hadoop.fs.s3a.s3guard.AuthoritativeAuditOperation.verifyAuthDir(AuthoritativeAuditOperation.java:111)
	at org.apache.hadoop.fs.s3a.s3guard.AuthoritativeAuditOperation.executeAudit(AuthoritativeAuditOperation.java:183)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreAuthoritativeMode.expectAuthRecursive(ITestDynamoDBMetadataStoreAuthoritativeMode.java:879)
	at org.apache.hadoop.fs.s3a.s3guard.ITestDynamoDBMetadataStoreAuthoritativeMode.testRenameDirMarksDestAsAuth(ITestDynamoDBMetadataStoreAuthoritativeMode.java:555)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

[INFO] Running org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardConcu

@apache apache deleted a comment from hadoop-yetus Nov 16, 2020
@bgaborg bgaborg self-requested a review November 19, 2020 10:33
Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @steveloughran, this will be very useful in production. I haven't used the new API itself, so I just making my code review based on reading the code.
I added some comments based on that.

throws IOException {
Preconditions.checkArgument(StringUtils.isNotEmpty(uploadId),
"empty/null upload ID: "+ uploadId);
Preconditions.checkArgument(parts != null,
"No uploaded parts list");
Preconditions.checkArgument(!parts.isEmpty(),
"No uploaded parts to save");

// put a 0-byte file with the name of the original under-magic path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it is needed to add this with the IOStat feature? is it related somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's always done. I just added the docs for people to understand what is going on. We put the file so that code which expects a file to exist after close() is satisified. Some code in spark does. Luckily nothing ever tries to read the data until the job is committed.

* @param pathStyleAccess enable path style access?
* @return new AmazonS3 client
*/
private AmazonS3 buildAmazonS3Client(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated. Please explain why this change is needed for the stat API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to collect AWS stats too. I couldn't actually get it to work so the final bit of wiring is left for HADOOP-13551.

The stats collection classes for AWS are there. But we need to switch to the builder API For the S3 client. And when you do that, things stop working. I concluded that was a separate PR to follow. (See! I tried to keep things under control!)

* @param count count of exceptions
*/
void exceptionInMultipartComplete(int count) {
@Override
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could keep the javadocs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed everywhere where some more detail in javadocs can go, added

*/
void exceptionInMultipartAbort() {
exceptionsInMultipartFinalize.incrementAndGet();
@Override
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could keep the javadocs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Get the number of bytes pending upload.
* @return the number of bytes in the pending upload state.
*/
@Override
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could keep the javadocs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* to be called at the end of the write.
* @param size size in bytes
*/
@Override
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could keep the javadocs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* Output stream has closed.
* Trigger merge in of all statistics not updated during operation.
*/
@Override
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could keep the javadocs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the implementation-specific detail -the stuff which isn't just a copy and paste of the interface text

# This causes all remote iterator stats
# to be logged when the RemoteIterators.foreach() method is
# invoked
log4j.logger.org.apache.hadoop.util.functional.RemoteIterators=DEBUG
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe info level logging is better here, or this is needed to show all stats?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( i know this is for testing )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, its too much. Sorry, must have had it set up for me and missed I was committing it

@InterfaceAudience.Private
@InterfaceStability.Unstable
public class WrappedIOException extends RuntimeException {
public class WrappedIOException extends UncheckedIOException {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice touch, we were talking about this exception is a more narrow kind of runtime exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found out about this exception recently. We can throw it in our APIs without feeling guilty. Not our fault java insists on checked exceptions

*/
@InterfaceAudience.Public
@InterfaceStability.Unstable
public final class RemoteIterators {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a RemoteIterator class in hadoop. Can we extend that one, or is there a reason to create a whole new one here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a class to work with RemoteIterator instances

  • A set of chainable iterators to filter, transform and map from java Iterable to remote Iterator. If you look at S3a's Listing class, we wrap iterators this way. RemoteIterators makes this more generic
  • And the iterators pass down IOStatistics all the way to the bottom. This is needed so we can get the final IOStats through those FS listing APIs.

Change-Id: I0a5d24689bdfaea9d4976abc03f1680bea7fda74
@steveloughran
Copy link
Contributor Author

Going to close this PR and re-open a new one rebased to trunk. Sorry

@steveloughran steveloughran deleted the s3/HADOOP-16830-iostatistics-all branch December 15, 2020 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants