-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException #3341
Conversation
TEST RESULTS HNS Account Location: East US 2
JIRAs for failures: ITestAzureBlobFileSystemLease, testLastModifiedTime |
...adoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsRestOperationException.java
Outdated
Show resolved
Hide resolved
.../hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/oauth2/RetryTestTokenProvider.java
Outdated
Show resolved
Hide resolved
.../hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/oauth2/RetryTestTokenProvider.java
Show resolved
Hide resolved
happy with all the code changes. Now, with the resetting, can the changes to the pom be removed? It isn't needed to fix this problem. |
Regarding the streamStats test pom fix: |
I think you are a bit confused about what parallel/sequential means. I stated this a few days ago; nothing has changed in Maven since then.
The issue is not parallel execution, the issue is state persisting from previous runs. The reason things work in sequential & not parallel is the Before moving into sequential phase, and so making test runs slower, can we work out why the Meaning it could be one of
I think we need to understand more before committing this -otherwise it's not so much a fix for the problem as a "we changed this and it went away" kind of PR. It probably is just some state left over from the previous test run -but there is a risk of it being something real. |
I agree with the fact that any given process executes tests in a sequential manner. Some findings: For one failing scenario: Therefore, one way this could have happened is that the two tests (possibly along with any other test class involving read) were running in different processes, but around the same time. This resulted in these tests modifying the same statistics variable, which could also explain the drop in read count despite the test having executed the expected number of read ops - the statistics reset was called in one test while the other test was in the middle of executing read. |
where is this statistics variable? It is some counter in the remote FS, or a field in the process running JUnit? |
OK, I have now discovered something while trying to track down why test running of the abfs integration tests in MAPREDUCE-7341 fail. the hadoop-azure pom does request parallel method execution
I think this means that different processes do run different methods in the parallel phase. This actually breaks anything which expects static state to be shared across different threads in the same test case, and also means that tests where Turning this feature off gets those tests to work. try commenting it out and see what difference it makes for you. I would like to change it to the same that other parallel runs have: all methods in a TestCase is run in the same process, and after that test case has finished, the process ends. I'll make it a separate change from my PR, but will include it my PR as well, for completeness. (thought it was some junit5 thing; not so) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now I know that abfs test run really does parallelise methods in same suite and process (which is fast but brittle), you do need to isolate this.
all good apart from the . to keep javadocs quiet
public static void ResetStatusToFirstTokenFetch() { | ||
/** | ||
* Clear earlier retry details and reset RetryTestTokenProvider instance to | ||
* state of first access token fetch call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
When parallel is set to "classes" (instead of "both"), the StreamStats test passes even with the failure scenario induced by dummy test, as the two tests are run sequentially in the process. However, I guess there might be occasional failures if a different class reads/writes simultaneously in a different process; could not reproduce failure though.
Have made a minor correction to the pom file; the test will be excluded from the parallel run (currently classesandMethods/both) and executed separately along with a bunch of other integration tests that need to be run sequentially
d540d5a
to
dc2f22e
Compare
This is ready to go in, with one little change needed javadocs in RetryTestTokenProvider.java to add "." at the end of the first sentence, even if there's only one sentence in the docs. some versions of javadoc completely break here, so it's not just a little style thing, it's a "stop the build failing" change thanks |
Hi @steveloughran, I had added the "." post your earlier review and can see it in the PR. Could you please check again if it's been updated? |
…cs and ITestAbfsRestOperationException (#3341) Addresses transient failures in the following test classes: * ITestAbfsStreamStatistics: Uses a filesystem level static instance to record read/write statistics, which also tracks these operations in other tests running in parallel. Marked for sequential-only run to avoid transient failure * ITestAbfsRestOperationException: The use of a static member to track retry count causes transient failures when two tests of this class happen to run together. Switch to non-static variable for assertions on retry count closes #3341 Contributed by Sumangala Patki Change-Id: Ied4dec35c81e94efe5f999acae4bb8fde278202e
afraid i had to revert this change as the build now insists that the @VisibleForTesting annotation refers to the hadoop one. |
…cs and ITestAbfsRestOperationException (#3699) Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute Contributed by Sumangala Patki
…cs and ITestAbfsRestOperationException (#3699) Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute Contributed by Sumangala Patki
…cs and ITestAbfsRestOperationException (apache#3341) Addresses transient failures in the following test classes: * ITestAbfsStreamStatistics: Uses a filesystem level static instance to record read/write statistics, which also tracks these operations in other tests running in parallel. Marked for sequential-only run to avoid transient failure * ITestAbfsRestOperationException: The use of a static member to track retry count causes transient failures when two tests of this class happen to run together. Switch to non-static variable for assertions on retry count closes apache#3341 Contributed by Sumangala Patki
…Statistics and ITestAbfsRestOperationException (apache#3341)" This reverts commit 82658a2.
…cs and ITestAbfsRestOperationException (apache#3699) Successor for the reverted PR apache#3341, using the hadoop @VisibleForTesting attribute Contributed by Sumangala Patki
PR addresses transient failures in the following test classes: