Skip to content

HADOOP-18526. Leak of S3AInstrumentation instances via hadoop Metrics references #5144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

steveloughran
Copy link
Contributor

This has triggered an OOM in a process which was churning through s3a fs instances; the increased memory footprint of IOStatistics amplified what must have been a long standing issue.

  • Makes sure instrumentation is closed when fs is closed
  • Uses a weak reference from metrics to instrumentation, so even if the fs wasn't closed (see HADOOP-18478), this ref would not be holding things back.

How was this patch tested?

cloud tests. no new tests...if there are suggestions, that'd be good

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran
Copy link
Contributor Author

tested, s3 london, parallel @ 8. one failure -created HADOOP-18531 for that

@steveloughran
Copy link
Contributor Author

testing ideas

  • scale test to create many fs instances in parallel and verify all is good...need to make sure metrics is turned on first
  • add probe method for state of s3a statistics; check after fs is closed
  • make sure iostats from closed fs are valid
  • see what happens when you call any stats collecting operation after the fs is closed

@steveloughran
Copy link
Contributor Author

latest test run s3 london, params -Dparallel-tests -DtestsThreadCount=10 -Dscale; all good.

new test to verify that instrumentation.close() unregisters on the first call and is no-op on the second...both unit test for standalone and itest for lifecycle through s3afs.

that showed that instrumentation is being unregistered on close(), so if leakage is reported, it's from something creating fs instances but not closing them. the weak refs will ensure that the instrumentation isn't being held on -but thread pools are still there to consume resources.

@steveloughran
Copy link
Contributor Author

want to add one more feature here; s3a fs initialize() to log stack trace at TRACE; helps to track down what is creating fs instances and not cleaning up properly

@steveloughran
Copy link
Contributor Author

next pr logs stack at trace

2022-11-30 17:38:32,062 [setup] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:initialize(460)) - Initializing S3AFileSystem for stevel-london
2022-11-30 17:38:32,062 [setup] TRACE s3a.S3AFileSystem (S3AFileSystem.java:initialize(464)) - Filesystem for s3a://stevel-london/ creation stack
java.lang.RuntimeException: org.apache.hadoop.fs.s3a.S3AFileSystem@56b21fd5
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:465)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3595)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:171)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3696)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3647)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:555)
	at org.apache.hadoop.fs.contract.AbstractBondedFSContract.init(AbstractBondedFSContract.java:72)
	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:189)
	at org.apache.hadoop.fs.s3a.AbstractS3ATestBase.setup(AbstractS3ATestBase.java:111)
	at org.apache.hadoop.fs.s3a.ITestS3AUnbuffer.setup(ITestS3AUnbuffer.java:61)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:750)

@steveloughran steveloughran force-pushed the s3/HADOOP-18526-s3a-metrics-leak branch from 9811c0b to 5f34529 Compare December 1, 2022 11:07
@apache apache deleted a comment from hadoop-yetus Dec 1, 2022
@apache apache deleted a comment from hadoop-yetus Dec 1, 2022
@steveloughran
Copy link
Contributor Author

tested aws london w/ -Dparallel-tests -DtestsThreadCount=10 -Dmarkers=delete

s3a fs was set to log at TRACE so print stack traces on all fs creation...just search for "creation stack" in the output to look. no problems when this tracing is enabled.

saw HADOOP-18351 again, so have just rebased for future runs.

think this is ready for review now

@steveloughran
Copy link
Contributor Author

build vms playing up "Resource temporarily unavailable"

… references

This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long standing issue.

* Makes sure instrumentation is closed when fs is closed
* Uses a weak reference from metrics to instrumentation, so even
  if the fs wasn't closed (see HADOOP-18478), this ref would not
  be holding things back.

Change-Id: I3324d4bcfd9fb1770691bfca2d28ab0183faa27d
… references

Add tests that instrumentation.close() unregisters *once*.

reverted the change to close() it in s3afs.close(), because that was
already done in stopAllServices().

if something is leaking S3AInstrumentation instances, it is because
it is creating fs instances outside the cache, and not closing them.
once dereferenced and a GC triggered, the fs instance and most
internal classes are GC'd, but not the instrumentation because of
the back ref.

This weakref link will do that, but will result in the metrics structure
being filled up with lots of weakrefs which don't get cleaned up.
It's a lot less expensive than strong refs, but still not ideal.
Fixing that would be tricky as the base metrics class or a subclass
would need to do cleanups (when?).

Change-Id: I07fe792301e23ed1d3676fbb8cc033c5a96e55a2
off by default...test/log4j.properties has it commented out

Change-Id: Ie83b74009ccae8869b2c5386206b913e0f76af2f
@steveloughran steveloughran force-pushed the s3/HADOOP-18526-s3a-metrics-leak branch from 5f34529 to f8cdd3d Compare December 1, 2022 17:13
* use the weak ref when registering
* log state of cache.disabled in trace-level constructor

The reason for the logging is that running out of memory
due to instrumentation refs is probably a sign of s3a fs instances
being created and not closed.
HADOOP-18476 should partially address this through FileContext;
the trace is there to identify other possible causes.

Change-Id: If78d16e8bfc63eea93ec328543ab110f82c93d90
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 21s Maven dependency ordering for branch
+1 💚 mvninstall 25m 41s trunk passed
+1 💚 compile 23m 12s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 20m 34s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 3m 42s trunk passed
+1 💚 mvnsite 2m 36s trunk passed
-1 ❌ javadoc 1m 14s /branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.
+1 💚 javadoc 1m 35s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 3m 57s trunk passed
+1 💚 shadedclient 20m 51s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 1m 33s the patch passed
+1 💚 compile 22m 20s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 22m 20s the patch passed
+1 💚 compile 20m 23s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 20m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 32s the patch passed
+1 💚 mvnsite 2m 36s the patch passed
-1 ❌ javadoc 1m 5s /patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.
+1 💚 javadoc 1m 35s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 4m 6s the patch passed
+1 💚 shadedclient 21m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 18s hadoop-common in the patch passed.
+1 💚 unit 2m 53s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 59s The patch does not generate ASF License warnings.
225m 42s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5144/6/artifact/out/Dockerfile
GITHUB PR #5144
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 5c3da2b5ff42 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4897454
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5144/6/testReport/
Max. process+thread count 1320 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5144/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@mukund-thakur mukund-thakur self-requested a review December 7, 2022 19:40
@apache apache deleted a comment from hadoop-yetus Dec 7, 2022
@apache apache deleted a comment from hadoop-yetus Dec 7, 2022
@apache apache deleted a comment from hadoop-yetus Dec 7, 2022
Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with one comment.

Makes sure instrumentation is closed when fs is closed
I can't find where is instrumentation getting closed in fs.close()

// for tracking down memory leak issues.
LOG.trace("Filesystem for {} created; fs.s3a.impl.disable.cache = {}",
name, originalConf.getBoolean("fs.s3a.impl.disable.cache", false),
new RuntimeException(super.toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why RTE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't throw it, just trace it. it can be anything. what is your suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just print it? I mean I don't understand the reason behind wrapping in RuntimeEx.
Also base FileStystem doesn't implement toString() so there won't be anything. Why not use this.tpString()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. generating that runtime exception creates a stack trace which then gets logged in the trace call, so we can see who is creating the fs instance
  2. calling super.toString() so we get the minimal instance id rather than the expanded tostring with all the stats and stuff. This is initialize, most of those values are unset so it would only confuse.

it is that stack trace which is the most critical thing as it lets us track down where FS instances are being created and then not closed.

@steveloughran
Copy link
Contributor Author

I can't find where is instrumentation getting closed in fs.close()

done in stopAllServices()

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good along with the tests.

// robust extract and convert to string
LOG.debug("Statistics for {}: {}", uri,
IOStatisticsLogging.ioStatisticsToPrettyString(getIOStatistics()));
// log IO statistics, including of any file deletion during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means "including iostatistics of any file deletion..." so IMO it's valid

Extended S3AInstrumentation and WeakRefMetricsSource to allow for
the test to verify that the link from metrics to instrumentation
works and is through the weak reference

Change-Id: I4fc156dfa60d7c7a868d8b765777c0f445a83257
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 20s Maven dependency ordering for branch
+1 💚 mvninstall 28m 39s trunk passed
+1 💚 compile 25m 9s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 compile 21m 40s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 1s trunk passed
+1 💚 mvnsite 2m 25s trunk passed
+1 💚 javadoc 1m 35s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 17s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 56s trunk passed
+1 💚 shadedclient 23m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 23s Maven dependency ordering for patch
+1 💚 mvninstall 1m 31s the patch passed
+1 💚 compile 24m 29s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javac 24m 29s the patch passed
+1 💚 compile 21m 43s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 javac 21m 43s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 51s /results-checkstyle-root.txt root: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3)
+1 💚 mvnsite 2m 22s the patch passed
+1 💚 javadoc 1m 27s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 18s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 4s the patch passed
+1 💚 shadedclient 23m 46s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 19s hadoop-common in the patch passed.
+1 💚 unit 2m 51s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
238m 31s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5144/7/artifact/out/Dockerfile
GITHUB PR #5144
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 594476bdd14f 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 627910b
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5144/7/testReport/
Max. process+thread count 1247 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5144/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

// for tracking down memory leak issues.
LOG.trace("Filesystem for {} created; fs.s3a.impl.disable.cache = {}",
name, originalConf.getBoolean("fs.s3a.impl.disable.cache", false),
new RuntimeException(super.toString()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't throw it, just trace it. it can be anything. what is your suggestion?

// robust extract and convert to string
LOG.debug("Statistics for {}: {}", uri,
IOStatisticsLogging.ioStatisticsToPrettyString(getIOStatistics()));
// log IO statistics, including of any file deletion during
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means "including iostatistics of any file deletion..." so IMO it's valid

@steveloughran
Copy link
Contributor Author

i am happy with this patch and want to get it in to 3.3.5. please can i have reviewers tell me if they are happy now. thx

@steveloughran
Copy link
Contributor Author

i am still happy with this patch and want to get it in to 3.3.5. please can i have reviewers tell me if they are happy now. thx

Copy link
Contributor

@ashutoshcipher ashutoshcipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final patch looks good to me +1

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mukund-thakur
Copy link
Contributor

@steveloughran
Copy link
Contributor Author

@mukund-thakur tried to clarify: the exception is created to build the stack trace, which is what we really want.

@steveloughran steveloughran merged commit aaf92fe into apache:trunk Dec 14, 2022
Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM,
Thanks @steveloughran for explaining. Now I understand, the whole intent was to find the caller. Lol, I was being dumb.

asfgit pushed a commit that referenced this pull request Dec 15, 2022
… references (#5144)

This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long-standing issue with FS instances being created
and not closed()

*  Makes sure instrumentation is closed when the FS is closed.
*  Uses a weak reference from metrics to instrumentation, so even
   if the FS wasn't closed (see HADOOP-18478), this back reference
   would not cause the S3AInstrumentation reference to be retained.
*  If S3AFileSystem is configured to log at TRACE it will log the
   calling stack of initialize(), so help identify where the
   instance is being created. This should help track down
   the cause of instance leakage.

Contributed by Steve Loughran.
asfgit pushed a commit that referenced this pull request Dec 15, 2022
… references (#5144)

This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long-standing issue with FS instances being created
and not closed()

*  Makes sure instrumentation is closed when the FS is closed.
*  Uses a weak reference from metrics to instrumentation, so even
   if the FS wasn't closed (see HADOOP-18478), this back reference
   would not cause the S3AInstrumentation reference to be retained.
*  If S3AFileSystem is configured to log at TRACE it will log the
   calling stack of initialize(), so help identify where the
   instance is being created. This should help track down
   the cause of instance leakage.

Contributed by Steve Loughran.
@steveloughran
Copy link
Contributor Author

Lol, I was being dumb.
no comment

slfan1989 pushed a commit to slfan1989/hadoop that referenced this pull request Dec 20, 2022
… references (apache#5144)


This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long-standing issue with FS instances being created
and not closed()

*  Makes sure instrumentation is closed when the FS is closed.
*  Uses a weak reference from metrics to instrumentation, so even
   if the FS wasn't closed (see HADOOP-18478), this back reference
   would not cause the S3AInstrumentation reference to be retained.
*  If S3AFileSystem is configured to log at TRACE it will log the
   calling stack of initialize(), so help identify where the
   instance is being created. This should help track down
   the cause of instance leakage.

Contributed by Steve Loughran.
kkolman pushed a commit to Datameer-Inc/hadoop that referenced this pull request Jul 12, 2023
… references (apache#5144)

This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long-standing issue with FS instances being created
and not closed()

*  Makes sure instrumentation is closed when the FS is closed.
*  Uses a weak reference from metrics to instrumentation, so even
   if the FS wasn't closed (see HADOOP-18478), this back reference
   would not cause the S3AInstrumentation reference to be retained.
*  If S3AFileSystem is configured to log at TRACE it will log the
   calling stack of initialize(), so help identify where the
   instance is being created. This should help track down
   the cause of instance leakage.

Contributed by Steve Loughran.

(cherry picked from commit aaf92fe)
kkolman pushed a commit to Datameer-Inc/hadoop that referenced this pull request Jul 12, 2023
… references (apache#5144)

This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long-standing issue with FS instances being created
and not closed()

*  Makes sure instrumentation is closed when the FS is closed.
*  Uses a weak reference from metrics to instrumentation, so even
   if the FS wasn't closed (see HADOOP-18478), this back reference
   would not cause the S3AInstrumentation reference to be retained.
*  If S3AFileSystem is configured to log at TRACE it will log the
   calling stack of initialize(), so help identify where the
   instance is being created. This should help track down
   the cause of instance leakage.

Contributed by Steve Loughran.

(cherry picked from commit aaf92fe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants