Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Fix](catalog)Remove the fs.disable.cache parameter to prevent excess…
…ive FS-associated objects and memory leaks (apache#46184) ### Background In the current file system implementation, the fs.disable.cache parameter allows disabling FS caching. While this provides flexibility, it introduces several critical issues: ``` 1: 22537201 721190432 java.util.HashMap$Node 2: 21559238 689895616 javax.management.MBeanAttributeInfo 3: 21559098 517418352 javax.management.Attribute 4: 19380247 465125928 org.apache.hadoop.metrics2.impl.MetricCounterLong 5: 122603 461180096 [J 6: 294309 255533536 [B 7: 724598 252264048 [Ljava.lang.Object; 8: 2012368 189047432 [C 9: 159442 131064400 [Ljava.util.HashMap$Node; 10: 114752 88075072 [Ljavax.management.MBeanAttributeInfo; 11: 1899581 45589944 java.lang.String 12: 1720140 41283360 org.apache.hadoop.metrics2.impl.MetricGaugeLong ``` #### Unbounded FS Instance Creation When fs.disable.cache=true, a new FS instance is created for every access, preventing instance reuse. ``` String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme); if (conf.getBoolean(disableCacheName, false)) { LOGGER.debug("Bypassing cache to create filesystem {}", uri); return createFileSystem(uri, conf); } ``` #### Resource Leakage Associated objects, such as thread metrics and connection pools, are not properly released due to excessive FS instance creation, leading to memory leaks. #### Performance Degradation Frequent creation and destruction of FS instances impose significant overhead, especially in high-concurrency scenarios. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [x] Manual test (add detailed scripts or steps below) ``` CREATE CATALOG `iceberg_cos` PROPERTIES ( "warehouse" = "cosn://ha/ha/ha/stress/multi_fs", "type" = "iceberg", "iceberg.catalog.type" = "hadoop", "cos.secret_key" = "*XXX", "cos.region" = "ap-beijing", "cos.endpoint" = "cos.ap-beijing.myqcloud.com", "cos.access_key" = "**************" ); Create a catalog using object storage, then write a scheduled script to continuously refresh the catalog. Query the catalog periodically and monitor whether the thread memory behaves as expected. ``` <img width="1131" alt="image" src="https://github.com/user-attachments/assets/c7b04a5a-449f-432c-975b-524fdb81247a" /> At 22:30, I replaced it with the fixed version.
- Loading branch information