-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Properties file corruption caused by write failure #11835
Comments
The update to properties file should be atomic, and we already do that for
+1 for this, we need to strenthen the handling of the properties file exception for the invoker. |
@danny0405
|
We have a check-sum in the properties file. |
@danny0405 Sounds good. Can I optimize the decision-making process here? |
Sure, would be glad to review your fix. |
@Ytimetravel Did you got a chance to work on this? Do we have any JIRA for the same? |
sorry, I am not sure if I fully understand how exactly we got into corrupt state. From what I see createMetaClient(true) fails. But if we chase the chain of calls, its ends up with
which actually accounts for reading from either of back up or original property file. can you help me understand a bit more. |
Describe the problem you faced
Dear community,
Recently I discovered a case: a write failure can cause the hoodi.properties file corrupted.
Problem site:
It causes other write tasks to fail.
The process in which this situation occurs is as follows:
File status:properties error(len=0) properties_backup error-free
File status:properties error(len=0) properties_backup removed
I think that we should not only check if the hoodie.properties file exists when performing recoverIfNeeded, we need more information to ensure that the hoodie.properties file is correct, rather than directly skipping file processing and deleting the backup file.
Any suggestion?
Environment Description
Hudi version : 0.14.0
Spark version :2.4
Hadoop version :2.6
Storage (HDFS/S3/GCS..) :HDFS
Stacktrace
Caused by: org.apache.hudi.exception.HoodieException: Error updating table configs.
at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:91)
at org.apache.hudi.internal.HoodieDataSourceInternalWriter.commit(HoodieDataSourceInternalWriter.java:91)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76)
... 69 more
Suppressed: java.lang.IllegalArgumentException: hoodie.table.name property needs to be specified
at org.apache.hudi.common.table.HoodieTableConfig.generateChecksum(HoodieTableConfig.java:523)
at org.apache.hudi.common.table.HoodieTableConfig.getOrderedPropertiesWithTableChecksum(HoodieTableConfig.java:321)
at org.apache.hudi.common.table.HoodieTableConfig.storeProperties(HoodieTableConfig.java:339)
at org.apache.hudi.common.table.HoodieTableConfig.modify(HoodieTableConfig.java:438)
at org.apache.hudi.common.table.HoodieTableConfig.delete(HoodieTableConfig.java:481)
at org.apache.hudi.table.upgrade.UpgradeDowngrade.run(UpgradeDowngrade.java:151)
at org.apache.hudi.client.BaseHoodieWriteClient.tryUpgrade(BaseHoodieWriteClient.java:1399)
at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1255)
at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1296)
at org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:769)
at org.apache.hudi.internal.DataSourceInternalWriterHelper.abort(DataSourceInternalWriterHelper.java:99)
at org.apache.hudi.internal.HoodieDataSourceInternalWriter.abort(HoodieDataSourceInternalWriter.java:96)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:82)
... 69 more
Caused by: org.apache.hudi.exception.HoodieIOException: Error updating table configs.
at org.apache.hudi.common.table.HoodieTableConfig.modify(HoodieTableConfig.java:466)
at org.apache.hudi.common.table.HoodieTableConfig.update(HoodieTableConfig.java:475)
at org.apache.hudi.common.table.HoodieTableConfig.setMetadataPartitionState(HoodieTableConfig.java:816)
at org.apache.hudi.common.table.HoodieTableConfig.clearMetadataPartitions(HoodieTableConfig.java:847)
at org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataTable(HoodieTableMetadataUtil.java:1396)
at org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataTable(HoodieTableMetadataUtil.java:275)
at org.apache.hudi.table.HoodieTable.maybeDeleteMetadataTable(HoodieTable.java:995)
at org.apache.hudi.table.HoodieSparkTable.getMetadataWriter(HoodieSparkTable.java:116)
at org.apache.hudi.table.HoodieTable.getMetadataWriter(HoodieTable.java:947)
at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:359)
at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:285)
at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236)
at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:211)
at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:88)
... 71 more
Caused by: java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline
at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:3520)
at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:3498)
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:3690)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:3625)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:80)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:115)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:80)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:115)
at org.apache.hudi.common.fs.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:75)
at org.apache.hudi.common.table.HoodieTableConfig.modify(HoodieTableConfig.java:449)
... 84 more
The text was updated successfully, but these errors were encountered: