-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Flink native k8s application mode recovery failed from S3(s3p) savepoint #3013
Comments
Thank you for the detailed feedback. We will work quickly to identify and fix this bug. 💪🔧 |
The above two screenshots are from the Flink 1.17.1 official document In actual testing, the SP address in Figure 2 should use the format 2 in Figure 1 to function properly, but using the format 1 in Figure 1 will fail, which is also the reason for the current error. By reviewing the source code, I understand that the process should be as follows: First, trigger the generation of sp, then streampark will obtain the generated sp from Flink and write it to the database. So if we fix this issue, should we obtain the format 2 in Figure 1 when obtaining sp from Flink, so that the code in other places doesn't need to be changed? |
Thank you for providing the information. We encourage you to fix this bug, how about it? 💪 We need to test and determine the savepoint path rules under different versions of Flink (1.12 ~ 1.17). We warmly welcome you to fix this bug. And we believe you can do it! 👍😊 |
I am willing to fix this bug and am currently reading the relevant code, but due to my limited abilities, it may take some time |
By reviewing the relevant source code and Flink official documents, I believe that the correct savepoint format should be Format 1 in the screenshot So I think this should be a problem with Flink, not Streampark。To prove this, I ran the following test Flink Version 1.17.1
In summary, it should be that flink sp has unexpected behavior when using S3 storage and using the s3p protocol If this is the design goal of flink, then maybe streampark needs to be optimized specifically for this scenario. |
Sorry for taking so long to get back here, based on your description, there is a preliminary suspicion that it might be a bug in Flink. We need further confirmation. If it is true, we can provide feedback to the Flink community. |
Search before asking
Java Version
1.8
Scala Version
2.12.x
StreamPark Version
2.1.1
Flink Version
1.17.1
deploy mode
kubernetes-application
What happened
Flink failed to recover from savepoints that automatically saved by streampark, Through reviewing the logs, it was found that the value of the savepoint submitted during streampark's recovery of the flash is
s3p://lakehouse/flink/sp/Platform-Link-Test-Security Log/savepoint-2b3ed0-f0c7ba51791f
.By checking the logs of the Flink app, it was found that Flink encountered an error when restoring from savepoint
s3p://lakehouse/flink/sp/Platform-Link-Test-Security-Log/savepoint-2b3ed0-f0c7ba51791f
.Afterwards, manually submitting using the same savepoint
s3p://lakehouse/flink/sp/Platform-Link-Test-Security Log/savepoint-2b3ed0-f0c7ba51791f
through common cli encountered the same errorHowever, by modifying the savepoint format to
s3p://lakehouse/flink/sp/Platform-Flink-Test-Security-Log/savepoint-2b3ed0-f0c7ba51791f/_metadata
, both common cli and streampark submissions can be successful.Error Exception
Screenshots
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: