Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flink 34569][e2e] fail fast if AWS cli container fails to start #24491

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions flink-end-to-end-tests/test-scripts/common_s3_operations.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,23 @@
# AWSCLI_CONTAINER_ID
###################################
function aws_cli_start() {
export AWSCLI_CONTAINER_ID=$(docker run -d \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local CONTAINER_ID
CONTAINER_ID=$(docker run -d \
--network host \
--mount type=bind,source="$TEST_INFRA_DIR",target=/hostdir \
-e AWS_REGION -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY \
--entrypoint python \
-it banst/awscli)
if [ $? -ne 0 ]; then
echo "running aws cli container failed"
if [ -n "$CONTAINER_ID" ]
then
docker kill "$CONTAINER_ID"
docker rm "$CONTAINER_ID"
fi
return 1
fi
export AWSCLI_CONTAINER_ID="$CONTAINER_ID"

while [[ "$(docker inspect -f {{.State.Running}} "$AWSCLI_CONTAINER_ID")" -ne "true" ]]; do
sleep 0.1
Expand All @@ -58,7 +69,11 @@ function aws_cli_stop() {
if [[ $AWSCLI_CONTAINER_ID ]]; then
aws_cli_stop
fi
aws_cli_start
aws_cli_start || aws_cli_start
Copy link
Contributor

@JingGe JingGe Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will simple retry like this let the script be executed multiple times without proper cleanup of previous container, given the aws_cli_start is not Idempotent, afaiu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that docker run -d ... with detach did not create a container if the exit code is non-zero. So failed aws_cli_start shouldn't leave something to be cleaned. But I can't find docs stating that.

Happy to remove the retry, getting the test to fail faster with a more obvious cause is an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added in a failsafe to kill/remove if CONTAINER_ID is non-empty after the docker run fails.

if [ $? -ne 0 ]; then
echo "running the aws cli container failed"
exit 1
fi

###################################
# Runs an aws command on the previously started container.
Expand Down Expand Up @@ -135,7 +150,7 @@ function s3_get_number_of_lines_by_prefix() {

# find all files that have the given prefix
parts=$(aws_cli s3api list-objects --bucket "$IT_CASE_S3_BUCKET" --prefix "$1" |
docker run -i stedolan/jq -r '[.Contents[].Key] | join(" ")')
docker run -i --rm stedolan/jq -r '[.Contents[].Key] | join(" ")')

# in parallel (N tasks), query the number of lines, store result in a file named lines
N=10
Expand Down