-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EcsRunTaskOperator log output in UI does not contain all events #40875
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
It seems then it is an issue on MWAA side? |
We worked with AWS support, who reached out to MWAA internally, and they believe that the issue is because of how the AWS provider is pushing logs to Cloudwatch for the ECS log group. Logs within the same logstream for our ECS task are being written to the same timestamp regardless of their execution time, possibly because they are triggered by the same call inside the task. This was not previously the case (e.g. with Airflow 2.5), and is the suspected cause of the error we're seeing. |
Thanks for the context, I'll look into it |
After investigation here is what I think. The logs after a certain point are not in the MWAA task logs because the task stopped before all the logs could have been pushed to the MWAA task logs. Basically the task created a lot of logs in the ECS task logs and stopped right after. In that case, the Airflow task stops as well and dont have time to push logs from ECS to MWAA |
@vincbeck What is the expected cadence for logs getting pushed to the ECS log group in Cloudwatch, and the frequency for logs getting pulled in MWAA? I can see ECS tasks that run for a few minutes according to the logs for the ECS task, but are missing logs in the mWAA log group after the first couple messages. I've attached an example to this comment of a new pair of logs for ECS vs MWAA-Task. The ECS task logs include timestamps from inside the container and show that the task ran for ~4 minutes, though all log timestamps are the same value. |
You are right, the issue might be logs with the same timestamp have trouble being sent to the MWAA task logs. Keep investigating |
Yep, that's the root cause ... Somehow when the timestamp is the same, it does not log |
Fix #41515 |
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow==2.8.1
apache-airflow-providers-airbyte==3.5.1
apache-airflow-providers-amazon==8.16.0
apache-airflow-providers-databricks==6.0.0
apache-airflow-providers-docker==3.9.1
apache-airflow-providers-slack==8.5.1
apache-airflow-providers-snowflake==5.2.1
Apache Airflow version
2.8.1
Operating System
Amazon Linux
Deployment
Amazon (AWS) MWAA
Deployment details
Our MWAA environments are managed through Terraform. We install 3 providers outside of those available in the public providers.
What happened
The logs tab in the UI for EcsRunTaskOperator tasks sometimes misses events. We noted this behavior once we upgraded from version 2.5 to version 2.7 (and it has remained in version 2.8). We did not upgrade to 2.6 at any point, and newer releases are not yet available through MWAA.
We use the
awslogs_group
argument for the EcsRunTaskOperator to specify the logs group that the UI should pull from. The log streams in that ECS log group contains all events, but not all events in the log stream appear in the MWAA Task log group.The following is an example from log streams that should share all logs (i.e. the MWAA Task logs should include all ECS logs). In the MWAA Task logs toward the bottom, events containing remaining DBT output logs are missing. The first several lines are fetched, and the rest of the logs that stream in later to the log stream are missed.
ECS logs:
mwaa-task-logstream.csv
MWAA Task logs:
ecs-task-logstream.csv
Our organization reached out to AWS MWAA for support relating to this issue and this is what we were told by their support regarding why this may be happening.
This was not reproduceable within a local Airflow setup.
This is the call made to the EcsRunTaskOperator:
What you think should happen instead
All log events in the ECS log group should be pulled forward by the MWAA Task log group using the
task_log_fetcher
.How to reproduce
We are using a large (~2GB) custom DBT image inside our taskdef. This has happened with every execution of the EcsRunTaskOperator against our custom DBT image.
After triggering a task using the EcsRunTaskOperator, the DBT output logs should stream directly from the ECS log stream to the MWAA Task log stream.
Anything else
This has been observed in our MWAA environment against both
apache-airflow==2.7.2
andapache-airflow==2.8.1
. When upgrading, we did not upgrade to a 2.6 version. This issue may be related to or caused by the same root cause as #39323.Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: