Skip to content

KubernetesPodOperator duplicating logs when they are being interrupted #33498

@fdemiane

Description

@fdemiane

Apache Airflow version

main (development)

What happened

This is a random issue. The KubernetesPodOperator duplicates logs when they are interrupted.

What you think should happen instead

When the logs are interrupted, the KubernetesPodOperator should continue logging since the last captured timestamp, and not re-print logs that were already printed.

How to reproduce

If you run the following dag mulltiple times, you might get duplicated logs. Note that the issue is random and not easily reproducible:

import datetime

from airflow import models
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator

YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)

with models.DAG(
    dag_id="composer_sample_kubernetes_pod",
    schedule_interval=datetime.timedelta(days=1),
    start_date=YESTERDAY,
) as dag:
    timeout = 240
    iterations = 1000
    arguments = \
        'for i in {1..%(iterations)s}; do echo "$i of %(iterations)s"; done' % {'iterations': iterations}

    kubernetes_min_pod_0 = KubernetesPodOperator(
        task_id="pod-ex-minimum-0",
        name="pod-ex-minimum-0",
        cmds=["/bin/bash", "-c"],
        arguments=[arguments],
        namespace="default",
        image="gcr.io/gcp-runtimes/ubuntu_18_0_4",
        startup_timeout_seconds=timeout
    )

Here is a sample output of duplicated logs:

[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 994 of 1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 995 of 1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 996 of 1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 997 of 1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 998 of 1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 999 of 1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 1000 of 1000
[2023-08-18, 13:32:54 UTC] {pod_manager.py:450} ERROR - Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base]
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:116} WARNING - Pod pod-ex-minimum-0-b3bf545v log read interrupted but container base still running
[2023-08-18, 13:32:55 UTC] {before.py:35} INFO - Starting call to 'unusual_prefix_01add01418c7cdaf7afbb9c57d62a05595a85aa7_kublogduplication.CustomPodManager.fetch_container_logs..consume_logs', this is the 1st time calling it.
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 1 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 2 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 3 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 4 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 5 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 6 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 7 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 8 of 1000

Operating System

Ubuntu

Versions of Apache Airflow Providers

main

Deployment

Google Cloud Composer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions