Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document fix for broken elasticsearch logs with 2.3.0+ upgrade #23821

Merged
merged 2 commits into from
May 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,13 @@ If you are happy with the new config values you should *remove* the setting in `

If you have customized the templates you should ensure that they contain ``{{ ti.map_index }}`` if you want to use dynamically mapped tasks.

If after upgrading you find your task logs are no longer accessible, try adding a row in the ``log_template`` table with ``id=0``
containing your previous ``log_id_template`` and ``log_filename_template``. For example, if you used the defaults in 2.2.5:

.. code-block:: sql

INSERT INTO log_template (id, filename, elasticsearch_id, created_at) VALUES (0, '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log', '{dag_id}_{task_id}_{run_id}_{try_number}', NOW());

BaseOperatorLink's ``get_link`` method changed to take a ``ti_key`` keyword argument (#21798)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Expand Down
34 changes: 15 additions & 19 deletions docs/apache-airflow-providers-elasticsearch/logging/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,37 +30,22 @@ First, to use the handler, ``airflow.cfg`` must be configured as follows:
.. code-block:: ini

[logging]
# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Users must supply an Airflow connection id that provides access to the storage
# location. If remote_logging is set to true, see UPDATING.md for additional
# configuration requirements.
remote_logging = True

[elasticsearch]
host = <host>:<port>
log_id_template = {dag_id}-{task_id}-{run_id}-{try_number}
end_of_log_mark = end_of_log
write_stdout =
json_fields =

To output task logs to stdout in JSON format, the following config could be used:

.. code-block:: ini

[logging]
# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Users must supply an Airflow connection id that provides access to the storage
# location. If remote_logging is set to true, see UPDATING.md for additional
# configuration requirements.
remote_logging = True

[elasticsearch]
host = <host>:<port>
log_id_template = {dag_id}-{task_id}-{run_id}-{try_number}
end_of_log_mark = end_of_log
write_stdout = True
json_format = True
json_fields = asctime, filename, lineno, levelname, message

.. _write-logs-elasticsearch-tls:

Expand All @@ -73,10 +58,6 @@ cert, etc.) use the ``elasticsearch_configs`` setting in your ``airflow.cfg``
.. code-block:: ini

[logging]
# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Users must supply an Airflow connection id that provides access to the storage
# location. If remote_logging is set to true, see UPDATING.md for additional
# configuration requirements.
remote_logging = True

[elasticsearch_configs]
Expand All @@ -100,3 +81,18 @@ To enable it, ``airflow.cfg`` must be configured as in the example below. Note t
# Code will construct log_id using the log_id template from the argument above.
# NOTE: scheme will default to https if one is not provided
frontend = <host_port>/{log_id}

Changes to ``[elasticsearch] log_id_template``
''''''''''''''''''''''''''''''''''''''''''''''

If you ever need to make changes to ``[elasticsearch] log_id_template``, Airflow 2.3.0+ is able to keep track of
old values so your existing task runs logs can still be fetched. Once you are on Airflow 2.3.0+, in general, you
can just change ``log_id_template`` at will and Airflow will keep track of the changes.

However, when you are upgrading to 2.3.0+, Airflow may not be able to properly save your previous ``log_id_template``.
If after upgrading you find your task logs are no longer accessible, try adding a row in the ``log_template`` table with ``id=0``
containing your previous ``log_id_template``. For example, if you used the defaults in 2.2.5:

.. code-block:: sql

INSERT INTO log_template (id, filename, elasticsearch_id, created_at) VALUES (0, '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log', '{dag_id}_{task_id}_{run_id}_{try_number}', NOW());