Skip to content

Logs to stdout or stderr stops after a few minutes #8955

@Tristan971

Description

@Tristan971

While trying to run ATS in a Docker container, I thought I'd make use of #7937 as a nice way to handle logs.

So I built branch 9.2.x, specifically 15bea4d and set the following:

In records.config:

CONFIG proxy.node.config.manager_log_filename STRING stderr
CONFIG proxy.config.diags.logfile.filename STRING stderr
CONFIG proxy.config.error.logfile.filename STRING stderr

In logging.yaml:

logging:
...
  logs:
    - mode: ascii
      format: common
      filename: stderr

Then I use traffic_manager without arguments as (non-root) PID 1 in the container.

At first things look alright, but after approximately 3 minutes, the logs just stop being output. Moving back error and diags to a file I can see:

[Jul 12 02:22:27.699] [LOG_FLUSH] ERROR: Failed to write log to stderr: [tried 101, wrote 0, Bad file descriptor]
[Jul 12 02:23:30.737] [LOG_FLUSH] ERROR: The following message was suppressed 12 times.
[Jul 12 02:23:30.737] [LOG_FLUSH] ERROR: Failed to write log to stderr: [tried 101, wrote 0, Bad file descriptor]

Looking into the container, I also notice the following:

$ echo "test" >> /proc/1/fd/2
$ echo "test" >> /proc/$(cat /run/trafficserver/server.lock)/fd/1
$ echo "test" >> /proc/$(cat /run/trafficserver/server.lock)/fd/2
bash: /proc/11/fd/2: No such file or directory

As one would expect, the first two commands cause "test" to be printed on the container logs, and the third does nothing.

It seems like stderr gets disconnected entirely at some point? A traffic_ctl server restart does bring logs back for another 3 minutes, but then the issue happens again. (note that the other ones work just fine)

At first I thought it would be some issue with ATS trying to "roll" the pipe or something akin to that, but adding CONFIG proxy.config.log.rolling_enabled INT 0 to records.config doesn't make a difference, so I guess it's not that.

If it is relevant, this was compiled and ran on a Debian Bullseye image, and the image in question is registry.gitlab.com/mangadex-pub/trafficserver:9.2.x-bullseye-dbcecd20.

Also, ATS otherwise keeps running just fine while that happens.

I doubt this will be particularly relevant, but here's the logs until they stop (when all set to stderr):

Logs
2022-07-12T03:07:42 [E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
2022-07-12T03:07:42 [Jul 12 02:07:42.435] traffic_manager STATUS: opened stderr
2022-07-12T03:07:42 [Jul 12 02:07:42.435] traffic_manager NOTE: updated diags config
2022-07-12T03:07:42 [Jul 12 02:07:42.437] traffic_manager NOTE: [LocalManager::listenForProxy] Listening on port: 8080 (ipv4)
2022-07-12T03:07:42 [Jul 12 02:07:42.437] traffic_manager NOTE: [TrafficManager] Setup complete
2022-07-12T03:07:43 [Jul 12 02:07:43.438] traffic_manager NOTE: [ProxyStateSet] Traffic Server Args: ' -M'
2022-07-12T03:07:43 [Jul 12 02:07:43.439] traffic_manager NOTE: [LocalManager::listenForProxy] Listening on port: 8080 (ipv4)
2022-07-12T03:07:43 [Jul 12 02:07:43.439] traffic_manager NOTE: [LocalManager::startProxy] Launching ts process
2022-07-12T03:07:43 [Jul 12 02:07:43.463] traffic_manager NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '9'
2022-07-12T03:07:43 [Jul 12 02:07:43.463] traffic_manager NOTE: [Alarms::signalAlarm] Server Process born
2022-07-12T03:07:45 [Jul 12 02:07:45.472] traffic_server STATUS: opened stderr
2022-07-12T03:07:45 [Jul 12 02:07:45.472] traffic_server NOTE: updated diags config
2022-07-12T03:07:45 [Jul 12 02:07:45.483] traffic_server NOTE: storage.config loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.483] traffic_server NOTE: storage.config finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.492] traffic_server NOTE: ip_allow.yaml loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.493] traffic_server NOTE: ip_allow.yaml finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.494] traffic_server NOTE: parent.config loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.494] traffic_server NOTE: parent.config finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.498] traffic_server NOTE: /etc/trafficserver/logging.yaml loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.500] traffic_server NOTE: /etc/trafficserver/logging.yaml finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.504] traffic_server NOTE: logging initialized[3], logging_mode = 3
2022-07-12T03:07:45 [Jul 12 02:07:45.504] traffic_server NOTE: Initialized plugin_dynamic_reload_mode: 1
2022-07-12T03:07:45 [Jul 12 02:07:45.504] traffic_server NOTE: plugin.config loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.505] traffic_server NOTE: loading plugin '/usr/lib/trafficserver/modules/header_rewrite.so'
2022-07-12T03:07:45 [Jul 12 02:07:45.507] traffic_server NOTE: loading plugin '/usr/lib/trafficserver/modules/healthchecks.so'
2022-07-12T03:07:45 [Jul 12 02:07:45.510] traffic_server NOTE: plugin.config finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.515] traffic_server NOTE: /etc/trafficserver/sni.yaml loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.516] traffic_server NOTE: /etc/trafficserver/sni.yaml finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.516] traffic_server NOTE: ssl_multicert.config loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.517] traffic_server NOTE: /etc/trafficserver/ssl_multicert.config finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.518] traffic_server NOTE: volume.config loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.518] traffic_server NOTE: volume.config finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.534] traffic_server NOTE: remap.config loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.534] traffic_server NOTE: strategies.yaml loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.534] traffic_server NOTE: No NextHop strategy configs were loaded.
2022-07-12T03:07:45 [Jul 12 02:07:45.535] traffic_server NOTE: strategies.yaml finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.537] traffic_server NOTE: remap.config finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.544] [TS_MAIN] NOTE: traffic server running
2022-07-12T03:07:45 [Jul 12 02:07:45.544] [TS_MAIN] NOTE: Traffic Server is running unprivileged, not switching to user 'nobody'
2022-07-12T03:07:45 [Jul 12 02:07:45.767] [ET_NET 3] NOTE: recovery clearing offsets of Vol /var/cache/trafficserver/cache.db 32768:13107196 : [153985024, 162373632] sync_serial 33 next 34
2022-07-12T03:07:45 [Jul 12 02:07:45.928] [ET_NET 2] NOTE: hosting.config loading ...
2022-07-12T03:07:45 [Jul 12 02:07:45.933] [ET_NET 2] NOTE: hosting.config finished loading
2022-07-12T03:07:45 [Jul 12 02:07:45.951] [ET_NET 2] NOTE: cache enabled
2022-07-12T03:07:45 [Jul 12 02:07:45.951] [ET_NET 2] NOTE: Traffic Server is fully initialized.
2022-07-12T03:07:52 - - redacted - TCP_MISS 1ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:07:57 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:02 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:07 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:12 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:17 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:22 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:27 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:32 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:37 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:42 - - redacted - TCP_MISS 1ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:47 - - redacted - TCP_MISS 1ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:51 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:08:56 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:01 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:06 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:11 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:16 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:21 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:27 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:30 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:35 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:40 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:45 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:50 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:09:55 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:10:00 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:10:05 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:10:10 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:10:15 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3
2022-07-12T03:10:20 - - redacted - TCP_MISS 0ms - "GET http://redacted/healthz HTTP/1.0" 200 3

[here it just abruptly stops]

Hopefully this helps track down the issue

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions