Upgrade verifiers do not retry if the download fails or times out. #5163
Closed
Description
opened on Jul 17, 2024
Hello, I have deployed Elastic Agent with Fleet Server in version 8.14.2 and tried to upgrade few days later to 8.14.3.
When watching the logs through Observability -> Logs -> Stream I have noticed some error messages from elastic_agent dataset. The logs are provided below as well as temporary fix.
Steps to reproduce:
- Upgrade Elastic Agent (Fleet Server) from Kibana UI.
Log output:
12:28:50.843 elastic_agent [elastic_agent][info] download from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz completed in 48 seconds @ 7.16MBps
12:28:50.843 elastic_agent [elastic_agent][info] updated upgrade details
12:28:50.854 elastic_agent [elastic_agent][info] download from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.sha512 completed in Less than a second @ +InfYBps12:28:50.854
12:28:50.854 elastic_agent [elastic_agent][info] updated upgrade details
12:28:50.854 elastic_agent [elastic_agent][info] updated upgrade details
12:28:50.854 elastic_agent [elastic_agent][info] updated upgrade details
12:28:50.854 elastic_agent [elastic_agent][info] updated upgrade details
12:28:51.464 elastic_agent [elastic_agent][info] Default PGP appended
12:29:21.465 elastic_agentm [elastic_agent][warn] Skipped remote PGP located at "https://artifacts.elastic.co/GPG-KEY-elastic-agent" because it's unavailable: 2 errors occurred:
* Get "https://artifacts.elastic.co/GPG-KEY-elastic-agent": context deadline exceeded
* Remote PGP download failed
12:29:21.468 elastic_agent [elastic_agent][warn] Skipped remote PGP located at "https://localhost:8221/api/agents/upgrades/8.14.3/pgp-public-key" because it's unavailable: 2 errors occurred:
* Get "https://localhost:8221/api/agents/upgrades/8.14.3/pgp-public-key": x509: certificate is valid for myfleet.example.com, not localhost
* Remote PGP download failed
12:29:21.468 elastic_agent [elastic_agent][info] Using 1 PGP keys
12:29:52.081 elastic_agent [elastic_agent][info] Cleaning up non-matching downloaded versions
12:29:52.114 elastic_agent [elastic_agent][error] upgrade to version 8.14.3 failed: failed verification of agent binary: 2 errors occurred:
* could not get .asc file: fetching asc file from '/opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc': open /opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: no such file or directory
* fetching asc file from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc: failed loading public key: Get "https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc": context deadline exceeded
12:29:52.114 elastic_agent [elastic_agent][info] updated upgrade details
Bug fix (manual):
- While in the upgrade process, go into folder /opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads and issue the command below.
- On the Fleet Server issue the command to manually download the .asc file:
root@myfleet:/opt/Elastic/Agent/data/elastic-agent-8.14.2-173817/downloads# curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz.asc
- Start the upgrade process again if it fails the first time from Kibana UI.
- Now the data stream shows "Using 2 PGP keys" as the second required one to download the update archive was manually put into the required downloads directory.
- Wait until the upgrade is successfully done.
Notes
My Fleet Server host is listening on socket *:8220
on a domain name https://myfleet.example.com:8220. The host has another socket open 127.0.0.1:8221
which is used for internal API operations. My firewall has OUTPUT chain to accept all and INPUT chain has the rule to accept all connections made to loopback adapter as specified in a rule iptables -A INPUT -i lo -j ACCEPT
.
Activity