Skip to content

Zero downtime update #713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a4b0600
deb rpm: install missing local plugins during upgrade process
kenhys Oct 9, 2024
7012e0b
deb rpm: add auto/manual service restart during upgrading
kenhys Oct 15, 2024
117e57f
deb: remove debug print
kenhys Oct 18, 2024
e85dda8
deb: fix wrong hook order to install plugins
kenhys Oct 18, 2024
051a305
deb: fix inconsistent phase comparison
kenhys Oct 18, 2024
b99276b
ci: fix tests for no-downtime
Watson1978 Oct 23, 2024
0d71bec
rules: add --no-restart-after-upgrade option in dh_installsystemd for…
Watson1978 Oct 16, 2024
5b698f3
system-test: remove unnecessary condition
Watson1978 Oct 24, 2024
158d18f
system-test: add comments
Watson1978 Oct 24, 2024
c0ce192
ci: extend timeout for v6 workflow (#701)
kenhys Oct 30, 2024
273c180
ci: show debug message to investigate
kenhys Oct 30, 2024
dcd91ce
ci: add test to update without data lost (#699)
Watson1978 Oct 31, 2024
656436d
deb rpm: fix local dependency gem (#688)
kenhys Nov 1, 2024
ca9059e
ci: add test to update with auto / manual feature (#712)
Watson1978 Nov 5, 2024
4230840
ci deb: check whether needrestart was suppressed
kenhys Nov 7, 2024
c7f8c36
deb yum: use /usr/sbin/fluent-gem to migrate gems when upgrade
Watson1978 Nov 7, 2024
102cdbc
ci: update the tests for no data lost (#715)
Watson1978 Nov 11, 2024
20cd970
ci: rename test file
Watson1978 Nov 12, 2024
fa5c641
ci: add downgrade test for no data lost
Watson1978 Nov 12, 2024
28a0966
deb: use auto/manual feature when old package supports (#738)
Watson1978 Nov 21, 2024
e27dee9
remove unnecessary debug logs
daipom Dec 5, 2024
2e109bf
deb: include prerm script in debian package (#757)
Watson1978 Dec 9, 2024
abb317e
auto plugin install: disable unless auto (#756)
daipom Dec 9, 2024
4fed1fe
deb rpm: remove manual feature of zero-downtime-restart from uninstal…
Watson1978 Dec 9, 2024
c2821b0
rpm: revert suppressing systemd_post macro (#759)
daipom Dec 10, 2024
4a06888
system-test: add update test from v5 LTS and downgrade test to v5 LTS…
Watson1978 Dec 10, 2024
262d113
deb: improve process timing for safety and simplicity (#762)
daipom Dec 11, 2024
038b770
rpm: improve process timing for safety (#764)
daipom Dec 11, 2024
33c1c1c
FLUENT_PACKAGE_SERVICE_RESTART: make empty or other values as auto (#…
daipom Dec 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 52 additions & 1 deletion .github/workflows/apt.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ concurrency:
jobs:
build:
name: Build
timeout-minutes: 60
timeout-minutes: 120
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -54,16 +54,34 @@ jobs:
fluent-package/apt/repositories
fluent-apt-source/apt/repositories
fluent-lts-apt-source/apt/repositories
v6-test/fluent-package/apt/repositories
key: ${{ runner.os }}-cache-${{ matrix.rake-job }}-${{ hashFiles('**/config.rb', '**/Rakefile', '**/Gemfile*', 'fluent-package/templates/**', 'fluent-package/debian/**', 'fluent-package/apt/**/Dockerfile') }}
- name: Build deb with Docker
if: ${{ ! steps.cache-deb.outputs.cache-hit }}
run: |
rake apt:build APT_TARGETS=${{ matrix.rake-job }}
- uses: actions/checkout@master
if: ${{ ! steps.cache-deb.outputs.cache-hit }}
with:
path: v6-test
- name: Build v6 deb with Docker
if: ${{ ! steps.cache-deb.outputs.cache-hit }}
run: |
cd v6-test
git config user.email "fluentd@googlegroups.com"
git config user.name "Fluentd developers"
git am fluent-package/bump-version-v6.patch
rake apt:build APT_TARGETS=${{ matrix.rake-job }}
- name: Upload fluent-package deb
uses: actions/upload-artifact@master
with:
name: packages-${{ matrix.rake-job }}
path: fluent-package/apt/repositories
- name: Upload v6 fluent-package deb
uses: actions/upload-artifact@master
with:
name: v6-packages-${{ matrix.rake-job }}
path: v6-test/fluent-package/apt/repositories
- name: Upload fluent-apt-source deb
uses: actions/upload-artifact@master
with:
Expand Down Expand Up @@ -147,7 +165,9 @@ jobs:
- "update-from-v4.sh local"
- "update-from-v4.sh v5"
- "update-from-v4.sh lts"
- "update-from-v5-lts.sh"
- "downgrade-to-v4.sh"
- "downgrade-to-v5-lts.sh"
- "install-newly.sh local"
- "install-newly.sh v5"
- "install-newly.sh lts"
Expand All @@ -157,6 +177,15 @@ jobs:
- "update-to-next-version-service-status.sh enabled inactive"
- "update-to-next-version-service-status.sh disabled active"
- "update-to-next-version-service-status.sh disabled inactive"
- "update-to-next-version-with-auto-and-manual.sh"
- "update-to-next-major-version.sh auto active"
- "update-to-next-major-version.sh auto inactive"
- "update-to-next-major-version.sh manual active"
- "update-to-next-major-version.sh manual inactive"
- "update-to-next-major-version.sh etc active"
- "update-to-next-major-version.sh etc inactive"
- "update-without-data-lost.sh v5 v6"
- "update-without-data-lost.sh v6 v5"
include:
- label: Debian bullseye amd64
rake-job: debian-bullseye
Expand Down Expand Up @@ -199,9 +228,31 @@ jobs:
- uses: actions/download-artifact@v4
with:
name: packages-${{ matrix.rake-job }}
- uses: actions/download-artifact@v4
with:
name: v6-packages-${{ matrix.rake-job }}
path: v6-test
- uses: actions/download-artifact@v4
with:
name: packages-apt-source-${{ matrix.rake-job }}
- uses: canonical/setup-lxd@v0.1.2
- name: Run diagnostic
run: |
uname -a
echo "::group::snap info lxd"
snap info lxd
echo "::endgroup::"
echo "::group::snap services lxd"
snap services lxd
echo "::endgroup::"
echo "::group::snap logs lxd"
sudo snap logs lxd
echo "::endgroup::"
echo "::group::lxc remote list"
lxc remote list
echo "::endgroup::"
echo "::group::lxc list images:"
lxc image list images:
echo "::endgroup::"
- name: Run Test ${{ matrix.test }} on ${{ matrix.lxc-image }}
run: fluent-package/apt/systemd-test/test.sh ${{ matrix.lxc-image }} ${{ matrix.test }}
53 changes: 51 additions & 2 deletions .github/workflows/yum.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ concurrency:
jobs:
build:
name: Build
timeout-minutes: 60
timeout-minutes: 120
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -50,17 +50,36 @@ jobs:
uses: actions/cache@v4
id: cache-rpm
with:
path: fluent-package/yum/repositories
path: |
fluent-package/yum/repositories
v6-test/fluent-package/yum/repositories
key: ${{ runner.os }}-cache-${{ matrix.rake-job }}-${{ hashFiles('**/config.rb', '**/Rakefile', '**/Gemfile*', '**/*.spec.in', 'fluent-package/templates/**', 'fluent-package/yum/**/Dockerfile') }}
- name: Build rpm with Docker
if: ${{ ! steps.cache-rpm.outputs.cache-hit }}
run: |
rake yum:build YUM_TARGETS=${{ matrix.rake-job }}
- uses: actions/checkout@master
if: ${{ ! steps.cache-rpm.outputs.cache-hit }}
with:
path: v6-test
- name: Build v6 rpm with Docker
if: ${{ ! steps.cache-rpm.outputs.cache-hit }}
run: |
cd v6-test
git config user.email "fluentd@googlegroups.com"
git config user.name "Fluentd developers"
git am fluent-package/bump-version-v6.patch
rake yum:build YUM_TARGETS=${{ matrix.rake-job }}
- name: Upload fluent-package rpm
uses: actions/upload-artifact@v4
with:
name: packages-${{ matrix.rake-job }}
path: fluent-package/yum/repositories
- name: Upload v6 fluent-package rpm
uses: actions/upload-artifact@v4
with:
name: v6-packages-${{ matrix.rake-job }}
path: v6-test/fluent-package/yum/repositories
# TODO move the following steps to "Test" job
- name: Check Package Size
run: |
Expand Down Expand Up @@ -121,7 +140,9 @@ jobs:
- AmazonLinux 2023 x86_64
test:
- "update-from-v4.sh"
- "update-from-v5-lts.sh"
- "downgrade-to-v4.sh"
- "downgrade-to-v5-lts.sh"
- "install-newly.sh local"
- "install-newly.sh v5"
- "install-newly.sh lts"
Expand All @@ -131,6 +152,15 @@ jobs:
- "update-to-next-version-service-status.sh enabled inactive"
- "update-to-next-version-service-status.sh disabled active"
- "update-to-next-version-service-status.sh disabled inactive"
- "update-to-next-version-with-auto-and-manual.sh"
- "update-to-next-major-version.sh auto active"
- "update-to-next-major-version.sh auto inactive"
- "update-to-next-major-version.sh manual active"
- "update-to-next-major-version.sh manual inactive"
- "update-to-next-major-version.sh etc active"
- "update-to-next-major-version.sh etc inactive"
- "update-without-data-lost.sh v5 v6"
- "update-without-data-lost.sh v6 v5"
include:
- label: AmazonLinux 2 x86_64
rake-job: amazonlinux-2
Expand All @@ -150,6 +180,10 @@ jobs:
- uses: actions/download-artifact@v4
with:
name: packages-${{ matrix.rake-job }}
- uses: actions/download-artifact@v4
with:
name: v6-packages-${{ matrix.rake-job }}
path: v6-test
- uses: canonical/setup-lxd@v0.1.2
- name: Run diagnostic
run: |
Expand Down Expand Up @@ -185,7 +219,9 @@ jobs:
- AlmaLinux 9 x86_64
test:
- "update-from-v4.sh"
- "update-from-v5-lts.sh"
- "downgrade-to-v4.sh"
- "downgrade-to-v5-lts.sh"
- "install-newly.sh local"
- "install-newly.sh v5"
- "install-newly.sh lts"
Expand All @@ -195,6 +231,15 @@ jobs:
- "update-to-next-version-service-status.sh enabled inactive"
- "update-to-next-version-service-status.sh disabled active"
- "update-to-next-version-service-status.sh disabled inactive"
- "update-to-next-version-with-auto-and-manual.sh"
- "update-to-next-major-version.sh auto active"
- "update-to-next-major-version.sh auto inactive"
- "update-to-next-major-version.sh manual active"
- "update-to-next-major-version.sh manual inactive"
- "update-to-next-major-version.sh etc active"
- "update-to-next-major-version.sh etc inactive"
- "update-without-data-lost.sh v5 v6"
- "update-without-data-lost.sh v6 v5"
include:
- label: RockyLinux 8 x86_64
rake-job: rockylinux-8
Expand All @@ -207,6 +252,10 @@ jobs:
- uses: actions/download-artifact@v4
with:
name: packages-${{ matrix.rake-job }}
- uses: actions/download-artifact@v4
with:
name: v6-packages-${{ matrix.rake-job }}
path: v6-test
- uses: canonical/setup-lxd@v0.1.2
- name: Run Test ${{ matrix.test }} on ${{ matrix.lxc-image }}
run: fluent-package/yum/systemd-test/test.sh ${{ matrix.lxc-image }} ${{ matrix.test }}
2 changes: 1 addition & 1 deletion fluent-package/Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,7 @@ class BuildTask
remove_needless_files
end

debian_pkg_scripts = ["preinst", "postinst", "postrm"]
debian_pkg_scripts = ["preinst", "postinst", "prerm", "postrm"]
debian_pkg_scripts.each do |script|
CLEAN.include(File.join("..", "debian", script))
end
Expand Down
2 changes: 2 additions & 0 deletions fluent-package/apt/commonvar.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ architecture=$(dpkg --print-architecture)
repositories_dir=/fluentd/fluent-package/apt/repositories
java_jdk=openjdk-11-jre
td_agent_version=4.5.2
fluent_package_lts_version=5.0.5

case ${code_name} in
xenial)
distribution=ubuntu
Expand Down
29 changes: 29 additions & 0 deletions fluent-package/apt/systemd-test/downgrade-to-v5-lts.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash

set -exu

. $(dirname $0)/../commonvar.sh

# Install v5 LTS to register the repository
curl --fail --silent --show-error --location https://toolbelt.treasuredata.com/sh/install-${distribution}-${code_name}-fluent-package5-lts.sh | sh

sudo apt purge -y fluent-package

# Install the current
sudo apt install -V -y \
/host/${distribution}/pool/${code_name}/${channel}/*/*/fluent-package_*_${architecture}.deb

# Test: service status
systemctl status --no-pager fluentd
systemctl status --no-pager td-agent
main_pid=$(eval $(systemctl show td-agent --property=MainPID) && echo $MainPID)

# Downgrade to v5 LTS
apt install -V -y fluent-package=${fluent_package_lts_version}-1 --allow-downgrades

systemctl status --no-pager fluentd
systemctl status --no-pager td-agent

# Fluentd should be restarted.
# NOTE: Unlike RPM, the restart behavior depends on TO-side. So, it restarts.
test $main_pid -ne $(eval $(systemctl show fluentd --property=MainPID) && echo $MainPID)
2 changes: 1 addition & 1 deletion fluent-package/apt/systemd-test/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ dir="/host/fluent-package/apt/systemd-test"
set -eux

echo "::group::Run test: launch $image"
lxc launch $image target
lxc launch $image target --debug
sleep 5
echo "::endgroup::"
echo "::group::Run test: configure $image"
Expand Down
69 changes: 69 additions & 0 deletions fluent-package/apt/systemd-test/update-from-v5-lts.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/bash

set -exu

. $(dirname $0)/../commonvar.sh


# If it update from v5 LTS without stopping Fluentd, Fluentd will not be restarted.
# Install v5 LTS
curl --fail --silent --show-error --location https://toolbelt.treasuredata.com/sh/install-${distribution}-${code_name}-fluent-package5-lts.sh | sh

systemctl status --no-pager fluentd
systemctl status --no-pager td-agent
main_pid=$(eval $(systemctl show td-agent --property=MainPID) && echo $MainPID)

# Install the current
sudo apt install -V -y \
/host/${distribution}/pool/${code_name}/${channel}/*/*/fluent-package_*_${architecture}.deb

# Test: service status
systemctl status --no-pager fluentd
systemctl status --no-pager td-agent

# Fluentd should NOT be restarted.
test $main_pid -eq $(eval $(systemctl show fluentd --property=MainPID) && echo $MainPID)

apt purge -y fluent-package

# If it update from v5 LTS with stopping Fluentd, Fluentd will be started when service is enabled.
# Install v5 LTS
curl --fail --silent --show-error --location https://toolbelt.treasuredata.com/sh/install-${distribution}-${code_name}-fluent-package5-lts.sh | sh

systemctl status --no-pager fluentd
systemctl status --no-pager td-agent
main_pid=$(eval $(systemctl show td-agent --property=MainPID) && echo $MainPID)

systemctl stop fluentd

# Install the current
sudo apt install -V -y \
/host/${distribution}/pool/${code_name}/${channel}/*/*/fluent-package_*_${architecture}.deb

systemctl status --no-pager fluentd
systemctl status --no-pager td-agent

# Fluentd should be started if service was stopped before update.
test $main_pid -ne $(eval $(systemctl show fluentd --property=MainPID) && echo $MainPID)

# Test: environmental variables
pid=$(systemctl show fluentd --property=MainPID --value)
env_vars=$(sudo sed -e 's/\x0/\n/g' /proc/$pid/environ)
test $(eval $env_vars && echo $HOME) = "/var/lib/fluent"
test $(eval $env_vars && echo $LOGNAME) = "_fluentd"
test $(eval $env_vars && echo $USER) = "_fluentd"
test $(eval $env_vars && echo $FLUENT_CONF) = "/etc/fluent/fluentd.conf"
test $(eval $env_vars && echo $FLUENT_PACKAGE_LOG_FILE) = "/var/log/fluent/fluentd.log"
test $(eval $env_vars && echo $FLUENT_PLUGIN) = "/etc/fluent/plugin"
test $(eval $env_vars && echo $FLUENT_SOCKET) = "/var/run/fluent/fluentd.sock"

# Test: No error logs
# (v5 default config outputs 'warn' log, so we should check only 'error' and 'fatal' logs)
sleep 3
test -e /var/log/fluent/fluentd.log
(! grep -e '\[error\]' -e '\[fatal\]' /var/log/fluent/fluentd.log)

# Test: Guard duplicated instance
(! sudo /usr/sbin/fluentd)
(! sudo /usr/sbin/fluentd -v)
sudo /usr/sbin/fluentd --dry-run
Loading
Loading