Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests are not reliable for a UBI 8 image #3520

Closed
patrick-stephens opened this issue May 20, 2021 · 1 comment · Fixed by #3529
Closed

Unit tests are not reliable for a UBI 8 image #3520

patrick-stephens opened this issue May 20, 2021 · 1 comment · Fixed by #3529

Comments

@patrick-stephens
Copy link
Contributor

Bug Report

Describe the bug
We are building fluent bit from source and running the various tests provided in a UBI 8 image. We have been seeing occasional failures in CI but not locally. Finally looked at it today and at least some of the test cases seem to be unreliable, they usually work ok but occasionally seem to hit what looks like timing issues so fail.

It looks like incidence of failure is higher when you run them all together in one go like we do in CI.

The failing tests we have seen so far:

  • nest
  • parser
  • grep

To Reproduce

docker build -t test .

FROM registry.access.redhat.com/ubi8/ubi:8.3
# hadolint ignore=DL3041
RUN dnf update -y && dnf install -y cmake diffutils gcc gcc-c++ libpq-devel m4 make openssl-devel systemd-devel tar unzip && dnf clean all

ARG FLUENT_BIT_VER=1.7.4
ADD https://github.com/fluent/fluent-bit/archive/refs/tags/v${FLUENT_BIT_VER}.zip /source.zip
RUN mkdir -p /fluent-bit/bin /fluent-bit/etc /fluent-bit/log /tmp/src \
      && unzip /source.zip -d /tmp/src && mv /tmp/src/fluent-bit-*/* /tmp/src/ \
      && rm -rf /tmp/src/build/*

# We require flex & bison which are not available for UBI to build record accessor and this is also used in some other output plugins
# We could build 1.6.10 as the 1.7 series will not build without RA: https://github.com/fluent/fluent-bit/issues/3097
# We must disable http input as well because this triggers another RA failure in 1.6.10: https://github.com/fluent/fluent-bit/issues/2930
#RUN cmake -DFLB_RECORD_ACCESSOR=Off -DFLB_STREAM_PROCESSOR=Off -DFLB_IN_HTTP=Off -DFLB_OUT_LOKI=Off -DFLB_TLS=On ../ && make && install bin/fluent-bit /fluent-bit/bin/
ARG BISON_VER=3.7
ARG BUSON_URL=http://ftp.gnu.org/gnu/bison
ARG FLEX_VER=2.6.4
ARG FLEX_URL=https://github.com/westes/flex/files/981163
ADD ${BUSON_URL}/bison-${BISON_VER}.tar.gz /bison/
ADD ${FLEX_URL}/flex-${FLEX_VER}.tar.gz /flex/
RUN tar -xzvf /bison/bison-${BISON_VER}.tar.gz -C /bison/ && tar -xzvf /flex/flex-${FLEX_VER}.tar.gz -C /flex/
# Flex needs Bison so do first
WORKDIR /bison/bison-${BISON_VER}/
RUN ./configure && make && make install
WORKDIR /flex/flex-${FLEX_VER}/
RUN ./configure && make && make install

WORKDIR /tmp/src/build/

# Build with tests for Fluent Bit and plugins - set to empty to disable
ARG MAKE_ADDITIONAL_TARGETS=test
# IPv6 tests dependent on docker configuration and support so disable
RUN sed -i '/{ "ipv6_client_server", test_ipv6_client_server},/d' ../tests/internal/network.c
# Disable the TD output plugin & Process input plugin as unit test fails
RUN cmake \
      -DFLB_TESTS_RUNTIME=On \
      -DFLB_TESTS_INTERNAL=On \
      -DFLB_RELEASE=On \
      -DFLB_TLS=On \
      -DFLB_OUT_TD=Off \
      -DFLB_IN_PROC=Off \
      ../ && \
      make && \
      install bin/fluent-bit /fluent-bit/bin/

CMD [ "/tmp/src/build/bin/flb-rt-filter_nest" ]

I include that command above to simplify an external docker loop.

  • Steps to reproduce the problem:
docker run --rm -it test bash

until ! /tmp/src/build/bin/flb-rt-filter_nest; do sleep 1; done

This command can run for a while but it will eventually fail.
I have seen it fail in the 2nd and 3rd (here) test case:

[ OK ]
SUCCESS: All unit tests have passed.
Test single...                                  [2021/05/20 11:39:08] [ info] [engine] started (pid=300)
[2021/05/20 11:39:08] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/05/20 11:39:08] [debug] [storage] [cio stream] new stream registered: lib.0
[2021/05/20 11:39:08] [ info] [storage] version=1.1.1, initializing...
[2021/05/20 11:39:08] [ info] [storage] in-memory
[2021/05/20 11:39:08] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/05/20 11:39:08] [debug] [lib:lib.0] created event channels: read=20 write=21
[2021/05/20 11:39:08] [debug] [router] match rule lib.0:lib.0
[2021/05/20 11:39:08] [ info] [sp] stream processor started
[2021/05/20 11:39:08] [debug] [filter:nest:nest.0] outer map size is 2, will be 2, nested map size will be 1
[2021/05/20 11:39:09] [debug] [task] created task=0x7f611401a410 id=0 OK
[2021/05/20 11:39:09] [debug] [test_filter_nest] received message: [1448403340.000000,{"extra":"Some more data","nested_key":{"to_nest":"This is the data to nest"}}]
[2021/05/20 11:39:09] [debug] [out coro] cb_destroy coro_id=0
[2021/05/20 11:39:09] [debug] [task] destroy task=0x7f611401a410 (task_id=0)
[2021/05/20 11:39:09] [ warn] [engine] service will stop in 1 seconds
[2021/05/20 11:39:10] [ info] [engine] service stopped
[ OK ]
Test multiple events are not dropped(nest)...   [2021/05/20 11:39:10] [ info] [engine] started (pid=303)
[2021/05/20 11:39:10] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/05/20 11:39:10] [debug] [storage] [cio stream] new stream registered: lib.0
[2021/05/20 11:39:10] [ info] [storage] version=1.1.1, initializing...
[2021/05/20 11:39:10] [ info] [storage] in-memory
[2021/05/20 11:39:10] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/05/20 11:39:10] [debug] [lib:lib.0] created event channels: read=20 write=21
[2021/05/20 11:39:10] [debug] [router] match rule lib.0:lib.0
[2021/05/20 11:39:10] [ info] [sp] stream processor started
[2021/05/20 11:39:10] [debug] [filter:nest:nest.0] outer map size is 2, will be 2, nested map size will be 1
[2021/05/20 11:39:10] [debug] [filter:nest:nest.0] no match found for (null)
[2021/05/20 11:39:11] [debug] [task] created task=0x7f611401a410 id=0 OK
[2021/05/20 11:39:11] [debug] [test_filter_nest] received message: [1448403340.000000,{"extra":"Some more data","nested_key":{"to_nest":"This is the data to nest"}}]
[2021/05/20 11:39:11] [debug] [test_filter_nest] received message: [1448403341.000000,{"not_nest":"dummy data","extra":"dummy more data"}]
[2021/05/20 11:39:11] [debug] [out coro] cb_destroy coro_id=0
[2021/05/20 11:39:11] [debug] [task] destroy task=0x7f611401a410 (task_id=0)
[2021/05/20 11:39:11] [ warn] [engine] service will stop in 1 seconds
[2021/05/20 11:39:12] [ info] [engine] service stopped
[ OK ]
Test multiple events are not dropped(lift)...   [2021/05/20 11:39:12] [ info] [engine] started (pid=306)
[2021/05/20 11:39:12] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/05/20 11:39:12] [debug] [storage] [cio stream] new stream registered: lib.0
[2021/05/20 11:39:12] [ info] [storage] version=1.1.1, initializing...
[2021/05/20 11:39:12] [ info] [storage] in-memory
[2021/05/20 11:39:12] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/05/20 11:39:12] [debug] [lib:lib.0] created event channels: read=20 write=21
[2021/05/20 11:39:12] [debug] [router] match rule lib.0:lib.0
[2021/05/20 11:39:12] [ info] [sp] stream processor started
[2021/05/20 11:39:12] [debug] [filter:nest:nest.0] Lift : Outer map size is 2, will be 2, lifting 1 record(s)
[2021/05/20 11:39:12] [debug] [filter:nest:nest.0] Lift : No match found for nested
[ FAILED ]
  filter_nest.c:226: Check Expected number of events 2, got 0... failed
[2021/05/20 11:39:13] [debug] [task] created task=0x7f611401a3d0 id=0 OK
[2021/05/20 11:39:13] [debug] [test_filter_nest] received message: [1448403340.000000,{"not_nestd":"not nested data","child":"nested data"}]
[2021/05/20 11:39:13] [debug] [test_filter_nest] received message: [1448403341.000000,{"not_nest":"dummy data","extra":"dummy more data"}]
[2021/05/20 11:39:13] [ warn] [engine] service will stop in 1 seconds
[2021/05/20 11:39:13] [debug] [out coro] cb_destroy coro_id=0
[2021/05/20 11:39:13] [debug] [task] destroy task=0x7f611401a3d0 (task_id=0)
[2021/05/20 11:39:14] [ info] [engine] service stopped
FAILED: 1 of 3 unit tests has failed.

If I run outside of the container in a loop that restarts it then it either takes too long or the timing issue goes away with the overhead of the container launch each time.
until ! docker run --rm -it test; do sleep 1; done

Expected behavior
Tests should always pass.

Screenshots

Your Environment

  • Version used: 1.7.4
  • Configuration: Container built from source
  • Environment name and version (e.g. Kubernetes? What version?): Local docker on MacOS but also in CI on Ubuntu 20.04
  • Server type and version:
  • Operating System and version: Various
  • Filters and plugins:

Additional context
Trying to auto-test a build from source to verify everything is correct.

@patrick-stephens
Copy link
Contributor Author

Tested now on 1.7.7 and seems to have been resolved (hard to confirm with timing issues), the change makes sense so I'm happy to consider it fixed.
I am getting other failures in the container build unit tests but I'll investigate and raise a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant