You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We are building fluent bit from source and running the various tests provided in a UBI 8 image. We have been seeing occasional failures in CI but not locally. Finally looked at it today and at least some of the test cases seem to be unreliable, they usually work ok but occasionally seem to hit what looks like timing issues so fail.
It looks like incidence of failure is higher when you run them all together in one go like we do in CI.
The failing tests we have seen so far:
nest
parser
grep
To Reproduce
docker build -t test .
FROM registry.access.redhat.com/ubi8/ubi:8.3
# hadolint ignore=DL3041
RUN dnf update -y && dnf install -y cmake diffutils gcc gcc-c++ libpq-devel m4 make openssl-devel systemd-devel tar unzip && dnf clean all
ARG FLUENT_BIT_VER=1.7.4
ADD https://github.com/fluent/fluent-bit/archive/refs/tags/v${FLUENT_BIT_VER}.zip /source.zip
RUN mkdir -p /fluent-bit/bin /fluent-bit/etc /fluent-bit/log /tmp/src \
&& unzip /source.zip -d /tmp/src && mv /tmp/src/fluent-bit-*/* /tmp/src/ \
&& rm -rf /tmp/src/build/*
# We require flex & bison which are not available for UBI to build record accessor and this is also used in some other output plugins
# We could build 1.6.10 as the 1.7 series will not build without RA: https://github.com/fluent/fluent-bit/issues/3097
# We must disable http input as well because this triggers another RA failure in 1.6.10: https://github.com/fluent/fluent-bit/issues/2930
#RUN cmake -DFLB_RECORD_ACCESSOR=Off -DFLB_STREAM_PROCESSOR=Off -DFLB_IN_HTTP=Off -DFLB_OUT_LOKI=Off -DFLB_TLS=On ../ && make && install bin/fluent-bit /fluent-bit/bin/
ARG BISON_VER=3.7
ARG BUSON_URL=http://ftp.gnu.org/gnu/bison
ARG FLEX_VER=2.6.4
ARG FLEX_URL=https://github.com/westes/flex/files/981163
ADD ${BUSON_URL}/bison-${BISON_VER}.tar.gz /bison/
ADD ${FLEX_URL}/flex-${FLEX_VER}.tar.gz /flex/
RUN tar -xzvf /bison/bison-${BISON_VER}.tar.gz -C /bison/ && tar -xzvf /flex/flex-${FLEX_VER}.tar.gz -C /flex/
# Flex needs Bison so do first
WORKDIR /bison/bison-${BISON_VER}/
RUN ./configure && make && make install
WORKDIR /flex/flex-${FLEX_VER}/
RUN ./configure && make && make install
WORKDIR /tmp/src/build/
# Build with tests for Fluent Bit and plugins - set to empty to disable
ARG MAKE_ADDITIONAL_TARGETS=test
# IPv6 tests dependent on docker configuration and support so disable
RUN sed -i '/{ "ipv6_client_server", test_ipv6_client_server},/d' ../tests/internal/network.c
# Disable the TD output plugin & Process input plugin as unit test fails
RUN cmake \
-DFLB_TESTS_RUNTIME=On \
-DFLB_TESTS_INTERNAL=On \
-DFLB_RELEASE=On \
-DFLB_TLS=On \
-DFLB_OUT_TD=Off \
-DFLB_IN_PROC=Off \
../ && \
make && \
install bin/fluent-bit /fluent-bit/bin/
CMD [ "/tmp/src/build/bin/flb-rt-filter_nest" ]
I include that command above to simplify an external docker loop.
Steps to reproduce the problem:
docker run --rm -it test bash
until ! /tmp/src/build/bin/flb-rt-filter_nest; do sleep 1; done
This command can run for a while but it will eventually fail.
I have seen it fail in the 2nd and 3rd (here) test case:
[ OK ]
SUCCESS: All unit tests have passed.
Test single... [2021/05/20 11:39:08] [ info] [engine] started (pid=300)
[2021/05/20 11:39:08] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/05/20 11:39:08] [debug] [storage] [cio stream] new stream registered: lib.0
[2021/05/20 11:39:08] [ info] [storage] version=1.1.1, initializing...
[2021/05/20 11:39:08] [ info] [storage] in-memory
[2021/05/20 11:39:08] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/05/20 11:39:08] [debug] [lib:lib.0] created event channels: read=20 write=21
[2021/05/20 11:39:08] [debug] [router] match rule lib.0:lib.0
[2021/05/20 11:39:08] [ info] [sp] stream processor started
[2021/05/20 11:39:08] [debug] [filter:nest:nest.0] outer map size is 2, will be 2, nested map size will be 1
[2021/05/20 11:39:09] [debug] [task] created task=0x7f611401a410 id=0 OK
[2021/05/20 11:39:09] [debug] [test_filter_nest] received message: [1448403340.000000,{"extra":"Some more data","nested_key":{"to_nest":"This is the data to nest"}}]
[2021/05/20 11:39:09] [debug] [out coro] cb_destroy coro_id=0
[2021/05/20 11:39:09] [debug] [task] destroy task=0x7f611401a410 (task_id=0)
[2021/05/20 11:39:09] [ warn] [engine] service will stop in 1 seconds
[2021/05/20 11:39:10] [ info] [engine] service stopped
[ OK ]
Test multiple events are not dropped(nest)... [2021/05/20 11:39:10] [ info] [engine] started (pid=303)
[2021/05/20 11:39:10] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/05/20 11:39:10] [debug] [storage] [cio stream] new stream registered: lib.0
[2021/05/20 11:39:10] [ info] [storage] version=1.1.1, initializing...
[2021/05/20 11:39:10] [ info] [storage] in-memory
[2021/05/20 11:39:10] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/05/20 11:39:10] [debug] [lib:lib.0] created event channels: read=20 write=21
[2021/05/20 11:39:10] [debug] [router] match rule lib.0:lib.0
[2021/05/20 11:39:10] [ info] [sp] stream processor started
[2021/05/20 11:39:10] [debug] [filter:nest:nest.0] outer map size is 2, will be 2, nested map size will be 1
[2021/05/20 11:39:10] [debug] [filter:nest:nest.0] no match found for (null)
[2021/05/20 11:39:11] [debug] [task] created task=0x7f611401a410 id=0 OK
[2021/05/20 11:39:11] [debug] [test_filter_nest] received message: [1448403340.000000,{"extra":"Some more data","nested_key":{"to_nest":"This is the data to nest"}}]
[2021/05/20 11:39:11] [debug] [test_filter_nest] received message: [1448403341.000000,{"not_nest":"dummy data","extra":"dummy more data"}]
[2021/05/20 11:39:11] [debug] [out coro] cb_destroy coro_id=0
[2021/05/20 11:39:11] [debug] [task] destroy task=0x7f611401a410 (task_id=0)
[2021/05/20 11:39:11] [ warn] [engine] service will stop in 1 seconds
[2021/05/20 11:39:12] [ info] [engine] service stopped
[ OK ]
Test multiple events are not dropped(lift)... [2021/05/20 11:39:12] [ info] [engine] started (pid=306)
[2021/05/20 11:39:12] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/05/20 11:39:12] [debug] [storage] [cio stream] new stream registered: lib.0
[2021/05/20 11:39:12] [ info] [storage] version=1.1.1, initializing...
[2021/05/20 11:39:12] [ info] [storage] in-memory
[2021/05/20 11:39:12] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/05/20 11:39:12] [debug] [lib:lib.0] created event channels: read=20 write=21
[2021/05/20 11:39:12] [debug] [router] match rule lib.0:lib.0
[2021/05/20 11:39:12] [ info] [sp] stream processor started
[2021/05/20 11:39:12] [debug] [filter:nest:nest.0] Lift : Outer map size is 2, will be 2, lifting 1 record(s)
[2021/05/20 11:39:12] [debug] [filter:nest:nest.0] Lift : No match found for nested
[ FAILED ]
filter_nest.c:226: Check Expected number of events 2, got 0... failed
[2021/05/20 11:39:13] [debug] [task] created task=0x7f611401a3d0 id=0 OK
[2021/05/20 11:39:13] [debug] [test_filter_nest] received message: [1448403340.000000,{"not_nestd":"not nested data","child":"nested data"}]
[2021/05/20 11:39:13] [debug] [test_filter_nest] received message: [1448403341.000000,{"not_nest":"dummy data","extra":"dummy more data"}]
[2021/05/20 11:39:13] [ warn] [engine] service will stop in 1 seconds
[2021/05/20 11:39:13] [debug] [out coro] cb_destroy coro_id=0
[2021/05/20 11:39:13] [debug] [task] destroy task=0x7f611401a3d0 (task_id=0)
[2021/05/20 11:39:14] [ info] [engine] service stopped
FAILED: 1 of 3 unit tests has failed.
If I run outside of the container in a loop that restarts it then it either takes too long or the timing issue goes away with the overhead of the container launch each time. until ! docker run --rm -it test; do sleep 1; done
Expected behavior
Tests should always pass.
Screenshots
Your Environment
Version used: 1.7.4
Configuration: Container built from source
Environment name and version (e.g. Kubernetes? What version?): Local docker on MacOS but also in CI on Ubuntu 20.04
Server type and version:
Operating System and version: Various
Filters and plugins:
Additional context
Trying to auto-test a build from source to verify everything is correct.
The text was updated successfully, but these errors were encountered:
Tested now on 1.7.7 and seems to have been resolved (hard to confirm with timing issues), the change makes sense so I'm happy to consider it fixed.
I am getting other failures in the container build unit tests but I'll investigate and raise a separate issue.
Bug Report
Describe the bug
We are building fluent bit from source and running the various tests provided in a UBI 8 image. We have been seeing occasional failures in CI but not locally. Finally looked at it today and at least some of the test cases seem to be unreliable, they usually work ok but occasionally seem to hit what looks like timing issues so fail.
It looks like incidence of failure is higher when you run them all together in one go like we do in CI.
The failing tests we have seen so far:
To Reproduce
docker build -t test .
I include that command above to simplify an external docker loop.
This command can run for a while but it will eventually fail.
I have seen it fail in the 2nd and 3rd (here) test case:
If I run outside of the container in a loop that restarts it then it either takes too long or the timing issue goes away with the overhead of the container launch each time.
until ! docker run --rm -it test; do sleep 1; done
Expected behavior
Tests should always pass.
Screenshots
Your Environment
Additional context
Trying to auto-test a build from source to verify everything is correct.
The text was updated successfully, but these errors were encountered: