Tail input plugin reads whole file from beginning when ignore_older is set to specific value #9490
Description
Bug Report
Describe the bug
Tail plugin wrongly reads file from beginning when ignore_older configuration parameter is set. It depends what value is set for 'ignore_older', See below.
If ignore_older is NOT used, then Tail plugin behaves as expected, it means as 'tail -f' shell command.
To Reproduce
On Mac, run below commands to understand to issue quickly.
Pre-requisites:
- To have harvested file with timestamp older than current time on Operating system
- Start fluent-bit with Tail's property ignore_older set to lower value than the calculated difference between current system time and file's timestamp (For example: Current time is 1-Jan-2000 10:10, timestamp on harversted file is 1-Jan-2000 10:00, ignore_older=5m)
Once the timestamp of the harvested file is changed to current date/time, or when file is updated, then you will see that Tail plugin read whole file from beginning. At that time you hit the bug.
Same behavior is seen on Linux as well.
[flb-tester] # for i in {1..10}; do echo "[$(date -j -v-1d +%d-%b-%Y\ %H:%M:%S)] [FluentBit works $i]" >> app.log; done
[flb-tester] # touch -c $(date -j -v-10H +%m%d%H%M) app.log
[flb-tester] # ls -l; date
total 8
-rw-r--r--@ 1 mlukas wheel 431 Oct 14 16:39 app.log
Tue Oct 15 02:39:26 CEST 2024
[flb-tester] # fluent-bit -i tail -p path=/tmp/flb/app.log -p path_key=log_path -p Refresh_Interval=3 -p ignore_older=15m -o stdout -l fb.log -v &
[1] 58732
[flb-tester] # Fluent Bit v3.1.9
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ _____ __
| ___| | | | | ___ (_) | |____ |/ |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __ / /`| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / \ \ | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /.___/ /_| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ \____(_)___/
[2024/10/15 02:39:45] [ info] Configuration:
[2024/10/15 02:39:45] [ info] flush time | 1.000000 seconds
[2024/10/15 02:39:45] [ info] grace | 5 seconds
[2024/10/15 02:39:45] [ info] daemon | 0
[2024/10/15 02:39:45] [ info] ___________
[2024/10/15 02:39:45] [ info] inputs:
[2024/10/15 02:39:45] [ info] tail
[2024/10/15 02:39:45] [ info] ___________
[2024/10/15 02:39:45] [ info] filters:
[2024/10/15 02:39:45] [ info] ___________
[2024/10/15 02:39:45] [ info] outputs:
[2024/10/15 02:39:45] [ info] stdout.0
[2024/10/15 02:39:45] [ info] ___________
[2024/10/15 02:39:45] [ info] collectors:
[flb-tester] # touch app.log
[0] tail.0: [[1728952803.205855000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 1]"}]
[1] tail.0: [[1728952803.205859000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 2]"}]
[2] tail.0: [[1728952803.205860000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 3]"}]
[3] tail.0: [[1728952803.205860000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 4]"}]
[4] tail.0: [[1728952803.205860000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 5]"}]
[5] tail.0: [[1728952803.205861000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 6]"}]
[6] tail.0: [[1728952803.205861000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 7]"}]
[7] tail.0: [[1728952803.205861000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 8]"}]
[8] tail.0: [[1728952803.205862000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 9]"}]
[9] tail.0: [[1728952803.205862000, {}], {"log_path"=>"/tmp/flb/app.log", "log"=>"[14-Oct-2024 02:39:16] [FluentBit works 10]"}]
[flb-tester] # ps -ef| grep fluent-bit
503 58732 58149 0 2:39AM ttys132 0:00.07 fluent-bit -i tail -p path=/tmp/flb/app.log -p path_key=log_path -p Refresh_Interval=3 -p ignore_older=15m -o stdout -l fb.log -v
503 58748 58149 0 2:40AM ttys132 0:00.00 grep fluent-bit
[flb-tester] #
- Steps to reproduce the problem:
#On Mac
1)Create empty app.log file
cd /tmp; mkdir flb; cd flb
touch app.log
2) Add few dummy lines inside the app.log file
for i in {1..10}; do echo "[$(date -j -v-1d +%d-%b-%Y\ %H:%M:%S)] [FluentBit works $i]" >> app.log; done
3) Change app.log file's timestamp to 10 hours back from now
touch -c $(date -j -v-10H +%m%d%H%M) app.log
4) Check file's timestamp and current date
ls -l; date
5) Run fluent-bit on backround using below command
fluent-bit -i tail -p path=/tmp/flb/app.log -p path_key=log_path -p Refresh_Interval=3 -p ignore_older=15m -o stdout -l fb.log -v &
6) Change the timestamp of the harversted file
touch app.log
Expected behavior
Expected behavior is : never read file from beginning when Tail plugin is used and its configuration parameter Read_from_Head is set to False (default). Other properties MAY NOT change the Tail's behavior.
Screenshots
#With ignore_older - There is a bug in Fluent Bit - ignore_older=15m
#With ignore_older - There is a bug in Fluent Bit - ignore_older=5m
#Without ignore_older - correct, and expected behavior
Your Environment
- Version used: Fluent Bit v3.1.9
- Configuration: See command line above.
- Environment name and version (e.g. Kubernetes? What version?): n/a
- Server type and version: n/a
- Operating System and version: MacOS: 14.6.1, arm64, Darwin Kernel Version 23.6.0
Issue was also reproduced on Linux 7 and 8 (RedHat based) - Filters and plugins: No flilters; Plugin: Tail, Stdout
Additional context
There is huge impact on Storage cost, because big files (could be several GigaBytes size) are whole being reuploaded to endpoints. So far we have identified that issue during RPM reinstallation.
Review and update
Review and update current Tail's documentation (could be on different places): https://docs.fluentbit.io/manual/installation/upgrade-notes#tail-input-plugin
Mainly information when whole file is being read:
For every file found at start, its followed from it last position,
for new files discovered at runtime or rotated, they are read from the beginning.