Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ignoring external input configuration in take_over: true mode #36395

Merged
merged 3 commits into from
Sep 14, 2023

Conversation

rdner
Copy link
Member

@rdner rdner commented Aug 22, 2023

The take-over mode didn't take into account external configuration files. Now it's reading the merged configuration that contains the externally defined inputs too.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Create a directory with 2 YAML files, let's say it's /test/config/inputs.d:

filebeat-before.yml

- type: log
  paths:
    - "/test/logs/error2.log"
- type: log
  paths:
    - "/test/logs/error3.log"

filebeat-after.yml.disabled

- type: filestream
  id: my-filestream-id-2
  take_over: true
  paths:
    - "/test/logs/error2.log"
- type: filestream
  id: my-filestream-id-3
  take_over: true
  paths:
    - "/test/logs/error3.log"
  1. Create the main config file in /test/config/filebeat.yml
filebeat.config.inputs:
  enabled: true
  path: /test/config/inputs.d/*.yml
filebeat.inputs:
  - type: log
    paths:
      - "/test/logs/error1.log"
logging:
  level: debug
output.console:
  enabled: true
path.data: "/test/data"

Note, you need to adjust the root path /test to your own.

  1. Create the log files in /test/logs with any lines you'd like, the filenames are error1.log, error2.log and error3.log.

  2. Run filebeat with the following script (you can override the FILEBEAT_ROOT variable:

#!/bin/bash

set -e

FILEBEAT_ROOT=~/Projects/beats/filebeat

cd $FILEBEAT_ROOT
rm -f ./filebeat
go build
cd -

$FILEBEAT_ROOT/filebeat run -e -c /test/config/filebeat.yml 2> output.json | jq .

You should see events according to the lines written to the log files, in my case it's:

{
  "@timestamp": "2023-09-13T12:10:26.583Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.11.0"
  },
  "input": {
    "type": "log"
  },
  "host": {
    "name": "hostname"
  },
  "agent": {
    "type": "filebeat",
    "version": "8.11.0",
    "ephemeral_id": "e6d49bb0-8a89-4a8b-a6f8-82df9a3fb2a0",
    "id": "55657aaf-2d01-41d0-a3b1-2a548ad1a6b6",
    "name": "hostname"
  },
  "ecs": {
    "version": "8.0.0"
  },
  "log": {
    "offset": 0,
    "file": {
      "path": "/test/logs/error1.log"
    }
  },
  "message": "test"
}
{
  "@timestamp": "2023-09-13T12:10:26.584Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.11.0"
  },
  "host": {
    "name": "hostname"
  },
  "agent": {
    "ephemeral_id": "e6d49bb0-8a89-4a8b-a6f8-82df9a3fb2a0",
    "id": "55657aaf-2d01-41d0-a3b1-2a548ad1a6b6",
    "name": "hostname",
    "type": "filebeat",
    "version": "8.11.0"
  },
  "message": "test3",
  "log": {
    "offset": 0,
    "file": {
      "path": "/test/logs/error3.log"
    }
  },
  "input": {
    "type": "log"
  },
  "ecs": {
    "version": "8.0.0"
  }
}
{
  "@timestamp": "2023-09-13T12:10:26.584Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.11.0"
  },
  "agent": {
    "type": "filebeat",
    "version": "8.11.0",
    "ephemeral_id": "e6d49bb0-8a89-4a8b-a6f8-82df9a3fb2a0",
    "id": "55657aaf-2d01-41d0-a3b1-2a548ad1a6b6",
    "name": "hostname"
  },
  "ecs": {
    "version": "8.0.0"
  },
  "message": "test2",
  "log": {
    "file": {
      "path": "/test/logs/error2.log"
    },
    "offset": 0
  },
  "input": {
    "type": "log"
  },
  "host": {
    "name": "hostname"
  }
}
  1. Now switch everything to filestreams with take_over: true, for that:

Switch the main config to:

filebeat.config.inputs:
  enabled: true
  path: /test/config/inputs.d/*.yml
filebeat.inputs:
  - type: filestream
    id: my-filestream-1
    take_over: true
    paths:
      - "/test/logs/error1.log"
logging:
  level: debug
output.console:
  enabled: true
path.data: "/test/data"

Note, we switched the input to filestream.

Rename the files:

  • filebeat-after.yml.disabled -> filebeat-after.yml
  • filebeat-before.yml -> filebeat-before.yml.disabled
  1. Run Filebeat with the same script as before and you should not see any events this time (this means the offsets were successfully migrated). Check output.json, it should contain the following log lines:
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 194
  },
  "message": "found 1 patterns for filestream `my-filestream-1`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 194
  },
  "message": "found 1 patterns for filestream `my-filestream-id-2`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 194
  },
  "message": "found 1 patterns for filestream `my-filestream-id-3`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 167
  },
  "message": "found 3 filestream inputs in `take over` mode",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 128
  },
  "message": "found loginput state `filebeat::logs::native::123539340-16777230` to take over by `filestream::my-filestream-1::native::123539340-16777230`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 128
  },
  "message": "found loginput state `filebeat::logs::native::128995786-16777230` to take over by `filestream::my-filestream-id-2::native::128995786-16777230`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 128
  },
  "message": "found loginput state `filebeat::logs::native::128995795-16777230` to take over by `filestream::my-filestream-id-3::native::128995795-16777230`",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 62
  },
  "message": "Attempting to find the checkpoint...",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 70
  },
  "message": "Checkpoint not found",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 74
  },
  "message": "Checking if the registry log exists at /test/data/registry/filebeat/log.json...",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 81
  },
  "message": "Found the registry log at /test/data/registry/filebeat/log.json",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.457+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "backup/registry.go",
    "file.line": 84
  },
  "message": "Creating backups for [/test/data/registry/filebeat/log.json]...",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
{
  "log.level": "info",
  "@timestamp": "2023-09-13T14:13:42.458+0200",
  "log.logger": "filestream-takeover",
  "log.origin": {
    "file.name": "takeover/takeover.go",
    "file.line": 87
  },
  "message": "filestream inputs took over 3 file(s) from loginputs",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}
  1. Now write to the error1.log, error2.log and error3.log files and make sure you see the events from filestream on the console, like this one:
echo "another_test" >> error1.log
{
  "@timestamp": "2023-09-13T12:17:16.506Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "8.11.0"
  },
  "input": {
    "type": "filestream"
  },
  "ecs": {
    "version": "8.0.0"
  },
  "host": {
    "name": "hostname"
  },
  "agent": {
    "type": "filebeat",
    "version": "8.11.0",
    "ephemeral_id": "1d1bbbd1-6912-497f-8a3e-000617cdb163",
    "id": "55657aaf-2d01-41d0-a3b1-2a548ad1a6b6",
    "name": "hostname"
  },
  "log": {
    "offset": 5,
    "file": {
      "path": "/test/logs/error1.log",
      "device_id": 16777230,
      "inode": 123539340
    }
  },
  "message": "another_test"
}

Related issues

@rdner rdner added bug Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-v8.10.0 Automated backport with mergify labels Aug 22, 2023
@rdner rdner self-assigned this Aug 22, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 22, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-13T15:54:32.194+0000

  • Duration: 70 min 12 sec

Test stats 🧪

Test Results
Failed 0
Passed 8257
Skipped 753
Total 9010

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@rdner rdner force-pushed the fix-take-over-config branch 5 times, most recently from 6937e9b to 76308e9 Compare September 13, 2023 10:16
The take-over mode didn't take into account external configuration
files. Now it's reading the merged configuration that contains the
externally defined inputs too.
@rdner rdner force-pushed the fix-take-over-config branch from 76308e9 to 94874d7 Compare September 13, 2023 10:17
@rdner rdner marked this pull request as ready for review September 13, 2023 12:19
@rdner rdner requested a review from a team as a code owner September 13, 2023 12:19
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the code change.

For tests, because processLogInputTakeOver happens before the InputLoader, it might be a good idea to have a test that shows what happens if the input config is wrong or not parsable somehow.

@mergify
Copy link
Contributor

mergify bot commented Sep 13, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-take-over-config upstream/fix-take-over-config
git merge upstream/main
git push upstream fix-take-over-config

@rdner
Copy link
Member Author

rdner commented Sep 14, 2023

@leehinman since all this new code does not really parse the input configuration and keeps it as raw config, nothing would fail on invalid configuration, unless YAML syntax error.

When the actual take over is happening the input configuration is parsed into a short verion of the input configuration

type scanner struct {
RecursiveGlob bool `config:"recursive_glob"`
}
type prospector struct {
Scanner scanner `config:"scanner"`
}
type inputConfig struct {
Type string `config:"type"`
ID string `config:"id"`
Paths []string `config:"paths"`
TakeOver bool `config:"take_over"`
Prospector prospector `config:"prospector"`
}
func defaultInputConfig() inputConfig {
return inputConfig{
Type: "",
ID: "",
Paths: []string{},
TakeOver: false,
Prospector: prospector{
Scanner: scanner{
RecursiveGlob: true,
},
},
}
}

And required data is validated there. As long as it has the right input type, paths and take_over: true it's considered valid. We don't duplicate validations for the rest of the parameters.

inputCfg := defaultInputConfig()
err := input.Unpack(&inputCfg)
if err != nil {
return nil, fmt.Errorf("failed to unpack input configuration: %w", err)
}
if inputCfg.Type != "filestream" || !inputCfg.TakeOver {
continue
}

And this is tested here

{
name: "does nothing when default config",
cfg: empty,
states: states,
},
{
name: "does nothing when there is no filestream with `take_over`",
cfg: noTakeOver,
states: states,
},
{
name: "returns error if filestreams don't have unique IDs",
cfg: noUniqueID,
states: states,
expErr: "failed to read input configuration: filestream with ID `filestream-id-2` in `take over` mode requires a unique ID. Add the `id:` key with a unique value",
},

If the input loader performs extra validation and fails later it's okay, the migration is done anyway and it's well communicated in the debug logs.

@rdner rdner merged commit 73774d5 into elastic:main Sep 14, 2023
@rdner rdner deleted the fix-take-over-config branch September 14, 2023 07:42
mergify bot pushed a commit that referenced this pull request Sep 14, 2023
…36395)

The take-over mode didn't take into account external configuration
files. Now it's reading the merged configuration that contains the
externally defined inputs too.

(cherry picked from commit 73774d5)
rdner added a commit that referenced this pull request Sep 14, 2023
…36395) (#36588)

The take-over mode didn't take into account external configuration
files. Now it's reading the merged configuration that contains the
externally defined inputs too.

(cherry picked from commit 73774d5)

Co-authored-by: Denis <denis.rechkunov@elastic.co>
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
…lastic#36395)

The take-over mode didn't take into account external configuration
files. Now it's reading the merged configuration that contains the
externally defined inputs too.
@fixed77
Copy link

fixed77 commented Aug 23, 2024

filebeat 8.15.0. problem returned.
filebeat ignoring external input configuration with take_over: true

@pierrehilbert
Copy link
Collaborator

Hello @fixed77
Can you please share more about the situation you are facing to help us reproduce it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.10.0 Automated backport with mergify bug Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

take_over: true is ignored when using input configuration from external files
5 participants