Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to save data for plugin storage under antivirus solution #57

Open
nmaludy opened this issue Jun 5, 2020 · 25 comments
Open

Failed to save data for plugin storage under antivirus solution #57

nmaludy opened this issue Jun 5, 2020 · 25 comments
Labels
in_windows_eventlog2(new) For new in_windows_eventlog2 plugin issue

Comments

@nmaludy
Copy link

nmaludy commented Jun 5, 2020

Hello, i'm running fluentd (1.10.2) using td-agent (3.7.1) on Windows 2016

My FluentD config looks like:

<match>
  flush_interval 1s
  host xxx
  port 12345
  protocol udp
  @type gelf
</match>

<filter>
  @type record_transformer
  <record>
    message ${record[description]}
  </record>
</filter>

<source>
  channels application,system,security
  @id windows_eventlog
  <storage>
    persistent true
    @type local
  </storage>
  tag windows_eventlog
  @type windows_eventlog
</source>

<system>
  root_dir C:/opt/td-agent
</system>

And i'm seeing the following errors coming through on the logging stream:

[windows_eventlog] failed to save data for plugin storage to file path="C:/opt/td-agent/worker0/windows_eventlog/storage.json" tmp="C:/opt/td-agent/worker0/windows_eventlog/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/worker0/windows_eventlog/storage.json.tmp, C:/opt/td-agent/worker0/windows_eventlog/storage.json)"

I checked and can see that this directory does exist and the file C:/opt/td-agent/worker0/windows_eventlog/storage.json is created and has the following contents:

{"application":[541177,41519],"system":[648,54300],"security":[8973443,1]}

Any ideas why i might be getting a permission denied error? It appears that the service is running as System:
image

Not sure what else to check, any thoughts?

@philipsabri
Copy link
Contributor

Are you running the td-agent as admin?

It's required if your are getting logs from channels like "Security"

@nmaludy
Copy link
Author

nmaludy commented Jun 5, 2020

@flurreN I haven't changed the Log On settings of the service. Does the default SYSTEM user not have access to the Security channel? Should i change the Log On setting to a local user with Administator rights?

@nmaludy
Copy link
Author

nmaludy commented Jun 5, 2020

@flurreN I have changed the service to login as local admin and no change. Still seeing the errors above and now that you mention the Security channel i'm not seeing logs from that either. Any thoughts on next-steps for debugging this?

@philipsabri
Copy link
Contributor

I manage to replicate your issue by denying write permission for SYSTEM on the opt folder. Which makes sense. The error is actually saying it dont have permission to write to C:\opt\td-agent\worker0\windows_eventlog\storage.json

Can you double check that you are not denying any SYSTEM permission on the storage.json file?

@nmaludy
Copy link
Author

nmaludy commented Jun 5, 2020

@flurreN I confirmed that all folders and files in the path including C:\opt\td-agent\worker0\windows_eventlog\storage.json are all writable by SYSTEM.

image

I also confirmed that the ruby processes are indeed running as SYSTEM in Task Manager:

image

@nmaludy
Copy link
Author

nmaludy commented Jun 5, 2020

What is interesting in this is that i see the timestamp of the storage.json file being updated and the storage.json.tmp files coming and going.

@philipsabri
Copy link
Contributor

I also managed to replicate the issue by having it run as a service and from the td-agent command promt at the same time. I guess they are both trying to access it by the same time, do you think you running anything in the background?

@nmaludy
Copy link
Author

nmaludy commented Jun 5, 2020

@flurreN I don't think i'm running anything else in the background. Here are the current Ruby / FluentD processes running:

image

Do any of those look like duplicates or different from your test system?

@philipsabri
Copy link
Contributor

philipsabri commented Jun 5, 2020

Its also running 3x Ruby interpeter for me so that should be fine

Try storing with a path

<storage>
    @type local
    persistent true
    path "c:/opt/td-agent/storage.json"
</storage>

@nmaludy
Copy link
Author

nmaludy commented Jun 5, 2020

@flurreN tried that config and am still seeing the error:

[windows_eventlog] failed to save data for plugin storage to file path="c:/opt/td-agent/storage.json" tmp="c:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (c:/opt/td-agent/storage.json.tmp, c:/opt/td-agent/storage.json)"

@philipsabri
Copy link
Contributor

@nmaludy
Hmm not sure, I think the devs would know more about this.

Also you should use in_windows_eventlog2 it has more option and will replace @in_windows_eventlog, it will probably be what you are looking for when it comes to eventlogs in the end anyway. I'm not sure if it will fix your issue but who knows, maybe.

@nmaludy
Copy link
Author

nmaludy commented Jun 8, 2020

@flurreN i tried your suggestion of in_windows_eventlog2 but am seeing the worker fail to start with the following error in the td-agent.log:

2020-06-08 08:31:59 -0400 [error]: #0 unexpected error error_class=TypeError error="no implicit conversion of Array into String"
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:108:in `memory'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:108:in `parse_memory'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:83:in `parse'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:163:in `bookmark_validator'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:138:in `subscribe_channel'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:131:in `block in start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:130:in `each'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:130:in `start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:200:in `block in start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:188:in `each'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:175:in `each'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:175:in `lifecycle'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:199:in `start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/engine.rb:248:in `start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/engine.rb:147:in `run'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:600:in `block in run_worker'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:837:in `main_process'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:594:in `run_worker'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/command/fluentd.rb:350:in `<top (required)>'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/bin/fluentd:8:in `<top (required)>'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/bin/fluentd:23:in `load'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/bin/fluentd:23:in `<main>'
2020-06-08 08:31:59 -0400 [error]: #0 unexpected error error_class=TypeError error="no implicit conversion of Array into String"
  2020-06-08 08:31:59 -0400 [error]: #0 suppressed same stacktrace
2020-06-08 08:31:59 -0400 [info]: Worker 0 finished unexpectedly with status 1

I copy/pasted the config snippet from the link you sent me.

@nmaludy
Copy link
Author

nmaludy commented Jun 8, 2020

Fixed that error by deleting the old storage.json file i had in the same place. Looks like the new plugin has a different storage format. The plugin is running now, but i'm still seeing errors related to renaming the storage file.

@philipsabri
Copy link
Contributor

Did you try to run the config with the td-agent command promt as admin (not as a service)?
Not sure if it would give another result but might be worth a try. If nothing works I would also try to reinstall the whole td-agent.
@nmaludy

@nmaludy
Copy link
Author

nmaludy commented Jun 8, 2020

@flurreN No change running it directly from the td-agent command prompt as Administrator. Also with this setup it still isn't able to read the Security channel.

Also with the windows_eventlog2 setup, i was never able to get it to read from any channel. No logs seemed to have ever been sent to the central logging server. Debugging this further now.

@nmaludy
Copy link
Author

nmaludy commented Jun 8, 2020

On the windows_eventlog2 setup, i figured out the problem. The Keywords field was being sent as a String and our old logger was sending it as an Integer, therefore our ElasticSearch index had type Integer and we were getting index failures when trying to ingest the logs from windows_eventlog2. To fix this i setup the following filter to remove the Keywords field (i don't care about it):

<filter winevt.**>
  @type record_transformer
  remove_keys Keywords
</filter>

@nmaludy
Copy link
Author

nmaludy commented Jun 8, 2020

@flurreN So, switching to windows_eventlog2 and applying the filter above to fix my indexing problem has solved the issue related to receiving the Security stream. Now, even the Service instance of td-agent / fluentd is reading the Security messages when running as the default SYSTEM user.

However, back to my original issue, still seeing errors in the logs related to renaming the temporary storage file:

2020-06-08 10:28:51 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/storage.json" tmp="C:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/st
orage.json.tmp, C:/opt/td-agent/storage.json)"
2020-06-08 10:29:11 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/storage.json" tmp="C:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/st
orage.json.tmp, C:/opt/td-agent/storage.json)"

@nmaludy
Copy link
Author

nmaludy commented Jun 8, 2020

For anyone else out there watching here is my working config:


<filter windows_eventlog**>
  @type record_transformer
  enable_ruby
  <record>
    full_message ${record["Description"]}
    short_message ${record["Description"].lines.first}
  </record>
  remove_keys Keywords,Description
</filter>

<match windows_eventlog**>
  flush_interval 1s
  host graylog.domain.tld
  port 12201
  protocol udp
  @type gelf
</match>

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  channels ["application", "system", "security"]
  read_existing_events false
  read_interval 2
  tag windows_eventlog
  render_as_xml true
  <storage>
    @type local
    persistent true
    path C:/opt/td-agent/storage.json
  </storage>
  <parse>
    @type winevt_xml
    preserve_qualifiers true
  </parse>
</source>

<system>
  root_dir C:/opt/td-agent
</system>

@nmaludy
Copy link
Author

nmaludy commented Jun 8, 2020

Just an update on this ticket, it looks like it was our Anti-Virus agent that was causing the issue. I worked with our security team to disable the AV agent on this host temporarily and the problem went away.

Looks like the AV agent was "grabbing" the the storage.json and storage.json.tmp files when they were written and modified. This causes an issue in Windows because these file "locks" prevent things like renames from going through since another process has the file open.

I haven't figured out why the rename DOES eventually succeed, but turning off AV resolved this.

Maybe there is another way we can operate on Windows to prevent this scenario in the future?

@philipsabri
Copy link
Contributor

@nmaludy Interesting and it makes sense actually. Im curious, which AV are you using?

I agree with you. The storage code is in this repo https://github.com/fluent/fluentd/ and I saw something similar in this issue fluent/fluentd#2446

@cosmo0920
Copy link
Contributor

cosmo0920 commented Jun 11, 2020

Looks like the AV agent was "grabbing" the the storage.json and storage.json.tmp files when they were written and modified. This causes an issue in Windows because these file "locks" prevent things like renames from going through since another process has the file open.

When Windows grubs files, they cannot be renamed eventually an administrator. This is because Windows requests greater than SYSTEM permission (It is strange and impossible permission!) to handle other processes handling files.

Yeah, disabling AV agent should be reasonable but a bit of dangerous.
IMO, AV agent should be able to specify excluding scanning folder.
in_windows_eventlog2 needs to rename storage data written files for managing bookmarks.

@cosmo0920 cosmo0920 added the in_windows_eventlog2(new) For new in_windows_eventlog2 plugin issue label Jun 22, 2020
@cosmo0920 cosmo0920 changed the title Failed to save data for plugin storage Failed to save data for plugin storage under antivirus solution Jul 8, 2020
@gordash2013
Copy link

Hello. Is there are any updates on this?

@kashok7474
Copy link

@cosmo0920 @ashie is there any update on this issue?

@cosmo0920
Copy link
Contributor

cosmo0920 commented Sep 12, 2023

@kashok7474 Please share your configuration and circumstances. Is the same for the comment? If not, could you post your configuration? Thanks.

@sutyak
Copy link

sutyak commented Nov 10, 2023

I've run into this same issue and created a ticket under Fluentd (should have been here for the plugin which I now know). Details here:
#103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in_windows_eventlog2(new) For new in_windows_eventlog2 plugin issue
Projects
None yet
Development

No branches or pull requests

6 participants