Skip to content

Filebeat race condition start / stop harvester #3779

Closed
@ruflin

Description

CI on AppVeyor recently run into a race condition during the tests. The race condition seems to be cause by Stopping a harvester before it was fully started, means Wait() is called before all Add(1) are called on the waitgroup. See full log messages below.

Running test_shutdown.py:Test.test_shutdown_wait_ok can lead to the following log output / race condition:

2017/03/20 16:05:50.682951 beat.go:319: INFO Home path: [C:\gopath\src\github.com\elastic\beats\filebeat] Config path: [C:\gopath\src\github.com\elastic\beats\filebeat] Data path: [C:\gopath\src\github.com\elastic\beats\filebeat\data] Logs path: [C:\gopath\src\github.com\elastic\beats\filebeat\logs]
2017/03/20 16:05:50.682951 metrics.go:23: INFO Metrics logging every 30s
2017/03/20 16:05:50.682951 beat.go:344: INFO Beat metadata path: C:\gopath\src\github.com\elastic\beats\filebeat\data\meta.json
2017/03/20 16:05:50.683952 beat.go:326: INFO Beat UUID: a9a026d7-5da0-4904-aee5-dfb1bce68f32
2017/03/20 16:05:50.683952 beat.go:208: INFO Setup Beat: filebeat; Version: 6.0.0-alpha1
2017/03/20 16:05:50.683952 processor.go:44: DBG  Processors: 
2017/03/20 16:05:50.683952 beat.go:214: DBG  Initializing output plugins
2017/03/20 16:05:50.683952 file.go:54: INFO File output path set to: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/output
2017/03/20 16:05:50.683952 file.go:55: INFO File output base filename set to: filebeat
2017/03/20 16:05:50.683952 file.go:58: INFO Rotate every bytes set to: 1024000
2017/03/20 16:05:50.683952 file.go:62: INFO Number of files set to: 7
2017/03/20 16:05:50.683952 outputs.go:117: INFO Activated file as output plugin.
2017/03/20 16:05:50.683952 publish.go:236: DBG  Create output worker
2017/03/20 16:05:50.684952 publish.go:278: DBG  No output is defined to store the topology. The server fields might not be filled.
2017/03/20 16:05:50.684952 publish.go:293: INFO Publisher name: APPVYR-WIN
2017/03/20 16:05:50.686952 async.go:63: INFO Flush Interval set to: -1s
2017/03/20 16:05:50.686952 async.go:64: INFO Max Bulk Size set to: -1
2017/03/20 16:05:50.687952 beat.go:251: INFO filebeat start running.
2017/03/20 16:05:50.687952 service_windows.go:51: DBG  Windows is interactive: true
2017/03/20 16:05:50.687952 registrar.go:72: INFO No registry file found under: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry. Creating a new registry file.
2017/03/20 16:05:50.687952 registrar.go:346: DBG  Write registry file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:50.741954 file_windows.go:51: DBG  delete old: remove C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry.old: The system cannot find the file specified.
2017/03/20 16:05:50.742952 file_windows.go:57: DBG  rotate to old: rename C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry.old: The system cannot find the file specified.
2017/03/20 16:05:50.742952 registrar.go:373: DBG  Registry file updated. 0 states written.
2017/03/20 16:05:50.742952 registrar.go:120: INFO Loading registrar data from C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:50.742952 registrar.go:140: INFO States Loaded from registrar: 0
2017/03/20 16:05:50.742952 registrar.go:271: INFO Starting Registrar
2017/03/20 16:05:50.742952 sync.go:46: INFO Start sending events to output
2017/03/20 16:05:50.742952 spooler.go:68: INFO Starting spooler: spool_size: 2048; idle_timeout: 100ms
2017/03/20 16:05:50.742952 crawler.go:42: INFO Loading Prospectors: 1
2017/03/20 16:05:50.743951 prospector.go:90: DBG  File Configs: [C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/log/*]
2017/03/20 16:05:50.743951 prospector_log.go:46: DBG  exclude_files: []
2017/03/20 16:05:50.743951 prospector_log.go:71: INFO Prospector with previous states loaded: 0
2017/03/20 16:05:50.743951 prospector.go:142: INFO Starting prospector of type: log; id: 3861952635306202170 
2017/03/20 16:05:50.744952 crawler.go:67: INFO Loading and starting Prospectors completed. Enabled prospectors: 1
2017/03/20 16:05:50.744952 prospector_log.go:77: DBG  Start next scan
2017/03/20 16:05:50.744952 prospector_log.go:267: DBG  Check file for harvesting: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.843080 spooler.go:103: DBG  Flushing spooler because of timeout. Events flushed: 0
2017/03/20 16:05:50.943941 spooler.go:103: DBG  Flushing spooler because of timeout. Events flushed: 0
2017/03/20 16:05:50.945950 service.go:33: DBG  Received sigterm/sigint, stopping
2017/03/20 16:05:50.945950 service.go:39: DBG  Received svc stop/shutdown request
2017/03/20 16:05:50.945950 filebeat.go:251: INFO Stopping filebeat
2017/03/20 16:05:50.945950 prospector.go:157: INFO Prospector channel stopped
2017/03/20 16:05:50.945950 crawler.go:109: INFO Stopping Crawler
2017/03/20 16:05:50.945950 crawler.go:122: INFO Stopping 1 prospectors
2017/03/20 16:05:50.956946 prospector_log.go:287: DBG  Start harvester for new file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.957921 state.go:72: DBG  New state added for C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.957921 log.go:341: DBG  Setting offset for file based on seek: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.957921 log.go:325: DBG  Setting offset for file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log. Offset: 0 
2017/03/20 16:05:50.957921 prospector_log.go:99: DBG  Prospector states cleaned up. Before: 1, After: 1
2017/03/20 16:05:50.957921 prospector.go:208: INFO Prospector ticker stopped
2017/03/20 16:05:50.957921 prospector.go:276: INFO Stopping Prospector: 3861952635306202170
2017/03/20 16:05:50.957921 log.go:109: INFO Harvester started for file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
==================
WARNING: DATA RACE
Write at 0x00c0421b57cc by goroutine 40:
  internal/race.Write()
      C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/internal/race/race.go:41 +0x3f
  sync.(*WaitGroup).Wait()
      C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/sync/waitgroup.go:129 +0x158
  github.com/elastic/beats/filebeat/harvester.(*Harvester).Stop()
      github.com/elastic/beats/filebeat/harvester/_obj/log.go:200 +0x81
  github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).Stop.func1()
      github.com/elastic/beats/filebeat/prospector/_obj/registry.go:49 +0x6b

Previous read at 0x00c0421b57cc by goroutine 38:
  internal/race.Read()
      C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/internal/race/race.go:37 +0x3f
  sync.(*WaitGroup).Add()
      C:/Users/appveyor/.gvm/versions/go1.7.4.windows.amd64/src/sync/waitgroup.go:71 +0x299
  github.com/elastic/beats/filebeat/harvester.(*Harvester).Harvest()
      github.com/elastic/beats/filebeat/harvester/_obj/log.go:68 +0x10a
  github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).start.func1()
      github.com/elastic/beats/filebeat/prospector/_obj/registry.go:76 +0xb7

Goroutine 40 (running) created at:
  github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).Stop()
      github.com/elastic/beats/filebeat/prospector/_obj/registry.go:50 +0x17c
  github.com/elastic/beats/filebeat/prospector.(*Prospector).stop()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:286 +0x1e3
  github.com/elastic/beats/filebeat/prospector.(*Prospector).Start.func2.1()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:182 +0x81
  github.com/elastic/beats/filebeat/prospector.(*Prospector).Start.func2()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:187 +0x9a

Goroutine 38 (running) created at:
2017/03/20 16:05:50.958921 log.go:366: DBG  Stopping harvester for file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
  github.com/elastic/beats/filebeat/prospector.(*harvesterRegistry).start()
      github.com/elastic/beats/filebeat/prospector/_obj/registry.go:77 +0xc2
  github.com/elastic/beats/filebeat/prospector.(*Prospector).startHarvester()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:376 +0x5d4
  github.com/elastic/beats/filebeat/prospector.(*ProspectorLog).scan()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector_log.go:288 +0x8cd
  github.com/elastic/beats/filebeat/prospector.(*ProspectorLog).Run()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector_log.go:93 +0x124
  github.com/elastic/beats/filebeat/prospector.(*Prospector).Run()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:195 +0x9e
  github.com/elastic/beats/filebeat/prospector.(*Prospector).Start.func2()
      github.com/elastic/beats/filebeat/prospector/_obj/prospector.go:186 +0x99
==================
2017/03/20 16:05:50.958921 log.go:373: DBG  Closing file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:50.958921 log.go:217: DBG  Update state: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log, offset: 88
2017/03/20 16:05:50.958921 crawler.go:138: INFO Crawler stopped
2017/03/20 16:05:50.958921 filebeat.go:235: INFO Shutdown output timer started. Waiting for max 10s.
2017/03/20 16:05:51.044903 spooler.go:103: DBG  Flushing spooler because of timeout. Events flushed: 1
2017/03/20 16:05:51.044903 client.go:158: DBG  No events to publish
2017/03/20 16:05:51.044903 sync.go:82: DBG  Events sent: 1
2017/03/20 16:05:51.044903 registrar.go:320: DBG  Processing 1 events
2017/03/20 16:05:51.044903 state.go:72: DBG  New state added for C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok\log\nasa-50k.log
2017/03/20 16:05:51.044903 registrar.go:302: DBG  Registrar states cleaned up. Before: 1, After: 1
2017/03/20 16:05:51.044903 registrar.go:346: DBG  Write registry file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:51.066776 file_windows.go:51: DBG  delete old: remove C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry.old: The system cannot find the file specified.
2017/03/20 16:05:51.067798 registrar.go:373: DBG  Registry file updated. 1 states written.
2017/03/20 16:05:51.067798 signalwait.go:94: INFO Continue shutdown: All enqueued events being published.
2017/03/20 16:05:51.067798 spooler.go:116: INFO Stopping spooler
2017/03/20 16:05:51.067798 spooler.go:121: DBG  Spooler has stopped
2017/03/20 16:05:51.067798 sync.go:54: DBG  Shutting down sync publisher
2017/03/20 16:05:51.067798 registrar.go:338: INFO Stopping Registrar
2017/03/20 16:05:51.067798 registrar.go:287: INFO Ending Registrar
2017/03/20 16:05:51.067798 registrar.go:346: DBG  Write registry file: C:\gopath\src\github.com\elastic\beats\filebeat\build\system-tests\run\test_shutdown.Test.test_shutdown_wait_ok/registry
2017/03/20 16:05:51.068774 registrar.go:373: DBG  Registry file updated. 1 states written.
2017/03/20 16:05:51.068774 metrics.go:51: INFO Total non-zero values:  beat.memstats.gc_next=4194304 beat.memstats.memory_alloc=1326664 beat.memstats.memory_total=1326664 filebeat.harvester.closed=1 filebeat.harvester.started=1 publish.events=1 registrar.states.current=1 registrar.states.update=1 registrar.writes=3
2017/03/20 16:05:51.068774 metrics.go:52: INFO Uptime: 397.8223ms
2017/03/20 16:05:51.068774 beat.go:255: INFO filebeat stopped.
PASS
coverage: 42.2% of statements in ./...
Found 1 data race(s)

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions