Skip to content

[ML] Autodetect fatal error in gallery 302 #94

Closed
@sophiec20

Description

@sophiec20

Found in "native_code_info": { "version": "7.0.0-alpha1-SNAPSHOT", "build_hash": "d219beab8c7d34" }

Linux version 3.10.0-693.11.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Mon Dec 4 23:52:40 UTC 2017

  • Using gallery2018 dataset
  • count at bucket_span=15m
  • datafeed query {"bool":{"must":[{"term":{"status":{"value":"302"}}}]}}
  • Datafeed started (from: 1970-01-02T10:00:00.000Z to: 2018-12-31T00:00:00.000Z)

Job fails with unexpected death of autodetect.
Occurs in repeated runs, at same latest timestamp 2018-10-09 15:29:11 (UTC).
This corresponds with the end of the timeseries for 302; however job end date was specified as 2018-12-31 00:00:00

[2018-05-14T10:14:04,309][INFO ][o.e.x.m.a.TransportPutDatafeedAction] [node1] Created datafeed [datafeed-ga3-count-302-15m]
[2018-05-14T10:14:04,499][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] Opening job [ga3-count-302-15m]
[2018-05-14T10:14:04,555][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] [ga3-count-302-15m] Loading model snapshot [N/A], job latest_record_timestamp [N/A]
[2018-05-14T10:14:04,705][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [ga3-count-302-15m] [autodetect/6086] [CResourceMonitor.cc@67] Setting model memory limit to 20 MB
[2018-05-14T10:14:04,740][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] Successfully set job state to [opened] for job [ga3-count-302-15m]
[2018-05-14T10:14:05,227][INFO ][o.e.x.m.d.DatafeedJob    ] [ga3-count-302-15m] Datafeed started (from: 1970-01-02T10:00:00.000Z to: 2018-12-31T00:00:00.000Z) with frequency [450000ms]
[2018-05-14T10:14:06,887][INFO ][o.e.x.m.j.p.DataCountsReporter] [node1] [ga3-count-200-15m] 700000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-05-14T10:14:26,360][INFO ][o.e.x.m.a.TransportPutDatafeedAction] [node1] Created datafeed [datafeed-ga3-count-400-15m]
[2018-05-14T10:14:48,150][INFO ][o.e.x.m.a.TransportPutDatafeedAction] [node1] Created datafeed [datafeed-ga3-count-303-15m]
[2018-05-14T10:14:51,946][INFO ][o.e.x.m.j.p.DataCountsReporter] [node1] [ga3-count-200-15m] 800000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-05-14T10:14:52,463][ERROR][o.e.x.m.j.p.a.NativeAutodetectProcess] [ga3-count-302-15m] autodetect process stopped unexpectedly: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0x7f39a97c3eaa, library: /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f39a948e000, normalized address: 0x335eaa'

[2018-05-14T10:14:52,465][INFO ][o.e.x.m.j.p.a.NativeAutodetectProcess] [ga3-count-302-15m] State output finished
[2018-05-14T10:14:52,482][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/24993] [CDetachedProcessSpawner.cc@184] Child process with PID 6086 was terminated by signal 11
[2018-05-14T10:14:52,529][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] Successfully set job state to [failed] for job [ga3-count-302-15m]
[2018-05-14T10:14:52,763][ERROR][o.e.x.m.j.p.a.AutodetectCommunicator] [ga3-count-302-15m] Unexpected exception writing to process
org.elasticsearch.ElasticsearchException: [ga3-count-302-15m] Unexpected death of autodetect: Fatal error: 'si_signo 11, si_code: 1, si_errno: 0, address: 0x7f39a97c3eaa, library: /opt/elastic/7.0.0-20180511/elasticsearch-7.0.0-alpha1-SNAPSHOT/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f39a948e000, normalized address: 0x335eaa'

        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator.checkProcessIsAlive(AutodetectCommunicator.java:307) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator.waitFlushToCompletion(AutodetectCommunicator.java:282) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator.lambda$flushJob$4(AutodetectCommunicator.java:241) ~[?:?]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectCommunicator$1.doRun(AutodetectCommunicator.java:363) ~[?:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectProcessManager$AutodetectWorkerExecutorService.start(AutodetectProcessManager.java:678) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_151]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-05-14T10:14:52,767][ERROR][o.e.x.m.j.p.a.AutodetectProcessManager] [node1] [ga3-count-302-15m] exception while flushing job
[2018-05-14T10:14:52,772][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] attempt to stop datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m]
[2018-05-14T10:14:52,780][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] try lock [20s] to stop datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m]...
[2018-05-14T10:14:52,780][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] stopping datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m], acquired [true]...
[2018-05-14T10:14:52,781][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] datafeed [datafeed-ga3-count-302-15m] for job [ga3-count-302-15m] has been stopped
[2018-05-14T10:14:54,614][WARN ][o.e.x.m.j.p.a.o.AutoDetectResultProcessor] [ga3-count-302-15m] some results not processed due to the termination of autodetect

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions