Skip to content

[ML] Autodetect process crashes when trying to forecast on job upgraded from 6.2.4 to 6.3.1 #135

Closed
@dolaru

Description

@dolaru

Found in 6.3.1. Can be reproduced in 6.3.0.

The issue can be reproduced by running a job using the high_count function on the dns_tunneling dataset with a 30m bucket span.

While running a 6.3.1 ES instance that was upgraded from version 6.2.4, if the user tries to forecast on the dns_tunneling job that was created in version 6.2.4, the autodetect process will crash as soon as the forecast request is sent.

ES logs show:

[2018-06-29T16:08:06,742][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [ml-2] Opening job [dns_tunneling1_20180629-1542_624_ga]
[2018-06-29T16:08:06,764][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [ml-2] [dns_tunneling1_20180629-1542_624_ga] Loading model snapshot [1530283391] with latest_record_timestamp [2016-02-11T23:52:14.000Z], job latest_record_timestamp [2016-02-11T23:52:14.000Z]
[2018-06-29T16:08:06,773][INFO ][o.e.x.m.j.p.a.NativeAutodetectProcessFactory] Restoring quantiles for job 'dns_tunneling1_20180629-1542_624_ga'
[2018-06-29T16:08:07,032][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [dns_tunneling1_20180629-1542_624_ga] [autodetect/28425] [CResourceMonitor.cc@67] Setting model memory limit to 1024 MB
[2018-06-29T16:08:07,072][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [ml-2] Successfully set job state to [opened] for job [dns_tunneling1_20180629-1542_624_ga]
[2018-06-29T16:08:07,114][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [dns_tunneling1_20180629-1542_624_ga] [autodetect/28425] [CAnomalyJob.cc@849] Processing is already complete to time 1455233400
[2018-06-29T16:08:07,126][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [dns_tunneling1_20180629-1542_624_ga] [autodetect/28425] [CForecastRunner.cc@113] Start forecasting from 2016-02-11T23:30:00+0000 to 2016-02-25T23:30:00+0000
[2018-06-29T16:08:07,127][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [dns_tunneling1_20180629-1542_624_ga] [autodetect/28425] [CTrendComponent.cc@347] Failed calculating confidence interval: Error in function boost::math::normal_distribution<double>::normal_distribution: Scale parameter is 0, but must be > 0 !, variance = 0, confidence = 95
[2018-06-29T16:08:07,190][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [dns_tunneling1_20180629-1542_624_ga] [autodetect/28425] [CTrendComponent.cc@347] Failed calculating confidence interval: Error in function boost::math::normal_distribution<double>::normal_distribution: Scale parameter is 0, but must be > 0 !, variance = 0, confidence = 95 | repeated [671]
[2018-06-29T16:08:07,190][FATAL][o.e.x.m.j.p.l.CppLogMessageHandler] [dns_tunneling1_20180629-1542_624_ga] [autodetect/28425] [CStateMachine.cc@211] Invalid index '1'
[2018-06-29T16:08:08,044][INFO ][o.e.x.m.j.p.a.NativeAutodetectProcess] [dns_tunneling1_20180629-1542_624_ga] State output finished
[2018-06-29T16:08:08,052][ERROR][o.e.x.m.j.p.a.NativeAutodetectProcess] [dns_tunneling1_20180629-1542_624_ga] autodetect process stopped unexpectedly: Failed calculating confidence interval: Error in function boost::math::normal_distribution<double>::normal_distribution: Scale parameter is 0, but must be > 0 !, variance = 0, confidence = 95
Invalid index '1'
Fatal error: 'terminate called after throwing an instance of 'std::runtime_error''
Fatal error: '  what():  Ml Fatal Exception'
Fatal error: 'si_signo 6, si_code: -6, si_errno: 0, address: 0x7ff1c2ab0428, library: /lib/x86_64-linux-gnu/libc.so.6, base: 0x7ff1c2a7b000, normalized address: 0x35428'

[2018-06-29T16:08:08,052][WARN ][o.e.x.m.j.p.a.o.AutoDetectResultProcessor] [dns_tunneling1_20180629-1542_624_ga] some results not processed due to the termination of autodetect
[2018-06-29T16:08:08,087][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/28259] [CDetachedProcessSpawner.cc@184] Child process with PID 28425 was terminated by signal 6
[2018-06-29T16:08:08,094][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [ml-2] Successfully set job state to [failed] for job [dns_tunneling1_20180629-1542_624_ga]

Steps to reproduce

  1. Start a version 6.2.4 ES instance and run a job using the high_count function on the dns_tunneling dataset with a 30m bucket span.
  2. Upgrade the instance to version 6.3.1
  3. Try to run a forecast on the dns_tunneling job that was created in version 6.2.4
  4. Notice that the autodetect process has crashed, and the job state is now failed

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions