Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Add documentation for new Analysis tab in logs app #49165

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions docs/logs/analysis-tab.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
[role="xpack"]
[[xpack-logs-analysis-page]]
== Detecting and inspecting log anomalies

beta::[]

If the {ml} {anomaly-detect} features are enabled, you can use the *Analysis* page in the Logs app to automatically detect some kinds of log anomalies.
The analysis automatically highlights periods where the log rate is outside the expected limits and therefore may be anomalous.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

periods of what? Time?

What makes them expected limits? Are they specified somewhere?

Copy link
Contributor

@Kerry350 Kerry350 Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is periods of time.

They are expected limits based on the model defined by the machine learning module, and the "learning" it has done on the datasets to date. Therefore these values will always differ based on the individual dataset. A rate of 10 might be anomalous in one dataset, but not anomalous in another. The ML model will adapt itself over time as it learns from more data.

It may be better to use the word "bounds" here over "limit" as that's the ML terminology.

This helps you to spot suspicious behavior without significant human intervention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this bit. I would move it to the top.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

significant human intervention is interesting. Maybe unpack that some more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this would stop users having to manually sample their log data, calculate the rates, and decide whether those rates are "normal".

You can use this information as a basis for further investigations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

further investigations into what?

Copy link
Contributor

@Kerry350 Kerry350 Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be various things:

  • A spike in the log rate could denote a DDoS attack. This may lead to investigating things like IP addresses from incoming requests.
  • A significant drop in the log rate could suggest that some some piece of infrastructure has stopped responding, and thus we're serving less requests.

These are just examples, mileage will vary between datasets and anomalies.

Also want to clarify that whilst the backing model which has been trained will have a lower and upper bound for what it considers "normal" and non-anomalous, it doesn't mean anomalous values will always land within these bounds. The model could have upper as 50 and lower as 10, and 30 could still, in the right circumstances, flag as anomalous if something else about the rate is still considered anomalous.


On the *Analysis* page, you can inspect the anomalies and the log partitions in which they occurred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the overall idea of this UI. I would move this to the top.

You can also view the anomalies directly in the Machine Learning app to get a greater understanding of the issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issues or anomalies?

How do they get to the Machine Learning app?

Is there a link you could include the this page in the Machine Learning docs?


[role="screenshot"]
image::logs/images/analysis-tab.png[Analysis tab in Logs app in Kibana]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

[float]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved
[[logs-analysis-page-create-ml-job]]
=== Create a machine learning job for logs analysis
Logs anomaly detection is carried out within a {kibana-ref}/xpack-spaces.html[space].
Within a space, the first time you select *Analysis* from the Logs app, you are prompted to create a machine learning job to carry out the logs analysis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, they already know what a space is. How about:

To enable log analysis and anomaly detection, you must create your own {kibana-ref}/xpack-spaces.html[space].


First, you need to choose the time range for the analysis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sounds like tasks. How about creating a task for this content. For example:

  1. Select the time range for the analysis. (How do they do this? Do they click something? Enter something?)
    By default, logs from four weeks prior to the current date are analyzed. As logs are ingested, they are analyzed. You cannot change the analysis time range after the Machine Learning job is created.
  2. Click Create ML job.
  3. Use the generated logs to detect anomalies.

From the screenshot, I don't see Create ML job. Is this on a different UI?

By default, the analysis uses logs from between four weeks ago and the current date, then continues to add new logs to the analysis as they are ingested. You cannot change the time range for the analysis after the machine learning job has been created.

Once you have selected the time range, click *Create ML job* to create the machine learning job.
Now you can start detecting anomalies in your logs.

[float]
[[logs-analysis-page-view-log-entries]]
=== View log entries
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

Once the machine learning job has been created, the *Analysis* page shows:
* the log entries chart
* an overall anomalies chart
* the anomalies in each partition.

The time range over which the logs are analyzed is fixed at the time range you selected when you created the machine learning job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they have just created the machine learning job, is this piece necessary? Shouldn't they already know this?

But you can use the time filter at the top of the *Analysis* page to restrict the time range for which the results are shown.

[float]
[[logs-analysis-page-change-time]]
=== Changing the time range

Use the time filter to select the time range for the results shown in the anomaly charts.

To quickly select some popular time range options, click the clock dropdown image:logs/images/time-filter-clock.png[]. In this popup you can choose from:

* *Quick select* to choose a recent time range, and use the back and forward arrows to move through the time ranges
* *Commonly used* to choose a time range from some commonly used options such as *Last 15 minutes*, *Today*, or *Week to date*
* *Refresh every* to specify an auto-refresh rate
* *Stop* to stop auto-refresh (enabled by default for logs anomaly charts)

NOTE: When you stop auto-refresh from within this dialog, the clock dropdown changes to a calendar image:logs/images/time-filter-calendar.png[].

For complete control over the start and end times, click the start time or end time shown in the bar beside the calendar or clock dropdown. In this popup, you can choose from the *Absolute*, *Relative* or *Now* tabs, then specify the required options.
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

[float]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved
[[logs-analysis-page-log-entries-chart]]
=== Log entries chart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since View log entries and Log entries chart include the same content, combine them and remove lines 40 and 41.


[role="screenshot"]
image::logs/images/analysis-tab-log-entries.png[Analysis tab log entries]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

The log entries chart shows an overall visualization of the log entry rate, partitioned and color-coded according to the value of the {ecs-ref}/ecs-event.html[ECS `event.dataset`] field.

You can hover over a time period to see the log rate for each of the partitions for that period.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition or entry?


You can click a partition name on the right hand side to show or hide the values for that partition, or hover over a partition name to highlight just the values for that partition in the chart.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content in Log entries chart and Overall anomalies chart is the basis for this page. It should go before Create a machine learning job for log analysis and View log entries. It would be beneficial if it went directly beneath the top screenshot.


[float]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved
[[logs-analysis-page-anomalies-chart]]
=== Anomalies chart

[role="screenshot"]
image::logs/images/analysis-tab-anomalies.png[Analysis tab anomalies]
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

The Anomalies chart shows the areas where anomalies were detected in the overall log entry rate across all log partitions. The underlying rate values are shown in grey, and the anomalous regions are color-coded and superimposed on top.

Where a time period is flagged as anomalous, it means that the machine learning algorithms detected something unusual about the log rate in that time period. This may be because the log rate was significantly higher than usual, or significantly lower than usual, or some other anomalous behavior was detected.

The level of anomaly detected in a time period is color-coded from red through orange to yellow and blue, where red indicates a critical anomaly level, and blue is a warning level.

You can hover over an underlying log rate value to see the average log rate for that time period, or hover over an anomalous region to see the partitions that had anomalies in that time period, and their anomaly scores. Anomaly scores range from 0 (no anomalies) to 100 (critical).

You can also click *Analyze in ML* to open the Anomaly Explorer in Machine Learning and {kibana-ref}/xpack-ml.html[analyze the anomalies in more detail].

[float]
[[logs-analysis-tab-partition-anomaly-chart]]
=== Partition anomaly charts
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

[role="screenshot"]
image::logs/images/analysis-tab-partition-anomalies.png[Analysis tab partition anomalies]

You can also view the anomaly chart for an individual partition.
Below the main anomalies chart, click the dropdown beside a partition name to see the anomaly distribution for only that partition.
In this example, we are viewing the anomaly chart for the `elasticsearch.server` partition.

You can hover over an underlying log rate value to see the average log rate for that partition in that time period, or hover over an anomalous region to see the anomaly score for that partition in that time period.

You can also click *Analyze in ML* to open the Anomaly Explorer in Machine Learning and {kibana-ref}/xpack-ml.html[analyze the anomalies in this partition in more detail].


Binary file added docs/logs/images/analysis-tab-anomalies.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/logs/images/analysis-tab-log-entries.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/logs/images/analysis-tab.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/logs/images/time-filter-calendar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/logs/images/time-filter-clock.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/logs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,5 @@ include::getting-started.asciidoc[]
include::using.asciidoc[]

include::configuring.asciidoc[]

include::analysis-tab.asciidoc[]
10 changes: 10 additions & 0 deletions docs/logs/using.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,16 @@ This opens the *Log event document details* fly-out that shows the fields associ
To quickly filter the logs stream by one of the field values, in the log event details, click the *View event with filter* icon image:logs/images/logs-view-event-with-filter.png[View event icon] beside the field.
This automatically adds a search filter to the logs stream to filter the entries by this field and value.

[float]
[[view-log-anomalies]]
=== View log anomalies
bmorelli25 marked this conversation as resolved.
Show resolved Hide resolved

If the {ml} {anomaly-detect} features are enabled, you can click *Analysis* to <<xpack-logs-analysis-page, use machine learning to detect and inspect anomalies>> in your log data.

[float]
[[using-logs-other-actions]]
=== Other actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something more descriptive we can use here besides Other actions?


To see other actions related to the event, in the log event details, click *Actions*.
Depending on the event and the features you have installed and configured, you may also be able to:

Expand Down