layout | title | description | parent | redirect_from |
---|---|---|---|---|
default |
Anomaly score checks |
Anomaly score checks use a machine learning algorithm to automatically detect anomalies in your time-series data. |
Soda CL |
/soda-cloud/anomaly-detection.html |
Last modified on {% last_modified_at %}
Use an anomaly score check to automatically discover anomalies in your time-series data.
Requires Soda Cloud and Soda Core Scientific.
checks for dim_customer:
- anomaly score for row_count < default
About anomaly score checks
Prerequisites
Install Soda Core Scientific
Define an anomaly score check
Anomaly score check results
Optional check configurations
List of comparison symbols and phrases
Troubleshoot Soda Core Scientific installation
Go further
The anomaly score check is powered by a machine learning algorithm that works with measured values for a metric that occur over time. The algorithm learns the patterns of your data – its trends and seasonality – to identify and flag anomalies in time-series data.
If you have connected Soda Core to a Soda Cloud account, Soda Core pushes check results to your cloud account where Soda Cloud stores all the previously-measured, historic values for your checks in the Cloud Metric Store. SodaCL can then use these stored values to establish a baseline of normal metric values against which to evaluate future metric values to identify anomalies. Therefore, you must have a created and [connected a Soda Cloud account]({% link soda-core/connect-core-to-cloud.md %}) to use anomaly score checks.
- You have a Soda Cloud account and have [connected Soda Core to Soda Cloud]({% link soda-core/connect-core-to-cloud.md %}).
- You have installed Soda Core Scientific in the same directory or virtual environment in which you [installed Soda Core]({% link soda-core/installation.md %}).
To use an anomaly score check, you must install Soda Core Scientific in the same directory or virtual environment in which you installed Soda Core. Best practice recommends installing Soda Core and Soda Core Scientific in a virtual environment to avoid library conflicts, but you can Install Soda Core Scientific locally if you prefer.
{% include install-soda-core-scientific.md %}
Refer to Troubleshoot Soda Core Scientific installation for help with issues during installation.
The following example demonstrates how to use the anomaly score for the row_count
metric in a check. You can use any [numeric]({% link soda-cl/numeric-metrics.md %}), [missing]({% link soda-cl/missing-metrics.md %}), or [validity]({% link soda-cl/validity-metrics.md %}) metric in lieu of row_count
.
checks for dim_customer:
- anomaly score for row_count < default
- Currently, you can only use
< default
to define the threshold in an anomaly score check. - By default, anomaly score checks yield warn check results, not fails.
You can use any [numeric]({% link soda-cl/numeric-metrics.md %}), [missing]({% link soda-cl/missing-metrics.md %}), or [validity]({% link soda-cl/validity-metrics.md %}) metric in anomaly score checks. The following example detects anomalies for the average of `order_price` in an `orders` dataset.
checks for orders:
- anomaly score for avg(order_price) < default
The following example detects anomalies for the count of missing values in the id
column.
checks for orders:
- anomaly score for missing_count(id) < default:
missing_values: [None, No Value]
Because the anomaly score check requires at least four data points before it can start detecting what counts as an anomalous measurement, your first few scans will yield a check result that indicates that Soda does not have enough data.
Soda Core 3.0.0xx
Anomaly Detection Frequency Warning: Coerced into daily dataset with last daily time point kept
Data frame must have at least 4 measurements
Skipping anomaly metric check eval because there is not enough historic data yet
Scan summary:
1/1 check NOT EVALUATED:
dim_customer in adventureworks
anomaly score for missing_count(last_name) < default [NOT EVALUATED]
check_value: None
1 checks not evaluated.
Apart from the checks that have not been evaluated, no failures, no warnings and no errors.
Sending results to Soda Cloud
Though your first instinct may be to run several scans in a row to product the four measurments that the anomaly score needs, the measurements don’t “count” if the frequency of occurrence is too random, or rather, the measurements don't represent enough of a stable frequency.
If, for example, you attempt to run eight back-to-back scans in five minutes, the anomaly score does not register the measurements resulting from those scans as a reliable pattern against which to evaluate an anomaly.
Consider using the Soda Core Python library to set up a [programmatic scan]({% link soda-core/programmatic.md %}) that produces a check result for an anomaly score check on a regular schedule.
Supported | Configuration | Documentation |
---|---|---|
✓ | Define a name for an anomaly score check. | - |
✓ | Add an identity to a check. | [Add a check identity]({% link soda-cl/optional-config.md %}#add-a-check-identity) |
Define alert configurations to specify warn and fail thresholds. | - | |
Apply an in-check filter to return results for a specific portion of the data in your dataset. | - | |
✓ | Use quotes when identifying dataset names; see example. Note that the type of quotes you use must match that which your data source uses. For example, BigQuery uses a backtick ({% raw %}`{% endraw %}) as a quotation mark. |
[Use quotes in a check]({% link soda-cl/optional-config.md %}#use-quotes-in-a-check) |
Use wildcard characters ({% raw %} % {% endraw %} or {% raw %} * {% endraw %}) in values in the check. | - | |
✓ | Use for each to apply anomaly score checks to multiple datasets in one scan; see example. | [Apply checks to multiple datasets]({% link soda-cl/optional-config.md %}#apply-checks-to-multiple-datasets) |
Apply a dataset filter to partition data during a scan; see example. | [Scan a portion of your dataset]({% link soda-cl/optional-config.md %}#scan-a-portion-of-your-dataset) |
checks for "dim_customer":
- anomaly score for row_count < default
for each dataset T:
datasets:
- dim_customer
checks:
- anomaly score for row_count < default
<
While installing Soda Core Scientific works on Linux, you may encounter issues if you install Soda Core Scientific on Mac OS (particularly, machines with the M1 ARM-based processor) or any other operating system. If that is the case, consider using one of the following alternative installation procedures.
- Install Soda Core locally
- Troubleshoot Soda Core Scientific installation in a virtual env
- Use Docker to run Soda Core
Need help? Ask the team in the Soda community on Slack.
{% include install-soda-core-scientific.md %}
{% include docker-soda-core.md %}
{% include troubleshoot-anomaly-check-tbb.md %}
- Need help? Join the Soda community on Slack.
- Reference [tips and best practices for SodaCL]({% link soda/quick-start-sodacl.md %}#tips-and-best-practices-for-sodacl).
Was this documentation helpful?
<script>(function(d,e,s){if(d.getElementById("likebtn_wjs"))return;a=d.createElement(e);m=d.getElementsByTagName(e)[0];a.async=1;a.id="likebtn_wjs";a.src=s;m.parentNode.insertBefore(a, m)})(document,"script","//w.likebtn.com/js/w/widget.js");</script>
{% include docs-footer.md %}