You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Known-issues.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
copyright:
4
4
years: 2017, 2020
5
-
lastupdated: "2020-07-01"
5
+
lastupdated: "2020-09-10"
6
6
7
7
subcollection: AnalyticsEngine
8
8
@@ -39,5 +39,6 @@ List of known issues in the release:
39
39
|Spark History Server UI - StdOut and StdErr logs | The drilldown links to StdOut and StdErr logs from the Stages or Executors tabs are broken. |AE 1.2 <br/> Created before end November 2019| SSH to the cluster and run `yarn logs --applicationId <appId>`|
40
40
| Broken rolling restart of worker daemons | The rolling restart of the slave components for the HDFS, Yarn, HBase, Ambari Metrics and Spark2 services is broken, resulting in an HTTP 403 error. | All versions | For now, a workaround is to restart the respective service as a whole from service action menu by selecting `Restart All`.|
41
41
| Livy REST API | If you use the Livy REST API `GET /batches/{batchId}/log`, the logs are available only for a few hours after submitting the job. If you try to retrieve the logs many hours after submitting the job, the following error is displayed: `HTTP ERROR 500 : Problem accessing /gateway/default/livy/v1/batches. Reason: Server Error`| AE 1.1 ||
42
-
| Adding additional libraries | Livy or IBM Cloud CLI commands fail after installing additional packages, for example PyMySQL, manually on all nodes of a cluster because the package can't be found. | AE 1.1 clusters created before June 2019 | Navigate to **Ambari UI > Spark2 > Configs > Custom spark2-defaults > Add Property** and enter the following 2 lines: <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK3_PYTHON=/home/common/conda/anaconda3/bin/python` <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK_PYTHON=/home/common/conda/anaconda2/bin/python` <br/> <br/> Restart the Spark service when prompted. This command forces the use of Anaconda Python instead of System Python. |
42
+
| Adding additional libraries | Livy or IBM Cloud CLI commands fail after installing additional packages, for example PyMySQL, manually on all nodes of a cluster because the package can't be found. | AE 1.1 clusters created before June 2019 | Navigate to **Ambari UI > Spark2 > Configs > Custom spark2-defaults > Add Property** and enter the following 2 lines: <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK3_PYTHON=/home/common/conda/anaconda3/bin/python` <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK_PYTHON=/home/common/conda/anaconda2/bin/python` <br/> <br/> Restart the Spark service when prompted. This command forces the use of Anaconda Python instead of System Python. |
43
43
|| If you install a newer version of a Python package than is already on the cluster and try to use it, the following error is displayed: `ImportError: No module named.`| AE 1.1 clusters created before June 2019 |Force the path to take the latest version by entering the following in your Python script: <br/> For Anaconda3: `import sys; sys.path.insert(0, '/home/wce/clsadmin/pipAnaconda3Packages')` <br/> <br/> For Anaconda2: `import sys; sys.path.insert(0, '/home/wce/clsadmin/pipAnaconda2Packages')`|
44
+
| Disk space issues | HDFS audit logs are not rotated and so fill up the disk space, which disrupts the normal functioning of a cluster. | AE 1.2 clusters created before 10 September 2020 | Navigate to **Ambari UI > HDFS > Config > Advanced** and search in the filter for `hdfs-log4j`. At the bottom of the text box, add the following two lines that set the maximum log file size and the backup frequency to 30 days: <br/><br/>`log4j.appender.DRFAAUDIT.MaxFileSize={{hadoop_log_max_backup_size}}MB`<br/><br/>`log4j.appender.DRFAAUDIT.MaxBackupIndex={{hadoop_log_number_of_backup_files}}`|
Lazy evaluation is an evaluation strategy that delays the evaluation of an expression until its value is needed. When combined with memoization, lazy evaluation strategy avoids repeated evaluations and can reduce the running time of certain functions by a significant factor.
22
+
23
+
The time series library uses lazy evaluation to process data. Notionally an execution graph is constructed on time series data whose evaluation is triggered only when its output is materialized. Assuming an object is moving in a one dimensional space, whose location is captured by x(t). You can determine the harsh acceleration/braking (`h(t)`) of this object by using its velocity (`v(t)`) and acceleration (`a(t)`) time series as follows:
24
+
25
+
```python
26
+
# 1d location timeseries
27
+
x(t) =input location timeseries
28
+
29
+
# velocity - first derivative of x(t)
30
+
v(t) = x(t) - x(t-1)
31
+
32
+
# acceleration - second derivative of x(t)
33
+
a(t) = v(t) - v(t-1)
34
+
35
+
# harsh acceleration/braking using thresholds on acceleration
36
+
h(t) =+1if a(t) > threshold_acceleration
37
+
=-1if a(t) < threshold_deceleration
38
+
=0 otherwise
39
+
```
40
+
41
+
This results in a simple execution graph of the form:
42
+
43
+
```
44
+
x(t) --> v(t) --> a(t) --> h(t)
45
+
```
46
+
47
+
Evaluations are triggered only when an action is performed, such as `compute h(5...10)`, i.e. `compute h(5), ..., h(10)`. The library captures narrow temporal dependencies between time series. In this example, `h(5...10)` requires `a(5...10)`, which in turn requires `v(4...10)`, which then requires `x(3...10)`. Only the relevant portions of `a(t)`, `v(t)` and `x(t)` are evaluated.
Furthermore, evaluations are memoized and can thus be reused in subsequent actions on `h`. For example, when a request for `h(7...12)` follows a request for `h(5...10)`, the memoized values `h(7...10)` would be leveraged; further, `h(11...12)` would be evaluated using `a(11...12), v(10...12)` and `x(9...12)`, which would in turn leverage `v(10)` and `x(9...10)` memoized from the prior computation.
53
+
54
+
In a more general example, you could define a smoothened velocity timeseries as follows:
In this example `h(l...u)` has the following temporal dependency. Evaluation of `h(l...u)` would strictly adhere to this temporal dependency with memoization.
The following example shows a python code snippet that implements harsh acceleration on a simple in-memory time series. The library includes several built-in transforms. In this example the difference transform is applied twice to the location time series to compute acceleration time series. A map operation is applied to the acceleration time series using a harsh lambda function, which is defined after the code sample, that maps acceleration to either `+1` (harsh acceleration), `-1` (harsh braking) and `0` (otherwise). The filter operation selects only instances wherein either harsh acceleration or harsh braking is observed. Prior to calling `get_values`, an execution graph is created, but no computations are performed. On calling `get_values(5, 10)`, the evaluation is performed with memoization on the narrowest possible temporal dependency in the execution graph.
Copy file name to clipboardExpand all lines: parquet-encryption.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
copyright:
4
4
years: 2017, 2020
5
-
lastupdated: "2020-07-02"
5
+
lastupdated: "2020-09-10"
6
6
7
7
subcollection: AnalyticsEngine
8
8
@@ -63,14 +63,13 @@ Key features include:
63
63
To enable Parquet encryption in {{site.data.keyword.iae_full_notm}}, set the following Spark classpath properties to point to the Parquet jar files that implement Parquet modular encryption, and to the key management jar file:
1. Add the following two parameters to point explicitly to the location of the JAR files. Make sure that you edit the paths to use the actual version of jar files on the cluster.
66
+
1. Add the following two parameters to point explicitly to the location of the JAR files.
67
67
68
68
Alternatively, you can get the JAR files applied as part of the cluster creation process. See [Advanced Provisioning](/docs/AnalyticsEngine?topic=AnalyticsEngine-advanced-provisioning-options){: external}.
Copy file name to clipboardExpand all lines: release-notes.md
+51-1Lines changed: 51 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
copyright:
4
4
years: 2017, 2020
5
-
lastupdated: "2020-07-23"
5
+
lastupdated: "2020-09-10"
6
6
7
7
subcollection: AnalyticsEngine
8
8
@@ -23,6 +23,56 @@ Use these notes to learn about the latest features, additions and changes to {{s
23
23
{: shortdesc}
24
24
## {{site.data.keyword.iae_full_notm}} information
25
25
26
+
### 10 September 2020
27
+
28
+
-**[AE-1.2.v28.4]** The following security patch was applied to the underlying VMs for IBM Java CVE: [CVE-2019-17639](https://www.ibm.com/blogs/psirt/security-bulletin-multiple-vulnerabilities-may-affect-ibm-sdk-java-technology-edition-3/)
29
+
- You can now configure log aggregation for the HDFS component. See [Configuring log aggregation](/docs/AnalyticsEngine?topic=AnalyticsEngine-log-aggregation).
30
+
- A fix was added that prevents HDFS audit logs from filling up disk space, which was caused by a misconfiguration of the log4j rotation property that disrupted the way clusters should work.
31
+
- You can now use the time series library in your Spark applications, which provides a rich time series data model and imputation functions for transforming, reducing, segmenting, joining, and forecasting time series. SQL extensions to time series are also provided. See [Time series library](/docs/AnalyticsEngine?topic=AnalyticsEngine-time-series).
32
+
33
+
### 20 August 2020
34
+
35
+
- As of this deployment, Analytics Engine cluster’s build version will be available in `/home/common/aeversion.txt` on the nodes of the cluster. You can check this file after you SSH to the cluster. This will help in tracking fixes that were made available against a particular version of Analytics Engine. For example, this deployment version is AE-1.2.v28.3.
36
+
37
+
-**[AE-1.2.v28.3]** Fixes were applied for the following security vulnerabilities on the OS level packages of cluster nodes. Apart from these fixes, a few config vulnerabilities at the OS level were also applied.
0 commit comments