andersonm-ibm
diff --git a/‎Known-issues.md‎
Lines changed: 3 additions & 2 deletions b/‎Known-issues.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎aggregating-logs.md‎
Lines changed: 7 additions & 2 deletions b/‎aggregating-logs.md‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎data-skipping.md‎
Lines changed: 1 addition & 1 deletion b/‎data-skipping.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎lazy-timeseries-evaluation.md‎
Lines changed: 108 additions & 0 deletions b/‎lazy-timeseries-evaluation.md‎
Lines changed: 108 additions & 0 deletions
diff --git a/‎parquet-encryption.md‎
Lines changed: 4 additions & 5 deletions b/‎parquet-encryption.md‎
Lines changed: 4 additions & 5 deletions
diff --git a/‎release-notes.md‎
Lines changed: 51 additions & 1 deletion b/‎release-notes.md‎
Lines changed: 51 additions & 1 deletion
@@ -2,7 +2,7 @@
 
 copyright:
   years: 2017, 2020
-lastupdated: "2020-07-01"
+lastupdated: "2020-09-10"
 
 subcollection: AnalyticsEngine
 
@@ -39,5 +39,6 @@ List of known issues in the release:
 |Spark History Server UI - StdOut and StdErr logs | The drilldown links to StdOut and StdErr logs from the Stages or Executors tabs  are broken. |AE 1.2 <br/> Created before end November 2019| SSH to the cluster and run `yarn logs --applicationId <appId>` |  
 | Broken rolling restart of worker daemons | The rolling restart of the slave components for the HDFS, Yarn, HBase, Ambari Metrics and Spark2 services is broken, resulting in an HTTP 403 error. | All versions | For now, a workaround is to restart the respective service as a whole from service action menu by selecting `Restart All`.|
 | Livy REST API | If you use the Livy REST API `GET /batches/{batchId}/log`, the logs are available only for a few hours after submitting the job. If you try to retrieve the logs many hours after submitting the job, the following error is displayed: `HTTP ERROR 500 : Problem accessing /gateway/default/livy/v1/batches. Reason: Server Error` | AE 1.1 | |
-| Adding additional libraries | Livy or IBM Cloud CLI commands fail after installing additional packages, for example PyMySQL, manually on all nodes of a cluster because the package can't be found. | AE 1.1 clusters created before June 2019 | Navigate to **Ambari UI > Spark2 > Configs > Custom spark2-defaults > Add Property ** and enter the following 2 lines: <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK3_PYTHON=/home/common/conda/anaconda3/bin/python` <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK_PYTHON=/home/common/conda/anaconda2/bin/python` <br/> <br/> Restart the Spark service when prompted. This command forces the use of Anaconda Python instead of System Python. |
+| Adding additional libraries | Livy or IBM Cloud CLI commands fail after installing additional packages, for example PyMySQL, manually on all nodes of a cluster because the package can't be found. | AE 1.1 clusters created before June 2019 | Navigate to **Ambari UI > Spark2 > Configs > Custom spark2-defaults > Add Property** and enter the following 2 lines: <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK3_PYTHON=/home/common/conda/anaconda3/bin/python` <br/> <br/> `spark.yarn.appMasterEnv.PYSPARK_PYTHON=/home/common/conda/anaconda2/bin/python` <br/> <br/> Restart the Spark service when prompted. This command forces the use of Anaconda Python instead of System Python. |
 |  | If you install a newer version of a Python package than is already on the cluster and try to use it, the following error is displayed: `ImportError: No module named.` | AE 1.1 clusters created before June 2019 |Force the path to take the latest version by entering the following in your Python script: <br/> For Anaconda3: `import sys; sys.path.insert(0, '/home/wce/clsadmin/pipAnaconda3Packages')` <br/> <br/> For Anaconda2: `import sys; sys.path.insert(0, '/home/wce/clsadmin/pipAnaconda2Packages')` |
+| Disk space issues | HDFS audit logs are not rotated and so fill up the disk space, which disrupts the normal functioning of a cluster. | AE 1.2 clusters created before 10 September 2020 | Navigate to **Ambari UI > HDFS > Config > Advanced** and search in the filter for `hdfs-log4j`. At the bottom of the text box, add the following two lines that set the maximum log file size and the backup frequency to 30 days: <br/><br/>`log4j.appender.DRFAAUDIT.MaxFileSize={{hadoop_log_max_backup_size}}MB`<br/><br/>`log4j.appender.DRFAAUDIT.MaxBackupIndex={{hadoop_log_number_of_backup_files}}`|
@@ -1,8 +1,8 @@
 ---
 
 copyright:
-  years: 2017, 2019
-lastupdated: "2019-10-08"
+  years: 2017, 2020
+lastupdated: "2020-08-30"
 
 subcollection: AnalyticsEngine
 
@@ -112,15 +112,20 @@ You can use the following component names:
 - `hadoop-mapreduce`
 - `hadoop-yarn`
 - `hbase`
+- `hdfs`
+- `hdfs-audit`
 - `hive`
 - `jnbg`
 - `knox`
+- `knox-audit`
 - `livy2`
 - `spark2`
 
 **Data node components**
 - `hadoop-mapreduce`
 - `hadoop-yarn`
+- `hdfs`
+- `hdfs-audit`
 - `spark2`
 - `yarn-apps`
 
 
@@ -1,7 +1,7 @@
 ---
 
 copyright:
-  years: 2017, 20120
+  years: 2017, 2020
 lastupdated: "2020-06-23"
 
 subcollection: AnalyticsEngine
 
@@ -0,0 +1,108 @@
+---
+
+copyright:
+  years: 2017, 2020
+lastupdated: "2020-08-30"
+
+subcollection: AnalyticsEngine
+
+---
+
+<!-- Attribute definitions -->
+{:new_window: target="_blank"}
+{:shortdesc: .shortdesc}
+{:codeblock: .codeblock}
+{:screen: .screen}
+{:pre: .pre}
+
+# Lazy evaluation
+{: #lazy-evaluation}
+
+Lazy evaluation is an evaluation strategy that delays the evaluation of an expression until its value is needed. When combined with memoization, lazy evaluation strategy avoids repeated evaluations and can reduce the running time of certain functions by a significant factor.
+
+The time series library uses lazy evaluation to process data. Notionally an execution graph is constructed on time series data whose evaluation is triggered only when its output is materialized. Assuming an object is moving in a one dimensional space, whose location is captured by x(t). You can determine the harsh acceleration/braking (`h(t)`) of this object by using its velocity (`v(t)`) and acceleration (`a(t)`) time series as follows:
+
+```python
+# 1d location timeseries
+x(t) = input location timeseries
+
+# velocity - first derivative of x(t)
+v(t) = x(t) - x(t-1)
+
+# acceleration - second derivative of x(t)
+a(t) = v(t) - v(t-1)
+
+# harsh acceleration/braking using thresholds on acceleration
+h(t) = +1 if a(t) > threshold_acceleration
+     = -1 if a(t) < threshold_deceleration
+     = 0 otherwise
+```
+
+This results in a simple execution graph of the form:
+
+```
+x(t) --> v(t) --> a(t) --> h(t)
+```
+
+Evaluations are triggered only when an action is performed, such as `compute h(5...10)`, i.e. `compute h(5), ..., h(10)`. The library captures narrow temporal dependencies between time series. In this example, `h(5...10)` requires `a(5...10)`, which in turn requires `v(4...10)`, which then requires `x(3...10)`. Only the relevant portions of `a(t)`, `v(t)` and `x(t)` are evaluated.
+
+```
+h(5...10) <-- a(5...10) <-- v(4...10) <-- x(3...10)
+```
+Furthermore, evaluations are memoized and can thus be reused in subsequent actions on `h`. For example, when a request for `h(7...12)` follows a request for `h(5...10)`, the memoized values `h(7...10)` would be leveraged; further, `h(11...12)` would be evaluated using `a(11...12), v(10...12)` and `x(9...12)`, which would in turn leverage `v(10)` and `x(9...10)` memoized from the prior computation.
+
+In a more general example, you could define a smoothened velocity timeseries as follows:
+
+```python
+# 1d location timeseries
+x(t) = input location timeseries
+
+# velocity - first derivative of x(t)
+v(t) = x(t) - x(t-1)
+
+# smoothened velocity
+# alpha is the smoothing factor
+# n is a smoothing history
+v_smooth(t) =  (v(t)*1.0 + v(t-1)*alpha + ... + v(t-n)*alpha^n) / (1 + alpha + ... + alpha^n)
+
+# acceleration - second derivative of x(t)
+a(t) = v_smooth(t) - v_smooth(t-1)
+```
+
+In this example `h(l...u)` has the following temporal dependency. Evaluation of `h(l...u)` would strictly adhere to this temporal dependency with memoization.
+
+```
+h(l...u) <-- a(l...u) <-- v_smooth(l-1...u) <-- v(l-n-1...u) <-- x(l-n-2...u)
+```
+
+## An Example
+{: #time-series-example}
+
+The following example shows a python code snippet that implements harsh acceleration on a simple in-memory time series. The library includes several built-in transforms. In this example the difference transform is applied twice to the location time series to compute acceleration time series. A map operation is applied to the acceleration time series using a harsh lambda function, which is defined after the code sample, that maps acceleration to either `+1` (harsh acceleration), `-1` (harsh braking) and `0` (otherwise). The filter operation selects only instances wherein either harsh acceleration or harsh braking is observed. Prior to calling `get_values`, an execution graph is created, but no computations are performed. On calling `get_values(5, 10)`, the evaluation is performed with memoization on the narrowest possible temporal dependency in the execution graph.
+
+```python
+import tspy
+from tspy.builders.functions import transformers
+
+x = tspy.time_series.list([1.0, 2.0, 4.0, 7.0, 11.0, 16.0, 22.0, 29.0, 28.0, 30.0, 29.0, 30.0, 30.0])
+v = x.transform(transformers.difference())
+a = v.transform(transformers.difference())
+h = a.map(harsh).filter(lambda h: h != 0)
+
+print(h[5, 10])
+```
+
+The harsh lambda is defined as follows:
+
+```python
+def harsh(a):
+    threshold_acceleration = 2.0
+    threshold_braking = -2.0
+
+    if (a > threshold_acceleration):
+        return +1
+    elif (a < threshold_braking):
+        return -1
+    else:
+        return 0
+```
@@ -2,7 +2,7 @@
 
 copyright:
   years: 2017, 2020
-lastupdated: "2020-07-02"
+lastupdated: "2020-09-10"
 
 subcollection: AnalyticsEngine
 
@@ -63,14 +63,13 @@ Key features include:
 To enable Parquet encryption in {{site.data.keyword.iae_full_notm}}, set the following Spark classpath properties to point to the Parquet jar files that implement Parquet modular encryption, and to the key management jar file:
 
 1. Navigate to **Ambari > Spark > Config -> Custom spark2-default**.
-1. Add the following two parameters to point explicitly to the location of the JAR files. Make sure that you edit the paths to use the actual version of jar files on the cluster.
+1. Add the following two parameters to point explicitly to the location of the JAR files.
 
  Alternatively, you can get the JAR files applied as part of the cluster creation process. See [Advanced Provisioning](/docs/AnalyticsEngine?topic=AnalyticsEngine-advanced-provisioning-options){: external}.
 
  ```
- spark.driver.extraClassPath=/home/common/lib/parquetEncryption/ibm-parquet-kms-<latestversion>-jar-with-dependencies.jar:/home/common/lib/parquetEncryption/parquet-format-<latestversion>.jar:/home/common/lib/parquetEncryption/parquet-hadoop-<latestversion>.jar
-
- spark.executor.extraClassPath=/home/common/lib/parquetEncryption/ibm-parquet-<latestversion>-jar-with-dependencies.jar:/home/common/lib/parquetEncryption/parquet-format-<latestversion>.jar:/home/common/lib/parquetEncryption/parquet-hadoop-<latestversion>.jar
+ spark.driver.extraClassPath=/home/common/lib/parquetEncryption/ibmparquetkms.jar:/home/common/lib/parquetEncryption/parquetformat.jar:/home/common/lib/parquetEncryption/parquethadoop.jar
+ spark.executor.extraClassPath=/home/common/lib/parquetEncryption/ibmparquetkms.jar:/home/common/lib/parquetEncryption/parquetformat.jar:/home/common/lib/parquetEncryption/parquethadoop.jar
  ```
 
 ## Mandatory parameters
 
@@ -2,7 +2,7 @@
 
 copyright:
   years: 2017, 2020
-lastupdated: "2020-07-23"
+lastupdated: "2020-09-10"
 
 subcollection: AnalyticsEngine
 
@@ -23,6 +23,56 @@ Use these notes to learn about the latest features, additions and changes to {{s
 {: shortdesc}
 ## {{site.data.keyword.iae_full_notm}} information
 
+### 10 September 2020
+
+- **[AE-1.2.v28.4]**  The following security patch was applied to the underlying VMs for IBM Java CVE: [CVE-2019-17639](https://www.ibm.com/blogs/psirt/security-bulletin-multiple-vulnerabilities-may-affect-ibm-sdk-java-technology-edition-3/)
+- You can now configure log aggregation for the HDFS component. See [Configuring log aggregation](/docs/AnalyticsEngine?topic=AnalyticsEngine-log-aggregation).
+- A fix was added that prevents HDFS audit logs from filling up  disk space, which was caused by a misconfiguration of the log4j rotation property that disrupted the way clusters should work.
+- You can now use the time series library in your Spark applications, which provides a rich time series data model and  imputation functions for transforming, reducing, segmenting, joining, and forecasting time series. SQL extensions to time series are also provided. See [Time series library](/docs/AnalyticsEngine?topic=AnalyticsEngine-time-series).
+
+### 20 August 2020
+
+- As of this deployment, Analytics Engine cluster’s build version will be available in `/home/common/aeversion.txt` on the nodes of the cluster. You can check this file after you SSH to the cluster. This will help in tracking fixes that were made available against a particular version of Analytics Engine. For example, this deployment version is AE-1.2.v28.3.
+
+- **[AE-1.2.v28.3]** Fixes were applied for the following security vulnerabilities on the OS level packages of cluster nodes. Apart from these fixes, a few config vulnerabilities at the OS level were also applied.
+
+  [CVE-2019-11729](https://access.redhat.com/errata/RHSA-2019:4190),
+  [CVE-2019-11745](https://access.redhat.com/errata/RHSA-2019:4190),
+  [CVE-2020-8616](https://access.redhat.com/errata/RHSA-2020:2344),
+  [CVE-2020-8617](https://access.redhat.com/errata/RHSA-2020:2344),
+  [CVE-2019-12735](https://access.redhat.com/errata/RHSA-2019:1619),
+  [CVE-2020-11008](https://access.redhat.com/errata/RHSA-2020:2337),
+  [CVE-2020-12049](https://access.redhat.com/errata/RHSA-2020:2894),
+  [CVE-2020-1967](https://gitlab.alpinelinux.org/alpine/aports/-/issues/11429),
+  [CVE-2020-3810](https://ubuntu.com/security/notices/USN-4359-1),
+  [CVE-2019-5188](http://www.ubuntu.com/usn/usn-4249-1),
+  [CVE-2019-5094](http://www.ubuntu.com/usn/usn-4142-1),
+  [usn-4038-3](http://www.ubuntu.com/usn/usn-4038-3),
+  [CVE-2017-12133](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2017-18269](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2018-11236](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2018-11237](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2018-19591](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2018-6485](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2019-19126](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2019-9169](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2020-10029](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2020-1751](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2020-1752](http://www.ubuntu.com/usn/usn-4416-1),
+  [CVE-2019-13627](http://www.ubuntu.com/usn/usn-4236-2),
+  [CVE-2018-16888](http://www.ubuntu.com/usn/usn-4269-1),
+  [CVE-2019-20386](http://www.ubuntu.com/usn/usn-4269-1),
+  [CVE-2019-3843](http://www.ubuntu.com/usn/usn-4269-1),
+  [CVE-2019-3844](http://www.ubuntu.com/usn/usn-4269-1),
+  [CVE-2020-1712](http://www.ubuntu.com/usn/usn-4269-1),
+  [CVE-2016-9840](http://www.ubuntu.com/usn/usn-4246-1),
+  [CVE-2016-9841](http://www.ubuntu.com/usn/usn-4246-1),
+  [CVE-2016-9842](http://www.ubuntu.com/usn/usn-4246-1),
+  [CVE-2016-9843](http://www.ubuntu.com/usn/usn-4246-1),
+  [CVE-2019-9924](http://www.ubuntu.com/usn/usn-4058-1)
+
+- There is no security patch application for existing clusters. Create a new cluster to get these fixes.
+
 ### 23 July 2020
 
 - Default values for the `gateway.socket.*` parameters were added to fix errors related to the length of messages when using Spark WebSockets.