You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sphinx/source/io_schema/provdb_schema.rst
+36-48Lines changed: 36 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -186,50 +186,48 @@ Global database
186
186
187
187
Below we describe the JSON schema for the **func_stats**, **counter_stats** and **ad_model** collections of the **global database** component of the provenance database.
188
188
189
+
A common data structure **RunStats** is used extensively to represent statistics (mean, min/max, std. dev., etc) of some quantity. It has the following schema:
190
+
191
+
|{
192
+
|**'accumulate'**: *The sum of all values (same as mean \* count). In some cases this entry is not populated*,
193
+
|**'count'**: *The number of values*,
194
+
|**'kurtosis'**: *kurtosis of the distribution of values*,
195
+
|**'maximum'**: *maximum value*,
196
+
|**'mean'**: *average value*,
197
+
|**'minimum'**: *minimum value*,
198
+
|**'skewness'**: *skewness of distribution of values*,
199
+
|**'stddev'**: *standard deviation of distribution of values*
200
+
|}
201
+
202
+
189
203
Function profile statistics schema
190
204
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
191
205
192
-
**func_stats** contains aggregated profile information for all functions. The JSON schema is as follows:
206
+
**func_stats** contains aggregated profile information and anomaly information for all functions. The JSON schema is as follows:
193
207
194
208
|{
195
-
|**'app'**: *program index*,
196
-
|**'fid'**: *global function index*,
197
-
|**'name'**: *function name*,
198
-
|**'exclusive'**: *Statistics of runtime exclusive of children*
199
-
|{
200
-
|**'accumulate'**: *unused*,
201
-
|**'count'**: *total function executions*,
202
-
|**'kurtosis'**: *kurtosis of function exclusive time distribution*,
203
-
|**'maximum'**: *maximum function exclusive time*,
204
-
|**'mean'**: *average function exclusive time*,
205
-
|**'minimum'**: *minimum function exclusive time*,
206
-
|**'skewness'**: *skewness of function exclusive time distribution*,
207
-
|**'stddev'**: *standard deviation of function exclusive time distribution*,
208
-
|},
209
-
|**'inclusive'**: *Statistics of runtime inclusive of children*
210
-
|{
211
-
|**'accumulate'**: *unused*,
212
-
|**'count'**: *total function executions*,
213
-
|**'kurtosis'**: *kurtosis of function inclusive time distribution*,
214
-
|**'maximum'**: *maximum function inclusive time*,
215
-
|**'mean'**: *average function inclusive time*,
216
-
|**'minimum'**: *minimum function inclusive time*,
217
-
|**'skewness'**: *skewness of function inclusive time distribution*,
218
-
|**'stddev'**: *standard deviation of function inclusive time distribution*,
219
-
|},
220
-
|**'stats'**: *Statistics on function anomalies per timestep observed in run to-date*
221
-
|{
222
-
|**'accumulate'**: *total number of anomalies observed for this function*,
223
-
|**'count'**: *number of timesteps data colected for*,
224
-
|**'kurtosis'**: *kurtosis of distribution of anomalies/step*,
225
-
|**'maximum'**: *maximum anomalies/step*,
226
-
|**'mean'**: *average anomalies/step*,
227
-
|**'minimum'**: *minimum anomalies/step*,
228
-
|**'skewness'**: *skewness of distribution of anomalies/step*,
229
-
|**'stddev'**: *standard deviation distribution of anomalies/step*,
230
-
|}
209
+
|**"__id"**: *record index*,
210
+
|**"app"**: *application/program index*,
211
+
|**"fid"**: *function index*,
212
+
|**"fname"**: *function name*,
213
+
|**"anomaly_metrics"**: *statistics on anomalies for this function (object). Note this entry is null if no anomalies were detected*
214
+
|{
215
+
|**"anomaly_count"**: *statistics on the anomaly count for time steps in which anomalies were detected, as well as the total number of anomalies (RunStats)*
216
+
|**"first_io_step"**: *the first IO step in which an anomaly was detected*,
217
+
|**"last_io_step"**: *the last IO step in which an anomaly was detected*,
218
+
|**"max_timestamp"**: *the last anomaly's timestamp*,
219
+
|**"min_timestamp"**: *the first anomaly's timestamp*,
220
+
|**"score"**: *statistics on the scores for the anomalies (RunStats)*,
221
+
|**"severity"**: *statistics on the severity of the anomalies (RunStats)*,
222
+
|},
223
+
|**"runtime_profile"**: *statistics on function runtime (i.e. the function profile) (object)*
224
+
|{
225
+
|**"exclusive_runtime"**: *statistics on the runtime excluding child function calls (RunStats)*,
226
+
|**"inclusive_runtime"**: *statistics on the runtime including child function calls (RunStats)*
227
+
|}
231
228
|}
232
229
230
+
233
231
Counter statistics schema
234
232
^^^^^^^^^^^^^^^^^^^^^^^^^
235
233
@@ -238,17 +236,7 @@ The **counter_stats** collection has the following schema:
238
236
|{
239
237
|**'app'**: *Program index*,
240
238
|**'counter'**: *Counter description*,
241
-
|**'stats'**: *Global aggregated statistics on counter values since start of run*,
242
-
|{
243
-
|**'accumulate'**: *Unused*,
244
-
|**'count'**: *Number of times counter appeared*,
245
-
|**'kurtosis'**: *kurtosis of distribution of value*,
246
-
|**'maximum'**: *maximum value*,
247
-
|**'mean'**: *average value*,
248
-
|**'minimum'**: *minimum value*,
249
-
|**'skewness'**: *skewness of distribution of values*,
250
-
|**'stddev'**: *standard deviation of distribution of values*
251
-
|}
239
+
|**'stats'**: *Global aggregated statistics on counter values since start of run (RunStats)*
0 commit comments