A software monad for transformation of event records with different variables and entities into (manageable) time series and sparse matrices.
-
The primary goal of this Event Records Transformations Monad (ERTMon) is to convert heterogeneous events data into sparse matrices (suitable for use in machine learning and statistical algorithms.)
-
The produced sparse matrices are with named rows and columns -- see "AntonAntonov/SSparseMatrix", [AAp4].
-
The monad takes a computational specification dataset.
-
The event data records are grouped by entity identifier and variable.
-
Groups corresponding to the same variable are used to make a sparse matrix corresponding to that variable.
-
In the obtained variable sparse matrices each rows corresponds to an entity identifier.
-
The matrices are normalized according to the computational specification.
-
A normalization can be "global" across all entities for a given variable, or "local", separately computed for each matrix row.
Here is a flowchart encompassing most of the ERTMon workflows:
Prepare events data using weather data from meteorological stations close to certain major cities. The weather data is retrieved with paclet's function WeatherEventRecords from the paclet:
citiesSpec = {{"Miami", "USA"}, {"Jacksonville", "USA"}, {"Chicago", "USA"}, {"London", "UK"}, {"Melbourne", "Australia"}, {"Sydney", "Australia"}};
dateRange = {{2024, 10, 1}, {2025, 10, 1}};
wProps = {"Temperature", "MaxTemperature", "Pressure", "Humidity", "WindSpeed"};
res = WeatherEventRecords[citiesSpec, dateRange, wProps, 0];Here we assign the obtained datasets to variables we use below:
eventRecords = res["eventRecords"];
entityAttributes = res["entityAttributes"];Here are the summaries of the datasets eventRecords and entityAttributes :
ResourceFunction["RecordsSummary"][eventRecords]
ResourceFunction["RecordsSummary"][entityAttributes]Here we take all temperature event records for those weather stations:
srecs = eventRecords[Select[#Variable == "Temperature"&]];And here plot the corresponding time series obtained by grouping the records by station (entity ID's) and taking the columns "ObservationTime" and "Value":
grecs = Normal@GroupBy[srecs, #EntityID&][All, All, {"ObservationTime", "Value"}];
DateListPlot[grecs, ImageSize -> Large, PlotTheme -> "Detailed", AspectRatio -> 1/3, FrameLabel -> {"Time", "Temperature, C°"}]Here is a computational specification:
compSpec = Dataset[Association["Humidity.Mean" -> Association["Variable" -> "Humidity", "Explanation" -> "",
"MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "Mean", "NormalizationScope" -> "Entity",
"NormalizationFunction" -> "Mean"], "Humidity.OutliersCount" ->
Association["Variable" -> "Humidity", "Explanation" -> "", "MaxHistoryLength" -> 5184000,
"AggregationIntervalLength" -> 172800, "AggregationFunction" -> "OutliersCount",
"NormalizationScope" -> "Variable", "NormalizationFunction" -> "None"],
"Humidity.Range" -> Association["Variable" -> "Humidity", "Explanation" -> "",
"MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "Range", "NormalizationScope" -> "Country",
"NormalizationFunction" -> "Mean"], "MaxTemperature.Mean" ->
Association["Variable" -> "MaxTemperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000,
"AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Mean",
"NormalizationScope" -> "Entity", "NormalizationFunction" -> "Mean"],
"MaxTemperature.OutliersCount" -> Association["Variable" -> "MaxTemperature",
"Explanation" -> "", "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "OutliersCount", "NormalizationScope" -> "Variable",
"NormalizationFunction" -> "None"], "MaxTemperature.Range" ->
Association["Variable" -> "MaxTemperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000,
"AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Range",
"NormalizationScope" -> "Country", "NormalizationFunction" -> "Mean"],
"Pressure.Mean" -> Association["Variable" -> "Pressure", "Explanation" -> "",
"MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "Mean", "NormalizationScope" -> "Entity",
"NormalizationFunction" -> "Mean"], "Pressure.OutliersCount" ->
Association["Variable" -> "Pressure", "Explanation" -> "", "MaxHistoryLength" -> 5184000,
"AggregationIntervalLength" -> 172800, "AggregationFunction" -> "OutliersCount",
"NormalizationScope" -> "Variable", "NormalizationFunction" -> "None"],
"Pressure.Range" -> Association["Variable" -> "Pressure", "Explanation" -> "",
"MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "Range", "NormalizationScope" -> "Country",
"NormalizationFunction" -> "Mean"], "Temperature.Mean" ->
Association["Variable" -> "Temperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000,
"AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Mean",
"NormalizationScope" -> "Entity", "NormalizationFunction" -> "Mean"],
"Temperature.OutliersCount" -> Association["Variable" -> "Temperature", "Explanation" -> "",
"MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "OutliersCount", "NormalizationScope" -> "Variable",
"NormalizationFunction" -> "None"], "Temperature.Range" ->
Association["Variable" -> "Temperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000,
"AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Range",
"NormalizationScope" -> "Country", "NormalizationFunction" -> "Mean"],
"WindSpeed.Mean" -> Association["Variable" -> "WindSpeed", "Explanation" -> "",
"MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "Mean", "NormalizationScope" -> "Entity",
"NormalizationFunction" -> "Mean"], "WindSpeed.OutliersCount" ->
Association["Variable" -> "WindSpeed", "Explanation" -> "", "MaxHistoryLength" -> 5184000,
"AggregationIntervalLength" -> 172800, "AggregationFunction" -> "OutliersCount",
"NormalizationScope" -> "Variable", "NormalizationFunction" -> "None"],
"WindSpeed.Range" -> Association["Variable" -> "WindSpeed", "Explanation" -> "",
"MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800,
"AggregationFunction" -> "Range", "NormalizationScope" -> "Country",
"NormalizationFunction" -> "Mean"]]];Here is a monad pipeline that process the event records into sparse matrices:
p =
ERTMonUnit[]⟹
ERTMonSetEventRecords[eventRecords]⟹
ERTMonSetEntityAttributes[entityAttributes]⟹
ERTMonEchoDataSummary⟹
ERTMonSetComputationSpecification[compSpec]⟹
ERTMonGroupEntityVariableRecords⟹
ERTMonComputeVariableStatistic[Histogram]⟹
ERTMonEchoFunctionValue["Variable distributions:"]⟹
ERTMonFindVariableOutlierBoundaries⟹
ERTMonEchoFunctionValue["Outlier boundaries:"]⟹
ERTMonEntityVariableGroupsToTimeSeries["MaxTime"]⟹
ERTMonAggregateTimeSeries⟹
ERTMonMakeContingencyMatrices⟹
ERTMonEchoFunctionValue["Contingency matrices:", MatrixPlot /@ #&]Here is the expanded version of the summary box:
p[AAp1] Anton Antonov, DataReshapers, (2023), Wolfram Language Paclet Repository.
[AAp2] Anton Antonov, MonadMakers, (2023), Wolfram Language Paclet Repository.
[AAp3] Anton Antonov, OutlierIdentifiers, (2023), Wolfram Language Paclet Repository.
[AAp4] Anton Antonov, SSparseMatrix, (2023), Wolfram Language Paclet Repository.
[AA1] Anton Antonov, "Monad code generation and extension"*, (2017), *MathematicaForPrediction at WordPress.
[AA2] Anton Antonov, "A monad for classification workflows", (2018), MathematicaForPrediction at WordPress.





