Skip to content

antononcube/WL-MonadicEventRecordsTransformations-paclet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WL-MonadicEventRecordsTransformations-paclet

In brief

A software monad for transformation of event records with different variables and entities into (manageable) time series and sparse matrices.

Details

  • The primary goal of this Event Records Transformations Monad (ERTMon) is to convert heterogeneous events data into sparse matrices (suitable for use in machine learning and statistical algorithms.)

  • The produced sparse matrices are with named rows and columns -- see "AntonAntonov/SSparseMatrix", [AAp4].

  • The monad takes a computational specification dataset.

  • The event data records are grouped by entity identifier and variable.

  • Groups corresponding to the same variable are used to make a sparse matrix corresponding to that variable.

  • In the obtained variable sparse matrices each rows corresponds to an entity identifier.

  • The matrices are normalized according to the computational specification.

  • A normalization can be "global" across all entities for a given variable, or "local", separately computed for each matrix row.

Here is a flowchart encompassing most of the ERTMon workflows:


Basic Examples

Prepare events data using weather data from meteorological stations close to certain major cities. The weather data is retrieved with paclet's function WeatherEventRecords from the paclet:

citiesSpec = {{"Miami", "USA"}, {"Jacksonville", "USA"}, {"Chicago", "USA"}, {"London", "UK"}, {"Melbourne", "Australia"}, {"Sydney", "Australia"}};
dateRange = {{2024, 10, 1}, {2025, 10, 1}};
wProps = {"Temperature", "MaxTemperature", "Pressure", "Humidity", "WindSpeed"};
res = WeatherEventRecords[citiesSpec, dateRange, wProps, 0];

Here we assign the obtained datasets to variables we use below:

eventRecords = res["eventRecords"];
entityAttributes = res["entityAttributes"];

Here are the summaries of the datasets eventRecords and entityAttributes :

ResourceFunction["RecordsSummary"][eventRecords]
ResourceFunction["RecordsSummary"][entityAttributes]

Here we take all temperature event records for those weather stations:

srecs = eventRecords[Select[#Variable == "Temperature"&]];

And here plot the corresponding time series obtained by grouping the records by station (entity ID's) and taking the columns "ObservationTime" and "Value":

grecs = Normal@GroupBy[srecs, #EntityID&][All, All, {"ObservationTime", "Value"}];
DateListPlot[grecs, ImageSize -> Large, PlotTheme -> "Detailed", AspectRatio -> 1/3, FrameLabel -> {"Time", "Temperature, C°"}]

Here is a computational specification:

compSpec = Dataset[Association["Humidity.Mean" -> Association["Variable" -> "Humidity", "Explanation" -> "", 
    "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "Mean", "NormalizationScope" -> "Entity", 
    "NormalizationFunction" -> "Mean"], "Humidity.OutliersCount" -> 
   Association["Variable" -> "Humidity", "Explanation" -> "", "MaxHistoryLength" -> 5184000, 
    "AggregationIntervalLength" -> 172800, "AggregationFunction" -> "OutliersCount", 
    "NormalizationScope" -> "Variable", "NormalizationFunction" -> "None"], 
  "Humidity.Range" -> Association["Variable" -> "Humidity", "Explanation" -> "", 
    "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "Range", "NormalizationScope" -> "Country", 
    "NormalizationFunction" -> "Mean"], "MaxTemperature.Mean" -> 
   Association["Variable" -> "MaxTemperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000, 
    "AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Mean", 
    "NormalizationScope" -> "Entity", "NormalizationFunction" -> "Mean"], 
  "MaxTemperature.OutliersCount" -> Association["Variable" -> "MaxTemperature", 
    "Explanation" -> "", "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "OutliersCount", "NormalizationScope" -> "Variable", 
    "NormalizationFunction" -> "None"], "MaxTemperature.Range" -> 
   Association["Variable" -> "MaxTemperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000, 
    "AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Range", 
    "NormalizationScope" -> "Country", "NormalizationFunction" -> "Mean"], 
  "Pressure.Mean" -> Association["Variable" -> "Pressure", "Explanation" -> "", 
    "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "Mean", "NormalizationScope" -> "Entity", 
    "NormalizationFunction" -> "Mean"], "Pressure.OutliersCount" -> 
   Association["Variable" -> "Pressure", "Explanation" -> "", "MaxHistoryLength" -> 5184000, 
    "AggregationIntervalLength" -> 172800, "AggregationFunction" -> "OutliersCount", 
    "NormalizationScope" -> "Variable", "NormalizationFunction" -> "None"], 
  "Pressure.Range" -> Association["Variable" -> "Pressure", "Explanation" -> "", 
    "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "Range", "NormalizationScope" -> "Country", 
    "NormalizationFunction" -> "Mean"], "Temperature.Mean" -> 
   Association["Variable" -> "Temperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000, 
    "AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Mean", 
    "NormalizationScope" -> "Entity", "NormalizationFunction" -> "Mean"], 
  "Temperature.OutliersCount" -> Association["Variable" -> "Temperature", "Explanation" -> "", 
    "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "OutliersCount", "NormalizationScope" -> "Variable", 
    "NormalizationFunction" -> "None"], "Temperature.Range" -> 
   Association["Variable" -> "Temperature", "Explanation" -> "", "MaxHistoryLength" -> 5184000, 
    "AggregationIntervalLength" -> 172800, "AggregationFunction" -> "Range", 
    "NormalizationScope" -> "Country", "NormalizationFunction" -> "Mean"], 
  "WindSpeed.Mean" -> Association["Variable" -> "WindSpeed", "Explanation" -> "", 
    "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "Mean", "NormalizationScope" -> "Entity", 
    "NormalizationFunction" -> "Mean"], "WindSpeed.OutliersCount" -> 
   Association["Variable" -> "WindSpeed", "Explanation" -> "", "MaxHistoryLength" -> 5184000, 
    "AggregationIntervalLength" -> 172800, "AggregationFunction" -> "OutliersCount", 
    "NormalizationScope" -> "Variable", "NormalizationFunction" -> "None"], 
  "WindSpeed.Range" -> Association["Variable" -> "WindSpeed", "Explanation" -> "", 
    "MaxHistoryLength" -> 5184000, "AggregationIntervalLength" -> 172800, 
    "AggregationFunction" -> "Range", "NormalizationScope" -> "Country", 
    "NormalizationFunction" -> "Mean"]]];

Here is a monad pipeline that process the event records into sparse matrices:

p = 
	ERTMonUnit[]⟹
	ERTMonSetEventRecords[eventRecords]⟹
	ERTMonSetEntityAttributes[entityAttributes]⟹
	ERTMonEchoDataSummaryERTMonSetComputationSpecification[compSpec]⟹
	ERTMonGroupEntityVariableRecordsERTMonComputeVariableStatistic[Histogram]⟹
	ERTMonEchoFunctionValue["Variable distributions:"]⟹
	ERTMonFindVariableOutlierBoundariesERTMonEchoFunctionValue["Outlier boundaries:"]⟹
	ERTMonEntityVariableGroupsToTimeSeries["MaxTime"]⟹
	ERTMonAggregateTimeSeriesERTMonMakeContingencyMatricesERTMonEchoFunctionValue["Contingency matrices:", MatrixPlot /@ #&]

Here is the expanded version of the summary box:

p


References

Paclets

[AAp1] Anton Antonov, DataReshapers, (2023), Wolfram Language Paclet Repository.

[AAp2] Anton Antonov, MonadMakers, (2023), Wolfram Language Paclet Repository.

[AAp3] Anton Antonov, OutlierIdentifiers, (2023), Wolfram Language Paclet Repository.

[AAp4] Anton Antonov, SSparseMatrix, (2023), Wolfram Language Paclet Repository.

Documents

[AA1] Anton Antonov, "Monad code generation and extension"*, (2017), *MathematicaForPrediction at WordPress.

[AA2] Anton Antonov, "A monad for classification workflows", (2018), MathematicaForPrediction at WordPress.

Releases

No releases published

Packages

No packages published