Skip to content

SolarNet aggregation

Matt Magoffin edited this page Aug 13, 2025 · 11 revisions

SolarNet provides several API methods that return aggregated results derived from the raw datum posted by SolarNodes over fixed time windows, such as days and months. This guide describes how SolarNet performs that aggregation.

Instantaneous vs. accumulating vs. status properties

SolarNode datum are collected as general datum sample objects, which have their properties grouped into these classifications:

Classification Description Example
instantaneous An absolute measurement of some number. A reading from a speedometer in a car, for example 50 km/h.
accumulating A relative measurement of some number that accumulates value over time. A reading from an odometer in a car, for example 28,189 km.
status Some value, not necessarily a number. A warning message from a car, for example Check engine: E290.

SolarNet treats these classifications in the following ways when aggregating datum into time windows:

Classification Aggregation
instantaneous average of samples within time window
accumulating sum of differences over time, projected across time window boundaries
status most frequently seen value

Time windows are equal in length to the aggregation level being queried, and include everything on and after the start date up to but not including the end date. You might think of that logic like StartDate <= window < EndDate or (window >= StartDate) AND (window < EndDate).

List aggregation

The /datum/list endpoint allocates accumulating data using time projection across time window boundaries, so that only the amount within the window is allocated to the final aggregate value. For example, imagine calculating the accumulated aggregate for a time window starting at 10:00 given two readings like this:

Date Energy (Wh)
2000-01-01 09:59 200
2000-01-01 10:02 500

SolarNet would report 200 Wh for the 10:00 window. That is because there are 3 minutes between the two readings, and a difference of 300 Wh. If we project that 300 Wh evenly over those 3 minutes, only 2 of those minutes fall within our 10:00 time window, so SolarNet would allocate 2/3 of the 300 Wh to the window, which is 200 Wh.

Example list aggregation

Imagine we have a SolarNode collecting data from a car that goes on a trip. The collected data looks like this:

Date Speed (km/h) Odometer (km) Message
2000-01-01 10:00 0 53456
2000-01-01 10:15 50 53460
2000-01-01 10:30 70 53466
2000-01-01 10:45 30 53470
2000-01-01 11:00 20 53472
2000-01-01 11:15 90 53478 Check oil
2000-01-01 11:30 10 53485 Check oil
2000-01-01 11:45 5 53486 Check oil
2000-01-01 12:00 0 53486 Check oil
2000-01-01 12:15 0 53487
2000-01-01 12:30 60 53490
2000-01-01 12:45 95 53498
2000-01-01 13:00 2 53504

SolarNet would aggregate this into hourly time windows that look like this:

Date Speed (km/h) Odometer (km) Message
2000-01-01 10:00 37.5 16
2000-01-01 11:00 31.25 14 Check oil
2000-01-01 12:00 38.75 18

Here the Speed column represents the average speed and Odometer represents the distance travelled, over each hour.

In a similar fashion, SolarNet would aggregate this into a single daily time window that would look like this:

Date Speed (km/h) Odometer (km) Message
2000-01-01 00:00 35.83 48 Check oil

List partial aggregation

The /datum/list, /datum/reading, /datum/stream/datum, and /datum/stream/reading endpoints support including partial aggregated results for time ranges that do not align with the requested aggregation level. This feature only works in conjunction with aggregate queries, so the aggregation and partialAggregation query parameters must be both provided.

⚠️ NOTE: the start/end date ranges will be implicitly treated as node-local dates, that is without any time zone component. We recommend you provide the date range with the localStartDate and localEndDate parameters

An example scenario where this can be used is when you want to query for monthly results between two mid-month dates, like 15 Jan 2020 - 15 Mar 2020. By specifying the query parameters like this:

localStartDate=2020-01-15&localEndDate=2020-03-15&aggregation=Month&partialAggregation=Day

The results will include 3 result time ranges:

Result # Start Date End Date
1 2020-01-15 2020-01-31
2 2020-02-01 2020-02-29
3 2020-03-01 2020-03-14

If the partialAggregation parameter had not been provided, the results would include only full month results like this:

Result # Start Date End Date
1 2020-02-01 2020-02-29
2 2020-03-01 2020-03-31

Supported partial aggregation types

Both the aggregation and partialAggregation parameters accept aggregation type values, but partialAggregation must be a smaller aggregate (shorter time range) than the aggregation value, and only Month, Day, and Hour values are supported, as outlined here:

Aggregation Supported Partial Aggregations
Year Month, Day
Month Day, Hour
Day Hour

Reading aggregation

The /datum/reading method can be used to query the difference of accumulating property values collected by SolarNodes without any time projection applied. SolarNet will apply the Difference reading type logic to each aggregate time window. In the previous example aggregation, the end result would be the same because the datums all fall on exact minute dates. Real data collected by SolarNodes will not have such nicely convenient dates, and the normal aggregation uses time based projection to allocate portions of accumulation fairly between time window boundaries. The CalcualtedAtDifference reading type uses this same approach, but reading aggregation uses the Difference approach, which does not use time based projection. Instead it looks for the reading closest to but before the start/end boundaries of the time window.

Here's an example to illustrate the difference. Let's calculate the 10:00 hourly aggregate value for this data:

Date Energy (Wh)
2000-01-01 09:59 200
2000-01-01 10:02 500
2000-01-01 10:11 900
2000-01-01 10:19 1600
2000-01-01 10:55 2900
2000-01-01 11:01 3500

Here's what SolarNet would return, for both /datum/list and /datum/reading style aggregations:

Style Result Calculation
/list 3100 ((500-200) * (2/3)) + (900-500) + (1600-900) + (2900-1600) + ((3500 - 2900) * (5/6))
/reading 2700 (2900 - 200)

Note the two cases of time projection involved in the /list result: (500-200) * (2/3) and (3500 - 2900) * (5/6).

Time window tolerance

When SolarNet aggregates for a given time window, using either the /list or /reading style, it must search for the previous and/or next datum samples in order to either perform time projection or simply find the starting/ending readings. In order to find those values efficiently, SolarNet imposes a maximum tolerance limitation on how far in time it will look for the necessary datum. The tolerance amount differs based on the aggregation style:

Style Tolerance
/list 1 hour
/reading 1 year (see below for more details)

Essentially, if there is a gap in the data larger than this tolerance, then SolarNet will allocate no accumulation for that period. Here's an example:

Date Energy (Wh)
2000-01-01 10:55 2900
2000-01-01 12:01 6500
2000-01-01 12:03 6600

Notice the gap of 66 minutes between the 10:55 and 12:01 samples. Here's what SolarNet would return when aggregating the 12:00 hourly aggregate:

Style Result Calculation
/list 100 (6600 - 6500)
/reading 3700 (6600 - 2900)

Notice how for the /list result there is no time projection included like (6500 - 2900) * (1/66), because the amount of time is larger than the maximum tolerance of 1 hour. The /reading result has a much larger tolerance, and so does use the 10:55 sample as the starting value for this window.

Reading time window tolerance

The Difference reading style used when computing cached aggregate values has a large time tolerance of 1 year. This is large enough to hopefully accomodate the majority of device outages that might occur within a datum stream. There are some caveates to this tolerance, however, to manage the performance characteristics of the aggregation process, which must work with streams that may not contain accumulating property values (accumulating property values are required by the reading aggregation process):

  1. SolarNetwork will try to find the "nearby" datum with at least 1 accumulating property value within a 14 day tolerance. This is to help work with some devices like solar inverters that stop reporting accumulating values (like energy) at night, but continue reporting instantaneous or status values (like operating state).

    For example, when aggregating the first hour with accumulating values after the sun comes up, the "nearby" datum should be the last datum on the previous day that reported accumulating values, not any of the datum reported overnight that lack the accumulating values.

  2. If nothing is found within the 14 day tolerance, SolarNetwork will locate the closest datum within the full 1 year tolerance, regardless if that datum has accumulating values or not. When a datum is found, if it has accumulating values then it will be used for aggregation, otherwise it will be ignored.

These constraints together give datum streams enormous flexibility in terms of SolarNetwork coping with gaps in the data. If you have a datum stream with larger time gaps that fall outside these constraints, you can use Auxiliary Datum Records to keep the aggregate data consistent across the time gaps.

Example: gaps in accumulating properties

Date Status Energy (Wh)
2000-01-01 00:00 Off
2000-01-01 03:00 Off
2000-01-01 06:00 On 200
2000-01-01 12:00 On 500
2000-01-01 18:00 On 900
2000-01-01 21:00 Off
2000-01-02 00:00 Off
2000-01-02 03:00 Off
2000-01-02 06:00 On 1000
2000-01-02 12:00 On 1200

Notice the gaps in the Energy property that occur over night. SolarNetwork will correctly deal with those gaps, reporting the daily accumulation as 700 on the first day, and 300 on the second day.

Reading raw vs. aggregate query

It is important to understand the how a "raw" reading difference query executes differently from an "aggregate" reading query. A raw reading difference query is one like this (omitting node and source IDs for brevity):

/reading?readingType=Difference&startDate=2025-01-01T12:00&endDate=2025-01-01T13:00

☝️ That example reads like calculate the accumulating difference between noon and 1pm on 1 Jan 2025.

An aggregate reading query makes use of the pre-computed hourly, daily, or monthly reading aggregate data, by adding an aggregation= parameter. You can essentially ask for the same result, like this:

/reading?readingType=Difference&startDate=2025-01-01T12:00&endDate=2025-01-01T13:00&aggregation=Hour

Normally these will return the same result, but there are times when discrepancies can occur. The reason a discrepancy can occur is because these two styles actually perform different queries.

Reading raw query

A raw reading query essentially looks at the start and end of the given time range, and calculates a difference between the datum on those two dates. It assumes that all the data within the time range is consistent, so there is no need to inspect each and every datum in the middle.

💡 This is simplifying the actual query a great deal, but this is the essence of the query.

Reading aggregate query

An aggregate reading query essentially relies on the pre-computed hourly aggregate values, and as each hour already has been computed the query can simply calculate a sum of every aggregate datum between the start and end dates.

💡 You can think of daily and monthly aggregate queries in the same way as hourly aggregates.

The way an hourly aggregate value is itself computed, however, is by calculating the sum of the difference beween every datum and its previous datum, between the start and end dates.

Raw versus aggregate example

Here is an example set of data to illustrate how a query for a hour of data using the raw style executes differently than a query for the same hour using the aggregate style:

Date Energy
2025-01-01 12:00 100
2025-01-01 12:10 0
2025-01-01 12:15 300
2025-01-01 12:30 400
2025-01-01 12:45 500
2025-01-01 13:00 600

That 12:10 value is suspicious: the previous reading was 100, at 12:10 it goes down to 0, and then at 12:15 it is back "up" at 300. Suspicious aside, the two query styles will return the same result but by different means:

Style Result Why?
raw 500 The result is simply (end - start) so 600 - 100 = 500
agg 500 The result is (0 - 100) + (300 - 0) + (400 - 300) + (500 - 400) + (600 - 500) = 500

What style should you use?

If you can query on hour-aligned date ranges, then querying for aggregate reading values will generally execute much faster than the equivalent raw query. You can make use of partial aggregation and rollupType=All to efficiently query for very large time ranges down to arbitrary hour or day values. For example this raw query:

/reading?readingType=Difference&startDate=2010-01-15&endDate=2025-05-22

Can be turned into an aggregate query style like this:

/reading?readingType=Difference&startDate=2010-01-15&endDate=2025-05-22&aggregation=Month&partialAggregation=Day&rollupType=All

If you need to query for a very specific (not hour-aligned) time range, then your only option is to use the raw query style.

Clone this wiki locally