an attempt at fixing the 15m vs 1h resolutions (for now) #196

wonko · 2024-10-07T17:24:16Z

made the timedelta dependant on the returned value
added some basic tests

This should fix the missing data for now. This needs further tuning if there's ever a non-4-step-15m-interval. I might add that later this week, no time today ...

wonko · 2024-10-07T17:25:51Z

also

➜ ./bin/python -m unittest -v
test_be_60m (test.test_api_client.TestDocumentParsing.test_be_60m) ... ok
test_be_60m_15m_mix (test.test_api_client.TestDocumentParsing.test_be_60m_15m_mix) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

Roeland54 · 2024-10-07T18:06:25Z

Here the german special case is not handled. They have 2 sets of data one 15min and one 60min. The 60min set is the one that needs to be parsed. That is why we ignored everything besides 60min before.

entsoe
or try this api call: https://web-api.tp.entsoe.eu/api?documentType=A44&securityToken=<SECRET>&periodStart=202410060000&periodEnd=202410071300&in_domain=10Y1001A1001A82H&out_domain=10Y1001A1001A82H

wonko · 2024-10-07T18:19:33Z

what's the preferred outcome in this case?

The code currently only deals with 1h intervals, so ...

anything besides the 1h interval could be silently discarded?
should the full-hour-value of the 60M take precedence over the 15M value? Or the other way around?

It seems there's no relation between the value for a certain point, an the hour-points in the 15m resolution (no average of either all hour-points, or some sliding window around the hour-mark) so a choice needs to be made.

Given the DE example above, there's

         <Period>
            <timeInterval>
              <start>2024-10-05T22:00Z</start>
              <end>2024-10-06T22:00Z</end>
            </timeInterval>
            <resolution>PT60M</resolution>
              <Point>
                <position>1</position>
                 <price.amount>67.04</price.amount>
              </Point>
              <Point>
                <position>2</position>
                 <price.amount>63.97</price.amount>
              </Point>
...

and

          <Period>
            <timeInterval>
              <start>2024-10-05T22:00Z</start>
              <end>2024-10-06T22:00Z</end>
            </timeInterval>
            <resolution>PT15M</resolution>
              <Point>
                <position>1</position>
                 <price.amount>98.1</price.amount>
              </Point>
              <Point>
                <position>2</position>
                 <price.amount>89.5</price.amount>
              </Point>
              <Point>
                <position>3</position>
                 <price.amount>77.21</price.amount>
              </Point>
              <Point>
                <position>4</position>
                 <price.amount>40.09</price.amount>
              </Point>
              <Point>
                <position>5</position>
                 <price.amount>87.2</price.amount>
              </Point>
              <Point>
                <position>6</position>
                 <price.amount>80.1</price.amount>
              </Point>
              <Point>
                <position>7</position>
                 <price.amount>75.3</price.amount>
              </Point>
              <Point>
                <position>8</position>
                 <price.amount>51.34</price.amount>
              </Point>
              <Point>
                <position>9</position>
                 <price.amount>76.8</price.amount>
              </Point>

There's no way to get the 63,97 value of 23:00 by using the datapoints in the 15m set (or I made some serious mistake).

Roeland54 · 2024-10-07T18:30:06Z

anything besides the 1h interval could be silently discarded in the German case. The 15min data is some price from another electricity product. No idea what it really means. Except that we do not want it. ' day-ahead prices of the separate 10:15 auction of EXAA are also published under the filter “resolution=PT15M"

wonko · 2024-10-07T18:55:31Z

addressed that, prefers the 60M data, but takes 15M data if no 60M data is available (which keeps the oddball BE "bug" situation working).

Added the DE example as a test for this situation

Pluimvee · 2024-10-08T20:04:23Z

Not sure if your test files are the same as the DE files, but your mixed file contains
PT15M for 5 till 6 october
PT15M for 6 till 7 october
PT60M for 7 till 8 october

Looking at the date you commited the file (october 7) I can image that the data retrieved gave PT15 for the past and PT60 for future prices.

Are you sure there are overlapping periods in the DE case?

Pluimvee

I suggest to make a separate method for PT15M logic as we need to also cope with missing positions (price remains equal)

The PT15M logic can return the hour and price found after iterating through max 4 positions and return the average. As such the code works for any number of positions (max4) in a PT15M resolution

See some (untested) code in #202

wonko · 2024-10-09T05:21:49Z

Not sure if your test files are the same as the DE files, but your mixed file contains PT15M for 5 till 6 october PT15M for 6 till 7 october PT60M for 7 till 8 october

Looking at the date you commited the file (october 7) I can image that the data retrieved gave PT15 for the past and PT60 for future prices.

Are you sure there are overlapping periods in the DE case?

seems like i forgot to add the datafile for the DE testcase... should be good now.

wonko · 2024-10-09T05:47:58Z

I suggest to make a separate method for PT15M logic as we need to also cope with missing positions (price remains equal)

The PT15M logic can return the hour and price found after iterating through max 4 positions and return the average. As such the code works for any number of positions (max4) in a PT15M resolution

See some (untested) code in #202

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

Pluimvee · 2024-10-09T08:25:22Z

I suggest to make a separate method for PT15M logic as we need to also cope with missing positions (price remains equal)
The PT15M logic can return the hour and price found after iterating through max 4 positions and return the average. As such the code works for any number of positions (max4) in a PT15M resolution
See some (untested) code in #202

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

The setup in #202 allows for extending towards PT30M and other resolutions

Pluimvee · 2024-10-09T10:27:06Z

In the BE mixed file there is

1 timeseries
with 3 periods
2 periods with a resolution of PT15M (yesterday & today) and 1 period with PT60M (tomorrow)
the timeseries has mRID set to 1 (first response to the requested data?)

In the DE example there are

2 timeseries, first having resolution PT60, the second PT15M
the first having mRID set to 1, the second to 2
the first having classificationSequence_AttributeInstanceComponent.position set to 1, the second to 2
both timeseries contain 2 periods with sequential time intervals (so no overlap in the periods)

Meaning: in the DE case the server gave 2 responses to the request. Two 'TimeSeries' with different mRID, different resolutions and different classificationSequence_AttributeInstanceComponent.position

It seems the classificationSequence_AttributeInstanceComponent.position is an optional parameter also valid in the request. Not sure what it does.

I think when there are two (or more) Timeseries elements in the response these can be interpreted as alternatives

Roeland54 · 2024-10-09T11:40:52Z

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

Yes I agree on this. Separate mehods with duplicate code are not really clean or maintainable. I even think we should go further than this and should just use the data in the resolution we get from the api and make the complete integration generic with multiple resolutions. Instead of forcing it in the 60m resolution which can be done in multiple ways and then create confusion in the future. I also think we can assume real mixed responses like the belgian bug of this week is not really something we will see again. Would be nice to be able to handle it not really a dealbreaker.
As a benefit the integration becomes fully compatible with 15min resolution which will become the norm somewhere next year.

In the BE mixed file there is

1 timeseries

with 3 periods

2 periods with a resolution of PT15M (yesterday & today) and 1 period with PT60M (tomorrow)

the timeseries has mRID set to 1 (first response to the requested data?)

In the DE example there are

2 timeseries, first having resolution PT60, the second PT15M

the first having mRID set to 1, the second to 2

the first having classificationSequence_AttributeInstanceComponent.position set to 1, the second to 2

both timeseries contain 2 periods with sequential time intervals (so no overlap in the periods)

Meaning: in the DE case the server gave 2 responses to the request. Two 'TimeSeries' with different mRID, different resolutions and different classificationSequence_AttributeInstanceComponent.position

It seems the classificationSequence_AttributeInstanceComponent.position is an optional parameter also valid in the request. Not sure what it does.

I think when there are two (or more) Timeseries elements in the response these can be interpreted as alternatives

That the german response has multiple data sets is already explained an mentioned above. More info in this issue where entsoe-py handles this case. Good spot on the classificationSequence_AttributeInstanceComponent.position I didn't notice that one before. If this field makes is possible to separate SDAC and EXAA data it could be used to easily only parse the SDAC data. Will need to see if we can find any documentation on this field.

wonko · 2024-10-09T12:02:42Z

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

Yes I agree on this. Separate mehods with duplicate code are not really clean or maintainable. I even think we should go further than this and should just use the data in the resolution we get from the api and make the complete integration generic with multiple resolutions. Instead of forcing it in the 60m resolution which can be done in multiple ways and then create confusion in the future. I also think we can assume real mixed responses like the belgian bug of this week is not really something we will see again. Would be nice to be able to handle it not really a dealbreaker. As a benefit the integration becomes fully compatible with 15min resolution which will become the norm somewhere next year.

My two cents below.

The resolution is actually an ISO8601 defined expression of duration (Pxxxx -> period, followed by time indicators). Imho, best way forward would be to parse the data with the given resolution, making zero assumptions, towards a pandas DF and then aggregate over it towards the wanted resolution (60 minutes in this case, might change in the future).

Pandas has https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.isoformat.html as a parser for the resolution, and the code could use the resample or aggregate functionality for the calculation towards the wanted resolution. Detecting a one-on-one resolution-to-resample might be a shortcut to consider and which might cover 95% of the cases.

And, as a bonus, if i remember correctly, pandas also has a forward-fill method for the fillna method to fill the missing values.

Using pandas might give a more robust parsing and handling of the data. The code would only have to translate between the XML and the dataframes, and apply the correct math to the dataframes as needed.

(I kind of got pulled in this by making this one small fix (to actually just fix my battery-charging), but the problem is interesting from a developers view, so I keep coming back ... ;-) )

Roeland54 · 2024-10-09T12:43:57Z

Using pandas would pin us to version 2.1.4 as pandas is pinned by the home assistant core project. This already caused a mess before so I am quite averse to taking on that dependency. And if id didn't removed the dependency on entsoe-py a few weeks ago. The xml change would have broken the integration completely and the only solution would have been writing our own xml parser from zero. Taking dependencies is always a trade-off and an interesting topic from a developer view. In general I love libraries, they make so much possible. But there is always a cost.

Besides that I think aggregating or resampling the data is not the responsibility of this integration. The aim is to bring the data of entso-e inside home assistant and make it usable. The data should be kept as pure and untouched as possible.

wonko added 2 commits October 7, 2024 19:21

an attempt at fixing the 15m vs 1h resolutions (for now)

bf10b72

silly cache

fcf4a86

wonko mentioned this pull request Oct 7, 2024

Belgian prices are not fetched because of different resolutions in api response #195

Open

fix 15m overlapping data for DE case

2b32daf

Pluimvee reviewed Oct 8, 2024

View reviewed changes

add missing DE testdata

706e060

This was referenced Oct 9, 2024

Support for 15, 60 & mixed resolutions #202

Closed

Full support for PT15M resolution and handle overlapping timeseries #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an attempt at fixing the 15m vs 1h resolutions (for now) #196

an attempt at fixing the 15m vs 1h resolutions (for now) #196

wonko commented Oct 7, 2024

wonko commented Oct 7, 2024

Roeland54 commented Oct 7, 2024

wonko commented Oct 7, 2024

Roeland54 commented Oct 7, 2024

wonko commented Oct 7, 2024

Pluimvee commented Oct 8, 2024 •

edited

Loading

Pluimvee left a comment •

edited

Loading

wonko commented Oct 9, 2024

wonko commented Oct 9, 2024

Pluimvee commented Oct 9, 2024

Pluimvee commented Oct 9, 2024

Roeland54 commented Oct 9, 2024

wonko commented Oct 9, 2024

Roeland54 commented Oct 9, 2024

an attempt at fixing the 15m vs 1h resolutions (for now) #196

Are you sure you want to change the base?

an attempt at fixing the 15m vs 1h resolutions (for now) #196

Conversation

wonko commented Oct 7, 2024

wonko commented Oct 7, 2024

Roeland54 commented Oct 7, 2024

wonko commented Oct 7, 2024

Roeland54 commented Oct 7, 2024

wonko commented Oct 7, 2024

Pluimvee commented Oct 8, 2024 • edited Loading

Pluimvee left a comment • edited Loading

Choose a reason for hiding this comment

wonko commented Oct 9, 2024

wonko commented Oct 9, 2024

Pluimvee commented Oct 9, 2024

Pluimvee commented Oct 9, 2024

Roeland54 commented Oct 9, 2024

wonko commented Oct 9, 2024

Roeland54 commented Oct 9, 2024

Pluimvee commented Oct 8, 2024 •

edited

Loading

Pluimvee left a comment •

edited

Loading