Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an attempt at fixing the 15m vs 1h resolutions (for now) #196

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

wonko
Copy link

@wonko wonko commented Oct 7, 2024

  • made the timedelta dependant on the returned value
  • added some basic tests

This should fix the missing data for now. This needs further tuning if there's ever a non-4-step-15m-interval. I might add that later this week, no time today ...

@wonko
Copy link
Author

wonko commented Oct 7, 2024

also

➜ ./bin/python -m unittest -v
test_be_60m (test.test_api_client.TestDocumentParsing.test_be_60m) ... ok
test_be_60m_15m_mix (test.test_api_client.TestDocumentParsing.test_be_60m_15m_mix) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

@Roeland54
Copy link
Collaborator

Here the german special case is not handled. They have 2 sets of data one 15min and one 60min. The 60min set is the one that needs to be parsed. That is why we ignored everything besides 60min before.

entsoe
or try this api call: https://web-api.tp.entsoe.eu/api?documentType=A44&securityToken=<SECRET>&periodStart=202410060000&periodEnd=202410071300&in_domain=10Y1001A1001A82H&out_domain=10Y1001A1001A82H

@wonko
Copy link
Author

wonko commented Oct 7, 2024

what's the preferred outcome in this case?

The code currently only deals with 1h intervals, so ...

  • anything besides the 1h interval could be silently discarded?
  • should the full-hour-value of the 60M take precedence over the 15M value? Or the other way around?

It seems there's no relation between the value for a certain point, an the hour-points in the 15m resolution (no average of either all hour-points, or some sliding window around the hour-mark) so a choice needs to be made.

Given the DE example above, there's

         <Period>
            <timeInterval>
              <start>2024-10-05T22:00Z</start>
              <end>2024-10-06T22:00Z</end>
            </timeInterval>
            <resolution>PT60M</resolution>
              <Point>
                <position>1</position>
                 <price.amount>67.04</price.amount>
              </Point>
              <Point>
                <position>2</position>
                 <price.amount>63.97</price.amount>
              </Point>
...

and

          <Period>
            <timeInterval>
              <start>2024-10-05T22:00Z</start>
              <end>2024-10-06T22:00Z</end>
            </timeInterval>
            <resolution>PT15M</resolution>
              <Point>
                <position>1</position>
                 <price.amount>98.1</price.amount>
              </Point>
              <Point>
                <position>2</position>
                 <price.amount>89.5</price.amount>
              </Point>
              <Point>
                <position>3</position>
                 <price.amount>77.21</price.amount>
              </Point>
              <Point>
                <position>4</position>
                 <price.amount>40.09</price.amount>
              </Point>
              <Point>
                <position>5</position>
                 <price.amount>87.2</price.amount>
              </Point>
              <Point>
                <position>6</position>
                 <price.amount>80.1</price.amount>
              </Point>
              <Point>
                <position>7</position>
                 <price.amount>75.3</price.amount>
              </Point>
              <Point>
                <position>8</position>
                 <price.amount>51.34</price.amount>
              </Point>
              <Point>
                <position>9</position>
                 <price.amount>76.8</price.amount>
              </Point>

There's no way to get the 63,97 value of 23:00 by using the datapoints in the 15m set (or I made some serious mistake).

@Roeland54
Copy link
Collaborator

anything besides the 1h interval could be silently discarded in the German case. The 15min data is some price from another electricity product. No idea what it really means. Except that we do not want it. ' day-ahead prices of the separate 10:15 auction of EXAA are also published under the filter “resolution=PT15M"

@wonko
Copy link
Author

wonko commented Oct 7, 2024

addressed that, prefers the 60M data, but takes 15M data if no 60M data is available (which keeps the oddball BE "bug" situation working).

Added the DE example as a test for this situation

@Pluimvee
Copy link
Contributor

Pluimvee commented Oct 8, 2024

Not sure if your test files are the same as the DE files, but your mixed file contains
PT15M for 5 till 6 october
PT15M for 6 till 7 october
PT60M for 7 till 8 october

Looking at the date you commited the file (october 7) I can image that the data retrieved gave PT15 for the past and PT60 for future prices.

Are you sure there are overlapping periods in the DE case?

Copy link
Contributor

@Pluimvee Pluimvee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to make a separate method for PT15M logic as we need to also cope with missing positions (price remains equal)

The PT15M logic can return the hour and price found after iterating through max 4 positions and return the average. As such the code works for any number of positions (max4) in a PT15M resolution

See some (untested) code in #202

@wonko
Copy link
Author

wonko commented Oct 9, 2024

Not sure if your test files are the same as the DE files, but your mixed file contains PT15M for 5 till 6 october PT15M for 6 till 7 october PT60M for 7 till 8 october

Looking at the date you commited the file (october 7) I can image that the data retrieved gave PT15 for the past and PT60 for future prices.

Are you sure there are overlapping periods in the DE case?

seems like i forgot to add the datafile for the DE testcase... should be good now.

@wonko
Copy link
Author

wonko commented Oct 9, 2024

I suggest to make a separate method for PT15M logic as we need to also cope with missing positions (price remains equal)

The PT15M logic can return the hour and price found after iterating through max 4 positions and return the average. As such the code works for any number of positions (max4) in a PT15M resolution

See some (untested) code in #202

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

@Pluimvee
Copy link
Contributor

Pluimvee commented Oct 9, 2024

I suggest to make a separate method for PT15M logic as we need to also cope with missing positions (price remains equal)
The PT15M logic can return the hour and price found after iterating through max 4 positions and return the average. As such the code works for any number of positions (max4) in a PT15M resolution
See some (untested) code in #202

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

The setup in #202 allows for extending towards PT30M and other resolutions

@Pluimvee
Copy link
Contributor

Pluimvee commented Oct 9, 2024

In the BE mixed file there is

  • 1 timeseries
  • with 3 periods
  • 2 periods with a resolution of PT15M (yesterday & today) and 1 period with PT60M (tomorrow)
  • the timeseries has mRID set to 1 (first response to the requested data?)

In the DE example there are

  • 2 timeseries, first having resolution PT60, the second PT15M
  • the first having mRID set to 1, the second to 2
  • the first having classificationSequence_AttributeInstanceComponent.position set to 1, the second to 2
  • both timeseries contain 2 periods with sequential time intervals (so no overlap in the periods)

Meaning: in the DE case the server gave 2 responses to the request. Two 'TimeSeries' with different mRID, different resolutions and different classificationSequence_AttributeInstanceComponent.position

It seems the classificationSequence_AttributeInstanceComponent.position is an optional parameter also valid in the request. Not sure what it does.

I think when there are two (or more) Timeseries elements in the response these can be interpreted as alternatives

@Roeland54
Copy link
Collaborator

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

Yes I agree on this. Separate mehods with duplicate code are not really clean or maintainable. I even think we should go further than this and should just use the data in the resolution we get from the api and make the complete integration generic with multiple resolutions. Instead of forcing it in the 60m resolution which can be done in multiple ways and then create confusion in the future. I also think we can assume real mixed responses like the belgian bug of this week is not really something we will see again. Would be nice to be able to handle it not really a dealbreaker.
As a benefit the integration becomes fully compatible with 15min resolution which will become the norm somewhere next year.

In the BE mixed file there is

  • 1 timeseries
  • with 3 periods
  • 2 periods with a resolution of PT15M (yesterday & today) and 1 period with PT60M (tomorrow)
  • the timeseries has mRID set to 1 (first response to the requested data?)

In the DE example there are

  • 2 timeseries, first having resolution PT60, the second PT15M
  • the first having mRID set to 1, the second to 2
  • the first having classificationSequence_AttributeInstanceComponent.position set to 1, the second to 2
  • both timeseries contain 2 periods with sequential time intervals (so no overlap in the periods)

Meaning: in the DE case the server gave 2 responses to the request. Two 'TimeSeries' with different mRID, different resolutions and different classificationSequence_AttributeInstanceComponent.position

It seems the classificationSequence_AttributeInstanceComponent.position is an optional parameter also valid in the request. Not sure what it does.

I think when there are two (or more) Timeseries elements in the response these can be interpreted as alternatives

That the german response has multiple data sets is already explained an mentioned above. More info in this issue where entsoe-py handles this case. Good spot on the classificationSequence_AttributeInstanceComponent.position I didn't notice that one before. If this field makes is possible to separate SDAC and EXAA data it could be used to easily only parse the SDAC data. Will need to see if we can find any documentation on this field.

@wonko
Copy link
Author

wonko commented Oct 9, 2024

The solution would have to be generic, apparently the XML specs allow for PT30M resolutions as well...

Yes I agree on this. Separate mehods with duplicate code are not really clean or maintainable. I even think we should go further than this and should just use the data in the resolution we get from the api and make the complete integration generic with multiple resolutions. Instead of forcing it in the 60m resolution which can be done in multiple ways and then create confusion in the future. I also think we can assume real mixed responses like the belgian bug of this week is not really something we will see again. Would be nice to be able to handle it not really a dealbreaker. As a benefit the integration becomes fully compatible with 15min resolution which will become the norm somewhere next year.

My two cents below.

The resolution is actually an ISO8601 defined expression of duration (Pxxxx -> period, followed by time indicators). Imho, best way forward would be to parse the data with the given resolution, making zero assumptions, towards a pandas DF and then aggregate over it towards the wanted resolution (60 minutes in this case, might change in the future).

Pandas has https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.isoformat.html as a parser for the resolution, and the code could use the resample or aggregate functionality for the calculation towards the wanted resolution. Detecting a one-on-one resolution-to-resample might be a shortcut to consider and which might cover 95% of the cases.

And, as a bonus, if i remember correctly, pandas also has a forward-fill method for the fillna method to fill the missing values.

Using pandas might give a more robust parsing and handling of the data. The code would only have to translate between the XML and the dataframes, and apply the correct math to the dataframes as needed.

(I kind of got pulled in this by making this one small fix (to actually just fix my battery-charging), but the problem is interesting from a developers view, so I keep coming back ... ;-) )

@Roeland54
Copy link
Collaborator

Using pandas would pin us to version 2.1.4 as pandas is pinned by the home assistant core project. This already caused a mess before so I am quite averse to taking on that dependency. And if id didn't removed the dependency on entsoe-py a few weeks ago. The xml change would have broken the integration completely and the only solution would have been writing our own xml parser from zero. Taking dependencies is always a trade-off and an interesting topic from a developer view. In general I love libraries, they make so much possible. But there is always a cost.

Besides that I think aggregating or resampling the data is not the responsibility of this integration. The aim is to bring the data of entso-e inside home assistant and make it usable. The data should be kept as pure and untouched as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants