Skip to content

Speed up time interval arounding around dst #56371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 7, 2020

When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In #55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.

The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it might end up
being faster than the microbenchmarks imply.

When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In elastic#55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.

The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 7, 2020
@@ -91,7 +91,7 @@ public static Lookup lookup(ZoneId zone, long minUtcMillis, long maxUtcMillis) {
*
* @return a lookup function of {@code null} if none could be built
*/
public static LocalTimeOffset lookupFixedOffset(ZoneId zone) {
public static LocalTimeOffset fixedOffset(ZoneId zone) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this method because the name seemed wrong to me when I read it this time around.

@Param({ "MONTH_OF_YEAR", "HOUR_OF_DAY" })
public String timeUnit;
@Param({ "calendar year", "calendar hour", "10d", "5d", "1h" })
public String interval;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These add support for time interval rounding.

@@ -555,7 +555,7 @@ public long inGap(long localMillis, Gap gap) {
@Override
public long beforeGap(long localMillis, Gap gap) {
return gap.previous().localToUtc(localMillis, this);
};
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed silly ; because it made my eyes hurt.


private final long interval;
private final ZoneId timeZone;
/** For fixed offset timezones, this is the offset in milliseconds, otherwise TZ_OFFSET_NON_FIXED */
private final long fixedOffsetMillis;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this any more because we can just ask the lookup if it is fixed across the bounds we need.

@@ -773,88 +767,32 @@ public byte id() {

@Override
public Prepared prepare(long minUtcMillis, long maxUtcMillis) {
return prepareForUnknown();
long minLookup = minUtcMillis - interval;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here we go with the actual change!


@Override
public long nextRoundingValue(long utcMillis) {
// TODO this is used in date range's collect so we should optimize it too
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much what we used to do. It ain't fast, but we can make it faster later.


private class JavaTimeRounding implements Prepared {
@Override
public long round(long utcMillis) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just shuffled from where it used to be.

@@ -778,6 +803,18 @@ public void testPrepareLongRangeRoundsNotToMidnight() {
assertThat(prepared.round(time("9000-03-31T15:25:15.148Z")), isDate(time("9000-03-31T15:00:00Z"), tz));
}

public void testIntervalBeforeGap() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were fun examples that came up in the random tests that failed. It makes it a ton easier to rerun them if I pull them out like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment/javadoc to this, please? It's unclear to me why this particular data would be problematic

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing against java time lends a lot of confidence to this. I don't see any obvious issues with the implementation. Given both of those, I think this looks good.

}
}

private class VariableRounding implements Prepared, LocalTimeOffset.Strategy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is private, but I still think a bit of javadoc to explain when you want Variable and when you want Fixed would help future generations.

// Round a whole bunch of dates and make sure they line up with the known good java time implementation
Rounding.Prepared prepared = rounding.prepare(min, max);
Rounding.Prepared javaTimeRounding = rounding.prepareJavaTime();
for (int d = 0; d < 1000; d++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good test. I like this test.

@@ -778,6 +803,18 @@ public void testPrepareLongRangeRoundsNotToMidnight() {
assertThat(prepared.round(time("9000-03-31T15:25:15.148Z")), isDate(time("9000-03-31T15:00:00Z"), tz));
}

public void testIntervalBeforeGap() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment/javadoc to this, please? It's unclear to me why this particular data would be problematic

@nik9000
Copy link
Member Author

nik9000 commented May 7, 2020

I've pushed some more Javadoc!

We really are relying and the randomized testing against java.util.time to catch stuff. Partially that is because the rules that we use for rounding, while they fit on a page, aren't always the clearest thing to fit in your head.

@nik9000 nik9000 merged commit 8478ee6 into elastic:master May 7, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 7, 2020
When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In elastic#55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.

The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
@nik9000
Copy link
Member Author

nik9000 commented May 8, 2020

Here is a completed benchmark run. The results are in line with what I mentioned in the issue description.

nik9000 added a commit that referenced this pull request May 8, 2020
When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In #55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.

The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 8, 2020
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after elastic#56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
nik9000 added a commit that referenced this pull request Jul 1, 2020
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after #56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jul 1, 2020
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after elastic#56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
nik9000 added a commit that referenced this pull request Jul 1, 2020
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after #56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants