Skip to content

[WIP] Interval utilities and ergonomics #457

Open
@JLockerman

Description

This is a meta-issue tracking various ergonomic improvements for Intervals we may want to add. It comes from some work I was doing analyzing interval quantiles. Something like

SELECT
  justify_interval((approx_percentile(0.10, uddsketch)::TEXT || ' seconds')::interval) "0.10",
  justify_interval((approx_percentile(0.30, uddsketch)::TEXT || ' seconds')::interval) "0.30",
  justify_interval((approx_percentile(0.50, uddsketch)::TEXT || ' seconds')::interval) "0.50",
  justify_interval((approx_percentile(0.70, uddsketch)::TEXT || ' seconds')::interval) "0.70",
  justify_interval((approx_percentile(0.90, uddsketch)::TEXT || ' seconds')::interval) "0.90"
FROM(
  SELECT
    percentile_agg(EXTRACT(EPOCH FROM interval))
  FROM intervals
);

If all the changes proposed here would be implemented, that code would look more like one of

SELECT
  approx_percentile(0.10, uddsketch)::seconds -> justify() "0.10",
  approx_percentile(0.30, uddsketch)::seconds -> justify() "0.30",
  approx_percentile(0.50, uddsketch)::seconds -> justify() "0.50",
  approx_percentile(0.70, uddsketch)::seconds -> justify() "0.70",
  approx_percentile(0.90, uddsketch)::seconds -> justify() "0.90"
FROM(
  SELECT
    percentile_agg(EXTRACT(EPOCH FROM interval))
  FROM intervals
);
Other possibilities

Mega-Function

SELECT
  justified_interval(seconds => approx_percentile(0.10, uddsketch)) "0.10",
  justified_interval(seconds => approx_percentile(0.30, uddsketch)) "0.30",
  justified_interval(seconds => approx_percentile(0.50, uddsketch)) "0.50",
  justified_interval(seconds => approx_percentile(0.70, uddsketch)) "0.70",
  justified_interval(seconds => approx_percentile(0.90, uddsketch)) "0.90"
FROM(
  SELECT
    percentile_agg(EXTRACT(EPOCH FROM interval))
  FROM intervals
);

Multiple Functions

SELECT
  seconds(approx_percentile(0.10, uddsketch)) -> justify() "0.10",
  seconds(approx_percentile(0.30, uddsketch)) -> justify() "0.30",
  seconds(approx_percentile(0.50, uddsketch)) -> justify() "0.50",
  seconds(approx_percentile(0.70, uddsketch)) -> justify() "0.70",
  seconds(approx_percentile(0.90, uddsketch)) -> justify() "0.90"
FROM(
  SELECT
    percentile_agg(EXTRACT(EPOCH FROM interval))
  FROM intervals
);
Full Postfix

It's an interesting thought experiment to see what this would look like if we used accessor notation for all of the operations.

SELECT
  approx_percentile(0.10, uddsketch) -> as_seconds() -> justify() "0.10",
  approx_percentile(0.30, uddsketch) -> as_seconds() -> justify() "0.30",
  approx_percentile(0.50, uddsketch) -> as_seconds() -> justify() "0.50",
  approx_percentile(0.70, uddsketch) -> as_seconds() -> justify() "0.70",
  approx_percentile(0.90, uddsketch) -> as_seconds() -> justify() "0.90"
FROM(
  SELECT
    percentile_agg(EXTRACT(EPOCH FROM interval))
  FROM intervals
);

Since I'm not comfortable implementing -> on the builtin types–due to the risk of collisions and the complexities of doing type conversion in pipelines–I don't consider consider this a serious proposal yet. Though if those two issues could be solved it would be promising.

Conversion from numbers

Problem Statement

Intervals are unusual types in that they're almost numbers, but due the fact that they can be specified in variable-length units, such as months or years, they cannot quite be treated like numbers. For certain operations(such as getting interval quantiles in approximate_percentile() the best way to handle this is to convert to some well-defined unit, such as seconds, perform the math on that, then convert back to an Interval for display purposes.

Currently the canonical way to convert a number of seconds (stored in $number) to an interval is:

($number::TEXT || ' seconds')::interval)

which is both confusing to read and unintuitive to write. A direct conversion function such as would be much easier to understand and search for. There are a number of possible ways to implement such a conversion depending on which use case we want to target. The next section expands upon my preferred one in detail, and the rest are briefly summarize other in Alternatives

Type-Based Conversion

This strategy works by creating a pseudo-type for each unit we want to convert from. These types aren't meant to be useful directly, they are actually Intervals internally and can be automatically casted to Interval. They exist to provide explicit casts from the numeric types and as something for accessors to be defined on. Our example earlier of converting a number of seconds to an Interval would look like

$number::seconds

the main advantages to this over other methods is that

  1. The unit is postfix in the usual place we expect to see units of measure.
  2. It provides us with a type we can hang accessors off of such as $number::seconds -> justify() described below.

Creating an Interval from multiple units looks something like

$num1::days + $num2::hours + $num3::minutes

Which could be seen as nice, or unpleasant depending on taste.

The main disadvantages is that it's potentially abusing casts in a way they're not designed for. Also if Postgres does finally deprecate the :: syntax this would being as nice.

Though these don't seem like deal breakers

More Details

For this method we'd create a slew of types, one for each of the units we want to convert to. This would most likely include years, months, hours, minutes, seconds, milliseconds, microseconds. Each would have an (assignment?) cast from the integer types, and possibly the float and numeric types (though how fractions are handled would need to be determined). These casts can be implemented using the current TEXT conversion path for a PoC. Each type would also have an implicit cast to Interval

Alternatives

There are two major alternatives to type-based conversion a mega-function would define a single function such as

interval_of(years=>$y, months=>$m, hours=>$h, minutes=>$m, seconds=>$s, milliseconds=>$ms, microseconds=>$us)

which can convert from any number of numbers to an Interval. In practice, few users would use all the arguments, and usage would likely look something like

interval_of(seconds=>$num)

The alternative would be to use multiple functions, one for each unit, such as

years($num)
months($num)
seconds($num)

Multiple functions would also allow us to create types per-until if we want them.

Justification and other transformations

TODO

Metadata

Assignees

No one assigned

    Labels

    feature-requestAnd area of analysis that could be made easier

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions