Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate watermarks and set event time using expressions #511

Merged
merged 2 commits into from
Jan 26, 2024

Conversation

jbeisen
Copy link
Collaborator

@jbeisen jbeisen commented Jan 25, 2024

Modify the periodic watermark generator to accept a physical expressions that it uses to calculate the watermark. The expression can be set via a virtual field in the create table statement, or it defaults to a max lateness expression. The operator is inserted via a user-defined logical node.

Add an expression to the projection in the SourceRewriter to set the event time if present.

Putting it all together, this query

CREATE TABLE input (
  price bigint,
  double_price bigint generated always as (price * 2) stored,
  my_time timestamp generated always as (cast('2020-01-01' as timestamp)) stored
) WITH (
  connector = 'filesystem',
  type = 'source',
  path = '/Users/jonaheisen/dev/data',
  format = 'parquet',
  'source.regex-pattern' = '.*\.parquet$',
  event_time_field = 'my_time',
  watermark_field = 'my_time'
);

select * from input;

generates this job graph

image

@jbeisen jbeisen changed the title Add ExpressionWatermarkGenerator operator Generate watermarks using expressions Jan 25, 2024
@jbeisen jbeisen marked this pull request as ready for review January 25, 2024 23:25
@jbeisen jbeisen force-pushed the watermark branch 2 times, most recently from 52d46ef to e5ede11 Compare January 26, 2024 00:50
@jbeisen jbeisen changed the title Generate watermarks using expressions Generate watermarks and set event time using expressions Jan 26, 2024
arroyo-df/src/lib.rs Show resolved Hide resolved
arroyo-df/src/plan_graph.rs Outdated Show resolved Hide resolved
arroyo-df/src/source_rewriter.rs Show resolved Hide resolved
Modify the periodic watermark generator to accept a physical expressions
that it uses to calculate the watermark. The expression can be set via a
virtual field in the create table statement, or it defaults to a max
lateness expression. The operator is inserted via a user-defined logical
node.

Add an expression to the projection in the SourceRewriter to set the
event time if present.
@jacksonrnewhouse jacksonrnewhouse merged commit 236d014 into dev Jan 26, 2024
8 checks passed
@jbeisen jbeisen deleted the watermark branch January 26, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants