Skip to content

let BigQueryGetData operator take a list of fields for the "order by" clause #39127

@lopezvit

Description

@lopezvit

Description

Sometimes you just need a the latest value of a field (e.g. updatedAt) so further operators downstream could use said value in their own query.
This can be done by SELECT MAX(updatedAt) [...] but that would required a lot of re-write, when simply adding a new param ordering_fields could solve the same issue, allowing to create a query similar to:
SELECT updatedAt FROM [...] LIMIT 1 ORDER BY updatedAt DESC

Example implementation (not tested):

def generate_query(self, hook: BigQueryHook) -> str:
    """Generate a SELECT query if for the given dataset and table ID."""
    query = "select "
    if self.selected_fields:
        query += self.selected_fields
    else:
        query += "*"
    query += (
        f" from `{self.table_project_id or hook.project_id}.{self.dataset_id}"
        f".{self.table_id}` limit {self.max_results}"
    )
    if self.ordering_fields:
        query += f" ORDER BY {self.ordering_fields}"
    return query

Use case/motivation

The operator BigQueryGetData should have 1 more params ordering_fields so the generated query would also include the ORDER BY clause.

Related issues

#24460

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions