Skip to content

SELECT * EXCEPT/EXCLUDE #27

@johnedquinn

Description

@johnedquinn

Description

I have personally seen several users of PartiQL with obscenely large queries using SELECT VALUE for the specific use-case of filtering out attributes from the input binding tuple. As an illustrative example, consider a table containing 26 attributes, where each attribute's name is a letter in the English alphabet.

In order to exclude an attribute (say, d) from the table in a PartiQL query, we would need to write something like:

SELECT
  a, b, c, e, ...
  q, r, s, t, u, v, w, x, y, z
FROM
  alphabet_table

Or, another common approach from PartiQL users is:

SELECT VALUE {
  'a': a,
  'b': b,
  ...
  'y': y,
  'z': z
}
FROM
  alphabet_table

I've even conversed with some individuals who needed to generate queries to exclude certain attributes from tables that have multiple thousands of attributes.

Possible Solutions

SELECT * EXCEPT (BigQuery)

A possible solution is syntax such as:

SELECT * EXCEPT (d)
FROM alphabet_table

The above is syntax supported by Google's BigQuery. Similarly, they have support for syntax such as:

SELECT * REPLACE (quantity/2 AS quantity)
FROM some_table

We could also accomplish the same as SELECT * EXCEPT by writing:

SELECT * REPLACE (MISSING AS d)
FROM alphabet_table

As MISSING is stripped from the output result in PartiQL, the above is likely valid as a replacement for EXCEPT!

Allowing LET to Modify Binding Tuples

A somewhat similar approach of replacement could be a syntax such as:

SELECT *
FROM some_table AS t
LET MISSING AS t.d

By expanding support for introducing variables in the LET to include replacing attributes of a binding tuple, this approach could be possible. However, there are some immediate cons such as ambiguity in naming for JOINs, projection pushdown, and variables introduced AFTER the LET.

Nested Data (Using both SELECT REPLACE and a function)

While SELECT * EXCEPT could be valuable for removing columns from the final projection, how could we remove attributes from any nested data (struct) in the result projection? As an example, consider the binding tuple:

<
  'a': 0,
  'b': {
    'c': 2,
    'd': 3,
    ...
  },
  'x': 23,
  'y': 25,
  'z': 26
>

How could we remove d from b while still outputting the rest of the result?

SELECT * REPLACE (
  (SELECT * EXCEPT d FROM src.b) AS b
)
FROM alphabet_nested_table AS src

This could theoretically work if we coerce src.b into a bag, but now we have a bag in the projection list. Perhaps, then, for nested attributes, implementing a function to redact a struct's columns could be easier & worthwhile. I am personally unknowledgeable on the topic of projection pushdown for nested structures, but this custom function could possibly negatively affect our ability to perform the pushdown. However, let's keep the train of thought:

Let's try to remove both y and b.d from the result:

SELECT * REPLACE (
  MISSING AS y,
  redact_columns(b, ['d']) AS b
)
FROM alphabet_nested_table

Or, with some naming:

SELECT * REPLACE (
  MISSING AS y,
  redact_columns(b, columns_to_redact_from_b) AS b
)
LET ['d'] AS columns_to_redact_from_b
FROM alphabet_nested_table

Well, that's certainly interesting and readable.

Revisiting Nested Data and Projection Pushdown

Ideally, PartiQL could support filtering of both columns and nested attributes of structs. I'll take this opportunity to explore some possible syntax:

SELECT * REPLACE (
  MISSING AS y,
  y.* EXCEPT y.b,  --- Or maybe just b. Or, even allowing just `y EXCEPT b` (not `y.*`)
  z.* REPLACE (
    some_nested_attr + 1 AS some_nested_attr,
    MISSING AS some_attr_to_be_removed
  )
)
FROM alphabet_nested_table

Then, we'd have to define rules for type mismatches, missing, null, etc. But, I'd argue that this is also interesting and readable! Even more interesting is possibly allowing replacing attributes of a struct as a first class expression! So, the following could be a valid query:

a REPLACE (
  b + 1 AS b
)
===
{
  'b': 2
}

And with highly nested data such as:

{
  'a': {
    'b': {
      'c': {
        'd': 0,
        'e': 1,
        'f': 2
      }
    }
  }
}
SELECT * REPLACE (

  -- Option # 1 (Un-nesting to replace)
  a REPLACE (
    b REPLACE (
      c REPLACE (
        d + 1 AS d
 )))

  -- Option # 2 (Unsure about this)
  a.b.c REPLACE (
    d + 1 AS d
  ) AS a.b.c

  -- Option # 3 (Same as option # 2, but we don't have the nested renaming)
  a.b.c REPLACE (
    d + 1 AS d
  )

) FROM some_highly_nested_table

Other Links

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions