Skip to content

Commit

Permalink
docs: add details to the read operator stating that the filter applie…
Browse files Browse the repository at this point in the history
…s to the pre-masked schema and must be fully satisfied
  • Loading branch information
westonpace committed Jul 29, 2022
1 parent b6c3772 commit 3d0a60b
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,15 @@ The read operator is an operator that produces one output. A simple example woul
| ----------------- | ------------------------------------------------------------ | ------------------------------------ |
| Definition | The contents of the read property definition. | Required |
| Direct Schema | Defines the schema of the output of the read (before any projection or emit remapping/hiding). | Required |
| Filter | A boolean Substrait expression that describes the filter of an iceberg dataset. TBD: define how field referencing works. | Optional, defaults to none. |
| Filter | A boolean Substrait expression that describes a filter that must be applied to the data. The filter should be interpreted against the direct schema and not the masked projected schema. As a result fields that are used for filtering do not have to be included in the output. | Optional, defaults to none. |
| Projection | A masked complex expression describing the portions of the content that should be read | Optional, defaults to all of schema |
| Output properties | Declaration of orderedness and/or distribution properties this read produces. | Optional, defaults to no properties. |
| Properties | A list of name/value pairs associated with the read. | Optional, defaults to empty |

### Read Filtering

Consumers can often take advantage of a ReadRel's filter property, combined with file metadata, statistics, and indices to reduce the amount of data the needs to be read. This technique is often referred to as "pushdown filtering". In many cases this is inexact and a filter can only be partially satisfied. In these cases the consumer must apply some kind of in-memory filtering operation to fully satisfy the filter.

### Read Definition Types

Read definition types are built by the community and added to the specification. This is a portion of specification that is expected to grow rapidly.
Expand Down

0 comments on commit 3d0a60b

Please sign in to comment.