Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdks/python/apache_beam/yaml/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
* [Blueprints](#blueprints)
* [Element-wise](#element-wise)
* [IO](#io)
* [Jinja](#jinja)
* [ML](#ml)

<!-- TOC -->
Expand Down
76 changes: 75 additions & 1 deletion website/www/site/content/en/documentation/sdks/yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -708,7 +708,7 @@ the yaml file can be parameterized with externally provided variables using
the [jinja variable syntax](https://jinja.palletsprojects.com/en/stable/templates/#variables).
The values are then passed via a `--jinja_variables` command line flag.

For example, one could start a pipeline with
For example, one could start a pipeline with:

```
pipeline:
Expand Down Expand Up @@ -742,6 +742,80 @@ or writing dated sources and sinks, e.g.

would write to files like `gs://path/to/2016/08/04/dated-output*.json`.

A user can also use the `% include` directive to pull in other common templates:

<PATH_TO_YOUR_REPO>/pipeline.yaml
```yaml
pipeline:
transforms:
- name: Read from GCS
type: ReadFromText
config:
# NOTE: For include, the indentation has to line up correctly for it to be
# parsed correctly. So in this example the included readFromText.yaml has
# already indented yaml lines to line up correctly when including into this
# pipeline here.
{% include '<PATH_TO_YOUR_REPO>/submodules/readFromText.yaml' %}
- name: Write to GCS
type: WriteToText
input: Read from GCS
config:
path: "gs://MY-BUCKET/wordCounts/"
```

<PATH_TO_YOUR_REPO>/submodules/readFromText.yaml
```yaml
path: {{readFromText.path}}
```

This pipeline can be run like this:

```sh
python -m apache_beam.yaml.main \
--yaml_pipeline_file=pipeline.yaml \
--jinja_variables='{"readFromText": {"path": "gs://dataflow-samples/shakespeare/kinglear.txt"}}'
```

The `% import` jinja directive can also be used to pull in macros:

<PATH_TO_YOUR_REPO>/pipeline.yaml
```yaml
{% import '<PATH_TO_YOUR_REPO>/macros.yaml' as macros %}

pipeline:
type: chain
transforms:

# Read in text file
{{ macros.readFromText(readFromText) | indent(4, true) }}

# Write to text file on GCS, locally, etc
- name: Write to GCS
type: WriteToText
input: Read from GCS
config:
path: "gs://MY-BUCKET/wordCounts/"
```

<PATH_TO_YOUR_REPO>/macros.yaml
```yaml
{%- macro readFromText(params) -%}
- name: Read from GCS
type: ReadFromText
config:
path: "{{ params.path }}"
{%- endmacro -%}
```

This pipeline can be run with the same command as in the `% include` example
above.

There are many more ways to import and even use template inheritance using
Jinja as seen [here](https://jinja.palletsprojects.com/en/stable/templates/#import)
and [here](https://jinja.palletsprojects.com/en/stable/templates/#inheritance).

Full jinja pipeline examples can be found [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples/transforms/jinja).

## Other Resources

* [Example pipeline](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml/examples)
Expand Down
Loading