Skip to content

Commit 662ed72

Browse files
committed
docs: batching
1 parent ffdbfd3 commit 662ed72

File tree

3 files changed

+23
-28
lines changed

3 files changed

+23
-28
lines changed

README.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
### BeamSnowflakeExamples
1+
## BeamSnowflakeExamples
22

33
This repository contains examples of using [Snowflake](https://www.snowflake.com/) with [Apache Beam](https://github.com/apache/beam).
44
Precisely contains batching, streaming and cross-language usage examples.
55

6-
#### Setup of third parties
6+
### Setup required by all examples:
77
1. [Create Snowflake Account](https://trial.snowflake.com/?utm_cta=website-homepage-hero-free-trial&_ga=2.199198959.1328097007.1590138521-373661872.1583847959)
88
with Google Cloud Platform as a cloud provider.
99
2. Make sure that your default role for your username is set to ACCOUNTADMIN
@@ -26,13 +26,23 @@ with Google Cloud Platform as a cloud provider.
2626
STORAGE_ALLOWED_LOCATIONS = ('gcs://<BUCKET NAME>');
2727
```
2828
Please note that `gcs` prefix is used here, not `gs`.
29-
8. Authorize Snowflake to operate on your bucket by following [Grant the Service Account Permissions to Access Bucket Objects](https://docs.snowflake.com/en/user-guide/data-load-gcs-config.html#step-3-grant-the-service-account-permissions-to-access-bucket-objects)
29+
8. Authorize Snowflake to operate on your bucket by following [Step 3. Grant the Service Account Permissions to Access Bucket Objects](https://docs.snowflake.com/en/user-guide/data-load-gcs-config.html#step-3-grant-the-service-account-permissions-to-access-bucket-objects)
3030
9. Setup gcloud on your computer by following [Using the Google Cloud SDK installer](https://cloud.google.com/sdk/docs/downloads-interactive)
3131
10. Run one of the provided examples.
3232
33-
#### Batching example
34-
An example that contains batch writing and reading from Snowflake. Inspired by [Apache Beam/WordCount-example](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java)
33+
### Batching example
34+
An example contains batch writing into Snowflake and batch reading from Snowflake. Inspired by [Apache Beam/WordCount-example](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java).
3535
36+
An example consists of two pipelines:
37+
* Writing into Snowflake
38+
1. Reading files from provided by `inputFile` argument.
39+
2. Counting words
40+
3. Writing counts into Snowflake table provided by `tableName` argument.
41+
* Reading from Snowflake
42+
1. Reading counts from Snowflake table provided by `tableName` argument.
43+
2. Writing counts into provided by `output` argument.
44+
45+
#### Executing:
3646
1. Run batching example by executing following command:
3747
```
3848
./gradlew run --args=" /
@@ -43,6 +53,7 @@ An example that contains batch writing and reading from Snowflake. Inspired by [
4353
--password=<SNOWFLAKE PASSWORD> /
4454
--database=<SNOWFLAKE DATABASE> /
4555
--schema=<SNOWFLAKE SCHEMA> /
56+
--tableName=<SNOWFLAKE TABLE NAME> /
4657
--storageIntegration=<SNOWFLAKE STORAGE INTEGRATION NAME> /
4758
--stagingBucketName=<GCS BUCKET NAME> /
4859
--runner=<DirectRunner/DataflowRunner> /
@@ -51,17 +62,17 @@ An example that contains batch writing and reading from Snowflake. Inspired by [
5162
--region=<FOR DATAFLOW RUNNER: GCP REGION> /
5263
--appName=<OPTIONAL: DATAFLOW JOB NAME PREFIX>"
5364
```
54-
2. Go to Snowflake console to check saved counts
65+
2. Go to Snowflake console to check saved counts:
5566
```
56-
select * from <DATABASE NAME>.<SCHEMA NAME>.WORD_COUNT;
67+
select from <DATABASE NAME>.<SCHEMA NAME>.WORD_COUNT;
5768
```
5869
![Batching snowflake result](./images/batching_snowflake_result.png)
59-
3. Go to GCS bucket to check saved files
70+
3. Go to GCS bucket to check saved files:
6071
![Batching gcs result](./images/batching_gcs_result.png)
61-
4. Go to DataFlow to check submitted jobs
72+
4. Go to DataFlow to check submitted jobs:
6273
![Batching DataFlow result](./images/batching_dataflow_result.png)
6374
6475
65-
#### Streaming example
76+
### Streaming example
6677
67-
#### Cross-language example
78+
### Cross-language example

src/main/java/batching/SnowflakeWordCount.java

Lines changed: 1 addition & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31,20 +31,7 @@
3131
/**
3232
* An example that contains batch writing and reading from Snowflake. Inspired by Apache Beam/WordCount-example(https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java)
3333
*
34-
* An example consists of two piplines:
35-
* a) Writing into Snowflake
36-
* b) Reading from Snowflake
37-
*
38-
* The flow of writing into Snowflake is following:
39-
* 1. Reading provided files
40-
* 2. Counting words
41-
* 3. Writing counts into Snowflake
42-
*
43-
* The flow of reading from Snowflake is following:
44-
* 1. Reading counts from Snowflake
45-
* 2. Writing counts into output
46-
*
47-
* Check main README for executing
34+
* Check main README for more information.
4835
*/
4936
public class SnowflakeWordCount {
5037

src/main/java/batching/WordCountRow.java

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,6 @@
22

33
import java.io.Serializable;
44

5-
/**
6-
* TODO
7-
*/
85
public class WordCountRow implements Serializable {
96
private String word;
107
private Long count;

0 commit comments

Comments
 (0)