Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New example] XML to BigQuery via Dataflow #716

Merged
merged 8 commits into from
Nov 15, 2021

Conversation

aymanfarhat
Copy link
Member

Dataflow XML to BigQuery via XMLIO

Adding an example related to loading XML data into a BigQuery nested table via Dataflow and XMLIO. I didn't add unit tests as requested since I found the example to be simple enough not to need one. Feel free to suggest thought if you have any test ideas in mind and I can update accordingly.

@pull-request-size pull-request-size bot added the size/XL Denotes a PR that changes 500-999 lines. label Nov 11, 2021
@google-cla google-cla bot added the cla: yes All committers have signed a CLA label Nov 11, 2021
@iht iht self-requested a review November 12, 2021 08:15
@iht
Copy link
Member

iht commented Nov 12, 2021

/gcbrun

@iht
Copy link
Member

iht commented Nov 12, 2021

Before we continue with the review, please could you fix the build? There are style issues with the following files:

Some files need to be formatted in ./examples/dataflow-xmlio-to-bq - FAIL
./examples/dataflow-xmlio-to-bq/src/main/java/xml2bq/ShipInfo.java
./examples/dataflow-xmlio-to-bq/src/main/java/xml2bq/XmlIoDemo.java
./examples/dataflow-xmlio-to-bq/src/main/java/xml2bq/Order.java

You can check the style and fix it automatically using the google-java-format tool (https://github.com/google/google-java-format). For IntelliJ and other IDEs, you have plugins that do it automatically (just in case you are using an IDE with a supported plugin).

@pull-request-size pull-request-size bot added size/XXL Denotes a PR that changes 1000+ lines. and removed size/XL Denotes a PR that changes 500-999 lines. labels Nov 12, 2021
@iht
Copy link
Member

iht commented Nov 12, 2021

/gcbrun

@aymanfarhat
Copy link
Member Author

/gcbrun

@aymanfarhat
Copy link
Member Author

aymanfarhat commented Nov 12, 2021

@iht Thanks for the tip, I guess the checks are now passing. Had an issue between the IDE formatter and the jar based formatter for some reason. Mainly related to spaces and import ordering it seems.

@iht
Copy link
Member

iht commented Nov 15, 2021

Reviewing now. Thanks for fixing the build.

Copy link
Member

@iht iht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment about exporting credentials to run Dataflow.

## Running the pipeline

```
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/application-credentials.json" # If running locally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporting a key file is not necessary if you use gcloud auth application-default login to use your personal account, or gcloud config set auth/impersonate_service_account SERVICE_ACCOUNT (assuming you have permission to use that service account). By using any of the above options, you don't have to export credentials, which are always a risk, and it is against the Google Cloud recommendations (https://services.google.com/fh/files/misc/google-cloud-security-foundations-guide.pdf).

Please could you change that line and include a paragraph explaining that the user should have gcloud installed, and use any of the above commands? (and possibly that it would work "out of the box" if they are using the Cloud Shell).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Ah yes good point, sorry I missed that. README updated.

* See the License for the specific language governing permissions and
* limitations under the License.
*/
package xml2bq;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe com.google.cloud.pso.xml2bq

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@aymanfarhat
Copy link
Member Author

/gcbrun

Copy link
Member

@iht iht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution!!

@iht iht merged commit b2eda70 into GoogleCloudPlatform:main Nov 15, 2021
@aymanfarhat
Copy link
Member Author

@iht Thanks for the review!

@aymanfarhat aymanfarhat deleted the feature/xml2bq-demo branch November 15, 2021 15:29
rosmo pushed a commit to rosmo/professional-services that referenced this pull request Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes All committers have signed a CLA size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants