-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New example] XML to BigQuery via Dataflow #716
[New example] XML to BigQuery via Dataflow #716
Conversation
/gcbrun |
Before we continue with the review, please could you fix the build? There are style issues with the following files:
You can check the style and fix it automatically using the |
/gcbrun |
/gcbrun |
@iht Thanks for the tip, I guess the checks are now passing. Had an issue between the IDE formatter and the jar based formatter for some reason. Mainly related to spaces and import ordering it seems. |
Reviewing now. Thanks for fixing the build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment about exporting credentials to run Dataflow.
## Running the pipeline | ||
|
||
``` | ||
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/application-credentials.json" # If running locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exporting a key file is not necessary if you use gcloud auth application-default login
to use your personal account, or gcloud config set auth/impersonate_service_account SERVICE_ACCOUNT
(assuming you have permission to use that service account). By using any of the above options, you don't have to export credentials, which are always a risk, and it is against the Google Cloud recommendations (https://services.google.com/fh/files/misc/google-cloud-security-foundations-guide.pdf).
Please could you change that line and include a paragraph explaining that the user should have gcloud
installed, and use any of the above commands? (and possibly that it would work "out of the box" if they are using the Cloud Shell).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. Ah yes good point, sorry I missed that. README updated.
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package xml2bq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe com.google.cloud.pso.xml2bq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your contribution!!
@iht Thanks for the review! |
Dataflow XML to BigQuery via XMLIO
Adding an example related to loading XML data into a BigQuery nested table via Dataflow and XMLIO. I didn't add unit tests as requested since I found the example to be simple enough not to need one. Feel free to suggest thought if you have any test ideas in mind and I can update accordingly.