-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DLP samples #752
Merged
Merged
DLP samples #752
Changes from 1 commit
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
8f1feb0
Initial commit : DLP samples
jabubake 89278e3
Merge branch 'master' into dlp-samples
lesv 511a22b
fixes from code review
jabubake 8d72ade
Merge branch 'dlp-samples' of https://github.com/GoogleCloudPlatform/…
jabubake 2d5850f
updating to release version
jabubake bf00d59
Merge branch 'master' of https://github.com/GoogleCloudPlatform/java-…
jabubake 99bdc33
simplifying env vars used in integration tests.
jabubake cb85e68
cleanup
jabubake 6ce5d8c
Merge branch 'master' into dlp-samples
jabubake File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# Cloud Data Loss Prevention (DLP) API Samples | ||
The [Data Loss Prevention API](https://cloud.google.com/dlp/docs/) provides programmatic access to | ||
a powerful detection engine for personally identifiable information and other privacy-sensitive data | ||
in unstructured data streams. | ||
|
||
## Setup | ||
- A Google Cloud project with billing enabled | ||
- [Enable](https://console.cloud.google.com/launcher/details/google/dlp.googleapis.com) the DLP API. | ||
- (Local testing)[Create a service account](https://cloud.google.com/docs/authentication/getting-started) | ||
and set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable pointing to the downloaded credentials file. | ||
|
||
## Build | ||
This project uses the [Assembly Plugin](https://maven.apache.org/plugins/maven-assembly-plugin/usage.html) to build an uber jar. | ||
Run: | ||
``` | ||
mvn clean package | ||
``` | ||
|
||
## Retrieve InfoTypes | ||
An [InfoType identifier](https://cloud.google.com/dlp/docs/infotypes-categories) represents an element of sensitive data. | ||
|
||
[Info types](https://cloud.google.com/dlp/docs/infotypes-reference#global) are updated periodically. Use the API to retrieve the most current | ||
info types for a given category. eg. HEALTH or GOVERNMENT. | ||
``` | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata -category GOVERNMENT | ||
``` | ||
|
||
## Retrieve Categories | ||
[Categories](https://cloud.google.com/dlp/docs/infotypes-categories) provide a way to easily access a group of related InfoTypes. | ||
``` | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata | ||
``` | ||
|
||
## Inspect data for sensitive elements | ||
Inspect strings, files locally and on Google Cloud Storage and Cloud Datastore kinds with the DLP API. | ||
|
||
Note: image scanning is not currently supported on Google Cloud Storage. | ||
For more information, refer to the [API documentation](https://cloud.google.com/dlp/docs). | ||
Optional flags are explained in [this resource](https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/inspect#InspectConfig). | ||
``` | ||
Commands: | ||
-s <string> Inspect a string using the Data Loss Prevention API. | ||
-f <filepath> Inspects a local text, PNG, or JPEG file using the Data Loss Prevention API. | ||
-gcs -bucketName <bucketName> -fileName <fileName> Inspects a text file stored on Google Cloud Storage using the Data Loss | ||
Prevention API. | ||
-ds -projectId [projectId] -namespace [namespace] - kind <kind> Inspect a Datastore instance using the Data Loss Prevention API. | ||
|
||
Options: | ||
--help Show help | ||
-minLikelihood [string] [choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"] | ||
[default: "LIKELIHOOD_UNSPECIFIED"] | ||
specifies the minimum reporting likelihood threshold. | ||
-f, --maxFindings [number] [default: 0] | ||
maximum number of results to retrieve | ||
-q, --includeQuote [boolean] [default: true] include matching string in results | ||
-t, --infoTypes restrict to limited set of infoTypes [ default: []] | ||
[ eg. PHONE_NUMBER US_PASSPORT] | ||
``` | ||
### Examples | ||
- Inspect a string: | ||
``` | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is me@somedomain.com" | ||
``` | ||
- Inspect a local file (text / image): | ||
``` | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.txt | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.png | ||
``` | ||
- Inspect a file on Google Cloud Storage: | ||
``` | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -gcs -bucketName my-bucket -fileName my-file.txt | ||
``` | ||
- Inspect a Google Cloud Datastore kind: | ||
``` | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -ds -kind my-kind | ||
``` | ||
|
||
## Automatic redaction of sensitive data | ||
[Automatic redaction](https://cloud.google.com/dlp/docs/classification-redaction) produces an output with sensitive data matches removed. | ||
|
||
``` | ||
Commands: | ||
-s <string> Source input string | ||
-r <replacement string> String to replace detected info types | ||
Options: | ||
--help Show help | ||
-minLikelihood choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"] | ||
[default: "LIKELIHOOD_UNSPECIFIED"] | ||
specifies the minimum reporting likelihood threshold. | ||
|
||
-infoTypes restrict operation to limited set of info types [ default: []] | ||
[ eg. PHONE_NUMBER US_PASSPORT] | ||
``` | ||
|
||
### Example | ||
- Replace sensitive data in text with `_REDACTED_`: | ||
``` | ||
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Redact -s "My phone number is (123) 456-7890 and my email address is me@somedomain.com" -r "_REDACTED_" | ||
``` | ||
|
||
## Integration tests | ||
### Setup | ||
- [Create a Google Cloud Storage bucket](https://console.cloud.google.com/storage) and upload [test.txt](src/test/resources/test.txt). | ||
- [Create a Google Cloud Datastore](https://console.cloud.google.com/datastore) kind and add an entity with properties: | ||
- `property1` : john@doe.com | ||
- `property2` : 343-343-3435 | ||
- Ensure the following environment variables are set: | ||
- `GOOGLE_APPLICATION_CREDENTIALS` points to authorized service account credentials file. | ||
- `DLP_BUCKET_ID` points to Google Cloud Storage bucket that contains the sample text document. | ||
- `DLP_DATASTORE_KIND` points to a Datastore kind under default project. | ||
|
||
## Run | ||
Run all tests: | ||
``` | ||
mvn clean verify | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<!-- | ||
Copyright 2017 Google Inc. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
<!-- [START pom] --> | ||
<project> | ||
<modelVersion>4.0.0</modelVersion> | ||
<packaging>jar</packaging> | ||
<groupId>com.example</groupId> | ||
<artifactId>dlp-samples</artifactId> | ||
<version>1.0</version> | ||
|
||
<!-- Parent defines config for testing & linting. --> | ||
<parent> | ||
<artifactId>doc-samples</artifactId> | ||
<groupId>com.google.cloud</groupId> | ||
<version>1.0.0</version> | ||
<relativePath>..</relativePath> | ||
</parent> | ||
|
||
<properties> | ||
<maven.compiler.source>1.8</maven.compiler.source> | ||
<maven.compiler.target>1.8</maven.compiler.target> | ||
<google.auth.version>0.7.0</google.auth.version> | ||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
</properties> | ||
|
||
<!-- Temporary workaround for known issue : https://github.com/GoogleCloudPlatform/google-cloud-java/issues/2192 --> | ||
<dependencyManagement> | ||
<dependencies> | ||
<dependency> | ||
<groupId>com.google.auth</groupId> | ||
<artifactId>google-auth-library-credentials</artifactId> | ||
<version>${google.auth.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.google.auth</groupId> | ||
<artifactId>google-auth-library-oauth2-http</artifactId> | ||
<version>${google.auth.version}</version> | ||
</dependency> | ||
</dependencies> | ||
</dependencyManagement> | ||
<!--- End of workaround --> | ||
|
||
<dependencies> | ||
<!-- [START dlp_maven] --> | ||
<dependency> | ||
<groupId>com.google.cloud</groupId> | ||
<artifactId>google-cloud-dlp</artifactId> | ||
<!-- TODO update with release version --> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note : this needs to be updated to a release version prior to merge. |
||
<version>0.20.2-alpha-SNAPSHOT</version> | ||
</dependency> | ||
<!-- [END dlp_maven] --> | ||
<dependency> | ||
<groupId>commons-cli</groupId> | ||
<artifactId>commons-cli</artifactId> | ||
<version>1.4</version> | ||
</dependency> | ||
<!-- Test dependencies --> | ||
<dependency> | ||
<groupId>junit</groupId> | ||
<artifactId>junit</artifactId> | ||
<version>4.12</version> | ||
</dependency> | ||
</dependencies> | ||
<!-- Build jar with dependencies for testing --> | ||
<build> | ||
<plugins> | ||
<plugin> | ||
<artifactId>maven-assembly-plugin</artifactId> | ||
<version>3.0.0</version> | ||
<configuration> | ||
<descriptorRefs> | ||
<descriptorRef>jar-with-dependencies</descriptorRef> | ||
</descriptorRefs> | ||
</configuration> | ||
<executions> | ||
<execution> | ||
<id>make-assembly</id> <!-- this is used for inheritance merges --> | ||
<phase>package</phase> <!-- bind to the packaging phase --> | ||
<goals> | ||
<goal>single</goal> | ||
</goals> | ||
</execution> | ||
</executions> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
</project> | ||
<!-- [END pom] --> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these are still required?