Skip to content

Commit

Permalink
Data Labeling Beta samples (#2096)
Browse files Browse the repository at this point in the history
* add files

* upate create_annotation_spec_set and test

* add requirements.txt

* update create_instruction and test

* update import data and test

* add label image and test

* add label_text test

* add label_video_test

* add manage dataset and tests

* flake

* fix

* add README
  • Loading branch information
beccasaurus authored and dizcology committed Apr 5, 2019
1 parent c4a0e6d commit d487509
Show file tree
Hide file tree
Showing 18 changed files with 1,270 additions and 0 deletions.
78 changes: 78 additions & 0 deletions datalabeling/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
.. This file is automatically generated. Do not edit this file directly.
Google Cloud Data Labeling Service Python Samples
===============================================================================

.. image:: https://gstatic.com/cloudssh/images/open-btn.png
:target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=datalabeling/README.rst


This directory contains samples for Google Cloud Data Labeling Service. `Google Cloud Data Labeling Service`_ allows developers to request having human labelers label a collection of data that you plan to use to train a custom machine learning model.




.. _Google Cloud Data Labeling Service: https://cloud.google.com/data-labeling/docs/

Setup
-------------------------------------------------------------------------------


Authentication
++++++++++++++

This sample requires you to have authentication setup. Refer to the
`Authentication Getting Started Guide`_ for instructions on setting up
credentials for applications.

.. _Authentication Getting Started Guide:
https://cloud.google.com/docs/authentication/getting-started

Install Dependencies
++++++++++++++++++++

#. Clone python-docs-samples and change directory to the sample directory you want to use.

.. code-block:: bash
$ git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
#. Install `pip`_ and `virtualenv`_ if you do not already have them. You may want to refer to the `Python Development Environment Setup Guide`_ for Google Cloud Platform for instructions.

.. _Python Development Environment Setup Guide:
https://cloud.google.com/python/setup

#. Create a virtualenv. Samples are compatible with Python 2.7 and 3.4+.

.. code-block:: bash
$ virtualenv env
$ source env/bin/activate
#. Install the dependencies needed to run the samples.

.. code-block:: bash
$ pip install -r requirements.txt
.. _pip: https://pip.pypa.io/
.. _virtualenv: https://virtualenv.pypa.io/



The client library
-------------------------------------------------------------------------------

This sample uses the `Google Cloud Client Library for Python`_.
You can read the documentation for more details on API usage and use GitHub
to `browse the source`_ and `report issues`_.

.. _Google Cloud Client Library for Python:
https://googlecloudplatform.github.io/google-cloud-python/
.. _browse the source:
https://github.com/GoogleCloudPlatform/google-cloud-python
.. _report issues:
https://github.com/GoogleCloudPlatform/google-cloud-python/issues


.. _Google Cloud SDK: https://cloud.google.com/sdk/
18 changes: 18 additions & 0 deletions datalabeling/README.rst.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# This file is used to generate README.rst

product:
name: Google Cloud Data Labeling Service
short_name: Cloud Data Labeling
url: https://cloud.google.com/data-labeling/docs/
description: >
`Google Cloud Data Labeling Service`_ allows developers to request having
human labelers label a collection of data that you plan to use to train a
custom machine learning model.

setup:
- auth
- install_deps

cloud_client_library: true

folder: datalabeling
77 changes: 77 additions & 0 deletions datalabeling/create_annotation_spec_set.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse


# [START datalabeling_create_annotation_spec_set_beta]
def create_annotation_spec_set(project_id):
"""Creates a data labeling annotation spec set for the given
Google Cloud project.
"""
from google.cloud import datalabeling_v1beta1 as datalabeling
client = datalabeling.DataLabelingServiceClient()

project_path = client.project_path(project_id)

annotation_spec_1 = datalabeling.types.AnnotationSpec(
display_name='label_1',
description='label_description_1'
)

annotation_spec_2 = datalabeling.types.AnnotationSpec(
display_name='label_2',
description='label_description_2'
)

annotation_spec_set = datalabeling.types.AnnotationSpecSet(
display_name='YOUR_ANNOTATION_SPEC_SET_DISPLAY_NAME',
description='YOUR_DESCRIPTION',
annotation_specs=[annotation_spec_1, annotation_spec_2]
)

response = client.create_annotation_spec_set(
project_path, annotation_spec_set)

# The format of the resource name:
# project_id/{project_id}/annotationSpecSets/{annotationSpecSets_id}
print('The annotation_spec_set resource name: {}'.format(response.name))
print('Display name: {}'.format(response.display_name))
print('Description: {}'.format(response.description))
print('Annotation specs:')
for annotation_spec in response.annotation_specs:
print('\tDisplay name: {}'.format(annotation_spec.display_name))
print('\tDescription: {}\n'.format(annotation_spec.description))

return response
# [END datalabeling_create_annotation_spec_set_beta]


if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter
)

parser.add_argument(
'--project-id',
help='Project ID. Required.',
required=True
)

args = parser.parse_args()

create_annotation_spec_set(args.project_id)
36 changes: 36 additions & 0 deletions datalabeling/create_annotation_spec_set_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env python

# Copyright 2019 Google, Inc
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import create_annotation_spec_set
from google.cloud import datalabeling_v1beta1 as datalabeling
import pytest

PROJECT_ID = os.getenv('GCLOUD_PROJECT')


@pytest.mark.slow
def test_create_annotation_spec_set(capsys):
response = create_annotation_spec_set.create_annotation_spec_set(
PROJECT_ID)
out, _ = capsys.readouterr()
assert 'The annotation_spec_set resource name:' in out

# Delete the created annotation spec set.
annotation_spec_set_name = response.name
client = datalabeling.DataLabelingServiceClient()
client.delete_annotation_spec_set(annotation_spec_set_name)
93 changes: 93 additions & 0 deletions datalabeling/create_instruction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/usr/bin/env python

# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse


# [START datalabeling_create_instruction_beta]
def create_instruction(project_id, data_type, instruction_gcs_uri):
""" Creates a data labeling PDF instruction for the given Google Cloud
project. The PDF file should be uploaded to the project in
Google Cloud Storage.
"""
from google.cloud import datalabeling_v1beta1 as datalabeling
client = datalabeling.DataLabelingServiceClient()

project_path = client.project_path(project_id)

pdf_instruction = datalabeling.types.PdfInstruction(
gcs_file_uri=instruction_gcs_uri)

instruction = datalabeling.types.Instruction(
display_name='YOUR_INSTRUCTION_DISPLAY_NAME',
description='YOUR_DESCRIPTION',
data_type=data_type,
pdf_instruction=pdf_instruction
)

operation = client.create_instruction(project_path, instruction)

result = operation.result()

# The format of the resource name:
# project_id/{project_id}/instruction/{instruction_id}
print('The instruction resource name: {}\n'.format(result.name))
print('Display name: {}'.format(result.display_name))
print('Description: {}'.format(result.description))
print('Create time:')
print('\tseconds: {}'.format(result.create_time.seconds))
print('\tnanos: {}'.format(result.create_time.nanos))
print('Data type: {}'.format(
datalabeling.enums.DataType(result.data_type).name))
print('Pdf instruction:')
print('\tGcs file uri: {}'.format(
result.pdf_instruction.gcs_file_uri))

return result
# [END datalabeling_create_instruction_beta]


if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter
)

parser.add_argument(
'--project-id',
help='Project ID. Required.',
required=True
)

parser.add_argument(
'--data-type',
help='Data type. Only support IMAGE, VIDEO, TEXT and AUDIO. Required.',
required=True
)

parser.add_argument(
'--instruction-gcs-uri',
help='The URI of Google Cloud Storage of the instruction. Required.',
required=True
)

args = parser.parse_args()

create_instruction(
args.project_id,
args.data_type,
args.instruction_gcs_uri
)
41 changes: 41 additions & 0 deletions datalabeling/create_instruction_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env python

# Copyright 2019 Google, Inc
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import create_instruction
from google.cloud import datalabeling_v1beta1 as datalabeling
import pytest

PROJECT_ID = os.getenv('GCLOUD_PROJECT')
INSTRUCTION_GCS_URI = ('gs://cloud-samples-data/datalabeling'
'/instruction/test.pdf')


@pytest.mark.slow
def test_create_instruction(capsys):
result = create_instruction.create_instruction(
PROJECT_ID,
'IMAGE',
INSTRUCTION_GCS_URI
)
out, _ = capsys.readouterr()
assert 'The instruction resource name: ' in out

# Delete the created instruction.
instruction_name = result.name
client = datalabeling.DataLabelingServiceClient()
client.delete_instruction(instruction_name)
Loading

0 comments on commit d487509

Please sign in to comment.