Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding usage doc for Natural Language API. #2062

Merged
merged 1 commit into from
Aug 23, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,13 @@

vision-usage

.. toctree::
:maxdepth: 0
:hidden:
:caption: Natural Language

language-usage

.. toctree::
:maxdepth: 0
:hidden:
Expand Down
279 changes: 279 additions & 0 deletions docs/language-usage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
Using the API
=============

The `Google Natural Language`_ API can be used to reveal the
structure and meaning of text via powerful machine
learning models. You can use it to extract information about
people, places, events and much more, mentioned in text documents,
news articles or blog posts. You can use it to understand
sentiment about your product on social media or parse intent from
customer conversations happening in a call center or a messaging
app. You can analyze text uploaded in your request or integrate
with your document storage on Google Cloud Storage.

.. warning::

This is a Beta release of Google Cloud Natural Language API. This
API is not intended for real-time usage in critical applications.

.. _Google Natural Language: https://cloud.google.com/natural-language/docs/getting-started

Client
------

:class:`~gcloud.language.client.Client` objects provide a
means to configure your application. Each instance holds
both a ``project`` and an authenticated connection to the
Natural Language service.

For an overview of authentication in ``gcloud-python``, see
:doc:`gcloud-auth`.

Assuming your environment is set up as described in that document,
create an instance of :class:`~gcloud.language.client.Client`.

.. code-block:: python

>>> from gcloud import language
>>> client = language.Client()

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.


By default the ``language`` is ``'en'`` and the ``encoding`` is
UTF-8. To over-ride these values:

.. code-block:: python

>>> client = language.Client(language='es',
... encoding=encoding=language.Encoding.UTF16)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.


The encoding can be one of
:attr:`Encoding.UTF8 <gcloud.language.document.Encoding.UTF8>`,
:attr:`Encoding.UTF16 <gcloud.language.document.Encoding.UTF16>`, or
:attr:`Encoding.UTF32 <gcloud.language.document.Encoding.UTF32>`.

Methods
-------

The Google Natural Language API has three supported methods

- `analyzeEntities`_
- `analyzeSentiment`_
- `annotateText`_

and each method uses a `Document`_ for representing text. To
create a :class:`~gcloud.language.document.Document`,

.. code-block:: python

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.


>>> text_content = (
... 'Google, headquartered in Mountain View, unveiled the '

This comment was marked as spam.

This comment was marked as spam.

... 'new Android phone at the Consumer Electronic Show. '
... 'Sundar Pichai said in his keynote that users love '
... 'their new Android phones.')
>>> document = client.document_from_text(text_content)

This comment was marked as spam.

This comment was marked as spam.


By using :meth:`~gcloud.language.client.Client.document_from_text`,
the document's type is plain text:

.. code-block:: python

>>> document.doc_type == language.Document.PLAIN_TEXT
True

In addition, the document's language defaults to the language on
the client

.. code-block:: python

>>> document.language
'en'

This comment was marked as spam.

This comment was marked as spam.

>>> document.language == client.language
True

In addition, the
:meth:`~gcloud.language.client.Client.document_from_html`,
factory can be used to created an HTML document. In this
method and the from text method, the language can be
over-ridden:

.. code-block:: python

>>> html_content = """\
... <html>
... <head>
... <title>El Tiempo de las Historias</time>
... </head>
... <body>
... <p>La vaca salt&oacute; sobre la luna.</p>
... </body>
... </html>
... """
>>> document = client.document_from_html(html_content,
... language='es')

The ``language`` argument can be either ISO-639-1 or BCP-47 language

This comment was marked as spam.

This comment was marked as spam.

codes; at the time, only English, Spanish, and Japanese `are supported`_.
However, the ``analyzeSentiment`` method `only supports`_ English text.

.. _are supported: https://cloud.google.com/natural-language/docs/
.. _only supports: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment#body.request_body.FIELDS.document

The document type (``doc_type``) value can be one of
:attr:`Document.PLAIN_TEXT <gcloud.language.document.Document.PLAIN_TEXT>` or
:attr:`Document.HTML <gcloud.language.document.Document.HTML>`.

In addition to supplying the text / HTML content, a document can refer
to content stored in `Google Cloud Storage`_. We can use the
:meth:`~gcloud.language.client.Client.document_from_blob` method:

.. code-block:: python

>>> document = client.document_from_blob(bucket='my-text-bucket',

This comment was marked as spam.

This comment was marked as spam.

... blob='sentiment-me.txt')
>>> document.gcs_url
'gs://my-text-bucket/sentiment-me.txt'
>>> document.doc_type == language.Document.PLAIN_TEXT
True

and the :meth:`~gcloud.language.client.Client.document_from_uri`
method. In either case, the document type can be specified with
the ``doc_type`` argument:

.. code-block:: python

>>> gcs_url = 'gs://my-text-bucket/sentiment-me.txt'

This comment was marked as spam.

This comment was marked as spam.

>>> document = client.document_from_uri(

This comment was marked as spam.

This comment was marked as spam.

... gcs_url, doc_type=language.Document.HTML)
>>> document.gcs_url == gcs_url
True
>>> document.doc_type == language.Document.HTML
True

.. _analyzeEntities: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeEntities
.. _analyzeSentiment: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment
.. _annotateText: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText
.. _Document: https://cloud.google.com/natural-language/reference/rest/v1beta1/Document
.. _Google Cloud Storage: https://cloud.google.com/storage/

Analyze Entities
----------------

The :meth:`~gcloud.language.document.Document.analyze_entities` method
finds named entities (i.e. proper names) in the text and returns them
as a :class:`list` of :class:`~gcloud.language.entity.Entity` objects.
Each entity has a corresponding type, salience (prominence), associated
metadata and other properties.

.. code-block:: python

>>> text_content = ("Michelangelo Caravaggio, Italian painter, is "
... "known for 'The Calling of Saint Matthew'.")
>>> document = client.document(text_content)
>>> entities = document.analyze_entities()
>>> for entity in entities:
... print('=' * 20)
... print(' name: %s' % (entity.name,))
... print(' type: %s' % (entity.entity_type,))
... print('metadata: %s' % (entity.metadata,))
... print('salience: %s' % (entity.salience,))
====================
name: Michelangelo Caravaggio
type: PERSON
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Caravaggio'}
salience: 0.75942981
====================
name: Italian
type: LOCATION
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Italy'}
salience: 0.20193423
====================
name: The Calling of Saint Matthew
type: WORK_OF_ART
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/index.html?curid=2838808'}
salience: 0.03863598

Analyze Sentiment
-----------------

The :meth:`~gcloud.language.document.Document.analyze_sentiment` method
analyzes the sentiment of the provided text and returns a
:class:`~gcloud.language.sentiment.Sentiment`. Currently, this method
only supports English text.

.. code-block:: python

>>> text_content = "Jogging isn't very fun."
>>> document = client.document(text_content)
>>> sentiment = document.analyze_sentiment()

This comment was marked as spam.

This comment was marked as spam.

>>> print(sentiment.polarity)
-1
>>> print(sentiment.magnitude)
0.8

Annotate Text

This comment was marked as spam.

This comment was marked as spam.

-------------

The :meth:`~gcloud.language.document.Document.annotate_text` method
analyzes a document and is intended for users who are familiar with
machine learning and need in-depth text features to build upon.

The method returns a named tuple with four entries:

* ``sentences``: A :class:`list` of sentences in the text
* ``tokens``: A :class:`list` of :class:`~gcloud.language.token.Token`
object (e.g. words, punctuation)
* ``sentiment``: The :class:`~gcloud.language.sentiment.Sentiment` of
the text (as returned by
:meth:`~gcloud.language.document.Document.analyze_sentiment`)
* ``entities``: :class:`list` of :class:`~gcloud.language.entity.Entity`
objects extracted from the text (as returned by
:meth:`~gcloud.language.document.Document.analyze_entities`)

By default :meth:`~gcloud.language.document.Document.annotate_text` has
three arguments ``include_syntax``, ``include_entities`` and
``include_sentiment`` which are all :data:`True`. However, each of these
`Features`_ can be selectively turned off by setting the corresponding
arguments to :data:`False`.

When ``include_syntax=False``, ``sentences`` and ``tokens`` in the
response is :data:`None`. When ``include_sentiment``, ``sentiment`` in
the response is :data:`None`. When ``include_entities``, ``entities`` in
the response is :data:`None`.

.. code-block:: python

>>> text_content = 'The cow jumped over the Moon.'
>>> document = client.document(text_content)
>>> annotations = document.annotate_text()
>>> # Sentences present if include_syntax=True
>>> print(annotations.sentences)
['The cow jumped over the Moon.']
>>> # Tokens present if include_syntax=True
>>> for token in annotations.tokens:
... msg = '%11s: %s' % (token.part_of_speech, token.text_content)
... print(msg)
DETERMINER: The
NOUN: cow
VERB: jumped
ADPOSITION: over
DETERMINER: the
NOUN: Moon
PUNCTUATION: .
>>> # Sentiment present if include_sentiment=True
>>> print(annotations.sentiment.polarity)
1
>>> print(annotations.sentiment.magnitude)
0.1
>>> # Entities present if include_entities=True
>>> for entity in annotations.entities:
... print('=' * 20)
... print(' name: %s' % (entity.name,))
... print(' type: %s' % (entity.entity_type,))
... print('metadata: %s' % (entity.metadata,))
... print('salience: %s' % (entity.salience,))
====================
name: Moon
type: LOCATION
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Natural_satellite'}
salience: 0.11793101

.. _Features: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText#Features