-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding usage doc for Natural Language API. #2062
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,279 @@ | ||
Using the API | ||
============= | ||
|
||
The `Google Natural Language`_ API can be used to reveal the | ||
structure and meaning of text via powerful machine | ||
learning models. You can use it to extract information about | ||
people, places, events and much more, mentioned in text documents, | ||
news articles or blog posts. You can use it to understand | ||
sentiment about your product on social media or parse intent from | ||
customer conversations happening in a call center or a messaging | ||
app. You can analyze text uploaded in your request or integrate | ||
with your document storage on Google Cloud Storage. | ||
|
||
.. warning:: | ||
|
||
This is a Beta release of Google Cloud Natural Language API. This | ||
API is not intended for real-time usage in critical applications. | ||
|
||
.. _Google Natural Language: https://cloud.google.com/natural-language/docs/getting-started | ||
|
||
Client | ||
------ | ||
|
||
:class:`~gcloud.language.client.Client` objects provide a | ||
means to configure your application. Each instance holds | ||
both a ``project`` and an authenticated connection to the | ||
Natural Language service. | ||
|
||
For an overview of authentication in ``gcloud-python``, see | ||
:doc:`gcloud-auth`. | ||
|
||
Assuming your environment is set up as described in that document, | ||
create an instance of :class:`~gcloud.language.client.Client`. | ||
|
||
.. code-block:: python | ||
|
||
>>> from gcloud import language | ||
>>> client = language.Client() | ||
|
||
By default the ``language`` is ``'en'`` and the ``encoding`` is | ||
UTF-8. To over-ride these values: | ||
|
||
.. code-block:: python | ||
|
||
>>> client = language.Client(language='es', | ||
... encoding=encoding=language.Encoding.UTF16) | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
|
||
The encoding can be one of | ||
:attr:`Encoding.UTF8 <gcloud.language.document.Encoding.UTF8>`, | ||
:attr:`Encoding.UTF16 <gcloud.language.document.Encoding.UTF16>`, or | ||
:attr:`Encoding.UTF32 <gcloud.language.document.Encoding.UTF32>`. | ||
|
||
Methods | ||
------- | ||
|
||
The Google Natural Language API has three supported methods | ||
|
||
- `analyzeEntities`_ | ||
- `analyzeSentiment`_ | ||
- `annotateText`_ | ||
|
||
and each method uses a `Document`_ for representing text. To | ||
create a :class:`~gcloud.language.document.Document`, | ||
|
||
.. code-block:: python | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
|
||
>>> text_content = ( | ||
... 'Google, headquartered in Mountain View, unveiled the ' | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
... 'new Android phone at the Consumer Electronic Show. ' | ||
... 'Sundar Pichai said in his keynote that users love ' | ||
... 'their new Android phones.') | ||
>>> document = client.document_from_text(text_content) | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
|
||
By using :meth:`~gcloud.language.client.Client.document_from_text`, | ||
the document's type is plain text: | ||
|
||
.. code-block:: python | ||
|
||
>>> document.doc_type == language.Document.PLAIN_TEXT | ||
True | ||
|
||
In addition, the document's language defaults to the language on | ||
the client | ||
|
||
.. code-block:: python | ||
|
||
>>> document.language | ||
'en' | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
>>> document.language == client.language | ||
True | ||
|
||
In addition, the | ||
:meth:`~gcloud.language.client.Client.document_from_html`, | ||
factory can be used to created an HTML document. In this | ||
method and the from text method, the language can be | ||
over-ridden: | ||
|
||
.. code-block:: python | ||
|
||
>>> html_content = """\ | ||
... <html> | ||
... <head> | ||
... <title>El Tiempo de las Historias</time> | ||
... </head> | ||
... <body> | ||
... <p>La vaca saltó sobre la luna.</p> | ||
... </body> | ||
... </html> | ||
... """ | ||
>>> document = client.document_from_html(html_content, | ||
... language='es') | ||
|
||
The ``language`` argument can be either ISO-639-1 or BCP-47 language | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
codes; at the time, only English, Spanish, and Japanese `are supported`_. | ||
However, the ``analyzeSentiment`` method `only supports`_ English text. | ||
|
||
.. _are supported: https://cloud.google.com/natural-language/docs/ | ||
.. _only supports: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment#body.request_body.FIELDS.document | ||
|
||
The document type (``doc_type``) value can be one of | ||
:attr:`Document.PLAIN_TEXT <gcloud.language.document.Document.PLAIN_TEXT>` or | ||
:attr:`Document.HTML <gcloud.language.document.Document.HTML>`. | ||
|
||
In addition to supplying the text / HTML content, a document can refer | ||
to content stored in `Google Cloud Storage`_. We can use the | ||
:meth:`~gcloud.language.client.Client.document_from_blob` method: | ||
|
||
.. code-block:: python | ||
|
||
>>> document = client.document_from_blob(bucket='my-text-bucket', | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
... blob='sentiment-me.txt') | ||
>>> document.gcs_url | ||
'gs://my-text-bucket/sentiment-me.txt' | ||
>>> document.doc_type == language.Document.PLAIN_TEXT | ||
True | ||
|
||
and the :meth:`~gcloud.language.client.Client.document_from_uri` | ||
method. In either case, the document type can be specified with | ||
the ``doc_type`` argument: | ||
|
||
.. code-block:: python | ||
|
||
>>> gcs_url = 'gs://my-text-bucket/sentiment-me.txt' | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
>>> document = client.document_from_uri( | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
... gcs_url, doc_type=language.Document.HTML) | ||
>>> document.gcs_url == gcs_url | ||
True | ||
>>> document.doc_type == language.Document.HTML | ||
True | ||
|
||
.. _analyzeEntities: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeEntities | ||
.. _analyzeSentiment: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/analyzeSentiment | ||
.. _annotateText: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText | ||
.. _Document: https://cloud.google.com/natural-language/reference/rest/v1beta1/Document | ||
.. _Google Cloud Storage: https://cloud.google.com/storage/ | ||
|
||
Analyze Entities | ||
---------------- | ||
|
||
The :meth:`~gcloud.language.document.Document.analyze_entities` method | ||
finds named entities (i.e. proper names) in the text and returns them | ||
as a :class:`list` of :class:`~gcloud.language.entity.Entity` objects. | ||
Each entity has a corresponding type, salience (prominence), associated | ||
metadata and other properties. | ||
|
||
.. code-block:: python | ||
|
||
>>> text_content = ("Michelangelo Caravaggio, Italian painter, is " | ||
... "known for 'The Calling of Saint Matthew'.") | ||
>>> document = client.document(text_content) | ||
>>> entities = document.analyze_entities() | ||
>>> for entity in entities: | ||
... print('=' * 20) | ||
... print(' name: %s' % (entity.name,)) | ||
... print(' type: %s' % (entity.entity_type,)) | ||
... print('metadata: %s' % (entity.metadata,)) | ||
... print('salience: %s' % (entity.salience,)) | ||
==================== | ||
name: Michelangelo Caravaggio | ||
type: PERSON | ||
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Caravaggio'} | ||
salience: 0.75942981 | ||
==================== | ||
name: Italian | ||
type: LOCATION | ||
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Italy'} | ||
salience: 0.20193423 | ||
==================== | ||
name: The Calling of Saint Matthew | ||
type: WORK_OF_ART | ||
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/index.html?curid=2838808'} | ||
salience: 0.03863598 | ||
|
||
Analyze Sentiment | ||
----------------- | ||
|
||
The :meth:`~gcloud.language.document.Document.analyze_sentiment` method | ||
analyzes the sentiment of the provided text and returns a | ||
:class:`~gcloud.language.sentiment.Sentiment`. Currently, this method | ||
only supports English text. | ||
|
||
.. code-block:: python | ||
|
||
>>> text_content = "Jogging isn't very fun." | ||
>>> document = client.document(text_content) | ||
>>> sentiment = document.analyze_sentiment() | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
>>> print(sentiment.polarity) | ||
-1 | ||
>>> print(sentiment.magnitude) | ||
0.8 | ||
|
||
Annotate Text | ||
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong. |
||
------------- | ||
|
||
The :meth:`~gcloud.language.document.Document.annotate_text` method | ||
analyzes a document and is intended for users who are familiar with | ||
machine learning and need in-depth text features to build upon. | ||
|
||
The method returns a named tuple with four entries: | ||
|
||
* ``sentences``: A :class:`list` of sentences in the text | ||
* ``tokens``: A :class:`list` of :class:`~gcloud.language.token.Token` | ||
object (e.g. words, punctuation) | ||
* ``sentiment``: The :class:`~gcloud.language.sentiment.Sentiment` of | ||
the text (as returned by | ||
:meth:`~gcloud.language.document.Document.analyze_sentiment`) | ||
* ``entities``: :class:`list` of :class:`~gcloud.language.entity.Entity` | ||
objects extracted from the text (as returned by | ||
:meth:`~gcloud.language.document.Document.analyze_entities`) | ||
|
||
By default :meth:`~gcloud.language.document.Document.annotate_text` has | ||
three arguments ``include_syntax``, ``include_entities`` and | ||
``include_sentiment`` which are all :data:`True`. However, each of these | ||
`Features`_ can be selectively turned off by setting the corresponding | ||
arguments to :data:`False`. | ||
|
||
When ``include_syntax=False``, ``sentences`` and ``tokens`` in the | ||
response is :data:`None`. When ``include_sentiment``, ``sentiment`` in | ||
the response is :data:`None`. When ``include_entities``, ``entities`` in | ||
the response is :data:`None`. | ||
|
||
.. code-block:: python | ||
|
||
>>> text_content = 'The cow jumped over the Moon.' | ||
>>> document = client.document(text_content) | ||
>>> annotations = document.annotate_text() | ||
>>> # Sentences present if include_syntax=True | ||
>>> print(annotations.sentences) | ||
['The cow jumped over the Moon.'] | ||
>>> # Tokens present if include_syntax=True | ||
>>> for token in annotations.tokens: | ||
... msg = '%11s: %s' % (token.part_of_speech, token.text_content) | ||
... print(msg) | ||
DETERMINER: The | ||
NOUN: cow | ||
VERB: jumped | ||
ADPOSITION: over | ||
DETERMINER: the | ||
NOUN: Moon | ||
PUNCTUATION: . | ||
>>> # Sentiment present if include_sentiment=True | ||
>>> print(annotations.sentiment.polarity) | ||
1 | ||
>>> print(annotations.sentiment.magnitude) | ||
0.1 | ||
>>> # Entities present if include_entities=True | ||
>>> for entity in annotations.entities: | ||
... print('=' * 20) | ||
... print(' name: %s' % (entity.name,)) | ||
... print(' type: %s' % (entity.entity_type,)) | ||
... print('metadata: %s' % (entity.metadata,)) | ||
... print('salience: %s' % (entity.salience,)) | ||
==================== | ||
name: Moon | ||
type: LOCATION | ||
metadata: {'wikipedia_url': 'http://en.wikipedia.org/wiki/Natural_satellite'} | ||
salience: 0.11793101 | ||
|
||
.. _Features: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText#Features |
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
Sorry, something went wrong.