@@ -4652,293 +4652,12 @@ And then issue the following queries:
46524652Google BigQuery
46534653---------------
46544654
4655- .. versionadded :: 0.13.0
4656-
4657- The :mod: `pandas.io.gbq ` module provides a wrapper for Google's BigQuery
4658- analytics web service to simplify retrieving results from BigQuery tables
4659- using SQL-like queries. Result sets are parsed into a pandas
4660- DataFrame with a shape and data types derived from the source table.
4661- Additionally, DataFrames can be inserted into new BigQuery tables or appended
4662- to existing tables.
4663-
4664- .. warning ::
4665-
4666- To use this module, you will need a valid BigQuery account. Refer to the
4667- `BigQuery Documentation <https://cloud.google.com/bigquery/what-is-bigquery >`__
4668- for details on the service itself.
4669-
4670- The key functions are:
4671-
4672- .. currentmodule :: pandas.io.gbq
4673-
4674- .. autosummary ::
4675- :toctree: generated/
4676-
4677- read_gbq
4678- to_gbq
4679-
4680- .. currentmodule :: pandas
4681-
4682-
4683- Supported Data Types
4684- ''''''''''''''''''''
4685-
4686- Pandas supports all these `BigQuery data types <https://cloud.google.com/bigquery/data-types >`__:
4687- ``STRING ``, ``INTEGER `` (64bit), ``FLOAT `` (64 bit), ``BOOLEAN `` and
4688- ``TIMESTAMP `` (microsecond precision). Data types ``BYTES `` and ``RECORD ``
4689- are not supported.
4690-
4691- Integer and boolean ``NA `` handling
4692- '''''''''''''''''''''''''''''''''''
4693-
4694- .. versionadded :: 0.20
4695-
4696- Since all columns in BigQuery queries are nullable, and NumPy lacks of ``NA ``
4697- support for integer and boolean types, this module will store ``INTEGER `` or
4698- ``BOOLEAN `` columns with at least one ``NULL `` value as ``dtype=object ``.
4699- Otherwise those columns will be stored as ``dtype=int64 `` or ``dtype=bool ``
4700- respectively.
4701-
4702- This is opposite to default pandas behaviour which will promote integer
4703- type to float in order to store NAs. See the :ref: `gotchas<gotchas.intna> `
4704- for detailed explaination.
4705-
4706- While this trade-off works well for most cases, it breaks down for storing
4707- values greater than 2**53. Such values in BigQuery can represent identifiers
4708- and unnoticed precision lost for identifier is what we want to avoid.
4709-
4710- .. _io.bigquery_deps :
4711-
4712- Dependencies
4713- ''''''''''''
4714-
4715- This module requires following additional dependencies:
4716-
4717- - `httplib2 <https://github.com/httplib2/httplib2 >`__: HTTP client
4718- - `google-api-python-client <http://github.com/google/google-api-python-client >`__: Google's API client
4719- - `oauth2client <https://github.com/google/oauth2client >`__: authentication and authorization for Google's API
4720-
4721- .. _io.bigquery_authentication :
4722-
4723- Authentication
4724- ''''''''''''''
4725-
4726- .. versionadded :: 0.18.0
4727-
4728- Authentication to the Google ``BigQuery `` service is via ``OAuth 2.0 ``.
4729- Is possible to authenticate with either user account credentials or service account credentials.
4730-
4731- Authenticating with user account credentials is as simple as following the prompts in a browser window
4732- which will be automatically opened for you. You will be authenticated to the specified
4733- ``BigQuery `` account using the product name ``pandas GBQ ``. It is only possible on local host.
4734- The remote authentication using user account credentials is not currently supported in pandas.
4735- Additional information on the authentication mechanism can be found
4736- `here <https://developers.google.com/identity/protocols/OAuth2#clientside/ >`__.
4737-
4738- Authentication with service account credentials is possible via the `'private_key' ` parameter. This method
4739- is particularly useful when working on remote servers (eg. jupyter iPython notebook on remote host).
4740- Additional information on service accounts can be found
4741- `here <https://developers.google.com/identity/protocols/OAuth2#serviceaccount >`__.
4742-
4743- Authentication via ``application default credentials `` is also possible. This is only valid
4744- if the parameter ``private_key `` is not provided. This method also requires that
4745- the credentials can be fetched from the environment the code is running in.
4746- Otherwise, the OAuth2 client-side authentication is used.
4747- Additional information on
4748- `application default credentials <https://developers.google.com/identity/protocols/application-default-credentials >`__.
4749-
4750- .. versionadded :: 0.19.0
4751-
4752- .. note ::
4753-
4754- The `'private_key' ` parameter can be set to either the file path of the service account key
4755- in JSON format, or key contents of the service account key in JSON format.
4756-
4757- .. note ::
4758-
4759- A private key can be obtained from the Google developers console by clicking
4760- `here <https://console.developers.google.com/permissions/serviceaccounts >`__. Use JSON key type.
4761-
4762- .. _io.bigquery_reader :
4763-
4764- Querying
4765- ''''''''
4766-
4767- Suppose you want to load all data from an existing BigQuery table : `test_dataset.test_table `
4768- into a DataFrame using the :func: `~pandas.io.gbq.read_gbq ` function.
4769-
4770- .. code-block :: python
4771-
4772- # Insert your BigQuery Project ID Here
4773- # Can be found in the Google web console
4774- projectid = " xxxxxxxx"
4775-
4776- data_frame = pd.read_gbq(' SELECT * FROM test_dataset.test_table' , projectid)
4777-
4778-
4779- You can define which column from BigQuery to use as an index in the
4780- destination DataFrame as well as a preferred column order as follows:
4781-
4782- .. code-block :: python
4783-
4784- data_frame = pd.read_gbq(' SELECT * FROM test_dataset.test_table' ,
4785- index_col = ' index_column_name' ,
4786- col_order = [' col1' , ' col2' , ' col3' ], projectid)
4787-
4788-
4789- Starting with 0.20.0, you can specify the query config as parameter to use additional options of your job.
4790- For more information about query configuration parameters see
4791- `here <https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query >`__.
4792-
4793- .. code-block :: python
4794-
4795- configuration = {
4796- ' query' : {
4797- " useQueryCache" : False
4798- }
4799- }
4800- data_frame = pd.read_gbq(' SELECT * FROM test_dataset.test_table' ,
4801- configuration = configuration, projectid)
4802-
4803-
4804- .. note ::
4805-
4806- You can find your project id in the `Google developers console <https://console.developers.google.com >`__.
4807-
4808-
4809- .. note ::
4810-
4811- You can toggle the verbose output via the ``verbose `` flag which defaults to ``True ``.
4812-
4813- .. note ::
4814-
4815- The ``dialect `` argument can be used to indicate whether to use BigQuery's ``'legacy' `` SQL
4816- or BigQuery's ``'standard' `` SQL (beta). The default value is ``'legacy' ``. For more information
4817- on BigQuery's standard SQL, see `BigQuery SQL Reference
4818- <https://cloud.google.com/bigquery/sql-reference/> `__
4819-
4820- .. _io.bigquery_writer :
4821-
4822- Writing DataFrames
4823- ''''''''''''''''''
4824-
4825- Assume we want to write a DataFrame ``df `` into a BigQuery table using :func: `~pandas.DataFrame.to_gbq `.
4826-
4827- .. ipython :: python
4828-
4829- df = pd.DataFrame({' my_string' : list (' abc' ),
4830- ' my_int64' : list (range (1 , 4 )),
4831- ' my_float64' : np.arange(4.0 , 7.0 ),
4832- ' my_bool1' : [True , False , True ],
4833- ' my_bool2' : [False , True , False ],
4834- ' my_dates' : pd.date_range(' now' , periods = 3 )})
4835-
4836- df
4837- df.dtypes
4838-
4839- .. code-block :: python
4840-
4841- df.to_gbq(' my_dataset.my_table' , projectid)
4842-
4843- .. note ::
4844-
4845- The destination table and destination dataset will automatically be created if they do not already exist.
4846-
4847- The ``if_exists `` argument can be used to dictate whether to ``'fail' ``, ``'replace' ``
4848- or ``'append' `` if the destination table already exists. The default value is ``'fail' ``.
4849-
4850- For example, assume that ``if_exists `` is set to ``'fail' ``. The following snippet will raise
4851- a ``TableCreationError `` if the destination table already exists.
4852-
4853- .. code-block :: python
4854-
4855- df.to_gbq(' my_dataset.my_table' , projectid, if_exists = ' fail' )
4856-
4857- .. note ::
4858-
4859- If the ``if_exists `` argument is set to ``'append' ``, the destination dataframe will
4860- be written to the table using the defined table schema and column types. The
4861- dataframe must match the destination table in structure and data types.
4862- If the ``if_exists `` argument is set to ``'replace' ``, and the existing table has a
4863- different schema, a delay of 2 minutes will be forced to ensure that the new schema
4864- has propagated in the Google environment. See
4865- `Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191 >`__.
4866-
4867- Writing large DataFrames can result in errors due to size limitations being exceeded.
4868- This can be avoided by setting the ``chunksize `` argument when calling :func: `~pandas.DataFrame.to_gbq `.
4869- For example, the following writes ``df `` to a BigQuery table in batches of 10000 rows at a time:
4870-
4871- .. code-block :: python
4872-
4873- df.to_gbq(' my_dataset.my_table' , projectid, chunksize = 10000 )
4874-
4875- You can also see the progress of your post via the ``verbose `` flag which defaults to ``True ``.
4876- For example:
4877-
4878- .. code-block :: python
4879-
4880- In [8 ]: df.to_gbq(' my_dataset.my_table' , projectid, chunksize = 10000 , verbose = True )
4881-
4882- Streaming Insert is 10 % Complete
4883- Streaming Insert is 20 % Complete
4884- Streaming Insert is 30 % Complete
4885- Streaming Insert is 40 % Complete
4886- Streaming Insert is 50 % Complete
4887- Streaming Insert is 60 % Complete
4888- Streaming Insert is 70 % Complete
4889- Streaming Insert is 80 % Complete
4890- Streaming Insert is 90 % Complete
4891- Streaming Insert is 100 % Complete
4892-
4893- .. note ::
4894-
4895- If an error occurs while streaming data to BigQuery, see
4896- `Troubleshooting BigQuery Errors <https://cloud.google.com/bigquery/troubleshooting-errors >`__.
4897-
4898- .. note ::
4899-
4900- The BigQuery SQL query language has some oddities, see the
4901- `BigQuery Query Reference Documentation <https://cloud.google.com/bigquery/query-reference >`__.
4902-
4903- .. note ::
4904-
4905- While BigQuery uses SQL-like syntax, it has some important differences from traditional
4906- databases both in functionality, API limitations (size and quantity of queries or uploads),
4907- and how Google charges for use of the service. You should refer to `Google BigQuery documentation <https://cloud.google.com/bigquery/what-is-bigquery >`__
4908- often as the service seems to be changing and evolving. BiqQuery is best for analyzing large
4909- sets of data quickly, but it is not a direct replacement for a transactional database.
4910-
4911- .. _io.bigquery_create_tables :
4912-
4913- Creating BigQuery Tables
4914- ''''''''''''''''''''''''
4915-
49164655.. warning ::
49174656
4918- As of 0.17, the function :func: `~pandas.io.gbq.generate_bq_schema ` has been deprecated and will be
4919- removed in a future version.
4920-
4921- As of 0.15.2, the gbq module has a function :func: `~pandas.io.gbq.generate_bq_schema ` which will
4922- produce the dictionary representation schema of the specified pandas DataFrame.
4923-
4924- .. code-block :: ipython
4925-
4926- In [10]: gbq.generate_bq_schema(df, default_type='STRING')
4927-
4928- Out[10]: {'fields': [{'name': 'my_bool1', 'type': 'BOOLEAN'},
4929- {'name': 'my_bool2', 'type': 'BOOLEAN'},
4930- {'name': 'my_dates', 'type': 'TIMESTAMP'},
4931- {'name': 'my_float64', 'type': 'FLOAT'},
4932- {'name': 'my_int64', 'type': 'INTEGER'},
4933- {'name': 'my_string', 'type': 'STRING'}]}
4934-
4935- .. note ::
4936-
4937- If you delete and re-create a BigQuery table with the same name, but different table schema,
4938- you must wait 2 minutes before streaming data into the table. As a workaround, consider creating
4939- the new table with a different name. Refer to
4940- `Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191 >`__.
4657+ Starting in 0.20.0, pandas has split off Google BigQuery support into the
4658+ separate package ``pandas-gbq ``. You can ``pip install pandas-gbq `` to get it.
49414659
4660+ Documentation is now hosted `here <https://pandas-gbq.readthedocs.io/ >`__
49424661
49434662.. _io.stata :
49444663
0 commit comments