Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import pandas.io.ga as ga, Credentials, python-gflags-2.0, run_flow() or run()? #11307

Closed
jaradc opened this issue Oct 13, 2015 · 4 comments
Closed

Comments

@jaradc
Copy link

jaradc commented Oct 13, 2015

I am trying to get Google Analytics pulling in data but am experiencing many problems. I'll outline them here.

My Setup: I am using Pandas '0.17.0' and Python 3.4.2 32-bit.

  1. The Google Analytics console credentials screen no longer has "Installed application" application type as an option. At this point in time, even Google's documentation is not up-to-date. Options are now:
    • API key
      • Server key
      • Browser key
      • Android key
      • iOS key
    • OAuth 2.0 client ID
      • Web application
      • Android
      • Chrome App
      • iOS
      • PlayStation 4
      • Other
    • Service account
      • JSON
      • P12

The feedback here is, pandas documentation says to choose "Installed Application" but now that Google has changed things, what should a user setup given that "Installed Application" is no longer an option? My best guess is OAuth 2.0 client ID > Other. This will produce a JSON for download.
2. As instruction say, I downloaded JSON file renamed it to 'client_secrets.json' and moved it to C:\Python34\Lib\site-packages\pandas\io on my machine.
3. I then try to import as follows import pandas.io.ga as ga. Error given: ImportError: No module named 'apiclient'. No problem. I understand this to mean I don't have the Google Analytics modules. So, pip install --upgrade google-api-python-client then try to import again.
4. Again, import pandas.io.ga as ga. Error given: ImportError: No module named 'gflags'. OK, so I need a module named gflags apparently. After seeing this post on StackOverflow, it appears gflags might not support Python 3.X.
5. I do a search for gflags and learn it's some sort of command-line flags module that hasn't been updated in 4 years (at the time of this writing). OK... let's give it a shot I guess. pip install python-gflags. That ended up installing python-gflags-2.0.
6. Once again, import pandas.io.ga as ga. Error:
File "C:\Python34\lib\site-packages\gflags.py", line 1091 except gflags_validators.Error, e:. So I did some more research and found this post describing the except gflags_validators.Error, e error. This kind of confirms that gflags is for Python 2.X (not that I needed that confirmation since it clearly states it on the pypi page). I guess I could run 2to3 but I'll try to fix them manually.

  • Line 1091: except gflags_validators.Error, e: ---to--- except gflags_validators.Error as e:
  • Line 1270: except getopt.GetoptError, e: ---to--- except getopt.GetoptError as e:
  • Line 1573: except IOError, e_msg: ---to--- except IOError as e_msg:
  • Line 1886: except ValueError, e: ---to--- except ValueError as e:
  • Lines 2399, 2401, 2402, 2429, 2431, 2432 ---- > Wrap print statements in parentheses.

7EVEN. Trying again, import pandas.io.ga as ga and it works with no errors.
8IGHT. Great. Now I can finally try GA:

>>> ga.read_ga(
    account_id  = "23659189",
    property_id = "UA-23659189-1",
    metrics     = ['users', 'pageviews'],
    dimensions  = ['dayOfWeek'],
    start_date  = "2014-01-01",
    end_date    = "2014-08-01",
)
Traceback (most recent call last):
  File "<pyshell#10>", line 7, in <module>
    end_date    = "2014-08-01",
  File "C:\Python34\lib\site-packages\pandas\io\ga.py", line 104, in read_ga
    reader = GAnalytics(**reader_kwds)
  File "C:\Python34\lib\site-packages\pandas\io\ga.py", line 173, in __init__
    self._service = self._init_service(secrets)
  File "C:\Python34\lib\site-packages\pandas\io\ga.py", line 185, in _init_service
    http = self.authenticate(secrets)
  File "C:\Python34\lib\site-packages\pandas\io\ga.py", line 145, in authenticate
    return auth.authenticate(flow, self.token_store)
  File "C:\Python34\lib\site-packages\pandas\io\auth.py", line 108, in authenticate
    credentials = tools.run(flow, storage)
AttributeError: 'module' object has no attribute 'run'

9INE. I look in "C:\Python34\lib\site-packages\pandas\io\auth.py" and notice line 108 is credentials = tools.run(flow, storage). tools is imported by import oauth2client.tools as tools, so I open up oauth2client.tools and go to line 115 which has a function named run_flow. I don't see a function named run. The doc strings seem to also reference the function run() so perhaps this name has changed. So, in line 108 of pandas\io\auth.py I change credentials = tools.run(flow, storage) to credentials = tools.run_flow(flow, storage). The problem now is that run_flow has a required parameter flags and I have no idea what to do next... I'm pretty sure flags is supposed to be used in a command-line fashion but I'm trying to use it within Python IDLE and by calling ga.read_ga().

def run_flow(flow, storage, flags, http=None):
    """Core code for a command-line application.

    The ``run()`` function is called from your application and runs
    through all the steps to obtain credentials. It takes a ``Flow``
    argument and attempts to open an authorization server page in the
    user's default web browser. The server asks the user to grant your
    application access to the user's data. If the user grants access,
    the ``run()`` function returns new credentials. The new credentials
    are also stored in the ``storage`` argument, which updates the file
    associated with the ``Storage`` object.

    It presumes it is run from a command-line application and supports the
    following flags:

        ``--auth_host_name`` (string, default: ``localhost``)
           Host name to use when running a local web server to handle
           redirects during OAuth authorization.

        ``--auth_host_port`` (integer, default: ``[8080, 8090]``)
           Port to use when running a local web server to handle redirects
           during OAuth authorization. Repeat this option to specify a list
           of values.

        ``--[no]auth_local_webserver`` (boolean, default: ``True``)
           Run a local web server to handle redirects during OAuth
           authorization.

    The tools module defines an ``ArgumentParser`` the already contains the
    flag definitions that ``run()`` requires. You can pass that
    ``ArgumentParser`` to your ``ArgumentParser`` constructor::

        parser = argparse.ArgumentParser(
            description=__doc__,
            formatter_class=argparse.RawDescriptionHelpFormatter,
            parents=[tools.argparser])
        flags = parser.parse_args(argv)

    Args:
        flow: Flow, an OAuth 2.0 Flow to step through.
        storage: Storage, a ``Storage`` to store the credential in.
        flags: ``argparse.Namespace``, The command-line flags. This is the
               object returned from calling ``parse_args()`` on
               ``argparse.ArgumentParser`` as described above.
        http: An instance of ``httplib2.Http.request`` or something that
              acts like it.

    Returns:
        Credentials, the obtained credential.
    """

Am I doing something wrong? Should I be changing run() to run_flow()? If it's supposed to be run_flow(), what do I change the flags argument to in pandas\io\auth.py?

@jaradc
Copy link
Author

jaradc commented Oct 13, 2015

And of course, as I publicly 'think through' the problem, I found a solution that worked for me.

One. Open site-packages\pandas\io\auth.py and add the import statement import argparse to the top with the other imports.
Two. On line 22 of auth.py is a variable named FLAGS. I then modified line 109 to say this: credentials = tools.run_flow(flow, storage, FLAGS)

Alternatively, if that didn't work for you, you could change this whole block of code to this:

    parser = argparse.ArgumentParser(parents=[tools.argparser])
    flags = parser.parse_args()
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(flow, storage, flags)

Three. You're not done yet unfortunately! If you run ga.read_ga(), you will get a TypeError: list indices must be integers, not Index. jreback's suggestion from this thread is to pass index_col=0 when you call ga.read_ga() like this:

ga.read_ga(
    account_id  = "23659189",
    property_id = "UA-23659189-1",
    metrics     = ['users', 'pageviews'],
    dimensions  = ['dayOfWeek'],
    start_date  = "2014-01-01",
    end_date    = "2014-08-01",
    index_col   = 0,
)

If it worked for you like it did for me, you should get a dataframe like this:

           users  pageviews
dayOfWeek                  
0           1063       1958
1           2277       3578
2           2452       4052
3           2576       3908
4           2562       4083
5           2148       3420
6            965       1842

@jreback
Copy link
Contributor

jreback commented Oct 13, 2015

ga has not been touched in quite some time and to be honest should just be removed from pandas. all that said until/unless that happens fixes could be contributed.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2015

closing as deprecating ga in #11308

@debjan
Copy link

debjan commented May 27, 2016

In case someone desperately needs this...

Try this gflags: python3-gflags_1.5.1-2_all.deb
(Windows user: just copy the two files from dist folder to site-packages)

Apply patch to gflags.py:

--- a
+++ b
@@ -2407,13 +2407,13 @@
         return int(argument, base)
       # ValueError is thrown when argument is a string, and overflows an int.
       except ValueError:
-        return long(argument, base)
+        return int(argument, base)
     else:
       try:
         return int(argument)
       # OverflowError is thrown when argument is numeric, and overflows an int.
       except OverflowError:
-        return long(argument)
+        return int(argument)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants