Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin a QueryJob without waiting for it to finish #5435

Closed
bencaine1 opened this issue Jun 4, 2018 · 4 comments
Closed

Begin a QueryJob without waiting for it to finish #5435

bencaine1 opened this issue Jun 4, 2018 · 4 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. status: invalid type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@bencaine1
Copy link

OS: Linux dc32b7e8763a 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux
Python version: Python 2.7.6
google-cloud-bigquery: 1.1.0

In previous versions of the BQ python client, there was a method in QueryJob (or maybe _AsyncJob) called begin() that began the job without waiting for it to finish. There doesn't appear to be an equivalent of that anymore, despite the presence of other methods like done(), exists(), running(), etc. that depend on a job having been run asynchronously. Would it be possible to add back the begin() method?

@tseaver tseaver added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the BigQuery API. labels Jun 4, 2018
@tseaver
Copy link
Contributor

tseaver commented Jun 4, 2018

PR #4242 renamed the begin method to _begin, making it private: the expected usage pattern is that the caller use job.result() to begin the job, and then use the future it returns.

@tswast I'll defer to your opinion here.

@bencaine1
Copy link
Author

I can explain my use case a little more: I'm trying to start a query and print out the query if there's syntax error. The following code does not work:

            try:
                return query_job.result()
            except GoogleCloudError as e:
                msg = str(e)
                # This craziness puts line numbers next to the SQL.
                lines = query.split('\n')
                longest = max(len(l) for l in lines)
                # Print out a 'ruler' above and below the SQL so we can judge columns.
                ruler = ' ' * 4 + '|'  # Left pad for the line numbers (4 digits plus ':')
                for _ in range(longest / 10):
                    ruler += ' ' * 4 + '.' + ' ' * 4 + '|'
                header = '-----Offending Sql Follows-----'
                padding = ' ' * ((longest - len(header)) / 2)
                msg += '\n\n{}{}\n\n{}\n{}\n{}'.format(padding, header, ruler, '\n'.join(
                    '{:4}:{}'.format(n + 1, line) for n, line in enumerate(lines)), ruler)
                raise RuntimeError(msg)

Because the Retry object has a try/except block, so having this method in a try/except will bypass the Retry functionality due to nested try/excepts being weird in python.

What I want to do is begin the job, then check the job for errors periodically, and print the query and error if there are any errors and continue if the job executed successfully.

@bencaine1
Copy link
Author

It actually looks like _begin is called when the query_job is created via Client.query(). So you can actually close this ticket. Thanks!

@tswast
Copy link
Contributor

tswast commented Jun 4, 2018

@bencaine1 Is correct. Calling Client.query() starts the job. No need to call result() if you don't need to wait for it to finish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. status: invalid type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants