Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry options to BigQuery #431

Merged
merged 2 commits into from
Jan 20, 2020

Conversation

Yanson
Copy link
Contributor

@Yanson Yanson commented Jan 14, 2020

What this PR does / why we need it:
Add two fields to job properties and use them to set retry options in all BigQuery job waitFor calls, as per Google example.

This fixes an issue where job metadata is sometimes used on an object that does not have it set, since the job was not reassigned to the completed job reference that waitFor returns.

Additionally handles errors with best practice and allows some retry options to be configured by the user.

Does this PR introduce a user-facing change?:

Users should add the following settings under feast.jobs in their application yaml for batch serving:
    bigquery-initial-retry-delay-secs: 1
    bigquery-total-timeout-secs: 21600

Add two fields to job properties and use them to set retry options in all BigQuery job waitFor calls, as per Google example.
@feast-ci-bot
Copy link
Collaborator

Hi @Yanson. Thanks for your PR.

I'm waiting for a gojek member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Yanson
Copy link
Contributor Author

Yanson commented Jan 14, 2020

/assign @woop

@zhilingc
Copy link
Collaborator

/ok-to-test

@zhilingc
Copy link
Collaborator

/test test-end-to-end-batch

1 similar comment
@khorshuheng
Copy link
Collaborator

/test test-end-to-end-batch

@Yanson
Copy link
Contributor Author

Yanson commented Jan 17, 2020

/test test-end-to-end-batch

@Yanson
Copy link
Contributor Author

Yanson commented Jan 17, 2020

Hopefully this retest will pass. I will update the user-facing notes on Monday as it's worth mentioning.

@woop woop assigned zhilingc and unassigned woop Jan 18, 2020
@woop
Copy link
Member

woop commented Jan 18, 2020

Thanks for this @Yanson!

Assigning @zhilingc to this one.

@zhilingc
Copy link
Collaborator

/lgtm

I think the whole serving jobs configuration needs a bit of a refactor but I think that's a PR for another day.

@davidheryanto
Copy link
Collaborator

/approve

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidheryanto, Yanson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit fe520a9 into feast-dev:master Jan 20, 2020
@Yanson Yanson deleted the bigquery_serving_retry branch January 20, 2020 09:47
@Yanson
Copy link
Contributor Author

Yanson commented Jan 22, 2020

Failure to specify the configs described may result in the following error message in logs:

Client:

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Unable to load entity dataset to Bigquery"

Server:

total timeout or maximum number of attempts exceeded; current settings: RetrySettings{totalTimeout=PT0S, initialRetryDelay=PT0S, retryDelayMultiplier=1.0, maxRetryDelay=PT3S, maxAttempts=0, jittered=true, initialRpcTimeout=PT0S, rpcTimeoutMultiplier=1.0, maxRpcTimeout=PT0S}

khorshuheng pushed a commit that referenced this pull request Feb 13, 2020
* Add retry options to BigQuery

Add two fields to job properties and use them to set retry options in all BigQuery job waitFor calls, as per Google example.

* Change timeout defaults and add config to e2e test to fix failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants