-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional options for APIDataSet (e.g. proxies) #711
Comments
Thanks for opening the issue, and for your enthusiasm in trying to solve it! This sounds like an eminently reasonable idea to me, but would definitely appreciate the input from people who know Just to be clear, here's where we currently stand. Below are all the options available in Available in kedro:
Not available in kedro:
Personally I don't know which of these options are or aren't important, but I don't see why we shouldn't support all of them. However, I do wonder whether our current implementation would be the correct way to do this (having many different options in the |
…rg#711) First iteration, to be revisited next week.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi @lucasjamar, is this still a feature you're interested in? If so would you be able to make a contribution PR? |
Hi, Definitely useful. As @AntonyMilneQB mentionned, should we just wrap up the other options inside load_args or add them like the options? |
I'd be in favour of wrapping them in |
Hi, I can pick this one up. The API specification of the def request(method, url, **kwargs): ... Hence, my proposal is to keep the interface (almost) same in the data catalog: Since authentication while using the if auth is not None:
if 'type' not in auth:
raise DataSetError("'type' is missing from Authenticator configuration")
auth_type = auth.pop('type')
self._auth = BaseAuthFactory.create(auth_type, **auth, **credentials) There is a PR open #1445 for this implementation, but since my implementation is quite deviant from the current implementation, I'd like to open a new one. Also fyi @afuetterer as you are working on the subclass of the general API class. |
@daBlesr Sounds fine at a high level. Not entirely sure how you're planning to handle auth? If it's not an object, the default handling seems to be user/password as in https://github.com/psf/requests/blob/2a6f290bc09324406708a4d404a88a45d848ddf9/requests/models.py#L589-L609, and the current |
Hi @deepyaman, I think since most commonly credentials are provided in dictionary type, it would be better to stay consistent with that across the project. Hence, in my approach I would not support tuples. For instance, if you want to support access tokens, you can currently write something like this: url: example.com
method: GET
auth_type: tests.extras.datasets.api.test_api_auth.AccessTokenAuth
credentials: access_token_cred
load_args:
headers:
some_header: some_value
-- credentials.yaml
access_token_cred:
token: abc Custom Access Token class: class AccessTokenAuth(AuthBase):
"""Attaches Access Token Authentication to the given Request object."""
def __init__(self, token):
self.token = token
def __call__(self, r):
r.headers['Authorization'] = f'access_token {self.token}'
return r In my tests I have verified that indeed the following is the case: requests_mocker.last_request.headers['Authorization'] == 'access_token abc' The default for |
Thank you for continuing this PR, @daBlesr. I am looking forward for your update on the "base" A question: The |
In the general case, I think it is not very beneficial to create a session for the user (and expose it), as sessions should be closed at some point. However, it would not hurt to create a (non-exposed) session in this class and autoclose it afterwards (which you would be able to use), since sessions are created internally in the |
Thanks for your input. I tried exactly that already in the paginated api dataset. Is that what you meant? |
Auth@deepyaman @daBlesr Just for context, the current behaviour of converting to The case of an API token in headers, which is not currently supported by So for now let's leave Session@afuetterer @daBlesr Agree with all that's said above that there's no need to use
... but equally I don't see any harm in using a session for this in Also agree that using a session for |
Completed in #1633 |
Description
Add proxies parameter to APIDataSet. Perhaps other options could also interesting?
Context
When comparing the list of options available from requests with those available in the APIDataSet, there are few missing that might be interesting to add.
Personally, the most interesting to have would be the proxies option as discussed here. Before I used to set my proxies in my global env variables but I stumble upon cases where my pipeline has two APIs requiring two different proxies. Having the proxies options in the data catalog would have the added benefit that the proxies could be loaded using the TemplatedConfigLoader, thereby keeping the proxies confidential in the credentials.yml.
Other than proxies, I was wondering if there are other options that users might find to be of interest? Should we consider adding all options available in requests that are not yet available in the APIDataSet so that users have full flexibility?
Possible Implementation
I would like to try solving this one if thats ok. Hopefully it requires just adding these options in extras/datasets/api/APIDataSet.py. I am not certain if these options would require extra tests. Any inputs?
The text was updated successfully, but these errors were encountered: