Description
Since the requests
package seem to exhaust system RAM as default behavior, I think some api should pass steam = True
that allows chunked download. Current implementation hardcode as stream=None
(equivalent to False) and this can make the user's system unstable when downloading large datasets.
The download_file
method in KaggleApi
class tries to support chunked downloads but I am not sure this code works as expected because the downloading would be considered complete at this point.
kaggle-api/src/kaggle/api/kaggle_api_extended.py
Line 2181 in b97668b
And I think the current usage of the kaggle.http_client()
outside of the with self.build_kaggle_client() as kaggle:
statement is not recommended because resource managed by kaggle
object might be closed outside the with
statement.
with self.build_kaggle_client() as kaggle:
...
download_file(..., kaggle.http_client(), ...)
ex.
kaggle-api/src/kaggle/api/kaggle_api_extended.py
Lines 1187 to 1196 in b97668b