Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread does not have an option to solve unzipping problems with zip ill-formatted name files in Windows #5237

Open
fabiocs8 opened this issue Oct 25, 2021 · 2 comments

Comments

@fabiocs8
Copy link

As per my post in SO, fread cannot import and unzip the following URL:

dt <- fread("https://www.portaltransparencia.gov.br/download-de-dados/despesas-execucao/202001")

The work around was to read the url imposing mode = "wb" :
download.file("https://www.portaltransparencia.gov.br/download-de-dados/despesas-execucao/202001" , destfile = "test_file.zip" , mode = "wb")

unzip("test_file.zip", exdir = "."

It would be nice if fread provide an option to deal with cases like this.

@ben-schwen
Copy link
Member

ben-schwen commented Oct 26, 2021

There are several issues going on here.

  1. curl is not able to download "https://www.portaltransparencia.gov.br/download-de-dados/despesas-execucao/202001" but switching from HTTPS to HTTP solves this one and I would rather see this as an issues of curl

When the issue of downloading is solved by switching to HTTP with fread("http://www.portaltransparencia.gov.br/download-de-dados/despesas-execucao/202001") another one pops up:

  1. What you expect of fread is to automatically detect the filetype without file ending but that's not something fread does.

  2. Your file is a .zip which is not supported by fread yet, see also fread could attempt auto-unzip #3834

@fabiocs8
Copy link
Author

Thank you Ben.

When I run fread with verbose = TRUE (output in SO post link above), I understand that fread do download the file with no problem. However, the problem happens when decompressing it: because Windows interpret it as a a binary file, it changes '\n' line endings to '\r\n' (aka 'CRLF'), see the excelent answer provided by r2evans in SO. Using download.file ( .. , mode = "wb") is enough to solve this issue, and unzip works properly.

Amazingly, fread code in line 87 instructs curl with the option mode = "wb":
curl::curl_download(input, tmpFile, mode = "wb",
quiet = !showProgress)

So it seems that this mode option has no effect here....

Regards, Fabio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants