Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

get_historical_weather does not respond #130

Closed
adamhsparks opened this issue Mar 11, 2021 · 17 comments · Fixed by #132
Closed

get_historical_weather does not respond #130

adamhsparks opened this issue Mar 11, 2021 · 17 comments · Fixed by #132
Labels

Comments

@adamhsparks
Copy link
Collaborator

For a few weeks or up to a month or two now the CI tests have been failing on every OS tested.

So far I've updated the internal databases to reflect the latest metadata on stations and locations BOM has.

Here is an example URL not responding that's generated by the second example.

get_historical_weather(latlon = c(-35.2809, 149.1300), type = "min") ## 3,500+ daily records

http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=070351&p_c=-989854804&p_nccObsCode=123

When attempting to fetch the zip file the URL does not respond in R or in the browser, but I am able to browse to the data and display a table in my browser window. But using this method provided by the BOM website, http://www.bom.gov.au/climate/data/stations/, and requesting the same station and trying to download any data fails. Presumably, there's an issue with the BOM server.

@adamhsparks
Copy link
Collaborator Author

To further add to the confusion, the first example does appear to fetch the data.

get_historical_weather(stationid = "023000", type = "max")

@adamhsparks
Copy link
Collaborator Author

adamhsparks commented Mar 11, 2021

Checking the first URL for a response indicates that it's OK. So we can't just check first and then stop.

> curl -Is http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av\?p_display_type\=dailyZippedDataFile\&p_stn_num\=070351\&p_c\=-989854804\&p_nccObsCode\=123 | head -1
> HTTP/1.1 200 OK

@adamhsparks
Copy link
Collaborator Author

Now the first example is failing too. BOM servers seem to have issues right now.

@paulr-bv
Copy link

Further access issues today. Call is:

get_historical(stationid = "023000", type = "min")

Returns:

Error in file(con, "r") : 
  cannot open the connection to 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt'

In addition: Warning message:
In file(con, "r") :
  cannot open URL 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt': HTTP status was '403 Forbidden'

Accessing via curl ok:

> curl -Is http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt
> HTTP/1.1 200 OK

Accessing from web browser loads the file.

@jonocarroll
Copy link
Collaborator

Sorry for the silence, I have been lurking.

I get no issues from

readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt")

and expected data returned from get_historical()

> bomrang::get_historical(stationid = "023000", type = "min")
Data saved as /var/folders/7s/d2nrjvg10x76v0hyg1jx31_40000gn/T//Rtmpdp6EUx/IDCJAC0011_023000_1800_Data.csv
  --- Australian Bureau of Meteorology (BOM) Data Resource ---
  (Original Request Parameters)
  Station:		ADELAIDE (WEST TERRACE / NGAYIRDAPIRA) [023000]
  Location:		lat: -34.9257, lon: 138.5832
  Measurement / Origin:	Min / Historical
  Timespan:		1887-01-01 -- 2021-03-01 [96 years]
  ---------------------------------------------------------------
       product_code station_number year month day min_temperature
    1:   IDCJAC0011          23000 1887     1   1              NA
    2:   IDCJAC0011          23000 1887     1   2              NA
    3:   IDCJAC0011          23000 1887     1   3              NA
    4:   IDCJAC0011          23000 1887     1   4              NA
    5:   IDCJAC0011          23000 1887     1   5              NA
   ---
49015:   IDCJAC0011          23000 2021     3  13            17.9
49016:   IDCJAC0011          23000 2021     3  14             9.7
49017:   IDCJAC0011          23000 2021     3  15             9.4
49018:   IDCJAC0011          23000 2021     3  16            14.4
49019:   IDCJAC0011          23000 2021     3  17            13.7
       accum_days_min quality
    1:             NA
    2:             NA
    3:             NA
    4:             NA
    5:             NA
   ---
49015:              1       N
49016:              1       N
49017:              1       Y
49018:              1       N
49019:              1       N

so maybe this is an intermittent issue (fun).

The only thing I can think of is that while alphaAUS_136.txt returns 200 (OK), and alphaAUS_000.txt returns 404 (Not Found), the parent directory returns 403 (Forbidden)... could it perhaps be that the permissions are variable or dependent on the user agent or protocol or something?

@paulr-bv
Copy link

Hmm, it might be a client system thing.
I'm on a Mac and still running R 3.6.1 (sessionInfo at the bottom) as I haven't had the time to upgrade amidst some project work. Don't know if that is part of it (but I thought that my original script was working a few days ago).

I tried some readLines() calls with mixed results. I had Restarted R so no explicit libraries loaded.

The call to the BOM server failed again for me:

> readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt")
Error in file(con, "r") : 
  cannot open the connection to 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt'
In addition: Warning message:
In file(con, "r") :
  cannot open URL 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt': HTTP status was '403 Forbidden'

I then called w3.org (just another text file out there) and it worked:

> readLines("https://www.w3.org/TR/PNG/iso_8859-1.txt")
  [1] "The following are the graphical (non-control) characters defined by"            "ISO 8859-1 (1987).  Descriptions in words aren't all that helpful,"            
  [3] "but they're the best we can do in text.  A graphics file illustrating"          "the character set should be available from the same archive as this"           
  [5] "file."                                                                          ""                                                                              
  [7] "Hex Description                 Hex Description"                                ""                                                                              
  [9] "20  SPACE"                                                                      "21  EXCLAMATION MARK            A1  INVERTED EXCLAMATION MARK"                 
 [11] "22  QUOTATION MARK              A2  CENT SIGN"                                  "23  NUMBER SIGN                 A3  POUND SIGN" 

My sessionInfo():

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1   

Note that I had upgraded BOMRang to the latest version:

> packageVersion("BOMRang")
[1] ‘0.7.3’

I found a work around for my script, but like most of these things, it would be nice to work out what is going on at some stage!

@mattecologist
Copy link

I'm also having the same issues, R 4.0.2 in RStudio 1.4.1103:

> readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt") Error in file(con, "r") : cannot open the connection to 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt' In addition: Warning message: In file(con, "r") : cannot open URL 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt': HTTP status was '403 Forbidden'

But its working fine directly through the R console...

Both are without the bomrang package loaded, just checking access to BoM.

Rstudio Session info:

`sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.0.2 assertthat_0.2.1 cli_2.3.1 tools_4.0.2 withr_2.4.1 glue_1.4.2 sessioninfo_1.1.1`

@jonocarroll
Copy link
Collaborator

I should have mentioned that I tested in a terminal. I can reproduce the error from RStudio 1.4.1623 on Mac. I don't see any prominent discussions about it, so maybe reach out to Twitter or RStudio directly?

@jonocarroll
Copy link
Collaborator

For reference, this can be traced slightly upstream to file(con, "r") failing.

@jonocarroll
Copy link
Collaborator

Okay, it looks like the RStudio HTTPUserAgent is being rejected... based on https://r.789695.n4.nabble.com/File-Downloading-Problem-td3022137.html I tried

op <- options()
getOption("HTTPUserAgent")
#> [1] "RStudio Desktop (1.4.1623); R (3.6.2 x86_64-apple-darwin15.6.0 x86_64 darwin15.6.0)"
options(HTTPUserAgent = "")
readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt")
options(op)

with success.

My suggestion would be to add the following to bomrang fetching functions:

op <- options()
on.exit(options(op))
options(HTTPUserAgent = "{bomrang} R package (0.7.4) https://github.com/ropensci/bomrang")
readLines(<file>)

Maybe someone from BOM will file an issue... maybe they're deliberately blocking RStudio as a way to limit what bomrang does.

@mattecologist
Copy link

@jonocarroll can confirm this is working for me too now. Thanks!

@paulr-bv
Copy link

Agree with @mattecologist, @jonocarroll, blanking the UserAgent works for me.
Restoring the options then has it fail again.

In the interim, I just updated my script to put a wrapper around calls to get_historical() and the script worked.

The wrapper is:

cfg_bom_http_fix <- TRUE 

if (cfg_bom_http_fix) {
  #Backup the options, then set the options to blank the agent for BOM call
  op_bak <- options()
  options(HTTPUserAgent = "")
}

data_min_raw <- get_historical(station_number, type = "min")

if (cfg_bom_http_fix) {
  # Restore
  options(op_bak)
}

Thanks again!

@adamhsparks
Copy link
Collaborator Author

adamhsparks commented Mar 18, 2021

Thanks for teasing this out, @jonocarroll. We just moved into new offices and our network firewalls are causing some issues here so I wasn't going to be a good test case for this.

That said, it does work.

BOM's file serving seems to have been unstable lately. I was having issues in RStudio and the base console last week as noted that it didn't respond in R or the browser. Last week we were in the old offices, so I didn't have the same firewall issues I've had here since Monday.

@jonocarroll
Copy link
Collaborator

I'll try to make a PR to add this new user agent. It may end up being blocked specifically, in which case I'd say opening a line of communication to BOM may be prudent.

jonocarroll added a commit that referenced this issue Mar 19, 2021
update tests to use 0-padding on stationID
closes #130
@adamhsparks
Copy link
Collaborator Author

adamhsparks commented Mar 19, 2021 via email

@jonocarroll
Copy link
Collaborator

Oh, snap, neat timing. Check out my PR, it looks like this specific issue was the tip of the iceberg.

@jonocarroll
Copy link
Collaborator

jonocarroll commented Mar 19, 2021

I can confirm that on develop in a terminal, the failure point is the embedded nul which is fread() failing to read the .zip. This is also resolved in my PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants