Skip to content

OECD test_get_tourism test fails (apparent data change). #164

Closed
@jtkiley

Description

@jtkiley

For the past couple of weeks, the OECD test_get_tourism test has been failing. The last tests for #157 (on January 11) passed, but the ones after that time seem to have failed reliably.

If you run the data pull from the test, it returns real data.

>>> import pandas_datareader as pdr
>>> from datetime import datetime
>>> df = pdr.DataReader('TOURISM_INBOUND', 'oecd', start=datetime(2005, 1, 1),
...                     end=datetime(2012, 1, 1))
>>> df
Country                       Australia                                 \
Variable   Total international arrivals   China United Kingdom   Japan   
Year                                                                     
2008-01-01                      5512313  350155         686715  450133   
2009-01-01                      5490229  356239         676822  348465   
2010-01-01                      5790260  445855         660335  390555   
2011-01-01                      5770902  533370         622254  325740   
2012-01-01                      6032317  618818         608319  348054   

Country                                                                     \
Variable   New Zealand United States Switzerland and Liechtenstein Germany   
Year                                                                         
2008-01-01     1099442        448836                           NaN     NaN   
2009-01-01     1094418        472233                           NaN     NaN   
2010-01-01     1145959        462941                           NaN     NaN   
2011-01-01     1156307        447148                           NaN     NaN   
2012-01-01     1184656        470858                           NaN     NaN   

Country                         ...    Philippines                     \
Variable   Italy Netherlands    ...        Denmark Thailand Singapore   
Year                            ...                                     
2008-01-01   NaN         NaN    ...            NaN      NaN       NaN   
2009-01-01   NaN         NaN    ...            NaN      NaN       NaN   
2010-01-01   NaN         NaN    ...            NaN      NaN       NaN   
2011-01-01   NaN         NaN    ...            NaN      NaN       NaN   
2012-01-01   NaN         NaN    ...            NaN      NaN       NaN   

Country                                                               \
Variable   Malaysia Private accommodation Specialised establishments   
Year                                                                   
2008-01-01      NaN                   NaN                        NaN   
2009-01-01      NaN                   NaN                        NaN   
2010-01-01      NaN                   NaN                        NaN   
2011-01-01      NaN                   NaN                        NaN   
2012-01-01      NaN                   NaN                        NaN   

Country                                            
Variable   Venezuela Ecuador Bangladesh Sri Lanka  
Year                                               
2008-01-01       NaN     NaN        NaN       NaN  
2009-01-01       NaN     NaN        NaN       NaN  
2010-01-01       NaN     NaN        NaN       NaN  
2011-01-01       NaN     NaN        NaN       NaN  
2012-01-01       NaN     NaN        NaN       NaN  

[5 rows x 4368 columns]
>>> df['United States']['Total international arrivals']
Year
2008-01-01    175702309
2009-01-01    160507417
2010-01-01    164079732
2011-01-01    167600277
2012-01-01    171320408
Name: Total international arrivals, dtype: float64

However, when comparing it to the test, three main issues stand out.

  1. The precision of the stats appears to have changed from thousands (test) to individuals.
  2. The stats also appear to have been updated (not just rounded), as the comparable U.S. number for 2008 is 175632.0 in the test and 175702309 in the data.
  3. The time coverage now appears to start at 2008, though the tests are written assuming 2005.

These observations and the timing of the test failure seem to fit with the recent update (January 2016) noted by and current data available from OECD.

If we're fine with the testing approach, I should be able to update the test to have it pass when pulling the new data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions