API functionality revamp, text fixes, README revamp #15

prakaa · 2021-05-08T13:16:06Z

API functionality revamp (type inference for some API functions) and fix tests, major README changes

Initial PR made 8/5/2021. Leaving PR open though further changes are being made - as these are incorportated into the PR, I will tick tasks off.

API (Type inference & other changes)

Initial fixes

Tests and GUI require data to be interpreted as strings. This interferes with API functionality introduced in Beefing up command line dynamic data handling functionality #11, as parquet and feather files save a schema that includes column types
- Initial solution was to bypass this issue without changing lots of tests etc, added an additional parameter parse_data_types, which is default True for API and set to false in a gui wrapper function (follows structure of other functions wrapped for GUI). This parses data types on reading the AEMO csv.
- However this could lead to user error, whereby cached data may be stored as string datatypes, but parse_data_types will not parse the data types when reading existing files.

Further functionality

Should ensure new changes do not 'break' cache use - will add a separate cache functionality that caches type inferred data
Add new cache_compiler option that has typical cache args from dynamic_data_compiler built in (e.g. keep_csv=False, fformat=parquet or fformat=feather and data_merge=False. It will infer data types when CSVs from AEMO are downloaded and read in.
Add tests for cache_compiler
parse_data_types will remain but will parse data types of the DataFrame regardless of file type (i.e. parsing when cache or new file read, not just when new file read). Data from csv will always be read in as string
- parsing is implemented after dynamic_data_compiler has concatenated the list of DataFrames that _dynamic_data_fetch_loop returns. Parsing before concatenation can lead to typed columns being reverted to object once concatenation occurs (e.g. INTERVENTION went from Int to object).
- parsing is also done before filtering data with filter_cols and filter_values. If a user provides a numeric filter value (e.g. RAISE5MIN=5), the pre-parsed DataFrame will have all columns as objects and therefore return an empty DataFrame (unless the user provides RAISE5MIN="5"). This is not expected behaviour, so parsing occurs before filtering. Datetimes can be filtered using user-provided datetime strings or datetime objects
- GUI wrapper should have parse_data_types=False since GUI uses string joins
- API users will have parse_data_types=True since operations on columns may require them to be numeric
Make modules, inner functions and variables in data_fetch_methods private and push key functions from data_fetch_methods into package namespace (i.e. so that from nemosis import dynamic_data_compiler is possible)

Code readability

Internals of dynamic_data_compiler and cache_compiler will be broken out into private functions.

Readme

Workflow section for API user
Rewrite dynamic_data_compiler section with more advanced filtering examples.
Include cache_compiler, with note that it will delete csvs in a cache. However, if it detects pre-cached feather or parquet files, it will not do anything (e.g. if cache_compiler is run in the GUI cache, it will print that the cache has already been compiled)
Remove submodule import - users can now directly import main functions from nemosis
Python syntax highlighting
Table of contents

Changes to tests

FCAS Causer Pays (4s data) tests were failing as dates of data to be downloaded for test were > 60 days old (2 months of data available). Modified tests to pull data based on current date. Tests are skipped for year boundaries if the year boundary > 60 days ago. Tests also now check if the length of the Causer Pays file is appropriate +/- 1 entry (if the data starts at 00:00:03, for example, there is one less entry than would be calculated for data starting at 00:00:00).
Test suite data dates updated to 2018, and dates across test suites overlapped - this reduces the amount of data that needs to be downloaded and hence improves testing speed. However, the size of downloaded data is still significant, so all dynamic_data_compiler calls in tests are set to release feather files and delete original CSVs.
Changing test suite data is a problem where expected length is doubled due to intervention rows. Refactored test code to handle cases where interventions are an issue.
Change pandas testing import based on deprecation warning.

Other

FCAS variables file URL and name changed to reflect AEMO website
data_fetch_methods.py, filters.py and test_data_fetch_methods.py styled (flake8)

Testing

Test suite run to ensure newer changes to data_fetch_methods work. Report for tests:
Test Report.pdf
- tests were modified (FCAS changes, update and overlap data dates, intervention handling) and commit
  f963eb0 passed
- since f963eb0, caching and new parsing functionality has been incorporated. this new functionality should pass tests as of f963eb0
New changes tested for API (spot checks) with fresh install of Python on Ubuntu 20.04
- with basic settings dynamic_data_compiler downloads DISPATCHLOAD csv, releases feather file. The returned DataFrame is typed (which should happen for API users), but the saved feather had columns as objects/strings.
- "legacy" code will work (i.e. data_fetch_methods.dynamic_data_compiler vs just dynamic_data_compiler)
- cache_compiler releases parquet/feather for DISPATCHLOAD and deletes csv in cache. The remaining file is typed. Different compression engines were passed to the write function and this worked. The file was then reloaded using dynamic_data_compiler and this worked, with a typed DataFrame loaded.
Quick performance test:
- %timeit data_fetch_methods.dynamic_data_compiler("2018/01/01 00:00:00", "2018/01/01 23:55:00", "DISPATCHLOAD", './alt_data') with precompiled feather cache
- f963eb0 (following initial fixes): 833 +/- 15 ms, 7 runs
- final commit in this PR (5026915): 902 +/- 21.7 ms, 7 runs - likely due to additional if and try-except blocks. Relatively negligible.
New changes tested for GUI (spot checks)

…SW-CEEM-master

Merge UNSW-CEEM master into fork master - pocket rocket nemosis changes made

API revamped functionaly merge

prakaa · 2021-05-23T04:57:04Z

@nick-gorman see outline of all changes above. GUI still needs to be tested (checkbox unticked)

nick-gorman · 2021-05-24T01:24:17Z

Looks good Abi, I'll merge, compile the GUI, draft a release and publish to pypi

prakaa added 15 commits February 12, 2020 15:07

Merge branch 'UNSW-CEEM-master'

5cca1b1

fix case of .csv so downloaded files are recognised

e311a96

Created using Colaboratory

28361b0

Delete nem_simulator.ipynb

10edd3d

Merge branch 'master' of https://github.com/UNSW-CEEM/NEMOSIS into UN…

24053a6

…SW-CEEM-master

Merge pull request #3 from prakaa/UNSW-CEEM-master

9a85e25

Merge UNSW-CEEM master into fork master - pocket rocket nemosis changes made

default parse data for CLI, wrapper to handle GUI

1a31105

update import from pandas - deprecation warning

a71e178

update import statement - deprecation warning

876aa21

fix 4s tests, refactor, new common dates, feather

8d3896c

update 4s data variables link

e3fec45

update dates - overlap with data_fetch tests

b0d7c92

fix uncorrected expected dates

71fea26

fix 4s test and only keep feather

96182dc

info map test fix

f963eb0

prakaa mentioned this pull request May 8, 2021

Is full data range necessary for tests? #16

Open

Merge branch 'UNSW-CEEM:master' into master

defd217

prakaa changed the title ~~CLI data type inferal, FCAS 4s test fixes, minor test refactoring + updates~~ API type inferral, API caching function, test fixes May 22, 2021

prakaa marked this pull request as draft May 22, 2021 01:55

prakaa changed the title ~~API type inferral, API caching function, test fixes~~ API type inference, API caching function, test fixes May 22, 2021

prakaa added 4 commits May 22, 2021 16:30

parsing, private and break functions out

ded09c4

change test refs to private var, func

446b875

expose public functions in data_fetch

fcb1e64

add cache_compiler

fec8dd2

prakaa changed the title ~~API type inference, API caching function, test fixes~~ API functionality revamp, test fixes May 22, 2021

prakaa added 5 commits May 22, 2021 19:08

spot check fixes

e3de9b7

add cache compiler tests

19add83

remove unnecessary print

7e5454d

error handling if data is not available

97fc85f

fix setup function bug, no processed time return

4eb8231

prakaa added 18 commits May 23, 2021 09:19

meets flake8 specs and type testing for caching

b2e5024

incorrect bracket after formatting test

ff11ff0

actually fix incorrect bracketing in test

e84a283

add setup time fix to cache compiler

bf379e0

expand on return if error in csv read

12086a5

space in failure print string

df3139b

fix bracketing (again)

985baa9

parsing types b4 filtering - for numeric filters

3fec993

styled with flake8

bf16136

cache testing to 30 mins for trading tables

fa8da37

cache compiler doesn't read data if already cached

d07fd72

tryexcept to handle cols not in fthr/pqt

3742258

Merge pull request #4 from prakaa/type_inference

5026915

API revamped functionaly merge

rewrite readme based on new changes

3f66477

add TOC

649144f

remove TOC from TOC lol

f39b9db

refs to section in workflow

f68bd71

fix ToC links

12c68b2

prakaa changed the title ~~API functionality revamp, test fixes~~ API functionality revamp, text fixes, README revamp May 23, 2021

prakaa marked this pull request as ready for review May 23, 2021 04:56

add section dividers

17360d0

nick-gorman merged commit c1ea130 into UNSW-CEEM:master May 24, 2021

prakaa mentioned this pull request May 25, 2021

"Canary" changes for user column inputs #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API functionality revamp, text fixes, README revamp #15

API functionality revamp, text fixes, README revamp #15

prakaa commented May 8, 2021 •

edited by nick-gorman

Loading

prakaa commented May 23, 2021

nick-gorman commented May 24, 2021

API functionality revamp, text fixes, README revamp #15

API functionality revamp, text fixes, README revamp #15

Conversation

prakaa commented May 8, 2021 • edited by nick-gorman Loading

API functionality revamp (type inference for some API functions) and fix tests, major README changes

API (Type inference & other changes)

Initial fixes

Further functionality

Code readability

Readme

Changes to tests

Other

Testing

prakaa commented May 23, 2021

nick-gorman commented May 24, 2021

prakaa commented May 8, 2021 •

edited by nick-gorman

Loading