Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make float parsing from CSV round-trip #444

Merged
merged 2 commits into from
Jun 9, 2020
Merged

Conversation

lcontento
Copy link
Contributor

pandas uses by default a C parser for CSV files for performance reasons. The default behaviour of such parser when parsing floating point values from strings is different from Python's behaviour. While Python parsing can be round-tripped (e.g., float(0.999) == 0.999) this is not true for the C parser.
I believe most users nowadays expect floats to be parsed correctly, so I added a keyword argument to all instances of pandas.read_csv asking for the Python-like behaviour. The only downside is that this behaviour is slower (the C code actually calls Python to perform the parsing). This may be more relevant for the parameter table and less important for measurement tables, so it may also be acceptable to enable it only for some tables and not others.

Change the float parsing behaviour of `pandas.read_csv` to be the same
as in Python (e.g., in Python `float(0.999) == 0.999`).
@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #444 into develop will not change coverage.
The diff coverage is 78.57%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #444   +/-   ##
========================================
  Coverage    77.88%   77.88%           
========================================
  Files           22       22           
  Lines         2211     2211           
  Branches       529      529           
========================================
  Hits          1722     1722           
  Misses         363      363           
  Partials       126      126           
Impacted Files Coverage Δ
petab/sbml.py 51.66% <0.00%> (ø)
petab/visualize/plotting_config.py 56.98% <66.66%> (ø)
petab/conditions.py 90.47% <100.00%> (ø)
petab/core.py 82.91% <100.00%> (ø)
petab/measurements.py 80.24% <100.00%> (ø)
petab/observables.py 96.61% <100.00%> (ø)
petab/parameters.py 82.43% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c34e7a9...a4f8e4b. Read the comment docs.

@lcontento
Copy link
Contributor Author

I resolved some flake8 errors to make CI succeed, but Codacy says that its checks are not enabled for the PR branch.

@yannikschaelte
Copy link
Member

codacy is a bit shy sometimes.

@yannikschaelte yannikschaelte merged commit dc253fe into develop Jun 9, 2020
@lcontento lcontento deleted the fix_csv_floats branch June 9, 2020 16:00
@yannikschaelte yannikschaelte mentioned this pull request Jul 23, 2020
dweindl added a commit that referenced this pull request Sep 21, 2020
Release 0.1.9

Library:

* Allow URL as filenames for YAML files and SBML models (Closes #187) (#459)
* Allow model time in observable formulas (#445)
* Make float parsing from CSV round-trip (#444)
* Validator: Error message for missing IDs, with line numbers. (#467)
* Validator: Detect duplicated observable IDs (#446)
* Some documentation and CI fixes / updates
* Visualization: Add option to save visualization specification (#457)
* Visualization: Column XValue not mandatory anymore (#429)
* Visualization: Add sorting of indices of dataframes for the correct sorting
  of x-values (#430)
* Visualization: Default value for the column x_label in vis_spec (#431)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants