-
Notifications
You must be signed in to change notification settings - Fork 2
2.0.0 Merge #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
2.0.0 Merge #14
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added checks based on #11 to cross validate that metabolites in the Data and Metabolites sections match. Also edited tests so they work on Windows
Closes #7.
Issue #4 is mostly done with this. Still need to work on the list of things I added in my reply.
The --validate option wasn't implemented for download and it would be nontrivial to do it. Since there is already a dedicated validate command the easiest thing is to just remove the option.
Closes #4.
Added code to handle and validate having duplicate keys in the "Additional sample data". Also changed documentation where appropriate and added to changelog. Closes #5.
Fixed some issues that came up when trying to validate all of the Tab formatted files in the Workbench. Mostly just key errors from certain keywords being missing, but also an error when trying to print certain characters in unicode.
Closes #6. Added a validate method, a from_dict method, and changed some data members to properties.
Started repair command code and some various updates to other code that were found during repair creation and testing.
Added the repair command, still testing. Just about to change over from using jdks.JSON_DUPLICATE_KEYS directly, to using a class made from it.
Forgot to commit for a while. Lots of small various changes. Big additions are the code to repair shifted rows and regexes for metabolites columns.
Some reorganization. Removed consolidate_values_to_column and fix_missing_tab_in_table because they were largely replaced by row shift. fix_missing_tab_in_table was actually doing some things that shouldn't be done. It's hard to remember why I started creating that function, but after testing what it actually did, it was mostly just removing things it should not.
Added try except blocks to the tokenizer to make a couple of the common parsing errors easier to understand. Had to change some things in repair_metabolites_matching because the pyarrow dtype pandas regex matching methods don't behave the same as the python regex methods. Lots of work in repair on getting the metabolite name matching to be more robust, and removal of duplicates. Added methods to the mwtab class to set some internal data members from a dataframe, so that repair isn't doing it directly.
Decided to change removing duplicates into augmenting them, so moved that code from repair to a new augment module to make it easier to work with.
Big addition to augment. Lots of code added for augmenting duplicates. Needs a lot of cleaning.
Just before changing the repair_shifted_rows file to use the ColumnMatcher class.
Added ColumnFinder class to replace the functions and dictionary that handle this stuff before.
Logic for augmenting duplicates is finished. Want to capture some stuff before I delete it.
Lots of small changes. Refactored the fuzz ratio function. Added a lot of things to improve the duplicates augmentation.
rcha_metab is going to handle all of the repairing and augmenting, so I have moved those files to the new package.
Removed the repair command since it is going to rcha_metab now.
Added a step to fill NA values with the empty string when setting table values from a pandas dataframe.
Added methods to get tabular data as pandas dataframe to MWTabFile class.
Eric tried to import and got an error.
Eric found an issue in get_metabolites_as_pandas where if the table_name is not there you get a key error. I changed it so it looks for the table_name and returns an empty dataframe if not there.
Eric found an issue where if a file raised an error during read it wasn't being closed, so fixed that. Also added some documentation to NameMatcher.
Renamed metabolites_regexes to metadata_column_matching. Also added some documentation to the classes.
Added significantly to metadata column matching and worked on conf to change the theme.
Added some of the todos to the validator. Needed to investigate one of the todos before committing to it, and ended up finding an issue with some JSON outputs not being valid JSON. Tracked that down and fixed it in mwtab.py and tested it. All mwtab files parsed and saved out to JSON were valid JSON. Also went through the standard column names and made them all lower case and using underscores. We had decided to do that, but I hadn't gotten around to it.
Adding in some additional subsection validations. Committing to capture one implementation of validation classes before trying something a little different.
Gave up on making Schema work and switched everything over to jsonschema. I am about to make another significant change and want to capture what is there currently for easy reversion if needed.
Everything got changed over to jsonschema, but I am about to radically change one design direction I went in, so this commit is to capture the current state before this significant change.
Forgot to commit as I made major changes. The biggest changes were changing DuplicatesDict and updating the MWTabFile class so that the text and JSON representations are the same when read in. There were many changes even into the tokenizer to support the internal representation changes. The DuplicatesDict changes were a part of this as well, just making it much faster than it was previously. Changes in validator to remove some code that I changed over to test table wise instead of looping over the list of dicts. Also some of the updates to the mwschema and propagating them through the package are in this commit.
Lots of small improvements and changes to validation.
Various small updates to support validation changes and fix bugs.
Mostly a lot of cleaning up before getting started on testing with pytest. Filled in some docstrings, removed some comments, fixed the last few issues found with schema errors.
Added and fixed a lot of tests to get the coverage up. Also refactored cli.py for DRY purposes.
Added tests to get full coverage. Some minor changes/fixes to the code that I found as I was testing.
Fixed a lot of warnings that Sphinx printed. Lots of little updates like names, dependencies/requirements. Made changes to the html theme that I like.
A few cleanup changes preparing for the next release, but also adding back in read_lines to fileio. Turns out rcha_metab uses it and it is easier to leave it in the mwtab package.
AN003335 had a new unique problem with some of the Additional data in the SSF. I changed the code so it can handle that situation.
Many different issues and improvements done coming from things found while getting results for the paper. Added CLI option to save out validation JSON. Added CLI option for silent to validate. Added CLI option for force to validate and convert. Added some validations and removed some from JSON Schema to reduce spurious messages.
Removed importing requests because it was unnecessary and caused a failure on GitHub testing.
Somehow subprocess was missing lines to turn the command line into a list of strings.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.