2.0.0 Merge #14

ptth222 · 2025-12-13T20:47:06Z

No description provided.

Added checks based on #11 to cross validate that metabolites in the Data and Metabolites sections match. Also edited tests so they work on Windows

Closes #10.

Closes #8.

Closes #7.

Issue #4 is mostly done with this. Still need to work on the list of things I added in my reply.

The --validate option wasn't implemented for download and it would be nontrivial to do it. Since there is already a dedicated validate command the easiest thing is to just remove the option.

Closes #4.

Added code to handle and validate having duplicate keys in the "Additional sample data". Also changed documentation where appropriate and added to changelog. Closes #5.

Fixed some issues that came up when trying to validate all of the Tab formatted files in the Workbench. Mostly just key errors from certain keywords being missing, but also an error when trying to print certain characters in unicode.

Closes #6. Added a validate method, a from_dict method, and changed some data members to properties.

Started repair command code and some various updates to other code that were found during repair creation and testing.

Added the repair command, still testing. Just about to change over from using jdks.JSON_DUPLICATE_KEYS directly, to using a class made from it.

Forgot to commit for a while. Lots of small various changes. Big additions are the code to repair shifted rows and regexes for metabolites columns.

Some reorganization. Removed consolidate_values_to_column and fix_missing_tab_in_table because they were largely replaced by row shift. fix_missing_tab_in_table was actually doing some things that shouldn't be done. It's hard to remember why I started creating that function, but after testing what it actually did, it was mostly just removing things it should not.

Added try except blocks to the tokenizer to make a couple of the common parsing errors easier to understand. Had to change some things in repair_metabolites_matching because the pyarrow dtype pandas regex matching methods don't behave the same as the python regex methods. Lots of work in repair on getting the metabolite name matching to be more robust, and removal of duplicates. Added methods to the mwtab class to set some internal data members from a dataframe, so that repair isn't doing it directly.

Decided to change removing duplicates into augmenting them, so moved that code from repair to a new augment module to make it easier to work with.

Big addition to augment. Lots of code added for augmenting duplicates. Needs a lot of cleaning.

Just before changing the repair_shifted_rows file to use the ColumnMatcher class.

Added ColumnFinder class to replace the functions and dictionary that handle this stuff before.

Logic for augmenting duplicates is finished. Want to capture some stuff before I delete it.

Lots of small changes. Refactored the fuzz ratio function. Added a lot of things to improve the duplicates augmentation.

rcha_metab is going to handle all of the repairing and augmenting, so I have moved those files to the new package.

Removed the repair command since it is going to rcha_metab now.

Added a step to fill NA values with the empty string when setting table values from a pandas dataframe.

Eric found a bug where if you don't specify the --output-format it causes an error. I fixed this. There is also a small change to validate where I added 1 validation that needs to be tested.

Added methods to get tabular data as pandas dataframe to MWTabFile class.

Eric tried to import and got an error.

Eric found an issue in get_metabolites_as_pandas where if the table_name is not there you get a key error. I changed it so it looks for the table_name and returns an empty dataframe if not there.

Eric found an issue where if a file raised an error during read it wasn't being closed, so fixed that. Also added some documentation to NameMatcher.

Renamed metabolites_regexes to metadata_column_matching. Also added some documentation to the classes.

Added significantly to metadata column matching and worked on conf to change the theme.

Added some of the todos to the validator. Needed to investigate one of the todos before committing to it, and ended up finding an issue with some JSON outputs not being valid JSON. Tracked that down and fixed it in mwtab.py and tested it. All mwtab files parsed and saved out to JSON were valid JSON. Also went through the standard column names and made them all lower case and using underscores. We had decided to do that, but I hadn't gotten around to it.

Adding in some additional subsection validations. Committing to capture one implementation of validation classes before trying something a little different.

Gave up on making Schema work and switched everything over to jsonschema. I am about to make another significant change and want to capture what is there currently for easy reversion if needed.

Everything got changed over to jsonschema, but I am about to radically change one design direction I went in, so this commit is to capture the current state before this significant change.

Forgot to commit as I made major changes. The biggest changes were changing DuplicatesDict and updating the MWTabFile class so that the text and JSON representations are the same when read in. There were many changes even into the tokenizer to support the internal representation changes. The DuplicatesDict changes were a part of this as well, just making it much faster than it was previously. Changes in validator to remove some code that I changed over to test table wise instead of looping over the list of dicts. Also some of the updates to the mwschema and propagating them through the package are in this commit.

Cleaned up a lot of commented out code that has been hanging around and I feel is safe to remove now.

Lots of small improvements and changes to validation.

Various small updates to support validation changes and fix bugs.

Mostly a lot of cleaning up before getting started on testing with pytest. Filled in some docstrings, removed some comments, fixed the last few issues found with schema errors.

Added and fixed a lot of tests to get the coverage up. Also refactored cli.py for DRY purposes.

Added tests to get full coverage. Some minor changes/fixes to the code that I found as I was testing.

Fixed a lot of warnings that Sphinx printed. Lots of little updates like names, dependencies/requirements. Made changes to the html theme that I like.

A few cleanup changes preparing for the next release, but also adding back in read_lines to fileio. Turns out rcha_metab uses it and it is easier to leave it in the mwtab package.

AN003335 had a new unique problem with some of the Additional data in the SSF. I changed the code so it can handle that situation.

Many different issues and improvements done coming from things found while getting results for the paper. Added CLI option to save out validation JSON. Added CLI option for silent to validate. Added CLI option for force to validate and convert. Added some validations and removed some from JSON Schema to reduce spurious messages.

Some small bug fixes to make some previous major changes work.

Removed importing requests because it was unnecessary and caused a failure on GitHub testing.

Somehow subprocess was missing lines to turn the command line into a list of strings.

ptth222 added 30 commits April 8, 2024 19:24

Added validations

dd36103

Added checks based on #11 to cross validate that metabolites in the Data and Metabolites sections match. Also edited tests so they work on Windows

Swicthed to src layout and pyproject.toml

0b34c82

Updated workflows

b639fa3

Update readme

1244622

Reorder sections before printing.

c98107b

Closes #10.

Added validations on top level keywords

03e4b95

Closes #8.

Dropped in changes from issue #7

8116f8e

Closes #7.

Added fault tolerance

cd68333

Issue #4 is mostly done with this. Still need to work on the list of things I added in my reply.

Removed --validate option from download

a94bff7

The --validate option wasn't implemented for download and it would be nontrivial to do it. Since there is already a dedicated validate command the easiest thing is to just remove the option.

Updated documentation and CLI

47540d6

Closes #4.

Handle duplicate keys in JSON

049b758

Added code to handle and validate having duplicate keys in the "Additional sample data". Also changed documentation where appropriate and added to changelog. Closes #5.

Update validator.py

6c9b242

Fixed some issues that came up when trying to validate all of the Tab formatted files in the Workbench. Mostly just key errors from certain keywords being missing, but also an error when trying to print certain characters in unicode.

Modified MWTabFile

66afdd1

Closes #6. Added a validate method, a from_dict method, and changed some data members to properties.

Added repair

8a6325f

Started repair command code and some various updates to other code that were found during repair creation and testing.

Added repair beta

f294e0e

Added the repair command, still testing. Just about to change over from using jdks.JSON_DUPLICATE_KEYS directly, to using a class made from it.

Added DuplicatesDict

72d017b

Fixes and repair adds

88d7bd4

Forgot to commit for a while. Lots of small various changes. Big additions are the code to repair shifted rows and regexes for metabolites columns.

Add augmentation module

2cd56b1

Decided to change removing duplicates into augmenting them, so moved that code from repair to a new augment module to make it easier to work with.

Augment changes

44ddc29

Big addition to augment. Lots of code added for augmenting duplicates. Needs a lot of cleaning.

Before major class change

0395bba

Just before changing the repair_shifted_rows file to use the ColumnMatcher class.

Column matching changed to class

aaa3ec5

Added ColumnFinder class to replace the functions and dictionary that handle this stuff before.

Duplicate augmentation finished

441b77b

Logic for augmenting duplicates is finished. Want to capture some stuff before I delete it.

Major rework to find_metabolite_families

6531312

Improved augment

81324bf

Lots of small changes. Refactored the fuzz ratio function. Added a lot of things to improve the duplicates augmentation.

Moved files to rcha_metab

08fe853

rcha_metab is going to handle all of the repairing and augmenting, so I have moved those files to the new package.

Update cli.py

105ff4a

Removed the repair command since it is going to rcha_metab now.

Update mwtab.py

0374ada

Added a step to fill NA values with the empty string when setting table values from a pandas dataframe.

CLI fix

e4c97bf

Eric found a bug where if you don't specify the --output-format it causes an error. I fixed this. There is also a small change to validate where I added 1 validation that needs to be tested.

ptth222 added 29 commits July 3, 2025 16:32

Added methods to mwtab

c32819c

Added methods to get tabular data as pandas dataframe to MWTabFile class.

import fix

2316995

Eric tried to import and got an error.

Bug fix mwtab.py

10fe6a6

Eric found an issue in get_metabolites_as_pandas where if the table_name is not there you get a key error. I changed it so it looks for the table_name and returns an empty dataframe if not there.

Fix fileio

6de43b7

Eric found an issue where if a file raised an error during read it wasn't being closed, so fixed that. Also added some documentation to NameMatcher.

File rename

864a124

Renamed metabolites_regexes to metadata_column_matching. Also added some documentation to the classes.

Testing documentation building.

f326e24

Test documentation creation again.

5fff282

Documentation update

e44e061

Added significantly to metadata column matching and worked on conf to change the theme.

subsection validations

da57d85

Adding in some additional subsection validations. Committing to capture one implementation of validation classes before trying something a little different.

Changed to jsonschema

6272c66

Gave up on making Schema work and switched everything over to jsonschema. I am about to make another significant change and want to capture what is there currently for easy reversion if needed.

mwschema change

e1a5a12

Everything got changed over to jsonschema, but I am about to radically change one design direction I went in, so this commit is to capture the current state before this significant change.

Cleaning

38f43e3

Cleaned up a lot of commented out code that has been hanging around and I feel is safe to remove now.

Validation improvements

5f0c5d8

Lots of small improvements and changes to validation.

Validation updates

a7a034d

Various small updates to support validation changes and fix bugs.

Various changes.

0e4559b

Mostly a lot of cleaning up before getting started on testing with pytest. Filled in some docstrings, removed some comments, fixed the last few issues found with schema errors.

Added tests

bf8f6ff

Added and fixed a lot of tests to get the coverage up. Also refactored cli.py for DRY purposes.

Full coverage

13851d1

Added tests to get full coverage. Some minor changes/fixes to the code that I found as I was testing.

Lots of docs changes

240e5c0

Fixed a lot of warnings that Sphinx printed. Lots of little updates like names, dependencies/requirements. Made changes to the html theme that I like.

Mostly cleanup

a45ed17

A few cleanup changes preparing for the next release, but also adding back in read_lines to fileio. Turns out rcha_metab uses it and it is easier to leave it in the mwtab package.

Slight improvement to tokenizer

dd864fd

AN003335 had a new unique problem with some of the Additional data in the SSF. I changed the code so it can handle that situation.

Bug fixes

28c6f7c

Some small bug fixes to make some previous major changes work.

Updated tests and workflows

36b0b4c

Update test_cli.py

8a62f00

Removed importing requests because it was unnecessary and caused a failure on GitHub testing.

Removed support for Python 3.8 and 3.9

0fd0061

Added pyarrow requirement

b3ea7e8

Update test_cli.py

7341d2a

Somehow subprocess was missing lines to turn the command line into a list of strings.

ptth222 merged commit a8542fe into main Dec 13, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2.0.0 Merge #14

2.0.0 Merge #14

Uh oh!

ptth222 commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2.0.0 Merge #14

2.0.0 Merge #14

Uh oh!

Conversation

ptth222 commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants