Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatype comparison bug 2021-12-01 #90

Conversation

andrew-weisman
Copy link
Contributor

Without "dtype=self.slide_data['slide_id'].dtype", read_csv() will convert all-number columns to a numerical type. Even if we convert numerical columns back to objects later, we may lose zero-padding in the process; the columns must be correctly read in from the get-go. When we compare the individual train/val/test columns to self.slide_data['slide_id'] in the get_split_from_df() method, we cannot compare objects (strings) to numbers or even to incorrectly zero-padded objects/strings. An example of this breaking is shown in https://github.com/andrew-weisman/clam_analysis/tree/main/datatype_comparison_bug-2021-12-01 (look at the Jupyter notebook in GitHub).

@fedshyvana fedshyvana merged commit 5efe3ea into mahmoodlab:master Dec 2, 2021
@fedshyvana
Copy link
Collaborator

thanks Andrew, i did not anticipate slide ids to consist of only numerical characters but i suppose that is indeed possible.

@andrew-weisman
Copy link
Contributor Author

andrew-weisman commented Dec 3, 2021 via email

doori pushed a commit to msk-mind/CLAM that referenced this pull request Jan 26, 2022
…son_bug-2021-12-01

Datatype comparison bug 2021-12-01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants