Datatype comparison bug 2021-12-01 #90

andrew-weisman · 2021-12-02T08:40:19Z

Without "dtype=self.slide_data['slide_id'].dtype", read_csv() will convert all-number columns to a numerical type. Even if we convert numerical columns back to objects later, we may lose zero-padding in the process; the columns must be correctly read in from the get-go. When we compare the individual train/val/test columns to self.slide_data['slide_id'] in the get_split_from_df() method, we cannot compare objects (strings) to numbers or even to incorrectly zero-padded objects/strings. An example of this breaking is shown in https://github.com/andrew-weisman/clam_analysis/tree/main/datatype_comparison_bug-2021-12-01 (look at the Jupyter notebook in GitHub).

fedshyvana · 2021-12-02T17:06:35Z

thanks Andrew, i did not anticipate slide ids to consist of only numerical characters but i suppose that is indeed possible.

andrew-weisman · 2021-12-03T01:21:32Z

Thanks very much Max! I apologize I didn’t see the “Allow edits by maintainers” checkbox that I am now seeing, but of course feel free to modify as you see fit. Thanks so much for making this software available, I can’t wait to get it working on our data! Best, Andrew [Text Frederick National Laboratory on a teal background] [LinkedIn icon]<https://www.linkedin.com/company/frederick-national-laboratory-for-cancer-research/> [Twitter icon] <https://twitter.com/FredNatLab> [Facebook icon] <https://www.facebook.com/FredNatLab> [Instagram icon] <https://www.instagram.com/frednatlab/> Andrew Weisman, Ph.D. | High Performance Computing Analyst Strategic and Data Science Initiatives [Phone icon] 240-276-5891 [Email icon] ***@***.******@***.***> [Contractor] [Location icon] 9605 Medical Center Dr, Rm 300-21, Rockville, MD 20850 [Link icon] frederick.cancer.gov<https://frederick.cancer.gov/> The Frederick National Laboratory for Cancer Research is operated by Leidos Biomedical Research, Inc. for the National Cancer Institute. From: Max Lu ***@***.***> Sent: Thursday, December 2, 2021 12:07 PM To: mahmoodlab/CLAM ***@***.***> Cc: Weisman, Andrew (NIH/NCI) [C] ***@***.***>; Author ***@***.***> Subject: Re: [mahmoodlab/CLAM] Datatype comparison bug 2021-12-01 (PR #90) thanks Andrew, i did not anticipate slide ids to consist of only numerical characters but i suppose that is indeed possible. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#90 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKPX6F772HMQUBELJZGL4N3UO6RSLANCNFSM5JGQJZDQ>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

…son_bug-2021-12-01 Datatype comparison bug 2021-12-01

andrew-weisman added 3 commits November 29, 2021 20:35

Start making changes to main.py per CLAM GitHub README

a7b2d22

Get training working and fix a datatype comparison bug

5705cff

Isolate fix for datatype comparison bug

33c3853

fedshyvana merged commit 5efe3ea into mahmoodlab:master Dec 2, 2021

doori pushed a commit to msk-mind/CLAM that referenced this pull request Jan 26, 2022

Merge pull request mahmoodlab#90 from andrew-weisman/datatype_compari…

88da7ca

…son_bug-2021-12-01 Datatype comparison bug 2021-12-01

ff98li mentioned this pull request Feb 26, 2024

Fix: Slide ids turned into floats in split csv when names consist of only number #228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datatype comparison bug 2021-12-01 #90

Datatype comparison bug 2021-12-01 #90

andrew-weisman commented Dec 2, 2021

fedshyvana commented Dec 2, 2021

andrew-weisman commented Dec 3, 2021 via email

Datatype comparison bug 2021-12-01 #90

Datatype comparison bug 2021-12-01 #90

Conversation

andrew-weisman commented Dec 2, 2021

fedshyvana commented Dec 2, 2021

andrew-weisman commented Dec 3, 2021 via email