-
Notifications
You must be signed in to change notification settings - Fork 117
Conversation
2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data
mart-r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the lazy logging formatting.
Also, perhaps that should be a logger.debug instead?
PS:
At initial look it seems like you're changing the order of the returned arguments in the method as well as when unpacking. And that wouldn't change the behaviour. But for the lines changed, that is the intention.
What the change actually does is force the use of encode_category_values in line 354 to use the full data rather than the undersampled data.
So there's a little bit of a confusing path of fixing an issue. In order to fix an issue of unpacking data in line 354, the order of the returned arguments is changed everywhere else.
But this order probably does make sense overall.
medcat/utils/meta_cat/data_utils.py
Outdated
| if data[i][2] in category_value2id.values(): | ||
| label_data_[data[i][2]] = label_data_[data[i][2]] + 1 | ||
|
|
||
| logger.info(f"Original label_data: {label_data_}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to use lazy formatting for logging. So instead of an f-string, use %s formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes I'll fix that!
Regarding the ordering of unpacking, I was going to only change the order for link 354, however Tom pointed out that it would make the most sense to have it full_data and then undersampled_data
That's why the long route
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I make the change for all logging statements to use lazy logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally don't want to change the purpose of a PR.
But I don't think there's too many. So why not.
As for unpacking - yes, I agree. This order makes more sense in general. Just thought I'd mention it.
mart-r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazy logging means allowing the logger to parse in the % formats if (and only if) the logged message has a handler (i.e if it needs to be shown/written somewhere).
So a messages gets logged as follows:
logger.info("Messages about %s and nr %d", "topic 1", 4)
|
Fixed the lazy logging formatting |
mart-r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
2-phase learning for MetaCAT utilises
data_undersampled.Fixed a bug in the
evalfunction, which was incorrectly using thedata_undersampledinstead of thefull_data