Move nested CV into a separate file #4

uabua · 2023-07-13T12:59:29Z

I've exported the nested CV to a separate file, but I'm not entirely satisfied with the current structure. The process involves first manipulating the data in the notebook, saving it as a pickle, then running a separate script, and finally bringing the results back to this notebook. Do you think it would be better to have the data processing as a separate script?

Additionally, I've been struggling to calculate memory usage, so I could not do it. About the timing, right now, I'm tracking the time for each model within the outer loop. So, if we have, let's say, 100 iterations, we end up with separate entries in the results dictionary for each model and dataset (including taxonomic levels). However, I'm wondering if it would be more appropriate to have a cumulative time for each model type instead. For example, all the Random Forests together. What do you think?

natasha-dudek

Question 1: I've exported the nested CV to a separate file, but I'm not entirely satisfied with the current structure. The process involves first manipulating the data in the notebook, saving it as a pickle, then running a separate script, and finally bringing the results back to this notebook. Do you think it would be better to have the data processing as a separate script?

Since this is a tutorial, I think it is important to walk through the pre-processing steps in the notebook itself. It is kind of inconvenient for us as the makers of the tutorial, but I don't think there is a good workaround in this situation.

Question 2: Additionally, I've been struggling to calculate memory usage, so I could not do it.

No worries, thanks for looking into it.

Question 3: About the timing, right now, I'm tracking the time for each model within the outer loop. So, if we have, let's say, 100 iterations, we end up with separate entries in the results dictionary for each model and dataset (including taxonomic levels). However, I'm wondering if it would be more appropriate to have a cumulative time for each model type instead. For example, all the Random Forests together. What do you think?

Let's keep them as is. We can always add them up to find the cumulative time for each model type, should we want to calculate that.

Minor change request: I just noticed an oversight of mine from earlier on - please remove BMI from the feature set we use to train the model (since BMI is calculated from weight and height, which are both in there).

Aside from that, looks good.

uabua · 2023-07-13T19:48:58Z

I removed "BMI" from the features, as suggested. 🚀

natasha-dudek

Excellent, thanks!

Move nested CV into a separate file

6d0514d

uabua requested a review from natasha-dudek July 13, 2023 12:59

uabua self-assigned this Jul 13, 2023

natasha-dudek requested changes Jul 13, 2023

View reviewed changes

Remove BMI from features

b8dab8c

uabua requested a review from natasha-dudek July 13, 2023 19:47

natasha-dudek approved these changes Jul 13, 2023

View reviewed changes

uabua merged commit 33ed135 into main Jul 13, 2023

uabua deleted the f/separate_nested_cv branch July 13, 2023 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move nested CV into a separate file #4

Move nested CV into a separate file #4

Uh oh!

uabua commented Jul 13, 2023

Uh oh!

natasha-dudek left a comment

Uh oh!

uabua commented Jul 13, 2023

Uh oh!

natasha-dudek left a comment

Uh oh!

Uh oh!

Move nested CV into a separate file #4

Move nested CV into a separate file #4

Uh oh!

Conversation

uabua commented Jul 13, 2023

Uh oh!

natasha-dudek left a comment

Choose a reason for hiding this comment

Uh oh!

uabua commented Jul 13, 2023

Uh oh!

natasha-dudek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!