Fixed inconsistent results of oracle.get_links
across runs
#196
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @KenjiKamimoto-wustl122
I have observed that the method
oracle.get_links
unfortunately returns different results across runs (celloracle == 0.18.0
). While these differences are not huge (mean jaccard index of 0.9 between different runs), it is important to have a fixed seed to make results reproducible.Even though you correctly use
BaggingRegressor
with a fixed seed, the problem comes upstream since you use sets to store TF gene symbols inoracle.TFdict
. The problem with using sets is that their order is dependent on the current memory hash being used, meaning that at each run their order is going to be slightly different. This makesBaggingRegressor
sample differently event though it uses the same seed all the time. However the solution is very easy, to fix the order of the selected TFs by sorting them alphabetically:With this simple change results are always the same.
Note that to get different results with the previous version you need to restart the kernel/run the script again so that the memory hash is restarted. Running the same code inside the same session in a jupyter lab will yield the same results but not if you restart the notebook.
Hope this is helpful!