Skip to content

Comments

new_filename_and_folder_for_outputs#218

Merged
Rick-Methot-NOAA merged 12 commits intomainfrom
output_to_separate_files
Jan 6, 2022
Merged

new_filename_and_folder_for_outputs#218
Rick-Methot-NOAA merged 12 commits intomainfrom
output_to_separate_files

Conversation

@Rick-Methot-NOAA
Copy link
Collaborator

will write to /ssnew and /sso folders if they exist and should write separate report and compreport files in mceval (not tested).

@k-doering-NOAA
Copy link
Contributor

@Rick-Methot-NOAA , this is to address #70. right?

@k-doering-NOAA
Copy link
Contributor

Also, I should have said that the failing run was due to an issue with archiving the exe, which looks like is a gh actions issue, not related to the stock synthesis code.

@k-doering-NOAA
Copy link
Contributor

k-doering-NOAA commented Oct 22, 2021

I tested out this change locally, and it worked a little differently than I expected.

  1. When I made sso and ssnew folders, only a file called data_echo.ss_new was piped to the ssnew folder, and the sso folder only had the comp report and report files (note that the report file was called report.sso instead of Report.sso). I had expected all .sso files to go to the sso folder, and all ss_new files to go to the ssnew folder.
  2. When I didn't create these folders, no data.ss_new file or Report.sso file were created. Instead, these files were data_echo.ss_new and report.sso (note the lowercase).
  3. I haven't tested with mceval yet, either, but this should definitely be done before merging in.

Some potential changes to consider, depending what the scope of this feature is:

  1. Piping all output files to a separate folder or folders. This would be useful for people who then want to delete output all at once, because they could just delete the folder.
  2. Instead of users having to create ssnew and sso folders before, I wonder if they could select this option through a command line option? Perhaps they could pass the option and the names of the subfolders in the call to run SS3 and SS3 could create the folders (if it is possible to do this) for them.
  3. Otherwise, if this functionality is really just intended for use in MCMC, perhaps the code could be written in a way to restrict the scope of this feature (stick with the original idea of Re-running each MCMC saved sample to get a report file in Hake assessment #70 and not try to add additional functionality for users).
  4. Keep the original names for Report.sso and data.ss_new files. If multiple files are piped to the same folder, append something simple to the name, like a number (e.g., data_1.ss_new , data_2.ss_new)

@Rick-Methot-NOAA
Copy link
Collaborator Author

Thanks Kathryn for the feedback. This is definitely a WIP so I am not surprised that it looks incomplete and it is a long way from merging. I did pull request just to get it into the system and should have noted then that it is not expected to run completely yet.

  1. It would be more work to get all .sso files into the one folder, but easy to get all ssnew files there.
    2a. At this time, there is no more data.ss_new. Instead there is data_echo.ssnew, data_expval.ssnew, then 1 to many data_bootNNN.ssnew files. Seeking comment on this change as it was not part of the MCEVAL request but was the part of the code where I experimented with this file renaming capability.
    2b. You expect me to notice what case I used? Thanks for flagging.

  2. yes, needs MCEVAL test soon. Perhaps after next round of commits. I suspect that I may need to change where report.sso is created such that it is within the PROCEDURE loop and not in FINAL section.

  3. yes. related to no.1 above. Just need to make the path naming happen once at top of the model run and then use it globally.

  4. Nice idea. We could put this as a starter file option. Yes, the folders could be created by SS3.

  5. Functionality is more general than the original request. The only big change is the breakup of data.ss_new into separate files. I think it is much better this way. Much more transparent to the user; I do find people grabbing the wrong section within an appended data.ss_new. Need team comment on this, especially because it will require work by all the R programs. @kellijohnson-NOAA @chantelwetzel-noaa

  6. definitely will correct the case to Report.sso

@k-doering-NOAA
Copy link
Contributor

Thanks for all the commentary, Rick, it was helpful on getting me up to speed.

As for the data.ss_new split, I'm not sure about it, but I think it is just because I am used to the old way. It would require people to change their workflow, but then again it wouldn't be that hard to deal with. It is easy to accidentally look at the wrong section of data.ss_new (I have done this many times when just scrolling through the text files), so I see the advantage. I'd be curious to hear @kellijohnson-NOAA and @chantelwetzel-noaa 's thoughts as well.

@Rick-Methot-NOAA
Copy link
Collaborator Author

The protocol is now working for all ssnew and all sso files.
SS3 only writes to these folders if the folders exist.

I imagine a significant augmentation would allow user to input the name of the output folder in starter.sso, then SS3 would create that folder at runtime if it did not exist.
With that capability, the user could modify the input conditions, run with storage to a new, custom-named folder each time, thus getting in the ssnew files a record of the changed run conditions.
I'll stop tweaking this now, until we can test with mceval.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Kelli, Chantel and Ian - could I get you opinion on what to do with the data.ss_new while I work to get report for each mceval working.
options are:
a. status quo (Rick hates it)
b. status quo but to a sub-folder if subfolder exists
c. separate files for data_echo.ss_new, data_expval.ss_new, the all bootstraps appended in 3rd file
d. all in separate files

Right now the branch does (b) and (d)

@kellijohnson-NOAA @chantelwetzel-noaa @iantaylor-NOAA

@kellijohnson-NOAA
Copy link
Contributor

kellijohnson-NOAA commented Nov 10, 2021

(a) - lots of workflows depend on it
I know, that's why I am asking
(b) - I think the overhead of changing everyone's simulation code is not worth it for this option. What is the difference if the file is in the main directory versus a subdirectory, the structure is still the same? Are you trying to get at the point that the output file is not the same as the input file where the other ss_new files are?
intention is simply to provide clarity with all .sso files to one subfolder and all ssnew to another, but only if those subdirectories already exist, all of this is a precursor to formatting of the requested report files for each mceval run which I do not think should be appended into one file
(c) - I would change the extension of anything that is not the echoed data to something different than ss_new. So my proposal here would be data.ss_new, data_expected.ss, data_boot.ss where the .ss could be changed to something different for the latter two files.
so your preference would be to keep all the boot files in one appended file? and others in separate files. I like that.
(d) - has anyone checked the overhead of writing 100 bootstrap files versus 1 file with 100 data sets?
I think it would be trivial difference. Most of the time is in the bootstrap multinomial draws

@iantaylor-NOAA
Copy link
Contributor

I think (d) would be my ideal, but in the simulations I've worked on (long ago), I wanted to the expected recdev time series for each bootstrap so never wrote more than 1 per model run in which case (c) and (d) are essentially the same.
For (c) and (d), I like the filenames proposed by @kellijohnson-NOAA in the comment above #218 (comment).

I know that past workflows have all used the status-quo approach but don't have a good sense of how many of those are likely to ever be repeated. Adapting to any of these options also shouldn't be very hard, so I'm not sure that backward compatibility is important if it adds complexity to the code.

As for run time, I would guess that the bootstrap and file-writing process is much faster than the model estimation step that is surely part of almost all bootstrap analyses so it's not worth trying to hard to optimize the run time.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Rick-Methot-NOAA commented Nov 10, 2021 via email

@chantelwetzel-noaa
Copy link

chantelwetzel-noaa commented Nov 10, 2021 via email

@k-doering-NOAA
Copy link
Contributor

We will let this one sit for a bit, as we don't want to change the subfolder and file structure in the main branch yet. However, we plan to merge in before the 3.30.19 release.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Kelli, @kellijohnson-NOAA
this branch now produces a separate report and compreport file for each mceval. Files have names like:
Report_mce_25.sso for the 25th mceval iteration. These reports will be found in the runfile directory unless the user creates a ./sso subdirectory. Please take a look and let me know if this meets the needs of the hake assessment. The branch also implements the previously discussed changes to data.ss_new. These go to subfolder ./ssnew if it exists and have names of:
data_echo.ss_new, data_expval.ss, and data_boot_1.ss through data_boot_N.ss.
Rick

@kellijohnson-NOAA
Copy link
Contributor

I am testing this out today.

@kellijohnson-NOAA
Copy link
Contributor

@Rick-Methot-NOAA is the functionality intentional to have posteriors.sso be a blank file? Also, as written the system works when I run it with standard MCMC but @cgrandin was having trouble when using @Cole-Monnahan-NOAA's adnuts. Chris is looking into this more as will I.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Rick-Methot-NOAA commented Dec 14, 2021 via email

@Rick-Methot-NOAA
Copy link
Collaborator Author

@kellijohnson-NOAA code is now writing correctly to posteriors.sso. I do not know how to investigate the adnuts issue.

@kellijohnson-NOAA
Copy link
Contributor

Great. I will distribute the new executable to the team.

I checked with Chris and he said that it ended up being fine with adnuts. I think he panicked because the files are not written until the end. Sorry for the false alarm there.

@Rick-Methot-NOAA
Copy link
Collaborator Author

@k-doering-NOAA Please merge this into main after you are able to adjust the workflow to deal with the new filenames.

@k-doering-NOAA
Copy link
Contributor

Ok, sounds good, Rick. I'm working on this now; hope to finish by end of the day monday at the latest!

@cgrandin
Copy link

Hi all,

Thanks for getting this done and so quickly!
One little thing though is that the posteriors.sso, deriver_posteriors.sso, etc files are located in the sso folder. This may break some other folks code if they try to run with an sso folder and have current code that reads those files from the parent mcmc folder. Would it be possible to move those files back to the parent directory of sso?

Thanks

@Rick-Methot-NOAA
Copy link
Collaborator Author

Rick-Methot-NOAA commented Dec 19, 2021 via email

@k-doering-NOAA k-doering-NOAA linked an issue Dec 20, 2021 that may be closed by this pull request
3 tasks
@k-doering-NOAA
Copy link
Contributor

k-doering-NOAA commented Dec 20, 2021

Some observations on this PR:

  • Tests are fixed in the main branch, so when this gets merged in, the checks should work again. I tested by making a new branch from this one, then pulling in the main branch and running the tests. They were passing, as indicated by the blue arrow of the last commit (https://github.com/nmfs-stock-synthesis/stock-synthesis/tree/output_to_separate_files_check)
  • I tested the code locally and manually again to see what happens under various folder scenarios. I learned that if a subfolder called sso exists, then sso files are written there; if a folder called ssnew exists, then the .ss_new filew are written there.
  • The new data files are called data_echo.ss_new, data_expval.ss, and dat_boot_x.ss . I know that Kelli liked the expval and boot not to have ss_new extensions, but I wonder if it is then confusing that the filse do not have the .ss_newextension but live in the folder labeled ssnew anyway? I personally think it is less confusing to have the files with the ss_new extension since they are (sometimes) written to the ssnew folder.
  • A picky thing about bootstrap file names: the first is called dat_boot_1.ss ; the 10th would be dat_boot_10.ss; we could potentially add leading zeros instead to help order them (e.g., dat_boot_001.ss; dat_boot_010.ss). This would assume most people don't have more than 999 bootstrap files. I'm ok with either, just wanted you to think about which names you prefer, Rick.
  • @kellijohnson-NOAA , when you have the chance, could you list the steps on how to test this out with mcmc (for people who don't really use mcmc in admb, like me? :) ) I tried running, but wasn't successful at getting the files to show up as they should. I'm guessing this is user error on my part
  • (from Kelli):
    • In 2020 we used
        ss3 -mcmc 24000000 -mcsave 10000 -mcseed 36519
        ss3 -mceval 
    
    • In >= 2021 we use {adnuts} to run the model.
    • EDIT by KD later: I realized I needed to use 2 # MCMC output detail in the starter file in order to get CompReport_mce_x.sso and Report_mce_x.sso files

@cgrandin
Copy link

I think it's totally fine, we have everything working the way it is.
The way the file structure is laid out is fine for our team.

Thanks!

@k-doering-NOAA
Copy link
Contributor

@Rick-Methot-NOAA I just noticed that the option 2 for MCMC detail is not yet described in starter.ss_new comments:

2 # MCMC output detail: integer part (0=default; 1=adds obj func components); and decimal part (added to SR_LN(R0) on first call to mcmc)

@Rick-Methot-NOAA
Copy link
Collaborator Author

I will fix that.

@Rick-Methot-NOAA
Copy link
Collaborator Author

In my final push on this issue I revised the numbered file names to have leading digits.
example:
data_boot_005.ss
Report_mce_0025.sso

SS2out<<" end selex output "<<endl;
} // end do report detail
wrote_bigreport++;
SS2out.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rick-Methot-NOAA For some reason this line leads to Dynamic Bzero NOT being written in Report.sso. At the top of the report file it says Y Dynamic_Bzero report:59 but the section is not written.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be in report.sso.
I intentionally turned off the profiling outputs for the MCEVAL outputs but did not intentionally change dynamic Bzero. I will chenck on this also

@kellijohnson-NOAA
Copy link
Contributor

Thanks Rick, I checked and was able to successfully get 1002 samples ✔️.

One last mystery that is only happening in MCMC mode with this executable is in AGE_SELEX report:32 the final year for sel*wt is wrong, it ends up being 2020 instead of 2021. If I run the executable in MLE mode everything is fine.

sel*wt 1 2020 1 1 1 2020_1_sel*wt 0 0.000112571 0.0109621 0.0703895 0.5076 0.280687 0.530073 0.549706 0.558731 0.59539 0.601531 0.651961 0.589622 0.780642 0.813207 0.869964 0.869964 0.869964 0.869964 0.869964 0.869964

@Rick-Methot-NOAA
Copy link
Collaborator Author

are you seeing that discrepancy in report.sso, or in one of the mce reports?
The order of operations is getting quite complex, hence the problem with dynamic Bzero.

@kellijohnson-NOAA
Copy link
Contributor

@Rick-Methot-NOAA the problem with sel*wt... happens in both Report.sso and Report_mce_xxxx.sso if running in with -mcmc. If I just run .\ss.exe then those rows have the correct year.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Kelli,
Alas, I'm not going to solve the missing dynamic Bzero (and other things) today. Some change I made in order to produce report.sso in the procedure section for mceval has messed with the code logic in a way that has disabled things (like dynamic Bzero) that get appended to the report. So, your holiday gift will be late.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Found the solution. Now I can relax

@kellijohnson-NOAA
Copy link
Contributor

Thanks Rick. I will look at this tomorrow. Merry Christmas 🎄 🎁 🎄

@k-doering-NOAA
Copy link
Contributor

Glad to see all these new fixes! Congrats. I am rerunning the checks on this branch : https://github.com/nmfs-stock-synthesis/stock-synthesis/tree/out_to_sep_check, which merged into the main branch so the newer gh action files could be used (which is the reason for the failing tests on this branch).

@kellijohnson-NOAA
Copy link
Contributor

Unfortunately, I am not able to get the latest fix to work with the hake model that I am testing it on. If I run in MLE I get dynamic bzero but there are still no values even though I have it turned on when running in MCMC. As well as the final year for the sel* output is final year - 1 not final year. Rick, is there something special that I should be doing in starter.ss? I tried turning on all output and turning on minimal output but calling dynamic bzero specifically and neither worked.

@k-doering-NOAA
Copy link
Contributor

The simple lorenzen model has a change in results when run with estimation, I think because the se for the forecast catch value decreased see these model results
).

Rick, I think you remember you fixing something to do with forecast SE not too long ago, so I wonder if that is the reason? Perhaps the change is expected?

@Rick-Methot-NOAA
Copy link
Collaborator Author

Kelli - you need only make this change in starter:

2 # MCMC output detail: integer part (0=default; 1=adds obj func components; 2= +write_report_for_each_mceval); and decimal part (added to SR_LN(R0) on first call to mcmc)

@Rick-Methot-NOAA Rick-Methot-NOAA merged commit 6bc19ee into main Jan 6, 2022
@Rick-Methot-NOAA Rick-Methot-NOAA self-assigned this Jan 7, 2022
@k-doering-NOAA k-doering-NOAA deleted the output_to_separate_files branch January 10, 2022 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

6 participants