Skip to content

Comments

MV tweedie and D-M for sizefreq#357

Merged
Rick-Methot-NOAA merged 18 commits intomainfrom
MV-Tweedie-ver2
Aug 24, 2022
Merged

MV tweedie and D-M for sizefreq#357
Rick-Methot-NOAA merged 18 commits intomainfrom
MV-Tweedie-ver2

Conversation

@Rick-Methot-NOAA
Copy link
Collaborator

@Rick-Methot-NOAA Rick-Methot-NOAA commented Jul 21, 2022

Concisely (20 words or less) describe the issue

Reorganizes index logic for comp-weighting parameters in anticipation of the 2-parm MV-Tweedie;
adds information to the Fit_Len_Comp, etc. tables in report.sso, so needs r4ss attention
preps for extending D-M and MV-Tweedie to generalized size comp

Please Link issue(s)

still open #266
resolves #365 resolves #185 resolves #66 resolves #360 resolves #362 resolves #219

What tests have been done? Upload any model input files created for testing in a zip file, if possible.

using Andre_tag as testbed. Also all_gen_size_comp modified to have a combo of length, age and sizecomp, and a combo of multinomial and Dirichlet-Multinomial (D-M).

What tests/review still need to be done? Who can do it, and by when is it needed (ideally)?

r4ss needs updating to read the revised Fit_Len, etc. tables

Has any new code been documented?

Yes.
If not, please add documentation before submitting the Pull Request.

  • I have documented any new code added (or no new code was added)

is there an input change for users to Stock Synthesis?

  • Yes, there was an input change

If so, please provide an example of the new inputs needed.

-1 # N sizefreq methods to read (or -1 for expanded options)
2 # N sizefreq methods to read (this is a conditional line invoked by -1 above)
HASH each row below has entry for each sizefreq method 
 25 25 #Sizefreq N bins
 2 2 #Sizetfreq units(1=bio/2=num)
 3 3 #Sizefreq scale(1=kg/2=lbs/3=cm/4=inches)
 1e-07 1e-07 #Sizefreq:  small constant to add to comps
 31 49 #Sizefreq N obs per method
 1 0 #_Comp_Error:  0=multinomial, 1=dirichlet using Theta*n, 2=dirichlet using beta, 3=MV_Tweedie  (new conditional line)
 1 0 #_Comp_Error: index for dirichlet or MV_Tweedie  (new conditional line )

Check which is true. This PR requires:

  • no further changes to r4ss
  • no further changes to the manual
  • no further changes to SSI (the SS3 GUI)
  • no further changes to the stock synthesis change log (new features, bug reports)

Describe any changes in r4ss/SS3 manual/SSI that are needed (if not checked):

r4ss needs to update processing of the "fit_comp" tables

If changes are needed in the change log, please fill in the table here:

Action Topics Type
revise #185 parameter names for D-M output
new #66 D-M logL added for sizefreq data ALL
fix #360 D-M reported sample size size output
fix #362 correct the D-M logL calculation when there are two sexes output
fix #219 compreport.sso now contains bin-specific logL that is consistent with total logL output

Additional information (optional):

Add functions Comp_logL_multinomial and Comp_logL_Dirichlet to SS_miscfxn.tpl.
These are called by SS_objfun to compute lengthcomp, agecomp and sizefreq logL.

Indexing of CompErr is now by number of invoked methods rather than number of parameters because Tweedie takes 2 parameters

Indexing of CompErr for Sizefreq is by SizeFreq method, not by fleet,
and this indexing is appended to the fleet-based indexing for lencomp and agecomp.

Add code for computing comp logL by Tweedie. This is CompErr == 3. Not operational yet. Need to move into a function like Comp_Err_Dirichlet().

Add Dirichlet capability to sizefreq

In preliminary calcs. correct the offset_l and offset_a calculations which were adding a constant twice (for females and again for males).

In readcontrol, update parameter indexing for D-M in sizefreq; revise the CompErr parameter names;
drop "fleet" from CompErr parameter names because more than 1 fleet can use a parameter.

In write report: major updating to the Fit_Len, Fit_Age, and Fit_Size tables. r4ss needs to check for new column names.

Compreport.sso updated such that the reported logL for each bin now sum to the total logL for that observation as reported in the Fit_Len, etc. tables. This is for multinomial and for D-M.

In write_ssnew: fix a problem with the sample size used for Dirichlet method 2 (Beta).

@Rick-Methot-NOAA
Copy link
Collaborator Author

Output format looks like this:
Length_Comp_Fit_Summary
Factor Fleet Recommend_var_adj # N Npos min_Nsamp max_Nsamp mean_Nsamp_in mean_Nsamp_adj mean_Nsamp_DM err_method err_index DM_theta MV_power par1 par2 mean_effN HarMean_effN Curr_Var_Adj Fleet_name
4 2 1 # 62 62 25 25 50 25 24.8394 1 1 148.411 NA ln(DM_theta_1)_Len_P1(2) NA 159.429 82.8041 0.5 cpue2
4 4 1 # 10 10 25 25 50 25 24.8394 1 1 148.411 NA ln(DM_theta_1)_Len_P1(2) NA 209.852 68.6758 0.5 cpue4

Note that both fleets share the same comp_err method and parameters.
The parameter name is created for the first fleet that uses it, so that fleet gets included in the label (2).
This makes display of that same parameter label misleading when shown for the second fleet that uses that comp_err method.
I propose leaving "fleet" off of the label for the comperr parameters.

@k-doering-NOAA
Copy link
Contributor

k-doering-NOAA commented Jul 21, 2022

I checked out why the call-build-ss3-warnings job is failing, and it is just this one additional warning:

      |                         ^~~~~~~~~
ss.cpp: In member function ‘void model_parameters::evaluate_the_objective_function()’:
ss.cpp:27450:36: warning: implicitly-declared ‘dvariable& dvariable::operator=(const dvariable&)’ is deprecated [-Wdeprecated-copy]
27450 |     tweedie_Phi = mfexp(selparm(k1));
      |                                    ^

I think this warning can be safely ignored, so we would just increase the number of warnings expected by 1 (from 83 to 84): https://github.com/nmfs-stock-synthesis/workflows/blob/a62a6c65cd146ceebd3cac4b612cf3d86c0d58d9/.github/workflows/build-ss3-warnings.yml#L64

@Rick-Methot-NOAA Rick-Methot-NOAA changed the title Mv tweedie ver2 MV tweedie and D-M for sizefreq Aug 3, 2022
@Rick-Methot-NOAA
Copy link
Collaborator Author

The one run that failed the test was because there is one fewer warning with new code and that warning was related to a poor search convergence. So new code is fine.

@iantaylor-NOAA
Copy link
Contributor

I'll get to work on reviewing this today.
I can push a change to the warnings test to this branch to make the warning go away.

@Rick-Methot-NOAA
Copy link
Collaborator Author

still need to add a base offset for use of D-M with sizefreq method; then do a comparison test between length-based DM and sizefreq-based DM. Also, last push still had some incorrect effN calcs for method 2 D-M

@iantaylor-NOAA
Copy link
Contributor

@Rick-Methot-NOAA, I'm finally done with other meetings. Should I hold off on review until you've corrected Method 2 effN and added offset or just review what's currently in place?

@Rick-Methot-NOAA
Copy link
Collaborator Author

The attached example replicates all length comp data as sizefreq data, assigned both to use the same D-M parameter. Result is the same logL for both length comp and sizefreq comp.
D-M_test.zip
Still to be done is work on output to the Fit_SizeFreq table. This may require creation of an offset value for each sizefreq observation, as is done for length comp

iantaylor-NOAA
iantaylor-NOAA previously approved these changes Aug 5, 2022
Copy link
Contributor

@iantaylor-NOAA iantaylor-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rick-Methot-NOAA, I looked over all the changes and as far as I can tell things look good although I haven't tried to check the math or compared directly to the equations.

Clearly adding a new likelihood isn't the hard part of this, it's all the variable management and the changes to input/output.

The next steps that I can contribute to would be

  • play around with the the D-M_test.zip model that you attached to the Pull Request so I understand things better
  • implement the necessary changes in a branch of r4ss that could be used in the testing
  • compare parameter estimates for existing models that use DM likelihood like the two hake models in https://github.com/nmfs-stock-synthesis/test-models/tree/main/models. Having the r4ss working again may allow those comparisons to happen automatically via github actions. If models change because of errors in the previous implementation, we should try to document that, but maybe it's just the likelihood values that were wrong, not the MLE estimates.
  • update the documentation to describe these changes
  • [lower priority than the others] add a model to the test set that uses the Tweedie distribution for age or length and/or DM with generalized size comp

Having said all that, I'm on leave next week, so please proceed as you wish and I'll figure out how to contribute when I get back.

@k-doering-NOAA
Copy link
Contributor

@Rick-Methot-NOAA look like the for the run with no est, the two_morph_seas_areas model could not run a second time when using the .ss_new files. maybe the .ss_new file is not being written correctly?

The r4ss job is failing because r4ss is not able to read the output correctly. Maybe @kellijohnson-NOAA can help with this?

The two added warnings are:

ss.cpp: In member function ‘dvariable model_parameters::Comp_logL_multinomial(const double&, const dvector&, const dvar_vector&)’:
ss.cpp:18242:45: warning: implicitly-declared ‘dvariable& dvariable::operator=(const dvariable&)’ is deprecated [-Wdeprecated-copy]
18242 |     logL = - Nsamp * obs_comp * log(exp_comp);
      |                                             ^

ss.cpp:27551:36: warning: implicitly-declared ‘dvariable& dvariable::operator=(const dvariable&)’ is deprecated [-Wdeprecated-copy]
27551 |     tweedie_Phi = mfexp(selparm(k1));
      |                                    ^

These can be safely ignored, I think, as they are coming from admb 13.

Also, doesn't this PR require admb 13? We still aren't using admb 13 on all the github actions yet. I think this is something Ian is planning to update when he gets back.

@Rick-Methot-NOAA
Copy link
Collaborator Author

This PR does not need ADMB 13 yet. All references to Tweedie are placeholders.

I will look again at the two morph run, but I previously determined that the base run was not fully converged and the new run was slightly better.

@Rick-Methot-NOAA
Copy link
Collaborator Author

Rick-Methot-NOAA commented Aug 12, 2022

compare_logL.xlsx
finish refactoring code for compreport.sso to complete the remaining issue #219
lencomp, agecomp and szcomp now use streamlined and nearly identical code sequence.
previously, the logL values produced for D-M samples were incorrectly based on multinomial, but probably were not being used.
with this code conversion, a test case confirms that the sum of the logL for each bin in compreport.sso matches the total logL for that observation as reported in the report.sso tables Fit_Len, Fit_Age, and Fit_Size. Alas, I did not get this finished before Grant retired.

@Rick-Methot-NOAA
Copy link
Collaborator Author

for sure. glad to have your sharp eyes looking this over.

@iantaylor-NOAA iantaylor-NOAA added the Epic: Data weighting improvement Group of issues/user stories related to improving data weighting label Aug 23, 2022
iantaylor-NOAA added a commit to nmfs-ost/ss3-workflows that referenced this pull request Aug 23, 2022
@iantaylor-NOAA
Copy link
Contributor

Changes to r4ss now allow reading the SS3 output associated with this branch (confirmed by the "test-r4ss-with-ss3" workflow with help from @k-doering-NOAA).
The github actions that run Stock Synthesis with and without estimation needed a little tuning up to compare values from models before and after the changes in this Pull Request but @k-doering-NOAA have now sorted that out and the tests are passing when triggered manually for this branch. I'm going to make a trivial commit to this branch which will trigger a re-run of all the tests within this Pull Request to confirm that result. I think the warnings test will still fail but the others should now pass.

@iantaylor-NOAA
Copy link
Contributor

Following up on the fixed tests: now only the warnings test is failing (as expected as discussed in #369), so next steps can be to focus on documentation (maybe in combination with nmfs-ost/ss3-doc#123) and any needed changes to the SSI and change log.

@Rick-Methot-NOAA
Copy link
Collaborator Author

I think if you approve and merge, the change log will automatically get updated

Copy link
Contributor

@iantaylor-NOAA iantaylor-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rick-Methot-NOAA, it seems that the additional commits that occurred after my previous review made the approval no longer apply. The additional changes look good, so this approval should allow you to merge whenever you're ready and take away the "Merging is blocked" message.

@Rick-Methot-NOAA Rick-Methot-NOAA merged commit d9dd61d into main Aug 24, 2022
@Rick-Methot-NOAA Rick-Methot-NOAA added this to the 3.30.20 milestone Sep 15, 2022
@Rick-Methot-NOAA Rick-Methot-NOAA added the change log use for issues that should appear in change log label Sep 16, 2022
e-perl-NOAA pushed a commit to nmfs-ost/ss3-doc that referenced this pull request May 25, 2023
@Rick-Methot-NOAA
Copy link
Collaborator Author

This code revision introduced errors to *.ssnew for sizecomp data. These are now being corrected with issue #541

@e-perl-NOAA e-perl-NOAA deleted the MV-Tweedie-ver2 branch June 6, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change log use for issues that should appear in change log Epic: Data weighting improvement Group of issues/user stories related to improving data weighting

Projects

None yet

4 participants