Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong output size in SplineC2ROMPTarget::mw_evaluateVGLandDetRatioGrads #4408

Merged
merged 7 commits into from
Jan 23, 2023

Conversation

ye-luo
Copy link
Contributor

@ye-luo ye-luo commented Jan 23, 2023

Proposed changes

There is bug when using offload real build.
This caused writing to undesired memory locations and wrong numbers (crazy high variance/energy >>1)

What type(s) of changes does this code introduce?

  • Bugfix
  • Testing changes (e.g. new unit/integration/performance tests)

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

epyc-server

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'

Fix SplineC2R. When the requested size is smaller than what can be supplied, only output the requested size.
What is the right way of selecting a subset of orbitals in an already built SplineC2R remains a big question.
The trucation is subject to the sorting of orbitals in C2R which puts all the "complex" orbitals first.
"complex" orbitals produce two real orbitals.
const size_t first_cplx = first / 2;
const size_t last_cplx = omptarget::min(last / 2, orb_size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The min() here is wrong. orb_size refers to output real orbitals.

ValueType* restrict out_dphi_x = out_phi + phi_vgl_stride;
ValueType* restrict out_dphi_y = out_dphi_x + phi_vgl_stride;
ValueType* restrict out_dphi_z = out_dphi_y + phi_vgl_stride;
ValueType* restrict out_d2phi = out_dphi_z + phi_vgl_stride;

const size_t first_real = first_cplx + omptarget::min(nComplexBands_local, first_cplx);
const size_t last_real = last_cplx + omptarget::min(nComplexBands_local, last_cplx);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last_real can be larger than requested_orb_size and caused overflow.

@ye-luo
Copy link
Contributor Author

ye-luo commented Jan 23, 2023

Test this please

@prckent prckent self-requested a review January 23, 2023 14:42
Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ye. We discussed this a bit last week, but so we have a record here: why didn't asan catch this?

@ye-luo
Copy link
Contributor Author

ye-luo commented Jan 23, 2023

Thanks Ye. We discussed this a bit last week, but so we have a record here: why didn't asan catch this?

Because we don't have a test case exposing this bug. The added test triggers asan failure when I backported it to develop.

@ye-luo
Copy link
Contributor Author

ye-luo commented Jan 23, 2023

Test this please

@ye-luo
Copy link
Contributor Author

ye-luo commented Jan 23, 2023

I also noticed that asan runs on the github actions without QMC_DATA. We probably need to update the image to include NiO a4(S1) and a16(S4) h5 files.

Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Coverage would be higher but for #4409 .

We (I) should mention in the release notes that QMC_DATA and these NiO files are encouraged.

@prckent prckent merged commit 2006024 into QMCPACK:develop Jan 23, 2023
@ye-luo ye-luo deleted the fix-wrong-size branch February 13, 2023 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants