-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SUNDIALS NVector constructors #3165
Merged
WeiqunZhang
merged 3 commits into
AMReX-Codes:development
from
sundials-codes:update-nvector-constructors
Mar 14, 2023
Merged
Update SUNDIALS NVector constructors #3165
WeiqunZhang
merged 3 commits into
AMReX-Codes:development
from
sundials-codes:update-nvector-constructors
Mar 14, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dwillcox
approved these changes
Mar 10, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I have confirmed this works with SUNDIALS 6.5.0 when compiling and running the AMReX tutorial with the SUNDIALS integration wrapper and the ERK integrator.
jmsexton03
approved these changes
Mar 10, 2023
WeiqunZhang
added a commit
to WeiqunZhang/amrex
that referenced
this pull request
Mar 28, 2023
* SENSIE CI: run on ubuntu-20.04. 18.04 is deprecated. (AMReX-Codes#3131) * PODVector: Add some stream synchronization (AMReX-Codes#3130) The correctness of some functions depends on the asynchronous behavior of memcpy (especially when pageable host memory is involved). It seems that that is not well defined in the SYCL standard. So for safety, stream synchronization is added to a few places. * Fix AMReX-Codes#3105: missing include (AMReX-Codes#3134) * Initialize TinyProfiler earlier only for memory (AMReX-Codes#3137) Initialize TinyProfiler earlier only for memory and finalize it later only for memory as well. * Functional: Less -> Minimum, Greater -> Maximum (AMReX-Codes#3138) The functional classes for returning the lesser and greater of two arguments have been misnamed, because std::less and std::greater are for comparison and they return bool. To avoid confusion, they have been renamed. * Test: Reinit AMReX (AMReX-Codes#3141) ## Summary Run a test that inits and finalizes AMReX multiple times. ```console cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA -DAMReX_ENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo cmake --build build -j 8 ctest --test-dir build --verbose -R Reinit ``` ```console build/Tests/Reinit cuda-gdb -ex r -ex bt --args ./Test_Reinit inputs ``` ## Additional background This is often done in pytest tests, where the same process is reused for multiple tests. Needed for pyAMReX, ImpactX, WarpX et al. This also demonstrates a bug for GPU testing. * Add a compute face-centered velocity gradient in MLEBTensor (AMReX-Codes#3133) * Fix bug in Device::Finalize: Forgot to clear streams (AMReX-Codes#3143) ## Summary We forgot to clear the non-owning gpu_stream vector. They still held the old streams after we destroyed the streams in Device::Finalize. This bug affects codes that call amrex::Initialize and amrex::Finalize repeatedly. Note that we have the non-owning gpu stream vector for OMP CPU thread safety. The bug was introduced in AMReX-Codes#1064. ## Additional background AMReX-Codes#3141 ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * Fix nl test (AMReX-Codes#3142) This fixes a bug introduced in PR AMReX-Codes#3060 related to building neighbor lists between particles of different types. I also made it so that CI tests would have caught this bug. Closes AMReX-Codes#3127 * Fix FabArray::shift (AMReX-Codes#3147) ## Summary The cached metadata used by fused GPU kernel launches needs to be updated in FabArray::shift because the BoxArray has changed. ## Additional background Close AMReX-Codes#3145 * AmrLevel::FillPatch: Abort if grids are not properly nested for GPU or EB (AMReX-Codes#3098) * FillPatcher: Update interpolation in time (AMReX-Codes#3106) The previous implementation was susceptible to roundoff errors. It was observed in 7 levels PeleC runs that the code could fail because the times in StateData were out of sync. The updated logic is closer to that in the FillPatch function and the StateData class. * Runtime parameters for Signal handling controls (AMReX-Codes#3125) Add amrex.handle_sigsegv, handle_sigint, handle_sigabrt, and handle_sigfpe for more fine controls. Disable sigsegv handling for certain Intel GPUs because it's incompatible with the managed memory. * Adjust subgroup size for Intel GPUs (AMReX-Codes#3139) We can now use 32 as the subgroup size for Intel GPUs. This improves performance in most situations. * SYCL: Fix the order of destruction (AMReX-Codes#3146) Although we have not observed any issues, we should destroy the SYCL context, device, and queues in the reverse order of their construction. * GNU Make: Add support for Intel LLVM compiler in CPU builds (AMReX-Codes#3126) icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. * Add MPI support to GpuComplex (AMReX-Codes#3148) This creates a specialization of Mpi_typemap for GpuComplex<T>. Needed by ECP-WarpX/WarpX#3618 ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * DPCPP -> SYCL (AMReX-Codes#3140) The name dpcpp has been deprecated by Intel. This PR replaces DPCPP with SYCL. For backward compatibility, we have kept the AMREX_USE_DPCPP macro. However, build options like AMReX_DPCPP_AOT for cmake and DPCPP_AOT for GNU Make have been replaced by AMReX_SYCL_AOT and SYCL_AOT. * Fix bug in async IO for particles (AMReX-Codes#3150) Without this fix the following will fail: > amrex/Tests/Particles/NeighborParticles(development)$ make -j8 TINY_PROFILE=TRUE BL_NO_FORT=TRUE USE_MPI=FALSE DEBUG=TRUE > $ ./main3d.gnu.DEBUG.TPROF.CUDA.ex inputs amrex.async_out=1 amrex.the_arena_is_managed=0 > Initializing CUDA... > CUDA initialized with 1 device. > AMReX (23.02-21-g858e72dcc651) initialized > 0::Assertion `index >= 0 && index < NReal + static_cast<int>(m_runtime_rdata.size())' failed, file "../../..//Src/Particle/AMReX_StructOfArrays.H", line 43 !!! > SIGABRT The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * Fix case where a norm is 1e-15 but not technically zero (AMReX-Codes#3149) * SUNDIALS: Replace SUNDIALS math macros with std math functions (AMReX-Codes#3151) This fixes xsdk-project/xsdk-issues#211. * Add static_assert to Mpi_typemap<GpuComplex> (AMReX-Codes#3154) * CI: Add sundials (AMReX-Codes#3153) Also fixes warnings. * SYCL: Fix AOT (AMReX-Codes#3155) This reverts part of AMReX-Codes#3123 because the AOT argument is a compile option. Also we must provide AMReX_INTEL_ARCH when SYCL AOT is on. * GNU Make: make it an error if intel gpu arch for aot is unknown (AMReX-Codes#3157) This is a follow-up on AMReX-Codes#3155 to make it consistent with CMake, and also because `-device *` does not actually work on the testbed. * SYCL AOT: Fix warning (AMReX-Codes#3159) Finally set the proper flags without generating warnings about unused compiler arguments. * SYCL Device Info (AMReX-Codes#3161) Print out more SYCL device information if available. Currently, free memory and max memory band width are not available yet. * gitignore: Editors & IDEs (AMReX-Codes#3163) ## Summary Add a few typical editor and IDE project directories to `.gitignore`. ## Additional background ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * amrex::Math: No longer need to use sycl math functions (AMReX-Codes#3164) We introduced `amrex::Math::floor`, etc. because their std versions were not supported in device code by the Intel compiler at that time. Otherwise, the user would have to call either sycl::floor or std::floor with the help of macros. This is no longer the case. Now we can switch to use the std versions. The amrex::Math versions are kept for backward compatibility. The particle code is not updated because it's undergoing some large changes. * SYCL: Optimization of small reduction (AMReX-Codes#3167) For small reductions, it's faster to do the second pass on CPU. The change is for SYCL only. * New functions for getting data in cell or line. (AMReX-Codes#3166) Add get_cell_data and get_line_data to get the data in a cell or a line, respectively. Note that we already had get_slice_data. ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * Add Make.local-pre to gitignore (AMReX-Codes#3171) Given that `Make.local` files are in the `.gitignore`, this should probably be there too. * SYCL: Group and subgroup size (AMReX-Codes#3172) * Use [[sycl::reqd_work_group_size]] when we know the group size at compile time. I have not seen perfromance difference, but it should not hurt. * Replace [[intel::reqd_sub_group_size]] with [[sycl::reqd_sub_group_size]] since the Intel extension is in the standard now. * TableData: Relax the static assertion (AMReX-Codes#3174) Relax the requirement of T to trivially copyable (so that htod memcpy will work) and trivially destructible. After this change, TableData<GpuComplex<Real>> should work. * CMake: Parallel HDF5 Detection (AMReX-Codes#3170) ## Summary It turns out the words in the settings summary of HDF5 came from the words used when HDF5 was configured with CMake. The issue is `h5pcc -showconfig` may show `Parallel HDF5: ON`, but the current versions of CMake uses `Parallel HDF5: yes` to detect Parallel HDF5. This commit adds a more reliable way of detecting Parallel HDF5. ## Additional background - spack/spack#35546 - HDFGroup/hdf5#2488 - https://gitlab.kitware.com/cmake/cmake/-/merge_requests/8234 ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * Hypre setup and solve profile (AMReX-Codes#3168) I've wrapped the Hypre setup and solve functions so they can be more easily profiled. Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov> * GNU Make: Allow choice of Intel compiler variant (AMReX-Codes#3175) ## Summary This allows the user to specify `COMP = intel-classic` or `COMP = intel-llvm` if they wish to use a specific one. ## Additional background Note that the existing behaviour is maintained if `COMP = intel` (i.e. intel-llvm is chosen if the `icpx` command exists and intel-classic is chosen otherwise following AMReX-Codes#3126). However, the `AMREX_FCOMP`, `AMREX_CCOMP` and `lowercase_comp` makefile variables are changed to one of `intel-llvm` or `intel-classic` depending on which is used. This further allows the user to use information about the used variant in their `Make.local`. For example they may wish to set something like ``` CXX = mpiicpc -cxx=icpx ``` in the case `AMREX_CCOMP = intel-llvm`, `USE_MPI = TRUE` and they are using Intel MPI. ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [x] include documentation in the code and/or rst files, if appropriate * GNU Make: Allow customizing tmp_build_dir (AMReX-Codes#3177) ## Summary This allows the user to change the default temporary build directory to something other than `tmp_build_dir`. ## Additional background This would allow for users to build multiple AMReX applications using the same build directory and thus reduce compile times without having to use `ccache`. Of course, there might be a small risk in terms of users potentially accidentally mixing build configurations hence I would suggest not advertising this feature or at least advising caution. * fix recently introduced tmp_build_dir build bug (AMReX-Codes#3179) the make variable was renamed from TmpBuildDir to TMP_BUILD_DIR breaking application codes. This adds an alias to the old make variable name so code compile again. addresses bug in AMReX-Codes#3177 closes AMReX-Codes#3178 * SYCL: Fix AMReX-Codes#3172 (AMReX-Codes#3181) This may not be our bug. But using sycl::reqd_work_group_size does not work for the 2d range cases. So we have to remove that. * Update CHANGES for 23.03 (AMReX-Codes#3183) * Fix race condition with TinyProfiler for memory (AMReX-Codes#3169) Fix AMReX-Codes#3156 by giving every Arena its own map for profiling. * SYCL: Tweak AMReX-Codes#3164 and switch to multi_ptr for sincos (AMReX-Codes#3185) * Use std::isnan and std::isinf instead of ::isnan and ::isinf for HIP and CUDA. * Add sycl::isinf and sycl::isfinite back to amrex::Math because Intel compiler's aggressive fp mode makes std::isinf and std::isfinite always return false in release build. * Use sycl::isinf and sycl::isnan in amrex::isnan and amrex::isinf when SYCL is on for the same reason as above. * Fix call to sycl::sincos. According to the SYCL standard the pointer should be a multi_ptr instead of a raw pointer. * ParticleContainer: Resize runtime components (AMReX-Codes#3186) Add two new functions ResizeRuntimeRealComp and ResizeRuntimeIntComp to ParticelContainer. * Clang-Tidy (AMReX-Codes#3176) Add a configuration file .clang-tidy for clang-tidy. Add support to it in CMake. It can be enabled by `-DAMReX_CLANG_TIDY=ON`. Fix many warnings raised by clang-tidy. Most of them are fixed by `clang-tidy --fix-errors`. To bypass clang-tidy, some unfixable codes have been moved to `*.nolint.H`. A number of issues have been found and fixed. * There was use-after-move in testing OpenMP thread safety for communication. * The MultiFab tags were not handled correctly during vector resize. * Parentheses were missing in the definition of AMREX_REAL_LOWEST. * ReduceToPlane (AMReX-Codes#3187) * Add a function template for reducing MultiFab data to a plane. * Add KeyValuePair as an alias to ValLocPair. * Add more versions of MPI reduce functions for KeyValuePair/ValLocPair. * Fix an issue in AMReX-Codes#3176 (AMReX-Codes#3188) We need to add another version of SetTag now, otherwise it breaks WarpX. * Clang-Tidy II (AMReX-Codes#3190) This is the second batch of changes fixing clang-tidy warnings. Add CMake option AMReX_CLANG_TIDY_WERROR to update warnings to errors. Use clang-tidy in CI: GNU@8.4 C++17 Release [lib]. * Update SUNDIALS NVector constructors (AMReX-Codes#3165) ## Summary Update MultiFab NVector wrappers to accept a sundials context object. ## Additional background Profiling (and in the future error handling) in sundials requires objects to be create with a non-NULL context. This PR updates the MultiFab NVector wrappers with an optional input for the sundials context that defaults to the context created at initialization for the corresponding OpenMP thread. * Sorting for faster current deposition (AMReX-Codes#3198) ## Summary For current deposition in codes such as HiPACE++ and WarpX, it is mush better to sort by `x -> y -> z -> ppc` instead of `ppc -> x -> y -> z`. ## Additional background To use this, pass in {0,0,0} (sort by cell), {-1,-1,-1} (sort by node) or {-2, -2, -2} (sort by cell and node) as a bin_size into SortParticlesByBin. ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * Add assert that we are not inside threaded region when InitRandom is called. (AMReX-Codes#3204) * Clang-Tidy III (AMReX-Codes#3197) This is the third batch of changes fixing clang-tidy warnings. Most of the warnings are in EB code. Use Clang 14 in one of the Clang CI tests that used to use Clang 7. * Clang-Tidy IV (AMReX-Codes#3200) This is the fourth batch of changes fixing clang-tidy warnings. Most of them are in particle code. * Fix AMReX-Codes#3200 (AMReX-Codes#3206) It was a mistake to delete the move assignment operator of ParticleIDWrapper and ParticleCPUWrapper. * Fix issues in ParticleContainerBase (AMReX-Codes#3208) * The move constructor and assignment operator were not safe, because the m_gdb member could be the address of m_gdb_object. After the move that can no longer be true. This is fixed by making m_gdb_object a unique_ptr. Moving a unique_ptr does not change its pointer inside. * Fix SetParticleBoxArray, SetParticleDistributionMapping and SetParticleGeometry. The issue was these functions took a reference, which could be aliasing what was inside the m_gdb_object. ParGDB has a move assignment operator that could render the reference pointing to an invalid object. So now these functions take by value. This was not an issue before AMReX-Codes#3200, because the move assignment of ParGDB did copy instead of move. * Add Jean's name to the development team (AMReX-Codes#3209) * fix typo in debugging docs (AMReX-Codes#3210) ## Summary ## Additional background ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate * Clang-Tidy V (AMReX-Codes#3201) This is the fifth batch of changes fixing clang-tidy warnings. The changes are most related to single precision, 1D, 2D, and OMP. Enable clang-tidy in GCC and Clang CIs. Switch to Ubuntu 22.04 and GCC 12 in one test. * Clang-Tidy VI (AMReX-Codes#3202) * Update GNU Make. One can use `USE_CLANG_TIDY=TRUE` to enable clang-tidy check. For example, to enable clang-tidy check, specify the clang-tidy command (if it's the default clang-tidy), and treat warnings as errors, one can run make USE_CLANG_TIDY=TRUE CLANG_TIDY=clang-tidy-12 CLANG_TIDY_WARN_ERROR=TRUE By default amrex/.clang-tidy is used as the config file. One can change it with CLANG_TIDY_CONFIG_FILE=my_clang-tidy-conf. If you want to let clang-tidy fix errors for you, you can run make USE_CLANG_TIDY=TRUE CLANG_TIDY="clang-tidy --fix-errors" To avoid race conditions, the make job should not use `-j` (unless it's `-j1`) when fixing errors. * Add clang-tidy to GNU Make CIs. * Fix warnings in Fortran interface, particle and plotfile tools. * Clang-Tidy VII (AMReX-Codes#3203) Fix warnings in the interfaces for Hypre, PETSc and SUNDIALS. * Warning about CPU core oversubscription (AMReX-Codes#3211) If the user forgets to set OMP_NUM_THREADS, OMP might be oversubscribing CPU cores. If that's the case (to the best of our knowledge) and verbosity is on, we print out a warning. We use `std::thread::hardware_concurrency` to detect the number of cores available. Note that this is only a hint according to the C++ standard. To detect the number of MPI ranks per node, we use OMP_COMM_TYPE_NODE for Open MPI and MPI_COMM_TYPE_SHARED for others. Since MPI_COMM_TYPE_SHARED might group processes across nodes, the return value of `ParallelDescriptor::NProcsPerNode` might also be wrong. Nevertheless, both `std::thread::hardware_concurrency` and `ParallelDescriptor::NProcsPerNode` seem to work well. * SYCL sub-group size (AMReX-Codes#3180) Add option to specify SYCL sub-group size. The default is 32. It can be changed to 16 with `-DAMReX_SYCL_SUB_GROUP_SIZE=16` in CMake and `SYCL_SUB_GROUP_SIZE=16` in GNU Make. * Add build option for Intel SYCL AOT GRF mode (AMReX-Codes#3173) By default, the Intel SYCL compiler uses up to 128 registers. In the large register file mode, it's up to 256 registers. There is also the auto-large mode. For CMake, one can use `-DAMReX_SYCL_AOT_GRF_MODE=[Large|AutoLarge]` to choose the large register file or auto-large mode. AOT must also be enabled with `-DAMReX_SYCL_AOT=TRUE`. For GNU Make, one can use `SYCL_AOT_GRF_MODE=[Large|AutoLarge]`, and it depends on AOT being enabled by `SYCL_AOT=TRUE`. * Add comments to FluxRegister::OverwriteFlux (AMReX-Codes#3216) * Use ccache in Github Actions --------- Co-authored-by: AlexanderSinn <64009254+AlexanderSinn@users.noreply.github.com> Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> Co-authored-by: Lucas Esclapez <13371051+esclapez@users.noreply.github.com> Co-authored-by: Andrew Myers <atmyers2@gmail.com> Co-authored-by: asalmgren <asalmgren@lbl.gov> Co-authored-by: Miren Radia <32646026+mirenradia@users.noreply.github.com> Co-authored-by: PaulMullowney <60452402+PaulMullowney@users.noreply.github.com> Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov> Co-authored-by: Michael Zingale <michael.zingale@stonybrook.edu> Co-authored-by: Nuno Miguel Nobre <nuno.nobre@stfc.ac.uk> Co-authored-by: David Gardner <gardner48@llnl.gov>
guj
pushed a commit
to guj/amrex
that referenced
this pull request
Jul 13, 2023
## Summary Update MultiFab NVector wrappers to accept a sundials context object. ## Additional background Profiling (and in the future error handling) in sundials requires objects to be create with a non-NULL context. This PR updates the MultiFab NVector wrappers with an optional input for the sundials context that defaults to the context created at initialization for the corresponding OpenMP thread.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Update MultiFab NVector wrappers to accept a sundials context object.
Additional background
Profiling (and in the future error handling) in sundials requires objects to be create with a non-NULL context. This PR updates the MultiFab NVector wrappers with an optional input for the sundials context that defaults to the context created at initialization for the corresponding OpenMP thread.
Checklist
The proposed changes: