Skip to content

Runtime error when turning on threading in a CAM simulation #941

@sjsprecious

Description

@sjsprecious

What happened?

I tried to turn on the threading option in a CAM simulation (F2000climo compset, ne30pg3 resolution). I used one compute node on Derecho with 64 MPI tasks and 2 threads per MPI task. It built successfully but I encountered lots of runtime errors (partials of them are listed below):

dec2481.hsn.de.hpc.ucar.edu 16: munmap_chunk(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 43: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 51: munmap_chunk(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 44: munmap_chunk(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 62: munmap_chunk(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 39: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 40: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 45: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 15: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 24: munmap_chunk(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 17: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 59: munmap_chunk(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 42: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 57: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 18: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 27: munmap_chunk(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 32: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 33: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 35: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 38: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 55: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 50: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 54: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 2: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 8: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 10: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 13: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 19: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 23: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 41: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 36: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 12: free(): invalid pointer
dec2481.hsn.de.hpc.ucar.edu 6: forrtl: error (76): Abort trap signal
dec2481.hsn.de.hpc.ucar.edu 6: Image              PC                Routine            Line        Source
dec2481.hsn.de.hpc.ucar.edu 6: libpthread-2.31.s  000014A3E75D48C0  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libc-2.31.so       000014A3E2BEBCBB  gsignal               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libc-2.31.so       000014A3E2BED355  abort                 Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libc-2.31.so       000014A3E2C31AE7  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libc-2.31.so       000014A3E2C39B6A  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libc-2.31.so       000014A3E2C3B614  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: cesm.exe           000000000112BFFB  fvm_consistent_se         163  fvm_consistent_se_cslam.F90
dec2481.hsn.de.hpc.ucar.edu 6: libiomp5.so        000014A3E30F6053  __kmp_invoke_micr     Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libiomp5.so        000014A3E30642F3  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libiomp5.so        000014A3E3063232  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libiomp5.so        000014A3E30F6DC1  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libpthread-2.31.s  000014A3E75C86EA  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 6: libc-2.31.so       000014A3E2CB8A6F  clone                 Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: forrtl: error (76): Abort trap signal
dec2481.hsn.de.hpc.ucar.edu 29: Image              PC                Routine            Line        Source
dec2481.hsn.de.hpc.ucar.edu 29: libpthread-2.31.s  000014C84D0B88C0  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libc-2.31.so       000014C8486CFCBB  gsignal               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libc-2.31.so       000014C8486D1355  abort                 Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libc-2.31.so       000014C848715AE7  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libc-2.31.so       000014C84871DB6A  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libc-2.31.so       000014C84871F614  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: cesm.exe           000000000112BFFB  fvm_consistent_se         163  fvm_consistent_se_cslam.F90
dec2481.hsn.de.hpc.ucar.edu 29: libiomp5.so        000014C848BDA053  __kmp_invoke_micr     Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libiomp5.so        000014C848B482F3  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libiomp5.so        000014C848B47232  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libiomp5.so        000014C848BDADC1  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libpthread-2.31.s  000014C84D0AC6EA  Unknown               Unknown  Unknown
dec2481.hsn.de.hpc.ucar.edu 29: libc-2.31.so       000014C84879CA6F  clone                 Unknown  Unknown

The complete list of errors could be found on Derecho at /glade/derecho/scratch/sunjian/cam6_run/F2000climo.ne30pg3_ne30pg3_mg17.derecho.intel.gpu00_pcols00016_mpi0064_thread002_rrtmgp/run/cesm.log.2648024.desched1.231212-143239.

What are the steps to reproduce the bug?

To reproduce the error on Derecho, you can do:

  • ./create_newcase --case /glade/derecho/scratch/sunjian/cam6/F2000climo.ne30pg3_ne30pg3_mg17.derecho.intel --mach derecho --res ne30pg3_ne30pg3_mg17 --compset F2000climo --compiler intel
  • cd /glade/derecho/scratch/sunjian/cam6/F2000climo.ne30pg3_ne30pg3_mg17.derecho.intel
  • ./xmlchange --file env_mach_pes.xml --id NTASKS --val 64
  • ./xmlchange --file env_mach_pes.xml --id NTHRDS --val 2
  • ./case.setup
  • ./case.build
  • ./case.submit

What CAM tag were you using?

cam6_3_139

What machine were you running CAM on?

CISL machine (e.g. cheyenne)

What compiler were you using?

Intel

Path to a case directory, if applicable

/glade/derecho/scratch/sunjian/cam6/F2000climo.ne30pg3_ne30pg3_mg17.derecho.intel.gpu00_pcols00016_mpi0064_thread002_rrtmgp

Will you be addressing this bug yourself?

No

Extra info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working correctly

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions