Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latch issue and unable to cancel users address space #2963

Open
savaresejt opened this issue Jun 18, 2024 · 3 comments
Open

Latch issue and unable to cancel users address space #2963

savaresejt opened this issue Jun 18, 2024 · 3 comments
Assignees
Labels
bug Something isn't working Research Needed

Comments

@savaresejt
Copy link

savaresejt commented Jun 18, 2024

User reported having issues opening and saving datasets. We noticed that there was an issue with latch contention with xxx and that the user had address spaces that we could not cancel even with force cancel commands in SDSF. Ultimately we had to IPL the system to deal with the latching issues.

This is the latch connection on OMVS from /D OMVS,A=ALL

To Reproduce

We were not able to reproduce the error.

Expected behavior

Even if there is an issue saving the datasets, we would expect that the address spaces being spun up could be canceled by system admins.

Screenshots

Desktop (please complete the following information):

  • OS:
  • Zowe Explorer Version: v2.13.1
  • (Optional) Zowe CLI Version:
  • (Optional) Are you using Secure Credential Store? yes

Additional context

@savaresejt savaresejt added the bug Something isn't working label Jun 18, 2024
Copy link

Thank you for creating a bug report.
We will investigate the bug and evaluate its impact on the product.
If you haven't already, please ensure you have provided steps to reproduce the bug and as much context as possible.

@savaresejt
Copy link
Author

savaresejt commented Sep 26, 2024

We ran into this again

Describing Dataset-is-Catalogued function
  [-] should return true if a dataset exists 32.46s (32.46s|4ms)
   RuntimeException: Command Error:
   z/OSMF REST API Error:
   Rest API failure with HTTP(S) status 500
   rc:       4
   reason:   1
   category: 2
   message:  login: timeout: TsoServerConnection(USER=XXXXX, ASID=0x00ca, QID=0x25840028)
   Error Details:
   HTTP(S) error status "500" received.
   Review request details (resource, base path, credentials, payload) and ensure correctness.
   Protocol:  https
   Host:      XXX
   Port:      XXX
   Base Path:
   Resource:  /zosmf/restfiles/ds?dslevel=SYS3.P.DBA.JCL.OLD
   Request:   GET
   Headers:   [{"Accept-Encoding":"gzip"},{"X-IBM-Max-Items":"0"},{"X-CSRF-ZOSMF-HEADER":true}]
   Payload:   undefined
   at Dataset-is-Catalogued, C:\whatever
   at <ScriptBlock>, C:\whatever

Recovery Instructions
D GRS,C,L

  1. Find the latch number holding up OMVS/BPXOINIT [if you're not sure which one, go to TSO OMVS or SSH and see which you're stuck behind when you attempt to connect].
  2. In SDSF, put a JT next to the address space that is locking up OMVS/BPXOINIT.
  3. FORCE U=,A=,TCB= 

OUTPUT FROM D GRS,C,L

RESPONSE=SYSELMD                                                       
 ISG343I 08.08.35 GRS STATUS 871                                       
 LATCH SET NAME:  SYS.BPX.AP00.PRTB1.PPRA.LSN                          
 CREATOR JOBNAME: OMVS      CREATOR ASID: 0010                         
   LATCH NUMBER:  1                                                    
     REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME 
     USER1      0027  EXCLUSIVE  OWN       00AE41B0   Y   07:04:13.452 
     BPXOINIT   0041  EXCLUSIVE  WAIT      00AFAAE8   Y   07:04:13.450 
     USER1      0027  EXCLUSIVE  WAIT      00AE4800   Y   07:04:13.363 
     USER1      00E9  EXCLUSIVE  WAIT      00AE4800   Y   06:57:38.356 
     USER1      00AB  EXCLUSIVE  WAIT      00AE4800   Y   06:57:36.971 
     SSHD3      00F0  EXCLUSIVE  WAIT      00AD9DC8   Y   01:42:27.413 
     USER3      00F5  EXCLUSIVE  WAIT      00AFB2F8   Y   00:30:39.263 
     PORTMAP    0098  EXCLUSIVE  WAIT      00AF9040   Y   00:03:46.903 
   LATCH NUMBER:  47                                                   
     REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME 
     USER1      00AA  EXCLUSIVE  OWN       00AE41B0   Y   16:42:25.036 
     USER1      00AA  EXCLUSIVE  WAIT      00AE4800   Y   16:27:14.359 
   LATCH NUMBER:  125                                                  
     REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME 
     USER1      0099  EXCLUSIVE  OWN       00AE41B0   Y   15:40:55.436 
     USER1      0099  EXCLUSIVE  WAIT      00AE4800   Y   15:25:55.379 
 LATCH NUMBER:  260                                                    
   REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME   
   USER3      00F5  EXCLUSIVE  OWN       00AFB2F8   Y   00:30:39.263   
   RSED3      0051  EXCLUSIVE  WAIT      00AC80A0   Y   00:30:36.665   
   RSED3      0051  EXCLUSIVE  WAIT      00AC1E88   Y   00:27:39.311   
 LATCH NUMBER:  1505                                                   
   REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME   
   USER2      00D5  EXCLUSIVE  OWN       00AE41B0   Y   -over 24 hrs   
   USER2      00D5  EXCLUSIVE  WAIT      00AE4800   Y   -over 24 hrs   
 LATCH NUMBER:  1654                                                   
   REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME   
   SSHD3      00F0  EXCLUSIVE  OWN       00AD9DC8   Y   01:42:27.413   
   SSHD4      0066  EXCLUSIVE  WAIT      00AFB2F8   Y   01:40:26.298   
   SSHD5      0121  EXCLUSIVE  WAIT      00AFB2F8   Y   00:17:57.800   	 

These commands recovered us

FORCE U=USER1,A=00AA,TCB=AE4800 
FORCE U=USER1,A=0099,TCB=AE4800
FORCE U=USER1,A=0099,TCB=AE4800
FORCE U=USER1,A=0099,TCB=AE4800
FORCE U=USER2,A=00d5,TCB=AE4800

We would still like to figure out the root cause here, this could cause issues if we move zowe up to production and we're getting system latches.

I tried the following test afterwards and was unable to recreate the issue

  • Ran two powershell scripts at the same time writing to the same file with the zowe cli
  • Opened that file in zowe and tried to save it a few hundred times simulateously with the scripts
  • Opened that file in ISPF in edit and tried to save it a few hundred times simulateously with the scripts
  • Opened the file in explorerand tried to save it a few hundred times simulateously with the scripts

@savaresejt
Copy link
Author

This might be a big with z/OSMF. Just spoke with @phaumer in the z/open editor issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Research Needed
Projects
Status: New Issues
Development

No branches or pull requests

3 participants