Description
openedon Mar 7, 2022
Describe the bug
When concurrently executing configuration commands from jfrog cli, the config file in ~/.jfrog/jfrog-cli.conf.v5 may become corrupted. This causes all jfrog cli commands that are executed later to fail with the error message "[Error] invalid character 't' after top-level value" (See corrupted json below under 'Screenshots')
To Reproduce
The relevant operations are:
jfrog-cli-core/utils/lock/lock.go
Line 41 in a1d9695
jfrog-cli-core/utils/lock/lock.go
Line 55 in a1d9695
jfrog-cli-core/utils/lock/lock.go
Line 151 in a1d9695
The files in the lockfile directory are sorted according to the timestamp embedded in the filename. The oldest file containing a PID that is still running "wins" the lock and may go ahead to change the config file.
A possible cause for the observed behavior is the following sequence of events where two processes P1 and P2 want to access the config file concurrently:
- P1: time.now()
- P2: time.now()
- P2: os.OpenFile()
- P2: fileutils.ListFiles() -> yay I'm the oldest one -> goes ahead to change the config file
- P1: os.OpenFile()
- P1: fileutils.ListFiles() -> yay I'm the oldes... ☠️
Expected behavior
The lockfile mechanism works reliable.
Screenshots
$ cat ~/.jfrog/jfrog-cli.conf.v5
{
"servers": [],
"version": "5"
}toryUrl": "*redacted*",
"user": "*redacted*",
"password": "*redacted*",
"serverId": "*redacted*",
"isDefault": true
}
],
"version": "5"
}
Versions
- JFrog CLI core version: master
- JFrog CLI version (if applicable): 2.12.1
- Artifactory version: 7.33.12
Additional context
We have multiple Azure-Pipeline Agents running on the same self-hosted Ubuntu 20.04.3 machine, executing tasks from the Jfrog-Azure-Devops Plugin in parallel. The plugin uses jfrog-cli internally.
Possible Solution
Use file-creation or file-modification time from the filesystem as the sorting key. Since this might have lower resolution, a collision of identical timestamps has a higher probability. When a process recognizes, that two files have the same timestamp, the process can remove their own file, wait a random amount of time (This is important to prevent two processes going back-and-forth forever), and retry the operation.