Skip to content

Latest commit

 

History

History
52 lines (40 loc) · 2.42 KB

known-issues.md

File metadata and controls

52 lines (40 loc) · 2.42 KB

Known issues

Potential race between Kubernetes scheduler and pool state.

If a cmk isolate process terminates abnormally in a way that prevents releasing the assigned CPU list (e.g. because it was sent the KILL signal), then there is an interval of time between process termination and when cmk reconcile is able to remove the invalid process ID from the CMK configuration directory. During this interval, although the opaque integer resource becomes available, the next invocation of cmk isolate may not be able to safely make an allocation. In this case, cmk isolate should be expected to crash with a nonzero exit status. This appears to the operator as a filed pod launch. The scheduler will try to reschedule the pod. This condition will persist on the affected node until cmk reconcile has a chance to run, at which point it will detect that the saved process ID from the reaped container is no longer valid and free the cores for reuse.

Potential conflict with process ID reuse by the OS kernel.

If a cmk isolate process terminates abnormally in a way that prevents releasing the assigned CPU list (e.g. because it was sent the KILL signal), then there is an interval of time between process termination and when cmk reconcile is able to remove the invalid process ID from the CMK configuration directory. During this interval, if another process is started and the kernel happens to recycle the old PID, then cmk reconcile will not be able to detect the leaked CPU list. This scenario should be very rare in practice.

cmk init flag values for --num-shared-cores and --num-exclusive-cores must be positive integers

This places constratints on the construction of the user container command value. Zero is also unsupported.

The flag value for --interval (used in cmk reconcile and cmk node-report) must be an integer.

Fractional seconds are unsupported.