Description
Upstream issue: elastic/kibana#91570
Affected versions: 7.7.0 - 7.11.2
After a change to the cluster - could by a rolling upgrade or fine-tuning of settings - a formerly running transform reports it is stopped, e.g.:
{
"count" : 1,
"transforms" : [
{
"id" : "t2",
"state" : "stopped",
However trying to delete it, claims it is running:
{
"error" : {
"root_cause" : [
{
"type" : "status_exception",
"reason" : "Cannot delete transform [t2] as the task is running. Stop the task first"
}
],
"type" : "status_exception",
"reason" : "Cannot delete transform [t2] as the task is running. Stop the task first"
},
"status" : 409
}
Trying to delete it with force
times out and trying to start it, claims the task as well.
As a result it is not possible to delete the transform or use it.
Mitigation:
Transform requires a transform node to run on. To verify whether you have a node that can run transform check the output of GET _cat/nodes
:
v.x.y.z 2 99 2 0.68 0.66 1.43 dm * elasticsearch01
...
The above output is not valid, because only data and master node roles are available. You must have at least 1 node that has a t
, e.g.:
...
v.x.y.z 2 99 2 0.68 0.66 1.43 dt * elasticsearch03
...
The above shows a data and transform node. Note, you only need 1 node with a t
aka transform node.
Solution:
Add a transform node to your cluster, see https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
At least 1 node should specify the transform role:
node.roles: [ ..., transform, ... ]
Note: if you specify no roles, you automatically use all roles, see the docs for details.
Fix:
The solution of this problem has 2 aspects:
- operational:
- it must be possible to delete a (running) transform
- stats should report the correct state
- user-experience:
- API's must better handle the case of a missing transform node
- stats warn about no transform nodes / show number of transform nodes
- preview should warn about no transform nodes
- the UI should show the number of transform nodes (and visually indicate if there is none)
- API's must better handle the case of a missing transform node