Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputs.processes fails with "no such process" #2815

Closed
noidi opened this issue May 16, 2017 · 0 comments
Closed

inputs.processes fails with "no such process" #2815

noidi opened this issue May 16, 2017 · 0 comments

Comments

@noidi
Copy link
Contributor

noidi commented May 16, 2017

Bug report

Our Telegraf installations occasionally produce error messages like the following:

E! ERROR in input [inputs.processes]: read /proc/16314/stat: no such process

The error is triggered when a process is terminated while readProcFile (plugins/inputs/system/processes.go) is reading the process's /proc/<pid>/stat file.

System info:

All versions including the current master.

Steps to reproduce:

The bug is triggered by a race condition so it's quite hard to reproduce in Telegraf (we get 10-20 errors per day from 500+ hosts), but here's a little test program that uses a copy of readProcFile to demonstrate the issue.

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "os"
    "os/exec"
)

type reproductionType int

// Copied from plugins/inputs/system/processes.go
func readProcFile(filename string) ([]byte, error) {
    data, err := ioutil.ReadFile(filename)
    if err != nil {
        if os.IsNotExist(err) {
            return nil, nil
        }

        return nil, err
    }

    return data, nil
}

func main() {
    // Start cat with an open stdin pipe to keep the process running.
    cat := exec.Command("cat")
    catStdin, err := cat.StdinPipe()
    if err != nil {
        log.Fatal(err)
    }

    err = cat.Start()
    if err != nil {
        log.Fatal(err)
    }

    // In the background, close cat's stdin to make it terminate and wait for
    // the process to exit.
    isOpen := true
    go func() {
        catStdin.Close()
        cat.Wait()
        isOpen = false
    }()

    // As long as cat hasn't been terminated, keep rereading its stat file
    // from /proc.
    for isOpen {
        _, err = readProcFile(fmt.Sprintf("/proc/%d/stat", cat.Process.Pid))
        if err != nil {
            log.Fatal(err)
        }
    }
}

Running the test program multiple times tends to cause at least one failure:

$ for x in $(seq 10); do echo $x; ./proctest; done
1
2
3
4
5
2017/05/16 14:00:45 read /proc/22710/stat: no such process
6
2017/05/16 14:00:45 read /proc/22716/stat: no such process
7
8
9
10
2017/05/16 14:00:45 read /proc/22740/stat: no such process

Expected behavior:

Telegraf already checks for process termination between listing the /proc/*/stat files and reading them. Process termination between open() and read() should be handled the same way (i.e. without errors).

Actual behavior:

E! ERROR in input [inputs.processes]: read /proc/16314/stat: no such process
noidi pushed a commit to noidi/telegraf that referenced this issue May 16, 2017
danielnelson added a commit that referenced this issue May 17, 2017
danielnelson added a commit that referenced this issue May 19, 2017
vlamug pushed a commit to vlamug/telegraf that referenced this issue May 30, 2017
vlamug pushed a commit to vlamug/telegraf that referenced this issue May 30, 2017
jeichorn pushed a commit to jeichorn/telegraf that referenced this issue Jul 24, 2017
jeichorn pushed a commit to jeichorn/telegraf that referenced this issue Jul 24, 2017
maxunt pushed a commit that referenced this issue Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant