Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process data inconsistency between refresh_processes_specifics and refresh_process_specifics #881

Closed
daladim opened this issue Nov 29, 2022 · 13 comments · Fixed by #1017
Closed

Comments

@daladim
Copy link
Contributor

daladim commented Nov 29, 2022

Thanks for all your work on sysinfo!

I have found a way to get inconsistent data (or at least inconsistent process names) regarding a given process:

  • system.refresh_processes_specifics() (without any PID given) uses NtQuerySystemInformation and uses SYSTEM_PROCESS_INFORMATION.ImageName as their process names
  • system.refresh_process_specifics(pid) uses OpenProcess/NtQueryInformationProcess/EnumProcessModulesEx/GetModuleBaseNameW

These two code paths may end up getting different results.
See this minimal reproducer:

use sysinfo::{Pid, PidExt, System, SystemExt, ProcessExt};

fn main() {
    let mut system = System::default();
    system.refresh_processes_specifics(Default::default());
    for (pid, process) in system.processes() {
        let proc_name = process.name();
        if proc_name.contains("msiexec") {
            println!("  * {}\t{}\t{}\t{:?}", pid,
                proc_name,
                get_specific_proc_name(process.pid().as_u32()),
                process.cmd()
            );
        }
    }
}

fn get_specific_proc_name(pid: u32) -> String {
    let s_pid = Pid::from_u32(pid);
    let mut system = System::default();
    system.refresh_process_specifics(s_pid, Default::default());
    let proc = system.process(s_pid).unwrap();
    proc.name().to_string()
}

This can display

  * 1512        msiexec.exe     msiexec.exe     ["C:\\Windows\\System32\\msiexec.exe", "/i", "P:\\webex\\meetings.msi"]
  * 4576        msiexec.exe     MsiExec.exe     ["C:\\Windows\\syswow64\\MsiExec.exe", "-Embedding", "CF3EB87457D3F802DE535DC782B0A689", "C"]
  * 4876        msiexec.exe     msiexec.exe     ["C:\\Windows\\system32\\msiexec.exe", "/V"]

Mind the lower/uppercase inconsistency for process 4576 (which may be due to the erroneous command-line for a file that is actually named a lowercase C:\\Windows\\syswow64\\msiexec.exe)

This is not a critical issue, because:

  • Windows is not supposed to be case-sensitive for process names (I suppose?)
  • sysinfo documentation discourages against re-creating several System instances. Re-using the same instance would probably detect it knows this PID already and avoid the whole OpenProcess/NtQueryInformationProcess/EnumProcessModulesEx/GetModuleBaseNameW
  • it does not look this happens often (I had a hard time re-producing it, as the easiest reproducer I made involves running a 32-bit MSI on a 64-bit machine.

But still, having two different code paths to get the same data may look like something to avoid (if ever that would be possible).

What's your opinion on this?
I may try to fix it myself in case you'd like me to (although I'm not familiar (yet?) with sysinfo).

@GuillaumeGomez
Copy link
Owner

First, thanks for the issue and the very complete explanations. Both refresh_process and refresh_processes should gather coherent data. So if you're motivated to write a fix, it's very welcome!

@daladim
Copy link
Contributor Author

daladim commented Nov 29, 2022

OK, I'll try to have a look at it.

Which path do you think would make most sense?

  • using NtQuerySystemInformation to only get the list of PIDs, then using the OpenProcess/NtQueryInformationProcess/EnumProcessModulesEx/GetModuleBaseNameW route for each PID,
  • or doing it the other way round, and use an NT API to retrieve a SYSTEM_PROCESS_INFORMATION for a single PID?

I haven't read at Microsoft's documentation yet, maybe the second solution is not possible at all.
And maybe the first one would be too CPU-intensive?
What do you think?

@GuillaumeGomez
Copy link
Owner

I'd favour the solution which is the less CPU intensive as it is already quite big (a lot more than I'd liked it to be).

@daladim
Copy link
Contributor Author

daladim commented Dec 5, 2022

Hm, it turns out the OpenProcess/NtQueryInformationProcess/EnumProcessModulesEx/GetModuleBaseNameW way is not as reliable, as it sometimes even fails at retrieving a process name.

Here is the list of all the processes currently running on my machine where both process names mismatch:

  PID   NT API name     Win32 API name  cmdline
* 4     System                          []
* 10204 csrss.exe                       []
* 220   Registry                        []
* 2384  Memory Compression              []
* 2536  svchost.exe                     []
* 3388  MsMpEng.exe                     []
* 4900  dllhost.exe     DllHost.exe     ["C:\\Windows\\system32\\DllHost.exe", "/Processid:{973D20D7-562D-44B9-B70B-5A0F49CCDF3F}"]
* 5020  SecurityHealthService.exe       []
* 5148  NisSrv.exe                      []
* 568   smss.exe                        []
* 676   csrss.exe                       []
* 7376  svchost.exe                     []
* 748   wininit.exe                     []
* 756   csrss.exe                       []
* 7680  SgrmBroker.exe                  []
* 892   services.exe                    []
* 9572  explorer.exe    Explorer.EXE    ["C:\\Windows\\Explorer.EXE"]
* 9584  vctip.exe       VCTIP.EXE       ["C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.33.31629\\bin\\HostX64\\x64\\VCTIP.EXE"]

So, I'll probably use NT APIs as much as I can (well, in case I can find a suitable one that gives information for a single process)

@GuillaumeGomez
Copy link
Owner

Something important to be noted: windows files are case insensitive. So it should be the same whether or not you have capital letters in it. At this point it's just display.

@daladim
Copy link
Contributor Author

daladim commented Dec 5, 2022

That's right (that's one of the notes I wrote in my initial post as well).

However, having variable process names could cause trouble when trying to match names (that is how I discovered this bug). I am now working around by comparing case-insentively, but having a consistent output from sysinfo would be better anyway.

@GuillaumeGomez
Copy link
Owner

Absolutely! It's a shame that windows API provide non-coherent names...

@daladim
Copy link
Contributor Author

daladim commented Jan 3, 2023

Summary of what we currently have

  • a way to batch-list processes (refresh_processes_specifics). It uses NtQuerySystemInformation, which, in one call, returns a list of complete SYSTEM_PROCESS_INFORMATION objects. That's nice, complete and exact.
  • a way to fetch a single process (refresh_process_specifics). It uses OpenProcess, then various APIs for various data that we want to collect: GetModuleBaseNameW, GetProcessTimes, NtQueryInformationProcess with various request types, etc. This returns:
    • incorrect data (case can be modified, that was the reason I opened this issue)
    • incomplete data (which I discovered later on, which may be worse than the original issue) :
      • even run as admin, some process names are not retrieved: processes such as services.exe or some (but not all) svchost.exe have an empty string as proc.name()
      • processes have 0 as proc.memory and proc.virtual_memory.

So, we're using inconsistent APIs.

What other APIs are there?

I looked up at possible Microsoft APIs to list and get infos about processes

For single process retrieval:

  • I could not find any other way that the one you are using, so I have no idea how to improve it.

There are other options for batch retrieval, but the current NtQuerySystemInformation works well, so we have no reason to change:

What are possible solutions?

Since we cannot improve data obtained for a single process, we'll have to do it the other way round (list all processes and filter out the only one that interests us).

  • Using CreateToolhelp32Snapshot gives handles, so this won't give any benefit over the current situation (that already uses handles for single processes)
  • Using WTSEnumerateProcessesExW. But the WTS_PROCESS_INFO_EXA it gives lack info about virtual memory or parent ID. So I'm not sure that's really interesting to use it.
  • EnumProcesses is basically a trimmed-down (but slower) version of NtQuerySystemInformation
  • use NtQuerySystemInformation this way:
fn refresh_process_specifics(pid) -> Process {
    let procs: Vec<SYSTEM_PROCESS_INFORMATION> = Vec::new();
    NtQuerySystemInformation(SystemProcessInformation, &mut proc);
    procs.filter(|p| p.UniqueProcessId == pid)
}

Advantages:

  • consistency between refresh_processes_specifics and refresh_process_specifics
  • more data is available (we always populate process names, memory, virtual_memory)

Disadvantages: we ask Windows to collect data for all processes and only keep one :-/
Obviously, this is slower. Here is a bench to call refresh_process_specifics(a_given_pid)

test bench_current_implementation           ... bench:     361,260 ns/iter (+/- 75,058)
test bench_using_NtQuerySystemInformation   ... bench:   1,336,271 ns/iter (+/- 742,860)

With all this said (sorry for this long message, I wrote it as I researched, but now you have all the info): how much are you willing to trade performance for correctness?
I have a PR ready to be published on GitHub, just tell me whether or not it is interesting to you.

@GuillaumeGomez
Copy link
Owner

I think performance impact is too important to make this solution viable unfortunately. What I suggest is to simply mention that the process name comparison on windows should be done case insensitively.

@daladim
Copy link
Contributor Author

daladim commented Jan 4, 2023

That could be an easy addition.

The issue is not limited to case unfortunately, as I wrote there is also the "missing data" problem (missing names on some processes, missing memory/virtual_memory on every process).
I'll do an MR to document this

@GuillaumeGomez
Copy link
Owner

There is also an issue of right access. Some system calls require more so I didn't use them. But if you can find a way to get the same information without having a huge performance regression or needing more rights, then I'm all for it.

@LunNova
Copy link
Contributor

LunNova commented Jul 18, 2023

NtQuerySystemInformation can be used to retrieve a process name by PID with class SystemProcessIdInformation which is described here https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/ex/sysinfo/query.htm

@GuillaumeGomez
Copy link
Owner

Thanks for the information! Of course if you're interested into adding it... ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants