ROX-30714: track cgroup ID to container ID mapping #83

Molter73 · 2025-09-23T16:40:31Z

Description

This change greatly reduces the amount of data sent in the ringbuffer and the effort the BPF hooks need to retrieve the cgroup of the current process. In exchange for these benefits, we now need to lookup and keep track of the cgroups and container IDs that exist on the system ourselves by iterating over the cgroupsfs.

Checklist

Investigated and inspected CI test results
Updated documentation accordingly

Automated testing

Added unit tests
Added integration tests
Added regression tests

If any of these don't apply, please comment below.

Testing Performed

Manually tested the fact generated events have the expected container ID and added an integration test.

Stringy · 2025-09-26T15:10:47Z

Do we have some numbers on the performance benefits of doing this? e.g. how much memory saved per event, how many fewer events we're dropping etc? have we encountered throughput issues already?

I'm a little skeptical of the file system approach because of the race condition between an event happening and our ability to lookup the cgroup. Presumably it means that if we're unable to resolve the cgroup for the event, then activity might not be appropriately correlated with a deployment and as a result it might not be flagged as a violation?

Molter73 · 2025-09-29T09:39:38Z

Do we have some numbers on the performance benefits of doing this? e.g. how much memory saved per event, how many fewer events we're dropping etc? have we encountered throughput issues already?

@erthalion is looking into performance testing, maybe we will have some numbers in the near future.

I'm a little skeptical of the file system approach because of the race condition between an event happening and our ability to lookup the cgroup. Presumably it means that if we're unable to resolve the cgroup for the event, then activity might not be appropriately correlated with a deployment and as a result it might not be flagged as a violation?

Yeah, I'm not super keen on this approach for this same reason. I think in theory, creating and destroying containers takes some time and because cgroupfs is a virtual file system (i.e everything is computed from memory when read like /proc) we should be relatively fast at finding new container ID mappings, but I don't know if we can guarantee 100% hit rate.

On the other hand, the current approach only supports 16 levels of cgroup nestings, anything beyond that will have a 100% miss ratio and it is heavier on the processes that attempt to write or create files, so 🤷🏻‍♂️ .

Maybe we can leave both implementations in and make which we use toggleable via configuration?

This change greatly reduces the amount of data sent in the ringbuffer and the effort the BPF hooks need to retrieve the cgroup of the current process. In exchange for these benefits, we now need to lookup and keep track of the cgroups and container IDs that exist on the system ourselves by iterating over the cgroupsfs. TODO: Add integration tests with containers.

Because capturing information for a process running in a container is not fully possible from `/proc`, some information is passed to the Process constructor directly to overwrite it. This should be fine though, because we are using a specific container that should not have major changes on how it executes the command we are expecting.

JoukoVirtanen · 2025-11-08T18:20:47Z

fact/src/cgroup.rs

+#[derive(Debug)]
+struct ContainerIdEntry {
+    container_id: Option<Arc<String>>,
+    pub last_seen: SystemTime,


Why make this public?

JoukoVirtanen · 2025-11-08T19:20:00Z

fact/src/cgroup.rs

+        })
+    }
+
+    fn walk_cgroupfs(path: &PathBuf, map: &mut ContainerIdMap, parent_id: Option<Arc<String>>) {


Maybe rename to something like walk_cgroupfs_and_add_to_map. The current name doesn't explain what is done while we walk through the cgroupfs.

JoukoVirtanen · 2025-11-08T19:22:20Z

fact/src/cgroup.rs

+        ContainerIdCache(Arc::new(Mutex::new(map)))
+    }
+
+    fn update_unlocked(map: &mut ContainerIdMap) {


Maybe rename to update_no_lock or update_without_lock. The name implies that this updates something that is unlocked, rather than doing an update without locking.

JoukoVirtanen · 2025-11-08T19:24:18Z

fact/src/cgroup.rs

+    async fn prune(&mut self) {
+        let now = SystemTime::now();
+        self.0.lock().await.retain(|_, value| {
+            now.duration_since(value.last_seen).unwrap() < Duration::from_secs(30)


Maybe set a variable to 30 seconds and maybe pass that variable to this method to make this more configurable. Also maybe change the name to something like prun_old.

JoukoVirtanen · 2025-11-08T19:29:05Z

fact/src/cgroup.rs

+
+    pub fn start_worker(mut self, mut running: Receiver<bool>) -> JoinHandle<()> {
+        tokio::spawn(async move {
+            let mut update_interval = time::interval(time::Duration::from_secs(30));


Maybe don't hard code 30 here.

Molter73 force-pushed the mauro/ROX-30714-use-cgroup-id branch from 97f35d7 to 8c3fb0c Compare September 24, 2025 08:31

Molter73 marked this pull request as ready for review September 25, 2025 09:44

Molter73 force-pushed the mauro/ROX-30714-use-cgroup-id branch 2 times, most recently from ae53ab4 to 5df4d9e Compare September 30, 2025 10:43

Molter73 added 2 commits October 2, 2025 17:28

Molter73 force-pushed the mauro/ROX-30714-use-cgroup-id branch from 5df4d9e to 13456b9 Compare October 2, 2025 15:29

JoukoVirtanen reviewed Nov 8, 2025

View reviewed changes

Molter73 marked this pull request as draft November 10, 2025 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ROX-30714: track cgroup ID to container ID mapping #83

ROX-30714: track cgroup ID to container ID mapping #83

Uh oh!

Molter73 commented Sep 23, 2025 •

edited

Loading

Uh oh!

Stringy commented Sep 26, 2025

Uh oh!

Molter73 commented Sep 29, 2025

Uh oh!

JoukoVirtanen Nov 8, 2025

Uh oh!

JoukoVirtanen Nov 8, 2025

Uh oh!

JoukoVirtanen Nov 8, 2025

Uh oh!

JoukoVirtanen Nov 8, 2025 •

edited

Loading

Uh oh!

JoukoVirtanen Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ROX-30714: track cgroup ID to container ID mapping #83

Are you sure you want to change the base?

ROX-30714: track cgroup ID to container ID mapping #83

Uh oh!

Conversation

Molter73 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Testing Performed

Uh oh!

Stringy commented Sep 26, 2025

Uh oh!

Molter73 commented Sep 29, 2025

Uh oh!

JoukoVirtanen Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

JoukoVirtanen Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

JoukoVirtanen Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

JoukoVirtanen Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoukoVirtanen Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Molter73 commented Sep 23, 2025 •

edited

Loading

JoukoVirtanen Nov 8, 2025 •

edited

Loading