Skip to content

Conversation

@jlashner
Copy link
Contributor

@jlashner jlashner commented May 1, 2024

This PR changes pysmurf-controller operations that load TODs such that instead of running in the main process, they call out to a subprocess.

Description

This introduces the smurf_subprocess_util module, and a protocol for running functions inside a twisted subprocess in such a way that a general RunCfg can be passed to it, and a general RunResult can be returned. Any operation that involves TOD data loading is modified so that the loading and analysis occurs in a subprocess rather than in the main one.

Motivation and Context

This should hopefully solve the memory leak, which I believe is coming from instances where TOD data is not being released to the system after it leaves scope. Moving TOD loading and analysis into subprocesses instead of the main one should enforce that this memory is returned to the OS when the subprocess exits. This discussion has some links that motivate this solution.

How Has This Been Tested?

I have tested the subprocess protocol locally through the use of the test process (and the new run_test_func operation).
I will need to test all other operations with hardware on SATp1 or SATp3.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@jlashner jlashner changed the title Moves functions that load lots of data to subprocess Moves pysmurf-controller functions that load lots of data to subprocess May 1, 2024
return threads.blockingCallFromThread(
reactor, _run_func_in_subprocess_reactor, cfg)
else:
return _run_func_in_subprocess_reactor(cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool... I believe that if you are in reactor context, then you can't block -- i.e you need to return a Deferred here, which would only be possible if _run_func_in_subprocess_reactor(cfg) returned prot.deferred or some wrapped thing that applies the Runresult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... I changed to run_func_in_subprocess_from_thread and had it only handle the from-thread case

@jlashner
Copy link
Contributor Author

jlashner commented May 2, 2024

This has been tested on satp1 and it works! It seems to have fixed the large memory leak. Notes from my testing is here: https://simonsobs.atlassian.net/wiki/spaces/~55705801070cc775254537ab15f2dcf6702b78/pages/412778522/2024-05-02+pysmurf-controller+subprocessing+tests

Notably, memory returns to base-level after IVs and bias steps, which it did not previously:
image

I want to note that there does still seem to be a memory leak caused by the functions that have not been moved to subprocesses. I did not move the stream data function to a subprocess because it isn't loading TODs, however just the creation of the pysmurf object seems to be leaking memory, at a rate of ~15 MB per call:
image

I think we should probably move all logic involving pysmurf to a subprocess, however I this this PR solves the most pressing issues and largest memory leaks, so I think we should merge this first.

Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad the subprocess route is working! Can you address the unused imports caught by pre-commit: https://results.pre-commit.ci/run/github/186511668/1714677193.715HK9dWRnKVTIpCyOTUcg

@jlashner jlashner requested a review from BrianJKoopman May 2, 2024 20:26
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, a few small comments/questions below.

@jlashner jlashner requested a review from BrianJKoopman May 10, 2024 19:14
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments look good, merge conflicts still need resolving though.

@BrianJKoopman BrianJKoopman self-requested a review May 13, 2024 20:01
@BrianJKoopman BrianJKoopman merged commit b95aed3 into main May 13, 2024
@BrianJKoopman BrianJKoopman deleted the controller-subprocessing branch May 13, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants