Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU level UDP packet counter #4

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

CPU level UDP packet counter #4

wants to merge 10 commits into from

Conversation

ykyohei
Copy link
Collaborator

@ykyohei ykyohei commented Feb 7, 2024

Add CPU level UDP packet counter to the HWP encoder.

To identify whether we are dropping the hwp encoder packet at PRU->CPU level or at network level (UDP packet loss), we add the CPU level packet counter. The counter_index is indexed at the PRU level.

This has not been tested yet. We need to use an updated Encoder Agent together.

Copy link
Collaborator Author

@ykyohei ykyohei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, I have one comment.

@ykyohei
Copy link
Collaborator Author

ykyohei commented Mar 21, 2024

Another comment/question.
It might be fine to have more information, but I'm bit confused by the difference between buffer_reset_counter and packet_counter. They seem almost redundant to me.
Your buffer_reset_counter seems more fundamental, so we can remove the packet_counter?

Then we should have buffer_reset_counter also for the IRIG packet.

@jlashner
Copy link
Collaborator

Ya so the main difference between the counters is that the buffer_reset_counter counts when the PRU has finished writing a packet and starts on the next one, while packet_counter increments when the CPU detects a new packet and sends it over the network.

These are related but we still want to track both of them. The failure mode that I am concerned about is when the CPU does not register the packet from the PRU in time to write it to the network before the PRU overwrites it. This will show up as a dropped index in the buffer_reset_counter, while there is no corresponding dropped index in the packet_counter (meaning the packet was never written to the network). Like in this plot below:
image

I don't think I know enough about how IRIG is handled to know if we need the same kind of book-keeping. Do you know how often the IRIG buffer is swapped / if we're concerned about that dropping?

@ykyohei
Copy link
Collaborator Author

ykyohei commented Mar 21, 2024

Ah okay thanks.
If I understand correctly, reset_buffer_counter is redundant as the 120 jump in counter, but I agree that it is helpful to have reset_buffer_counter.

I was also less careful about IRIG data. Unfortunately, we have jump in IRIG time as well, and I think the rate is higher than in lab. This might be because of the site timing system. Also, this is less concerning because the interpolation of the dropped IRIG time is much simpler, but I think it is definitely helpful to monitor.
Figure 6 (1)

The rate is not negligible. WARNING of deleted wrong irig_time corresponds to this issue.

16774.603: Calculating Angles for obs_1710841702_satp1_1111111 (INFO)
WARNING: sotodlib.core.context: Key "metadata" not present in context.
WARNING: sotodlib.hwp.g3thwp: deleted wrong irig_time, indices: [1199 1842 2410]
WARNING: sotodlib.hwp.g3thwp: Packet drop exists
WARNING: sotodlib.hwp.g3thwp: 6 dropped packets are found, performing fill process
WARNING: sotodlib.hwp.g3thwp: deleted wrong irig_time, indices: [1199]
WARNING: sotodlib.hwp.g3thwp: Packet drop exists
WARNING: sotodlib.hwp.g3thwp: 2 dropped packets are found, performing fill process
17046.032: Calculating Angles for obs_1710848051_satp1_1111111 (INFO)
WARNING: sotodlib.core.context: Key "metadata" not present in context.
WARNING: sotodlib.hwp.g3thwp: Packet drop exists
WARNING: sotodlib.hwp.g3thwp: 1 dropped packets are found, performing fill process
WARNING: sotodlib.hwp.g3thwp: Packet drop exists
WARNING: sotodlib.hwp.g3thwp: 1 dropped packets are found, performing fill process
17159.491: Calculating Angles for obs_1710865890_satp1_1101111 (INFO)
WARNING: sotodlib.core.context: Key "metadata" not present in context.
WARNING: sotodlib.hwp.g3thwp: No HWP data in the specified timestamps.
WARNING: sotodlib.hwp.g3thwp: No HWP data in the specified timestamps.
WARNING: sotodlib.hwp.g3thwp: Offcentering calculation is only available when two encoders are operating. Skipped.
17635.181: Calculating Angles for obs_1710882494_satp1_1101111 (INFO)
WARNING: sotodlib.core.context: Key "metadata" not present in context.
WARNING: sotodlib.hwp.g3thwp: deleted wrong irig_time, indices: [ 643 1239 1826 1831 4022 4027]
WARNING: sotodlib.hwp.g3thwp: cannot find reference points, # of data is less than # of slit
WARNING: sotodlib.hwp.g3thwp: analyzed encoder data is None
WARNING: sotodlib.hwp.g3thwp: deleted wrong irig_time, indices: [ 643 1239 1833 3399]
WARNING: sotodlib.hwp.g3thwp: cannot find reference points, # of data is less than # of slit
WARNING: sotodlib.hwp.g3thwp: analyzed encoder data is None
WARNING: sotodlib.hwp.g3thwp: Offcentering calculation is only available when two encoders are operating. Skipped.
17884.236: Calculating Angles for obs_1710897250_satp1_1111111 (INFO)
WARNING: sotodlib.core.context: Key "metadata" not present in context.
WARNING: sotodlib.hwp.g3thwp: deleted wrong irig_time, indices: [  28 1226]
WARNING: sotodlib.hwp.g3thwp: Packet drop exists
WARNING: sotodlib.hwp.g3thwp: 1 dropped packets are found, performing fill process
WARNING: sotodlib.hwp.g3thwp: deleted wrong irig_time, indices: [624]
WARNING: sotodlib.hwp.g3thwp: Packet drop exists
WARNING: sotodlib.hwp.g3thwp: 1 dropped packets are found, performing fill process

@jlashner
Copy link
Collaborator

Yes... kind of. It's redundant with 120-edge frame drops as long as long as samples can only be dropped in complete packets. I think there are also some race conditions in the code which would mean these counters aren't completely redundant, but I believe those are much more unlikely to be triggered.

@bbixler500
Copy link
Collaborator

I'm fine merging this code on the beaglebone side. However, I believe the agent code branch needs to be updated as well before we do that. Currently, when I look at the hwp_encoder_packet_count branch on socs the code is not set up currently to handle the version, num_samples, and buffer_reset_counter fields.

@jlashner
Copy link
Collaborator

jlashner commented Oct 4, 2024

There is currently a PR here that adds that support: simonsobs/socs#647

I think there are possibly some issues remaining, but I was having some trouble emulating irig data packets locally which is why I ended up putting it on hold. I think it would still be good to get this working though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants