Skip to content

Conversation

@jlashner
Copy link
Contributor

I realize it may not have been clear what I was saying about restructuring the PCU agent to avoid timeout locks since we still don't have any examples, so thought I'd throw it into a branch here for people to see and use if they'd like.

Description

Restructure HWP PCU to use new lockless agent structure. I think this is a easier way to structure the agent, and should solve all of the issues that have to do with lock-acquisition, and generally make the agent more threadsafe.

Motivation and Context

Solves locking issues, and serves as an example for timeout lockless agents

How Has This Been Tested?

Not tested

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@jlashner jlashner mentioned this pull request Dec 19, 2023
6 tasks
Added 'hold' option to process_action function.
@17-sugiyama
Copy link
Contributor

17-sugiyama commented Dec 20, 2023

I added the 'hold' option to the process_action.
The combination of on/off has been changed -- Kyohei found more efficient operation to stop the HWP rotation.

elif action.command == 'hold':
on_channel = [1, 2, 5]
off_channel = [0, 6, 7]

@17-sugiyama
Copy link
Contributor

Many thanks for updating the pcu agent!
I tested the send_command function with the pcu instrument and it worked as expected.
However, _get_and_publish_data does not seem to work. Nothing is published, but there is no error message.

The pcu uses a relay module and I have defined get_status according to the sample code:
https://github.com/numato/samplecode/blob/master/RelayAndGPIOModules/USBRelayAndGPIOModules/python/usbrelay1_2_4_8/relayread.py
This command does not seem to be very stable and often takes a long time.
The problem with _get_and_publish_data may be due to get_status being unstable.
So I recommend to publish the status as the last sent command. I would like to have your opinion.

@jlashner
Copy link
Contributor Author

Hi Junna, no problem!

I do think reading the status from the relays instead of using the last sent command is very important. It will make the data from the agent a lot more trustworthy, especially in cases where:

  • relays are not set correctly after issuing a command
  • the agent is restarted and doesn't know the last sent command.

I suggest we invest some time to understand why the get_status function is not behaving properly.

@17-sugiyama
Copy link
Contributor

I got the importance of reading the status.
I made _get_and_publish_data work by increasing the sampling rate to every 30 sec.
https://github.com/simonsobs/socs/blob/61d971fda05564bd84a90787c813efd79ebe9fe2/socs/agents/hwp_pcu/agent.py#L130C1-L130C35
This is the published status when I set the operation mode as hold --> off --> on_1 --> hold.

  | 2023-12-21 14:25:00 | hold
  | 2023-12-21 14:24:00 | hold
  | 2023-12-21 14:23:00 | hold
  | 2023-12-21 14:22:00 | failed
  | 2023-12-21 14:21:00 | failed
  | 2023-12-21 14:20:00 | failed
  | 2023-12-21 14:19:00 | on_1
  | 2023-12-21 14:18:00 | on_1
  | 2023-12-21 14:17:00 | failed
  | 2023-12-21 14:16:00 | failed
  | 2023-12-21 14:15:00 | off
  | 2023-12-21 14:14:00 | hold

Although it takes a few minutes to obtain the correct status immediately after changing the operating mode, the response becomes stable thereafter.
If you are comfortable with this sample rate, this may be the solution.

Strictly speaking, the new 'hold' operation mode doesn't hold the hwp. It just stops the motor from spinning the hwp.
Status 'failed' is obtained when relay_read apparently failed.
@jlashner
Copy link
Contributor Author

jlashner commented Dec 21, 2023

Thanks Junna, this is much better! Do you think we can log the "response" in the cases where it's failing so we can get a better idea why? Other than that I think this is fine to merge. It may be worth coming back later and figure out how to make get_status more reliable though.

@ykyohei
Copy link
Contributor

ykyohei commented Dec 21, 2023

Actually, I was testting reliable get_status, and I could do it by the following script.
It also takes more time to send command but acceptable, I think.
For some reason, I needed to switch all 8 relays to get_status correctly.

I'm happy to implement more reliable get_status later.
But I would like to use this agent for satp3 observations ASAP.

import serial
import argparse
import time
import numpy as np

patterns = {
    'off':  [0, 0, 0, 0, 0, 0, 0, 0],
    'on_1': [1, 1, 1, 0, 0, 1, 1, 1],
    'on_2': [1, 1, 1, 0, 0, 1, 0, 0],
    'stop': [0, 1, 1, 0, 0, 1, 0, 0],
    'hold': [1, 1, 1, 0, 0, 1, 0, 0],
}

class pcu:
    def __init__(self, port="/dev/ttyACM0"):
        self.PCU = serial.Serial(
            port=port,
            baudrate=19200,
            timeout=1,
            bytesize=serial.EIGHTBITS,
            stopbits=serial.STOPBITS_ONE
        )

    def close(self):
        self.PCU.close()

    def wait(self):
        time.sleep(.1)

    def read(self):
        msg = self.PCU.read_until(b'\n\r').strip().decode()
        out = self.PCU.read_until(b'\n\r').strip().decode()
        return msg, out

    def get_status(self):
        self.PCU.write(str.encode('relay readall\n\r'))
        self.wait()
        out = int(self.read()[1], 16)
        print('pattern', [(out >> i & 1) for i in range(8)])
        for k, v in patterns.items():
            v = np.sum([vv*(2**i) for i, vv in enumerate(v)])
            if out == v:
                print('current state is '+k)
                return
        print('current state is undefined')

    def test(self, command):
        pattern = patterns[command]
        for j, i in enumerate(pattern):
            self.PCU.write(str.encode("relay "+{0:'off', 1:'on'}[i]+' '+str(j)+"\n\r"))
            self.wait()
            print(self.read())
        self.wait()
        print(command + ' is done')

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('command', type=str,)
    args = parser.parse_args()
    pcu = pcu()
    pcu.test(args.command)
    pcu.get_status()
    pcu.close()

@ykyohei
Copy link
Contributor

ykyohei commented Dec 21, 2023

I guess the reason why the current get_status sometimes fails because there is exception for the

response = self.port.read(25)
response = response.decode('utf-8')

I'm not sure if this is the best way, but following method in my script worked as long as I tested.

def read(self)
     msg = self.PCU.read_until(b'\n\r').strip().decode()
     out = self.PCU.read_until(b'\n\r').strip().decode()

self.PCU.write(str.encode('relay readall\n\r'))
out = int(self.read()[1], 16)

@ykyohei ykyohei marked this pull request as ready for review December 22, 2023 03:04
@ykyohei
Copy link
Contributor

ykyohei commented Dec 23, 2023

@jlashner
Can we merge this branch now?
I do understand the importacne of get_status, but I would like to start using this agent in the satp3 observation over the holidays.
Start using this agent is really important step for the observation using HWP now.
I will be carefully watching the satp3 observation until we implement the get_status.

@jlashner
Copy link
Contributor Author

Hi kyohei, yep! Totally fine with merging this now and then figuring out remaining issues after vacation. Good luck!

@jlashner jlashner merged commit 06ddfd4 into main Dec 23, 2023
@jlashner jlashner deleted the pcu_restructure branch December 23, 2023 16:05
@ykyohei ykyohei mentioned this pull request Dec 27, 2023
6 tasks
@ykyohei ykyohei mentioned this pull request Jan 6, 2024
6 tasks
hnakata-JP pushed a commit that referenced this pull request Apr 12, 2024
* PCU agent restructure

* Adds back agg_param frame-length

* Adds inline callback dec

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debugged agent.py

Added 'hold' option to process_action function.

* Updated 'stop' phase definition

* 'hold' is replaced with 'stop'

Strictly speaking, the new 'hold' operation mode doesn't hold the hwp. It just stops the motor from spinning the hwp.

* Update hwp_pcu.py -- added 'failed' to get_status responses

Status 'failed' is obtained when relay_read apparently failed.

* fix typo and docs

* fix typo

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jsugiyama <75608221+17-sugiyama@users.noreply.github.com>
Co-authored-by: ykyohei <38639108+ykyohei@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants