Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the proper way to code movement command with RPDO? #585

Closed
WillyTuring opened this issue Jan 13, 2022 · 25 comments
Closed

What's the proper way to code movement command with RPDO? #585

WillyTuring opened this issue Jan 13, 2022 · 25 comments

Comments

@WillyTuring
Copy link

Hi all,
Hope you guys are all safe and doing well.
I have several questions regarding using Soem library and would be really grateful to get any help from here, thanks in advance.
1.Background info
After creating the output and input buffer with ec_config_map() function, I ensured ①RPDO and TPDO are what we chose ; ②the objects mapped to the PDO are correct;③the buffer size is correct(both checked the Obytes and Ibytes), and the buffer address offset is inline with Obytes and Ibytes. And then we casted the output and input buffer to our struct so that we can write data to the buffer.
2.Problems
1.Position command kept repeating
The problem here is that when we write a series of different command position to the output buffer, we were sometimes repeating the same command position multiple times then it went to the next, this is verified by the data we captured in wireshark, i send position command to slave 6 and then read the the third to sixth bytes of the last 16 bytes in wireshark.
It's a little bit counter-intuitive to me, as i thought the purely writing part would be so much faster, since ec_send_processdata() needs to construct the datagram and frame header stuff, so it would make more sense if the data we see from the buffer is overwritten by the program.
But if we use while loop, it works prefectly, whereas i want to get rid of while loop.
2.Any certain rules about writing data to output buffer?
Another weird situation is that if i send command like output_TargetPosition = input_positionValue + movement; it works; but if i create a vector/array beforehand like movement[ ] = input_positionValue and then send command like this : output_TargetPosition = movement[ i ], it doesn't work.
This is a little bit confused to me, because as far as i understand, function receive_processdata() is just one part of send_processdata(), LRD FRMW and LRW are all started from send_processdata(), i think after we created buffer, send_processdata will constantly check the input and output buffer no matter there's real data or not.
So in my opinion, those two ways of sendind command should makes no difference, cause the program is constantly checkin the buffer, you don't need to explicitly use input_positionValue to "tell" them that.
3.The order of config_dc() and config_map() matters?
From what i read, config_map() should be executed before config_dc(), because as both the EtherCat poster and the technology overview indicates: for static DC drift, we might want to do 15000 of frames transfer with FRMW command, but when we use send_processdata to do that we need to have PDO mapping finished.
But in my case, if we move config_dc() after config_map(), it just didn't work great; I don't get it either.
Here two attachments are my code and the data captured from Wireshark.
dataCaptured.zip
main.zip

@ArthurKetels
Copy link
Contributor

Hi Willy,
You've put some densely written text there, and it is sometimes hard to follow your line of thought. Anyway I give it a try.

  1. The program (and SOEM) works as written. Perhaps what you've written in code is not what you intended. Your real-time loop that does the PDO transfer is set at 4ms. From wireshark you can see this is pretty stable. In your non-real time task you loop every 20ms and throw in some printf's that are really slow. From the wireshark trace you can see one PDO update every 6 cycles that is 24ms (so in good agreement with the 20+ms of the loop).

My first remark is that you need to tune your network driver. It takes more than 300us for a packet to return to the master. Probably your driver is doing packet coalescing. Use ethtool to figure out and optimize the settings.

My second remark is that in almost all cases it is wise to update the PDO data in the real-time task and not in the standard task as you are currently doing. Control loops and their code are real-time, so they belong in a real-time task.

When you transfer data between two tasks you need to synchronize or protect you data. If you write data in your main task it could get interrupted by the real-time task just in the middle. So then it will send out PDO data that is half new and half old data. That is probably not what you want. SOEM does not do intertask-communication, but you can run your own code or use one of the many libs specially written for this. Of course all of these problems disappear when you run the control code in the same real-time task as the PDO transfer.

  1. Yes there are rules (as written above). But that is not what is causing the effect you observe. In your code you make a buffer and fill it with data.
                int move_[200];
                for(int i = 0; i < 200; i++)
                {
                    move_[i] =  in_ptr[5]->PositionActualValue + 1 + i * 200;
                    cerr << move_[i] << endl;
                }

First remark is that you use different data types for move_ and PositionActualValue. Try to avoid that, although in this case it does not matter. But in general be accurate with types, because automatic conversions by the compiler will byte you sometimes.

Second remark is that the code actually does what you have written (and again not what you intended). You read PositionActualValue from the PDO buffer. But this value is read 200 times is rapid succession (takes less than 1us). In that time the value does not change as the PDO cycle time is 4000us. When you copy the move_ buffer to the TargetPosition time has passed and PositionActualValue has changed. Move_ has not changed. So you will have a very different result when doing output_TargetPosition = input_positionValue + movement.

  1. Yes it does matter where you put config_dc(). I refer to earlier post about this subject where this is explained in great detail. You can make it work in both cases but details matter.

@WillyTuring
Copy link
Author

Hi Arthur,
It’s always feels like a Gift to us to have you here in the Soem community, really grateful for your equally dense yet packed with useful information reply.
1.The 20+ms data updating cycle is indeed shown in Wireshark as you pointed out. But in this regard, I’m actually curious that how did you know that non-real time loop is 20ms per loop in the first place? Did you time it when you constructed it?
2.In your First Remark about taking 300us for a packet to be return to the master. I noticed that this calculation is actually taken from the time gap between LWR command from “private source” and the next LRD command from “master source”.
I always assume those two commands “LRD+FRWM” and “LWR” returned from “Private source” are the return packet from slave to the master, but never get verified by anyone, is my interpretation correct? So that’s why the time gap between “LWR” return back from “Private source” and the next “LRD” from master indicates the time it needs to return a packet from slave to master?
And I also noticed between two PDO cycle send out by the master, the largest chunk of time actually spends on the packet between “LWR” from the master and the next “LRD” from slave return. I used to think this time gap is the time needed for all of the slaves to finish reading the packet which makes sense, if this interpretation is right(which I don’t think so) then another “LWR” return from “Private source” then has to be the time slaves need to “write”.
The reason I don’t think it’s correct is that ①if all of the slaves are reading and adding data to the packet “on the fly”, then how can them separate the “read time” and “write time” as there are two packets(“read” and “write” respectively) return back from slave; ②the length of “LRD” return from “Private source” is still 168 bytes which is actually just a corresponding to the “LRD” length master needs to read, if this “LRD” from “Private source” is indeed indicates “slave is reading” than the length should be 96 bytes ; so I’m not sure how to interpret them correctly, can you maybe elaborate on this structure a little bit?
3.In your Second Remark, you mentioned that it is wise to update the PDO data in the real-time task, this makes sense, but I want to be more clear on this one. In this case, personally I would like to put the updating process just in that thread you created to send out the packet, because if I created another thread to do that without assigning an individual core to that thread, the threading process might also add some time overhead, am I right about this?
Another concern is that even if I update the process data in the same thread as sending out the packet, since I want to use for loop to write data to the buffer, the speed of the loop is always so much faster than PDO cycle time, should I make them in sync? If so do I just add some wait in the loop to match the PDO cycle time or is there any cleverer way of doing this?
4.I can totally feel what you are saying about “automatic data type conversion might byte you sometimes” Completely.
As I learned it the hard way when I was coding the control program, will pay attention to it.
5.About the difference between using “output_Target = input_Position + movement[increment]” and “output_Target = Movement[absolute position]”; I get what you are saying about the input_Position value update cycle.
Quote: “When you copy the move_ buffer to the TargetPosition time has passed and PositionActualValue has changed. Move_ has not changed.”
But what I’m confused is that when I copy the element value in move_[ ] buffer to the output, before any the first value movement, even the cycle gets updated but because we didn’t even move yet, so the value updated from input buffer should remain pretty much the same, right? So why it will make a difference?
But one thing I noticed after you pointed out is that the “LRD” command from master is constantly updating from each PDO cycle but always updates only the least significant byte of the position value(4 bytes), I missed that out before, I’m not sure if that “abnormal” just caused by my code. Because I think it’s normal there are fluctuations between two packets for the same position, but this is clearly not the case in the beginning of the code running.
I wonder why that is ?
6.Yeah, I think one should actually put config_dc() after config_map(), but I don’t know why I just can not get it work if I arrange the order like that. I need to look deeply in this regard.
I know this is another notoriously hated long question, but hope I explain myself clearly so that it at least has some reference value for people later on.
Looking forward to your reply and thanks in advance.
Willy

@ArthurKetels
Copy link
Contributor

  1. From your code:
                /* acyclic loop 5000 x 20ms = 10s */
                for(i = 1; i <= 5000; i++)
                {
                    if(i < 200)
                        out_ptr[5]->TargetPosition = move_[i];
                       ......
                    osal_usleep(20000);
                }

I think the osal_usleep(20000); is the giveaway.

  1. EtherCAT slaves indeed transfer data "on the fly". But there is some latency, simple rule of thumb is 300ns per slave. But SOEM knows it exactly because it is measured as part of the DC set-up code. You can find the exact delay per slave in ec_slave[].pdelay variable after ec_dcconfig().

The total transfer time from master through all slaves and back again to the master is the total slave delay plus the length of the packet At 100Mb/s one byte takes 80ns. Each metre of cable adds 5ns of delay. So in your case a rough estimate is (96 * 80ns + 6 * 300ns + 6 * 1m * 5ns) = 9510ns. This is wire time. Off course what wireshark sees is this plus the time spend in the network driver. In your case 300us - 9.5us = 290us is an awful lot of time to spend in a network driver. Therefore (and from years of experience) I assume there is some interrupt coalescing going on. This is done in many drivers to limit the number of interrupts in your PC. For TCP/IP throughput this does not matter but for real time protocols like EtherCAT this is killing.

This also explains why you would see two packets being send in short succession and then a wait and then the reception of both packets. When the driver receives the first packet back it simply waits. Then the second packets comes in and the driver waits. Only when a timeout value has passed the driver transfers the packets to the network stack. Only then they are seen by wireshark (and get timestamped) and soem.

  1. This is a basic multitasking question, and out of scope for SOEM. You as a programmer decide how to split your program in tasks and how to sync those tasks. Of course using tasks will give the added headache of inter-task communication. So simple programs use a single task (everything is done in OSAL_THREAD_FUNC_RT ecatthread(void *ptr) ), complex programs use multiple tasks and have to deal with data protection and inter-task communication. I suggest you have a read on Zer0MQ. Also remember that it is a bad idea to use printf and other long duration functions in a real-time task. Blocking functions are totally out of the question, you can not use them in real-time tasks.

The clever way of handling timing in a non blocking way is to use state machines in your code. Use the "tick"of your real-time loop to increment the state. Wiki is your friend here.

  1. I have not enough data to give you a sensible answer. A PDO is always updated as a whole. So when you only see change in one of the four bytes then you have coded something wrong.

@WillyTuring
Copy link
Author

Hi Arthur,
This is amazing information.
1.From the master point of view, “LRD+FRMW” is always followed by “LWR”, so what you are saying is let’s assume there’s No Time spend on network driver, and following your calculation, the time gap between the “LWR” packet send out from Master and the next successive “LRD+FRMW” packet returned will be 9.5us?
2.Yes you are right, I noticed that the time gap between in the two packets sending out from master is around 0.16-0.17us; but the time gap between two reception packets is around 16-17us, that is 100 times magnitude, which definitely confirmed your saying “When the driver receives the first packet back it simply waits. Then the second packets comes in and the driver waits. “.
3.So is it the two packets that we see from Wireshark that come from the source “03:01:01(not start with keyword ‘Private’)” just the reception or return-back of the packets we send out without extra meaning themselves?
4.So if I know the slave has its own “buffer space” to store all of the data shown in the packet and execute the movement command in serial order(which I’m not sure if that’s the case), is it better for me to just send the packet as fast as possible, so in this case I need to reduce the cycle time and offset time to make PDO cycle as short as possible while stable, meanwhile adjusting the time cycle of updating the data to the buffer. Of course I need to consider the time on network driver, What do you think of the general idea?

@ArthurKetels
Copy link
Contributor

  1. Correct. Wire time = packet time + slave time + cable time. And transmit and receive channel on TX100 Ethernet are asynchronous. No need to wait sending the next packet until the current packet is received back. When having big EtherCAT projects with lots of slaves and large packets it is not uncommon to have several packets underway.
  2. Yes and no. These are indeed the received packets, the first slave will set the 2nd bit of the MSB of the MAC address. The fake MAC address SOEM uses is 01:01:01:01:01:01, and when it returns it becomes 03:01:01:01:01:01. This is used in SOEM to distinguish both types. This becomes very important when using the redundant configuration where we can use two network ports at the same time. The routing then becomes NIC1tx->slave1->slaven->NIC2rx, and NIC2tx->slaven->slave1->NIC1rx.
  3. This depends on the servo modes the slave supports. Some can, others can not. Have a look at the DS402 specification. Also your servo drive manual will explain all available modes.

@WillyTuring
Copy link
Author

Hi Arthur,
Happy weekend, Thanks for your kind effort on the weekend.
I did a lot of code testing and packets checking today.
I checked the network driver and found out in the setting in parameter “rx-usecs” was set 3 us, I changed to 0, resulted in the frame delay time cut half to around 150us, I think I still need to see what other parameters might affect that.
PS: The cut-half result might not shown in the data captured, cause I did it in different time span.
Apart from this, there are two problems that I just couldn’t figure out.
1.I have some problem of placing IOmap() before config_dc(), and the problem is actually not about DC system clock, it’s about the sync0 config.
Whenever I put IOmap() before config_dc(), slaves reported “losing SM2 event” back(Error is reported from manufacturer’s testing software), and it stopped taking any data from the output buffer, I guess they mean “RPDO data” in that matter.
What I don’t understand is that this “losing SM2 event” still there after I enabled dorun=1, because I thought when the output data started sending cyclically sync0 signal will be able to catch the data and back to normal. And I also checked the system time difference, the drift is within 15ns, so the DC time matched well among them.
But clearly I missed something here, I put two attachments here, one is “IOmap() before config_dc()”, the other one is the reverse. By the way if config_dc() is before IOmap, clearly I can not use send_processdata() to do the dynamic drift, so I tried with “FPWR”, trying to write System time to the local register 0x910, it clearly didn’t work that well, I don’t know why either.

2.The second problem here is the most confusing one, I wrote some quick testing code to send movement command to the output buffer, and checked on the packets, I noticed the packet is actually updating the data I wrote every cycle. But the weird thing is the data just didn’t get execute by the slave.
And this is actually the main reason I’m asking whether the slave has its own buffer to save the data from the packets, I looked the datasheet really carefully and didn’t find any instruction in this matter.
I’m wondering what might possibly cause this happen?
2differentDataCapture.zip
main.zip

@ArthurKetels
Copy link
Contributor

For proper way of getting DC and sync working see:
#487 (comment)
#520

To distribute clocks you have to use FRMW (read-multiple-write) and not FPWR (write). You read the clock from the first reference slave and write it to all others.

For your drive control problem I can not help you without knowing something about your slaves. What is the brand and type? Where can I find the datasheet? How did you configure the slaves? I am no wizard with a crystal ball.

@WillyTuring
Copy link
Author

Hi Arthur,
LOL, you might not be a no wizard with a crystal ball, but you sure are a really skilled angle here.
The motor driver is from China, it’s a servo driver and I will attach the datasheet and the ESI file down there and just a reminder: the English version of datasheet has a lot of unclarity in the instruction because it’s translated by Our Chinese, so if there’s any concern about the expression on datasheet, please just let me know I will go to the Chinese version confirm for you.
In terms of driver configuration and motor parameter initialization on driver, I followed the manufacturer instruction, is there any specific aspect of configuration you want to know about it? I will try to see if I can export a copy of data configuration in our driver.
And I’m now reading about event-driven machine to see what’s the proper measure of implementing the code.
motorDriverDatasheet.zip
ESI-file.zip

@ArthurKetels
Copy link
Contributor

As far I could read from the datasheet the only supported modes under EtherCAT control are CSP, CSV and CST. The other modes are next to useless. As these modes are cyclic synchronous this means there is no buffer in the slave. Your application needs to calculate the correct setpoints for your trajectory and output them to the slave at the exact moment in time they need to be executed.

This is not a disadvantage, you gain the best possible control in exchange for a bit more complicated programming on the master side. My advice is to use CST mode, and run the velocity, and optionally position control loop, on the master side. Only very few applications need the extra bandwidth that is possible by running these on the slave. Rule of thumb, calculate your mechanical time constant, multiply by 5 to have position loop speed, multiply by 5 to get velocity loop speed. So for example if your mechanical time constant is 10hz, the position loop runs at 50hz and the velocity loop at 250hz. Anything more and you are just amplifying noise.

@WillyTuring
Copy link
Author

WillyTuring commented Jan 16, 2022

Hi Arthur,
Good day and can not thank you enough.
I managed to solve the configuration order issue, the IOmap before config_dc should always be the way to go.
I have a couple of questions though Again.
1.Yes you are right about it only supports cyclic control mode, but how did you draw your conclusion about the buffer based on this?
2.Why do you suggest CST over CSP, cause the latter is actually the most common one in the control system, because we actually more incline to have precise position control in our case.
And can you elaborate on this a little bit more "Only very few applications need the extra bandwidth that is possible by running these on the slave"?
3.The mechanical time constant, do you mean the driver execution time Per instruction command?(We will need to add a little bit more time on this base driver execution time considering the mechanical movement )
4.And forgot about another point, which is in current situation, when i enabled dorun and in the operational mode, for the first few PDO data exchange, i found out i didn't the real TPDO data from the slave while the
dataCaptured.zip
TPDO data has been packed to the packet, any thoughts on this one? I captured in Wireshark as well, i put the attachment here.

@ArthurKetels
Copy link
Contributor

  1. This is the definition of "cyclic synchronous" modes in DS402.
  2. Now we are going away from EtherCAT and SOEM. This is about control theory. Clearly out of scope here. A nice short article that explains the various options : https://www.celeramotion.com/ingenia/support/technical-papers/ethercat-operating-modes/
  3. Time constant : https://en.wikipedia.org/wiki/Time_constant All applications with servo drives can be simplified by a mass-spring system. Nothing has infinite low mass and nothing has infinite high stiffness. When you have actual estimates for your application your can derive the mechanical time constant. This time constant will put limits on the feedback gain of your control system. Go above this and your system starts oscillating.
  4. I am not sure I understand your question. When master data is not updated in the slave then it means the slave is not yet in operational mode. There can be quite a few cycles between request for OP and actual OP.

@WillyTuring
Copy link
Author

Hi Arthur,
I'm looking at the materials you pointed out now, for my last question, can you maybe take a look at the Wireshark data i captured, what do i mean when the master started exchange output PDO data in the packet, there are several cycles delay till we can see the input PDO data from packet being updated. So which means after ec_statecheck(0, EC_STATE_OPERATIONAL, 5 * EC_TIMEOUTSTATE); i need to wait a little bit then i can exchange the real PDO data.

@ArthurKetels
Copy link
Contributor

Ahh, now I get what you where asking! Yes indeed the slave does not respond the way it is supposed to. After the slave acknowledges OP state (packet #63384) it takes almost exactly 100ms before the LRD packets return data with a positive workcounter (packet #63497). This is actually wrong for 2 reasons.

  1. When a slave is in pre-OP it shall send TXPDO's, RXPDO's shall be disabled.
  2. In OP mode both TXPDO's and RXPDO's shall be active.
    This behavior is regarded a fail in the EtherCAT standard, and it would not pass the conformance test. Unfortunately some vendors still sell non-conformant slaves. I would ask the vendor for a firmware update that resolves this bug.

@WillyTuring
Copy link
Author

WillyTuring commented Jan 17, 2022

Hi Arthur,
Oh yeah you are right, i actually missed that, indeed when i exchanging 15000 time PDO data in Pre-op mode, RPDO is disabled, that's correct, but TPDO should be enabled. Well do you think they will cooperate on this firmware update issue?
I'm now draw some plots and will pose the jitter side of data to here.
By the way, i don't know why when i use FRMW for stablizing the DC clock in the Pre-op, it has some effect but just didn't behave as good as send_processdata(), FYI i wrote DCtime to 0x910 for every slave.

@WillyTuring
Copy link
Author

Hi Arthur,

Please take a look at those two files, one file recorded the Slave's Clock stabilization in the Pre-op mode, and the Packet Cycle jitter data in the Op mode; another file is the corresponding source code tsee if the way i captured data has any flaws in the first place.

Thanks in advance.
Willy
ClockStablizationData.zip
main.zip

@WillyTuring
Copy link
Author

@ArthurKetels
Hi Arthur, any thoughts on the things i mentioned above?

And apart from that, i found out another very interesting thing about how did my Ethercat driver take/ignore the data from the packets that i really want to share with you, it behaves so wrong yet so amazingly consistent that i must missed something here.

I willy update my discovery and the evidence here when you got time to read them. Please let me know when you have time to continue the discussion.

Have a good one!

@ArthurKetels
Copy link
Contributor

On one side expected behavior, on the other side you have be precise in your measurements. Plus minus 100us jitter is to be expected without taking special precautions. You can read on this effects by other users here, improvements of 10x are no exception with a bit of tweaking.

W.r.t. your code. You add an extra packet in the rt-loop:
ec_FPRD(ec_slave[1].configadr, ECT_REG_DCSYSTIME, sizeof(temp2_), &temp2_, EC_TIMEOUTRXM);
This is redundant and actually less accurate than you could be. It is much simpler to take each loop the difference of current ec_DCtime and the previous ec_DCtime. ec_DCtime also takes a read from ETC_REG_DCTIME but does this in the same packet as the PDO. The control loop tries to keep the RT-loop synchronous with ec_DCtime, so that is what you want to track.

The other thing is that you have to watch out using C++ constructs like vector<>. They can have unbounded latency because they invoke memalloc. If you want to use them at least preallocate with .reserve().

As a last remark, for cycle rates of 4ms your jitter of +-100us is probably good enough. Perfect is often the enemy of the good.

About my available time here, it depends. I am interested in good discussions, and I respect efforts made by you and others like you. But I also have a very busy job (running two companies) and can make no guarantees.

@WillyTuring
Copy link
Author

Hi Arthur,
Welcome back, of course i have every appreciation about your help and never have taken your feedback for granted, whenever i saw your reply reminder shown in my email, i always feel surprisingly happy and excited.
I'm like you, i like informative and insightful discussion that look at the things under the hook.
1.You are right about manually reading the ECT_REG_DCSYSTIME,
ec_DCtime also takes a read from ETC_REG_DCTIME but does this in the same packet as the PDO——This is definitely way much neater approach to track.
2.Now in terms of the unconformable slave problem as we saw ①The slave is not Updating TPDO in Pre-op and delaying the update 100ms after slave reached the Op mode, i can sort of deal with that for now by adding some delay after reaching the Op mode until the slave "Stabilized". Meanwhile, i will try to contact the vendor to see what they would respond for that, i asked them once before, but those technical support seems not knowing much about EtherCat configuration.
3.But the real Problem and this is the New finding i mentioned before is that seems like there's always a delay for slave to execute the command send from RPDO.Let me demonstrates my point with two example and data captured.
Do you still remember i mentioned to you that " the driver just didn't execute the command the way we send data."Here's what i did :
(1)In the test(CSP mode) the data we wrote is the absolute position we need motor to move to, it has a fixed 2000 increment between two command rightnext to each other.
(2)①I monitor the status word in RT task for Each Packet, ②i then went to Wireshark wrote down the time stamp when the output data is shown in the packet the first time(#Packet 63546),③and then i went all the way down to the First real data change in PositionActualValue(#Packet 64243), ④it shows there were 174.25 packets were neglected by the driver. ⑤then i went to check the status word, noticed that First 56 value in status word was 0x633 , and the next 57 to 175 exactly was 0x737, and after that, the value was 0x5337. And what common between 0x633 and 0x737 is their 12th bit is 0, while 0x5337 has 1 in 12th bit, which indicates the RPDO data in the first 175-ish packets has indeed being neglected by the driver.
(3)I then changed the increment and the number of packets running in the RT task, What most surprising is that no matter how big or how small this movement increment is, and no matter how many frame cycles i ran on the thread(i did 15 sets of different data setup, change both the increment and the PDO exchanging cycle).
The result is that the first 175 packets always being neglected by the driver, no matter i did 5000 times of PDO exchange or 500 times(770 to be exact).
PS: by increment i mean output_TartgetPosition = input_ActualPosition +/- increment.
datacaptured.zip
main.zip
And this is only the first part of the story.

@WillyTuring
Copy link
Author

Here it comes the second part.
If the "stabilization time needed is fixed", i can still take it, but the biggest problem now here is even after this "stabilization period" which i used delay for simple test here, seems like the slave always takes a few frame cycles to finish the command execution. Here's what i did in this test:
①So we wrote CSP setup command at packet 63522, and it took effect at packet 63527, at packet 63527, the status word is 0x0638
②meanwhile we wrote error clear command at packet 63526, and it took effect at packet 63539, the status word is 0x0660.
③then we wrote control word 0x6 at packet 63542(63539 + 4), and it took effect at packet 63551(- 63542 = 9), the status word is 0x0631.
④then we wrote control word 0x7 at packet 63554(63551 + 4), and it took effect at packet 63563(- 63554 = 9), the status word is 0x0633.
⑤then we wrote control word 0xf at packet 63566(63563 + 4), and it took effect at packet 63767(- 63566 = 201), the status word is 0x0737.
PS: the status word value at packet 63763 suddenly jumped to 0x0333.
I know in i made some mistakes in the guarded condition in RT task, the first condition of checking Operation enable is wrong, so i didn't set current position as target.
One interesting note is that due to my mistake, instead of slave went to error, casue "the default target is 0x00", the gap is too large to do interpolation for driver, the slave moves back and forth, i always recored the data in Excel so that you can take a look.
Here are some of my concerns and confusions
(1)I intended to code it in real event-driven way and pack the current sleep in RT task as a time-out event as well, but if the slave always takes uncertain amount of time to successfully execute a event, then i will have to use guard condition inside the state, this could easily result to spagetti code.
(2)I remember you said if the driver "only supports cyclic mode", it might not have internal buffer to keep the data from packets. But i've noticed from the datasheet, the shortest communication period it supports is 250us, let's say if the master has this real-time capability, I really don't think the driver can execute the motion command in 4kHz rate, so if there's no buffer then the data will get lost.
(3)I somehow think without any Hard-evidence to back me up at the moment though, the cycle we set is just a Sampling rate(In TPDO side), which means you can never expect the Actual position of your slave in the next LRD cycle is exactly the same as the value you wrote in the current LWR cycle, it should always be equal to less than the value you send because it's sampling rate related, if that's the case then there should be an internal buffer from slave side.
Correct me if i'm wrong cause i indeed don't have hard-evidence to back it up. Here's the files that support my previous observations though.
movementData.zip
dataCaptured.zip
main.zip

@ArthurKetels
Copy link
Contributor

It is very difficult to follow your tests. It is easier for me to understand when you make graphs of your measured data. And I do not really enjoy working through thousands of lines of wireshark captures. This is not about low level EtherCAT protocol where wireshark is certainly handy.

What I propose is that you do an impulse test in CST mode. Position control is always a bit difficult to interpret, torque is much easier.
The procedure:

  1. Configure slave for CST (cyclic - synchronous - torque).
  2. Fix the motor output shaft so it can not rotate.
  3. Make a square wave setpoint for torque alternating between +10% and -10% torque (of rated torque). Interval 100ms.
  4. Collect data in a buffer, ec_DCtime, torque setpoint, actual measured torque, actual encoder position.
  5. Collect samples at PDO interval for at least 4 cycles.
  6. Stop test and save buffer in a file (preferably CSV format).
  7. Post file here and perhaps have a look yourself with KST2 plotting tool (recommended)

@WillyTuring
Copy link
Author

Hi Arthur,
Thanks for your patient, it's definitely my negligence of posting my issue like that, i'm sorry, i shouldn't have done it in that way for sure with plain boring text.

Yes you are right about the difficulty of interpret CSP mode data. And after i dug deepper into it i now gradually realized why you suggested me to go with CST rather than CSP, because that way you have direct control on acceleration, and with that acceleration you can easily conclude whether the position command you send in this cycle will be reached by motor in next cycle.

Dumb I am, i totally neglected the physical limit of the motor, so now what i'm going to do is i will still run another several rounds of test in CSP mode to see if position-not-reached issue is actually due to the physical limit, if it is then i will then head to the CST test.

The procedures you suggested there are very clear and reasoning, but since all of the motors are assembled on the mechanical housing, and they weigh over 260kg, so i think choose one slave and running them back and forth will have the effect, right?

Meanwhile i will try to write the torque controller for my position control, will get back to you as soon as it gets done.

Again, thanks very much.

@WillyTuring
Copy link
Author

@ArthurKetels
Hi Arthur, i have a quick question, why is the reading i get from 0x60f4 following error actual value different from what i observed by reading the target position(0x607A) in Previous cycle and then subtracted with the current actual position(0x6064) in This cycle? Am i missing something here, the problem is like for one of the specific slave, it always take 4.5-ish cycle times to execute my target position command, no matter how small the position increment, and when i check with that, i always do the calculation of the current motor speed to see if it's possible to reach the target position or not in one cycle 4ms time, put it simply for exampld a 2000 pulse increment command with current motor speed, i'm 100% sure it will finish moving in say 1ms, but it just takes slave 4.5 cycle * 4ms to finish the movement command, Why is that? Here's the relevant data.
respondData.zip

@ArthurKetels
Copy link
Contributor

It is impossible for me to say something useful regarding your question. What you observe is what you observe. Why this is so it is difficult to answer without internal knowledge of the firmware. It is rarely so that a vendor will intentionally delay actions in a control loop. Mostly they are a side effect of design decisions.

Some general information about servo control. The only thing physically controlled is the torque of the motor. The resulting velocity and position are a side effect of that torque acting on resistance and inertia of the load. Velocity is controlled by placing a secondary control loop on top of the torque controller. Position in turn is controlled by a control loop on top of the velocity controller. This is all well known.

There is a problem however when you are changing velocity in discrete time steps (as in CSV mode). How should the controller know how much torque it should apply at the end of the time step? There are an infinite number of solutions. Still the controller has to pick one. What if the controller assumes the end velocity should be steady state (no change in velocity after reaching the target velocity) but the user increases the new target setpoint (acceleration)? Then the applied torque would be too low and jumps in torque will happen. The opposite happens when requesting deceleration.

The CSP mode (position control) will make it even worse. Should the controller aim for zero velocity after reaching the target setpoint? And should the torque be zero too? It is clear that the controller will never make the correct decisions when it does not know all parameters of the trajectory. And CSV and CSP modes simply do not provide enough information. That is why I do not recommend them. They seem nice on the surface, but get you into trouble very quickly.

Now, how could vendors of servo drives try to get a reasonable solution to the above problems? The simplest solution is to try to figure out what the user wants in the future. Or better, wait a couple of cycles before calculating a trajectory. If you wait one cycle you can do linear interpolation, if you wait two you can co quadratic, etc. So for example your servo drive can do 4 cycles delay to figure out the trajectory you want to run. Your motor will still move synchronous to your commanded setpoints but with latency.

Is there a way to avoid this lack of information problem? Well yes, with CST. The servo drive has no ambiguity anymore and can follow your setpoints without delay. Your application calculates the trajectory and knows the instantaneous velocity and position at any moment in time. From there you can control the optimal torque and send it to the drive.

The above comes with one mayor caveat, is torque the lowest level of control? Actually it is not. For any smooth movement we also need to control jerk. And this means even CST is not the optimum control mode. We would like to control the derivative of torque too (jerk). Luckily the inertia of many practical servo implementations will filter out the step like changes in torque. This is true as long the cyclic update rate of torque is much higher than the mechanical inertia. A nice benefit of jerk control is that it will also reduce mechanical vibrations in your application by a large amount.

High end servo controls that need high dynamic range not only employ jerk but also snap control.
References:
https://en.wikipedia.org/wiki/Jerk_(physics)
https://en.wikipedia.org/wiki/Fourth,_fifth,_and_sixth_derivatives_of_position

@WillyTuring
Copy link
Author

Hi Arthur,
Thanks for such an informative response, what you said about the cycle delay and the cons of using CSP and CSV as a control mode absolute hit the nail on the head, i'm 100% agree with you.

I'm sorry this is a little deviated from the gist of Soem now, but since you are probably the best one i know online who knows what's he talking about control in general, i will certainly keep update in here to see where it ends.

Be well and stay safe, catch up with you next time.

@ArthurKetels
Copy link
Contributor

Happy to be of service. But I now close this thread. You can open an other one when you have new information to share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants