Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi mcu homing #3956

Merged
merged 1 commit into from
Aug 28, 2021
Merged

Multi mcu homing #3956

merged 1 commit into from
Aug 28, 2021

Conversation

KevinOConnor
Copy link
Collaborator

This PR adds initial (currently incomplete) support for homing and probing when steppers and their endstops are on different micro-controllers. As an example, this may be useful with "toolhead boards" that have a Z probe attached, but don't directly control the Z stepper motors.

The main challenge with multi-mcu homing is having a reliable way to stop the steppers, even if there is a communication fault during homing. (Otherwise, if the mcu with the endstop is unable to communicate with the mcu controlling the steppers, the steppers may continue to move the carriage past the endstop, potentially causing significant damage.) To account for this issue, the code in this PR uses a system where each micro-controller announces that it is still active every 10ms and requires that it receive a message indicating all other mcus are still active at least every 25ms. Should a mcu not receive that response, it will stop the homing sequence (stepper movement will cease). Thus, even in the result of a communication failure, the steppers should not overshoot by more than 25ms of movement (eg, 0.500mm if homing at 20mm/s).

This PR is a work-in-progress. The current code does "work" in that it will home, but it is currently too inacurate to be used for printing. This is an early release that other developers can look at should they be interested.

Some known issues:

  1. To improve accuracy the code will also need to change how it calculates the final homing position. The current code uses the final stepper position, but that will need to change to the position that the stepper was at when the endstop signal occurred. (With the code on this PR, there could be some small movement after the endstop signal is received, but prior to the propagation of that signal.)
  2. If a multi-stepper axis (eg, stepper_z1) is used and those steppers are on different mcus, then this code does not currently handle that accurately. (The propagation time of the endstop signal may be different for each mcu, and thus the overshoot on each mcu would need to be accounted for.)
  3. The code on this branch may not behave correctly if one were to use multi-mcu homing on the endstops of a delta printer. (Should a communication timeout occur, it may only stop one of the towers instead of stopping all three towers.)
  4. The overshoot during multi-mcu homing is subtle. It may make sense to add a "max overshoot" config parameter to ensure users are aware of the implications of enabling multi-mcu homing.

Despite the above limitations, this is a fairly significant feature addition to Klipper. I think it may enable some very interesting hardware capabilities in the future.

@mattshepcar - fyi.

-Kevin

@KevinOConnor KevinOConnor force-pushed the work-homing-20210217 branch 3 times, most recently from 7023066 to 5b1a2bd Compare February 18, 2021 23:49
@ArkadiuszRaj
Copy link
Contributor

ArkadiuszRaj commented Feb 19, 2021

Hi @KevinOConnor

I have modified Pontus great toolboard tiny design, changing to use STM32G04 mpcu & easy high voltage inductive probe connection. G04 for FD-CAN mainly (expect another PR on stm32g4 support ;) )

During rebuild of one of my Voron I plan to use set of 7 huvud boards:

  • 4 of them installed on Z motors (tmc2209, soft endostop & motor temperature)
  • 2 for A-B motors, same config as Z ones
  • 1 as toolboard (extruder motor, filament stop, z inductive probe, part cooling fan, hotend heater, temp sensor and fan)

Everything connected using something similar to ring network with CAN / FD-CAN and talking to RPi Zero via USBCAN dongle as Pontus suggested (CAN2) for the beginning.

WIll that be a good test bed for this PR?

@KevinOConnor
Copy link
Collaborator Author

During rebuild of one of my Voron I plan to use set of 7 huvud boards:

FYI, that's similar to my current setup. I have a voron 2.4 with 3 huvud boards - one on the toolhead and two controlling the corexy motors. I did purchase additional huvud boards with the intent of converting the Z motors to them, but ultimately decided to use a low-cost skr board for the Z instead (I'm using the skr-mini-mz).

I'm not sure which USB2CAN board you are planning on using, but I would not recommend purchasing the innomaker board. It is reordering packets and it appears the stm32 chip on it has been been locked down so that no firmware fixes are possible.

-Kevin

@ArkadiuszRaj
Copy link
Contributor

ArkadiuszRaj commented Feb 19, 2021

Well, I have discussed that with Pontus and he gave me some insight on that toopis. Citing him:
https://github.com/candle-usb/candleLight_fw is a brilliant firmware for an USB-CAN dongle. Linux have native support for that CAN USB adapter standard. Boards running that can be a bit tricky to find though, there are some on alixpress, cantact is one of them. I modified an old Huvud board and put an STM32F042 and use it as a CAN-USB adapter, it works really well :)

Ultimatelly I can design and order from JLCPCB such a simple dongle having only F042 & FDCAN tranceiver.

@brandonheller
Copy link

Looks like really useful code with some interesting problems to solve, and I'd love to try it out.

One Q: Do you have any sense for the amount of inaccuracy to potentially expect, related to known issue 1 above? Presumably, as long as the time between triggering an endstop and detecting the change is small... but what would that amount potentially be?

Thanks!

@pecirep
Copy link

pecirep commented Feb 23, 2021

amount of inaccuracy to potentially expect

it's timing based, so it depends on your homing speed. assuming up to 25ms, this means 0.25mm at 10mm/s, 0.5mm at 20mm/s...

@WardBenjamin
Copy link

Well, I have discussed that with Pontus and he gave me some insight on that toopis. Citing him:
https://github.com/candle-usb/candleLight_fw is a brilliant firmware for an USB-CAN dongle. Linux have native support for that CAN USB adapter standard. Boards running that can be a bit tricky to find though, there are some on alixpress, cantact is one of them. I modified an old Huvud board and put an STM32F042 and use it as a CAN-USB adapter, it works really well :)

Ultimatelly I can design and order from JLCPCB such a simple dongle having only F042 & FDCAN tranceiver.

Unfortunately, the STM32F042 is out of stock virtually everywhere right now. You will need to use a G0/G4 with CAN and write some custom firmware. Fortunately a decent amount of candleLight is portable.

@brandonheller
Copy link

I've got a Huvud doing all it needs to print, and this would remove 2 wires used for the X endstop. I've added those notes here:
https://www.notion.so/Instructions-Getting-a-Huvud-Toolhead-Working-with-Klipper-611ebb87efba4e22ae3b0641e5267ab3

... but I'm hesitant to try out these branch.
(1) The original description says this is an incomplete PR - I'm not sure if it's from a performance/safety perspective, or from a 'basic functionality one.
(2) Current has a merge conflict.

@KevinOConnor - would you be up to rebase this on master now that the CAN improvements are merged? Thanks for those improvements, BTW. Worked great for me on the first try today. This would be icing on that cake.

@KevinOConnor
Copy link
Collaborator Author

FYI, I rebased this branch. Note though, that accuracy problems are still present - I would not recommend using multi-mcu homing/probing on Z.

-Kevin

@ETE-Design
Copy link

ETE-Design commented Mar 15, 2021

@KevinOConnor Will it be a general problem using multi-mcu for z probing, or is it just for now, and will it be possible to fix in near feature? Really looking forward for this feature :-)

@KevinOConnor
Copy link
Collaborator Author

This branch is still very much a "work in progress" - its intended audience is other developers. I hope to fix all of its deficiencies. I'll only merge it into the master branch when it is ready.

-Kevin

@brandonheller
Copy link

tl;dr: Works For Me.

I rebased atop today's master w/no issues, recompiled for a Huvud and an SKR to get past an endstop format error, restarted to avoid a No buffer space available error for the CAN interface, updated my Klipper config:

[stepper_x]
...
# With no toolhead:
#endstop_pin: PC0
# With toolhead:
endstop_pin: head0:PA1 

... and homing seems to work fine. 2 more wires down on the toolhead. Thanks!

@KevinOConnor
Copy link
Collaborator Author

KevinOConnor commented Apr 5, 2021

I have rebased this branch and made several improvements:

  1. Multi-mcu homing and probing should now be accurate.
  2. Multi-mcu homing behaviour should be okay even for homing on a delta printer. (In the event an endstop has a comms timeout, the homing sequence for the other towers should also now abort within ~250ms.)

This new code should, hopefully, be testable on a number of different printers. However, there are still some caveats with this code:

  1. If one does multi-mcu homing on a multi-stepper axis, and the steppers on that axis are also on separate mcus, then this code will not handle it correctly. (After each home/probe attempt, the steppers may become out of sync.)
  2. The endstop_phases code does not work correctly when using multi-mcu homing.
  3. The mcu code will not compile on the Beaglebone PRU due to code size restrictions on that mcu. Future infrastructure work will be needed on the PRU to free up space on that platform.

This code continues to use a 25ms max timeout when implementing multi-mcu homing. That is, when using multi-mcu homing/probing, it is possible for the carriage to continue moving even after the endstop/probe has signaled. For example, if homing at 10mm/s then the carriage may overshoot by as much as .250mm. This timeout is not configurable, and I'm leaning towards not making it configurable (it's unclear that a user can reasonably choose an alternate setting). The new code will determine the overshoot and (assuming it does not cause a mechanical failure) it will be accounted for - so the overshoot should not negatively impact accuracy.

If you run tests with this new code, please let me know your results (success or failure).

-Kevin

@brandonheller
Copy link

@KevinOConnor I tried out the updated branch, and the code no longer works for me.

With the printer in the middle, X and Y homing work fine. But then at the extent of motion (back right), with endstops already triggered, when Y or Z homing is triggered, I get a 'move out of range' error and motors shut off and make a horrible skipping noise:

Move out of range: 118.037 112.500 0.000 [0.000]

This is a CoreXY w/slight-less-than-120mm square bed (Voron 0). Probably relevant:

`
[stepper_x]
...
position_endstop: 118
position_max: 118

[stepper_y]
...
position_endstop: 117.5
position_max: 117.5
`

It does look like the code is compensating for the variability, as I can see the X value varying between 118.03-118.05 after X-only homing. Increasing position_max a bit doesn't seem to help.

To revert, it looks like I'll have reload the firmware on the SKR and the Huvud... argh:

Command format mismatch: endstop_state ..

At least I'm now saving the SKR firmwares, so the build/menuconfig cycle doesn't add delay.

@revilo196
Copy link
Contributor

revilo196 commented Apr 12, 2021

@KevinOConnor I tried out the mult mcu homing.
Using Probe(BL-Touch) and X-Axis-Endstop on a diffrent MCU.
YXZ homing sequence.

While homing Y everything works as normal
But as soon as the X homing starts klipper goes into shutdown.

Im getting:

Unable to obtain 'trsync_state' response
MCU 'tool' shutdown: Timer too close

Im Using a 500kbit CANbus for both MCU's
Also tested with 1Mbit speed

klippy.log

@KevinOConnor
Copy link
Collaborator Author

I get a 'move out of range' error

Okay - that will need to be fixed. In the interim, you should be able to update the homing sequence so that each axis is individually homed and then backed away from the endstop prior to homing any other axis.

motors shut off and make a horrible skipping noise

Not sure why you'd get skipped steps in that case though.

-Kevin

@KevinOConnor
Copy link
Collaborator Author

Im getting:
Unable to obtain 'trsync_state' response
MCU 'tool' shutdown: Timer too close

Okay, thanks. The code definitely did something wrong. I'll try to track it down.

-Kevin

@brandonheller
Copy link

@KevinOConnor For whatever reason, the code drives the motors when already homed, in either +Y or +X, which would both trigger skips. I can test this without the motors actually connected if that would help, to see if this is CoreXY-specific, without the horrible noises. Let me know if that would useful; should be easy enough after reflashing the code.

Has this code been tested on CoreXY yet? Thanks.

@KevinOConnor
Copy link
Collaborator Author

Has this code been tested on CoreXY yet?

Yes - I tested the code on my voron 2.4 printer. Multi-mcu homing with each xy stepper on its own huvud board.

-Kevin

@KevinOConnor
Copy link
Collaborator Author

I've rebased this branch and fixed some bugs. I think the "timer too close" issue should now be fixed.

There is still the issue of an overshoot during homing causing "invalid move" errors. To workaround this issue for now, either 1) always back off from the axis after homing it, or 2) increase the position_min/position_max of the axis so that the valid range is a little past the position_endstop.

-Kevin

@revilo196
Copy link
Contributor

Can confim the "timer too close" issue is fix works now fine for me.
Tested with the same setup from above. (BL-Touch and X-Axis-Endstop)
Will keep testing now.
Good work!

@GerogeFu
Copy link
Contributor

@KevinOConnor Hi, Kevin. I run into homing issue too. First i move the head to the middle of machine, then do QUAD_GANTRY_LEVEL. When finished, i do G28, then it tell me "!! Move out of range: 350.038 345.000 10.000 [0.000]". After that i try more commands like "G28 X" "G28 Y" "FIRMWARE_RESTART" "RESTART", finally it can home now. I give you my log, hope it will help you debug.
klippy.log

@eddietheengineer
Copy link

@Tircown @KevinOConnor I'm running this branch on a printer with hybrid-corexy kinematics (IDEX) and two CAN Huvud toolhead boards. The X enstops are mounted to the toolheads and routed through the Huvud.

When I try to home X, the first toolhead moves toward the left (X0) as expected, triggers the endstop, retracts, and triggers it again. However, when it triggers the second time, I get the following error:

Internal error on command:"G28"
Once the underlying issue is corrected, use the
"FIRMWARE_RESTART" command to reset the firmware, reload the
config, and restart the host software.
Printer is shutdown

If I switch the first toolhead to a sensorless X home and leave the second toolhead with the toolhead MCU endstop, the first toolhead homes properly, but the second toolhead errors out in the same way as the first case (triggers the endstop once, retracts, and faults out the second time it homes). I tried slowing down the triggering and slowing down the retract speed (to add a bit more delay before it homes the second time), but it didn't change the case.

I assume something is interacting between the IDEX X axis homing routine and the multi-mcu homing. For now I'll just run extra wires for the X endstops, but I wanted to document it for reference.

klippy (40).log

@eddietheengineer
Copy link

@Tircown's pull request fixed the issue with idex_modes.py!

@KevinOConnor
Copy link
Collaborator Author

@GerogeFu - sorry I missed your message earlier. I'm not sure why that would be. It is possible on this branch for the toolhead to nominally end outside the build area with this change. But, I'm not sure what would lead to getting the "move out of range" error. If you can recreate, can you issue an M112 command immediately after the problem occurs, and then attach that log here?

Separately, it should be possible to avoid the issue by changing your start scripts to move to a valid position after a G28. However, I'd still be interested in finding out what sequence of commands triggers that error.

-Kevin

@KevinOConnor
Copy link
Collaborator Author

FYI, I plan to merge the next set of changes on this branch. I plan to merge the "homing based on timing" part of this change in the next few days. The current Klipper code calculates the home position based on where the steppers are located after homing - with this next merge Klipper will calculate the home position based on the time that the endstop switch triggers.

The above merge will not include multi-mcu support, but the remaining changes after that are small.

-Kevin

@arkeet
Copy link
Contributor

arkeet commented Aug 26, 2021

Since 3814a13 I've seen a couple reports from Voron users of "move out of range" errors after homing. Instead of requiring changes to the post-homing procedure as mentioned in Config_Changes, I think this could be solved by having the homing routine itself move the toolhead back to the endstop position. Would this be a difficult thing to do?

@KevinOConnor
Copy link
Collaborator Author

If anyone is getting strange "move out of range errors", please make sure you are on the latest code. If the problem reoccurs, run M112 immediately after the error occurs and attach the full log here.

There are workarounds possible, but I'd prefer to have a good understanding of why these errors are occurring before deploying a countermeasure.

-Kevin

@mtw3d
Copy link
Contributor

mtw3d commented Aug 26, 2021

Am I correct that this has now been fully pushed to the main branch? If so, how do I switch back to that branch?

@lijgame
Copy link

lijgame commented Aug 27, 2021

If anyone is getting strange "move out of range errors," please make sure you are on the latest code. If the problem reoccurs, run M112 immediately after the error occurs and attach the full log here.

There are workarounds possible, but I'd prefer to have a good understanding of why these errors are occurring before deploying a countermeasure.

-Kevin

My Voron 2.4 runs on a single Spider board. I ran into this move out of range error when I was doing the thermal expansion test.
I was running on the pretty latest Klipper(up to date 19 Aug, commit efbb704
) plus the frame-expansion changes from @alchemyEngine.
Here is the Klipper log file I saved that time:
klippy (5).log

Let me know if you need more testing from me.

-Li Jiang

@zellneralex
Copy link
Contributor

zellneralex commented Aug 27, 2021

Am I correct that this has now been fully pushed to the main branch? If so, how do I switch back to that branch?

As Kevin mentioned “The above merge will not include multi-mcu support, but the remaining changes after that are small.”

so not fully merged yet. If you want go Bach to master ssh in your pi:
cd klipper
sudo service klipper stop
git checkout master
sudo service klipper start

@mtw3d
Copy link
Contributor

mtw3d commented Aug 27, 2021

As Kevin mentioned “The above merge will not include multi-mcu support, but the remaining changes after that are small.”

so not fully merged yet. If you want go Bach to master ssh in your pi:

He said that in reference to "the above merge", but the next day he merged

mcu: Support multi-mcu homing 3a73497"

Support endstops and probes attached to a different micro-controller than their associated steppers.

I don't want to go back to the master unless it supports multi-mcu homing, but it's not clear to me.

@KevinOConnor KevinOConnor force-pushed the work-homing-20210217 branch 3 times, most recently from 247bb64 to 2893e83 Compare August 27, 2021 21:33
Support endstops and probes attached to a different micro-controller
than their associated steppers.

Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
@KevinOConnor KevinOConnor merged commit 9504778 into master Aug 28, 2021
@KevinOConnor KevinOConnor deleted the work-homing-20210217 branch August 28, 2021 20:20
@KevinOConnor
Copy link
Collaborator Author

FYI, I just merged the last of the changes on this branch in to the master branch (commit 9504778).

Of the original issues identified, the If a multi-stepper axis (eg, stepper_z1) is used and those steppers are on different mcus then this code does not currently handle that accurately was not addressed in code on this merge. Instead, the code will raise an error if one attempts to configure multi-mcu-homing mixed with multi-mcu-stepper-axis. Support for multi-mcu-stepper-axis and multi-mcu-homing can be added at a later date if there is interest in that.

-Kevin

@brandonheller
Copy link

Thanks Kevin! This is really exciting, as it means I can put both a Z probe (Klicky) and the X endstop togther on a toolhead board (Huvud), and not be managing off-master code. 4 toolhead wires go away, bringing the total from 9 down to 5 for me (CANH, CANL, 24V, 5V, GND).

Nice.

And a Q re: the mutli-stepper axis limitation... I've never seen a >1-stepper printer board that doesn't have at least 4 drivers, and even with QGL or TBL you're covered.

Is the current limitation you describe only going to affect those with individual-motor boards on each Z axis then? Unless we live in a future with single-motor-integrated toolhead boards, I can't think of a use case.

@robthide37
Copy link

robthide37 commented Aug 30, 2021 via email

@KevinOConnor
Copy link
Collaborator Author

Mixing multi-mcu-homing and multi-mcu-axis would be a rare case. I don't know of anyone doing that today.

-Kevin

@robthide37
Copy link

robthide37 commented Aug 31, 2021 via email

@TP-SpeeDj
Copy link

klippy.log
Here's my log files, I hope they help.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.