Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Host Keepalive serial output with multi-serial #21283

Conversation

rhapsodyv
Copy link
Member

@rhapsodyv rhapsodyv commented Mar 8, 2021

Description

This is a continuation of the stress tests I'm doing on multi serial, to fix corner cases.

I found an issue that happens in a very specific combination of events, but that can happen very often on a real printing operation: multi serial + idle stacking (slow commands that call idle until finished) + keep alive code.

When marlin is executing some slow commands, it may stay in a loop waiting for it to complete. Inside that loop, marlin may call idle periodically. So, the idle will check for new serial commands (in the other serial) to enqueue. When it receive the command, it will reply "busy" (on keep alive function), but it send to the wrong serial port.

Call stack:

idle
  process_next_command
    Gxxx
       while(something)
          idle
            keepalive (sent to the current serial)
            other serial tries to send data, thinks it succeeds, but marlin simply doesn't reply (until the actual commands)
            data loss and wrong behavior on the first serial (it will cause resend on serial 1)

I could simulate resend commands on octoprint this way. And I fixed making the keep alive reply "busy" for all serial ports...

That in fact it seems correct, because marlin will only handle one command at time, so the keep alive should warns all serial ports that marlin is busy, to avoid timeout or false-resend commands everywhere.

Benefits

More robust multi serial.
May fix #21010 and #21244

Related Issues

#21249

@thinkyhead
Copy link
Member

It looks like multiSerial.portMask is being left uninitialized (zero) so the fallback for SERIAL_REDIRECT is no ports. The fallback should probably be 0xFF, or SERIAL_ALL, and this might actually be the correct fix.

@rhapsodyv
Copy link
Member Author

It looks like multiSerial.portMask is being left uninitialized (zero) so the fallback for SERIAL_REDIRECT is no ports. The fallback should probably be 0xFF, or SERIAL_ALL, and this might actually be the correct fix.

yes, can be. But remember that the second idle is called inside another running command.... so, it may set the serial port to the current command... and the other command just hang (it's what happens).

@rhapsodyv
Copy link
Member Author

rhapsodyv commented Mar 8, 2021

Test:

Octo "printing" in one serial 3 (only g1 x2 x2)
Screen Shot 2021-03-08 at 01 18 34

Then, in another serial (serial 1), I send a long move, to generate a wait on planner, to it call idle and get a keep alive:
Screen Shot 2021-03-08 at 01 18 48

This is what happens on octo:

Send: N4848 G1 X2.372 Y2.8189999*30
Recv: ok
Send: N4849 G1 X2.372 Y2.8189999*31
Recv:  T:-15.00 /0.00 @:0
Recv: echo:busy: processing. <<<<-receive the busy
Printer seems to support the busy protocol, will adjust timeouts and set busy interval accordingly
Recv:  T:-15.00 /0.00 @:0
Recv: ok
Send: N4850 M113 S2*88
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-14.92 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0
Recv:  T:-15.00 /0.00 @:0

Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.

Send: N4851 M105*31
Recv: Error:checksum mismatch, Last Line: 4849
Recv: Resend: 4850
Send: N4850 M113 S2*88
Recv: ok
Send: N4851 M105*31
Recv: ok T:-15.00 /0.00 @:0

Send: N4852 G1 X2.372 Y2.8189999*21

After adding the like that send busy for all serial ports, octo print will get hang or get timeout

@rhapsodyv
Copy link
Member Author

rhapsodyv commented Mar 8, 2021

This behaviour happened always in my tests: every time I generate a wait having commands for another serial, one serial loses a command and get timeout. With busy broadcast, I got no more resend erros on octo, even if I send commands in loop to the two serials, generating lots of busy, etc.

@thinkyhead
Copy link
Member

thinkyhead commented Mar 8, 2021

But, PORT_REDIRECT carries its own "stack" with it to ensure that everything is using the port it expects. It creates an object that restores the previous serial mask when it goes out of scope. If it isn't working now, maybe it's not restoring the previous value properly.

@rhapsodyv
Copy link
Member Author

rhapsodyv commented Mar 8, 2021

But, SERIAL_REDIRECT carries its own "stack" with it to ensure that everything is using the port it expects. It creates an object that restores the previous serial mask when it goes out of scope. If it isn't working now, maybe it's not restoring the previous value properly.

Yes, but its value will be the last value set, right? That is the value of the current command being executed.

You can easy test it:
in the serial 1, send a M0. This serial will be receiving paused by the user every "keep alive seconds"

then, In the serial 2, send a M114... Marlin will not respond anything to this serial, never..... even if, in fact, marlin is currently busy... So, this serial will timeout, if the command is queued... so, after the serial 1 release, marlin will execute the M114, and the host may send it again!!

That is the reason I sent the PR, because makes sense, for me, that keep alive should broadcast (as all host commands).

But yes, maybe we have more hidden issues hehehe...

@thinkyhead
Copy link
Member

The current command does redirect its output, and presumably it wants to also redirect the output of anything within idle that is intended as a message about the current state. Things called by idle() will be responsible for redirecting to SERIAL_ALL if they need to. When the running command is complete it should be restoring the serial port mask back to the default (i.e., SERIAL_ALL).

@thinkyhead
Copy link
Member

…so you are correct to redirect the output of keepalive.

@thinkyhead
Copy link
Member

And now, to fix this exp32 compile before I go ahead and merge….

@rhapsodyv
Copy link
Member Author

I found ... the lib arduinoWebSockets was updated to use the lasts code from espressif32... but we are using espressif32@2.1.0.... Or we try to use old version of arduinoWebSockets or new version of espressif32...

we can point arduinoWebSockets to one version old and it will work...
On features, point arduinoWebSockets to 2.3.4 (current is 2.3.5, committed yesterday)

arduinoWebSockets=https://github.com/Links2004/arduinoWebSockets/archive/2.3.4.zip

@thinkyhead thinkyhead changed the title Fix Keep Alive with Multi Serial Fix Host Keepalive serial output with multi-serial Mar 8, 2021
@thinkyhead thinkyhead merged commit 1b9ff68 into MarlinFirmware:bugfix-2.0.x Mar 8, 2021
@adolson
Copy link

adolson commented Mar 8, 2021

bugfix-2.0.x fails to compile in Arduino IDE after this merge. Was working yesterday. How do I fix this?

In file included from sketch/src/HAL/ESP32/FlushableHardwareSerial.cpp:23:0:
sketch/src/HAL/ESP32/FlushableHardwareSerial.h: In constructor ‘FlushableHardwareSerial::FlushableHardwareSerial(int)’:
sketch/src/HAL/ESP32/FlushableHardwareSerial.h:31:64: error: no matching function for call to ‘HardwareSerial::HardwareSerial(int&)’
   FlushableHardwareSerial(int uart_nr) : HardwareSerial(uart_nr) {}
                                                                ^
In file included from sketch/src/HAL/ESP32/FlushableHardwareSerial.h:24:0,
                 from sketch/src/HAL/ESP32/FlushableHardwareSerial.cpp:23:
/usr/share/arduino/hardware/arduino/avr/cores/arduino/HardwareSerial.h:117:12: note: candidate: HardwareSerial::HardwareSerial(volatile uint8_t*, volatile uint8_t*, volatile uint8_t*, volatile uint8_t*, volatile uint8_t*, volatile uint8_t*)
     inline HardwareSerial(
            ^
/usr/share/arduino/hardware/arduino/avr/cores/arduino/HardwareSerial.h:117:12: note:   candidate expects 6 arguments, 1 provided
/usr/share/arduino/hardware/arduino/avr/cores/arduino/HardwareSerial.h:93:7: note: candidate: constexpr HardwareSerial::HardwareSerial(const HardwareSerial&)
 class HardwareSerial : public Stream
       ^
/usr/share/arduino/hardware/arduino/avr/cores/arduino/HardwareSerial.h:93:7: note:   no known conversion for argument 1 from ‘int’ to ‘const HardwareSerial&’
/usr/share/arduino/hardware/arduino/avr/cores/arduino/HardwareSerial.h:93:7: note: candidate: constexpr HardwareSerial::HardwareSerial(HardwareSerial&&)
/usr/share/arduino/hardware/arduino/avr/cores/arduino/HardwareSerial.h:93:7: note:   no known conversion for argument 1 from ‘int’ to ‘HardwareSerial&&’
exit status 1
Error compiling for board Arduino Mega or Mega 2560.

@rhapsodyv
Copy link
Member Author

@adolson take a look if it solves: https://github.com/rhapsodyv/Marlin/tree/21283-followup

@adolson
Copy link

adolson commented Mar 8, 2021

Yes, that compiles successfully.

thinkyhead added a commit that referenced this pull request Mar 9, 2021
Followup to #21283

Co-authored-by: Scott Lahteine <github@thinkyhead.com>
thinkyhead added a commit that referenced this pull request Mar 9, 2021
Followup to #21283

Co-authored-by: Scott Lahteine <github@thinkyhead.com>
vyacheslav-shubin pushed a commit to vyacheslav-shubin/Marlin that referenced this pull request Mar 10, 2021
Co-authored-by: Scott Lahteine <github@thinkyhead.com>
vyacheslav-shubin pushed a commit to vyacheslav-shubin/Marlin that referenced this pull request Mar 10, 2021
Followup to MarlinFirmware#21283

Co-authored-by: Scott Lahteine <github@thinkyhead.com>
vyacheslav-shubin pushed a commit to vyacheslav-shubin/Marlin that referenced this pull request Mar 10, 2021
Co-authored-by: Scott Lahteine <github@thinkyhead.com>
vyacheslav-shubin pushed a commit to vyacheslav-shubin/Marlin that referenced this pull request Mar 10, 2021
Followup to MarlinFirmware#21283

Co-authored-by: Scott Lahteine <github@thinkyhead.com>
TyMi pushed a commit to TyMi/Marlin that referenced this pull request Mar 11, 2021
* bugfix-2.0.x: (248 commits)
  [cron] Bump distribution date (2021-03-11)
  Fix password menu stickiness before first auth (MarlinFirmware#21295)
  Lerdge-K TMC 2208/9 UART pins (MarlinFirmware#21299)
  Fix LERDGE 'extends' env references (MarlinFirmware#21305)
  Fix TouchMI stow in G34 (MarlinFirmware#21291)
  Fix MeatPack with per-serial-port instances (MarlinFirmware#21306)
  Tricked-out declaration
  Update MEATPACK test
  Number serial from 1 to match settings
  Clean up spaces and words
  Fix serial index types
  Add binary file transfer test
  fix meat pack internal buffer for multi serial
  [cron] Bump distribution date (2021-03-10)
  Fix LPC + TMC boot loop (MarlinFirmware#21298)
  Distinguish serial index from mask (MarlinFirmware#21287)
  Host Keepalive followup (MarlinFirmware#21290)
  [cron] Bump distribution date (2021-03-09)
  CUSTOM_USER_BUTTONS followup (MarlinFirmware#21284)
  Fix Host Keepalive serial target (MarlinFirmware#21283)
  ...

# Conflicts:
#	Marlin/Configuration.h
#	Marlin/Configuration_adv.h
chrisjenda pushed a commit to chrisjenda/Marlin that referenced this pull request Apr 3, 2021
Followup to MarlinFirmware#21283

Co-authored-by: Scott Lahteine <github@thinkyhead.com>
W4tel-BiDi pushed a commit to W4tel-BiDi/Marlin that referenced this pull request Apr 5, 2021
Co-authored-by: Scott Lahteine <github@thinkyhead.com>
W4tel-BiDi pushed a commit to W4tel-BiDi/Marlin that referenced this pull request Apr 5, 2021
Followup to MarlinFirmware#21283

Co-authored-by: Scott Lahteine <github@thinkyhead.com>
thinkyhead added a commit that referenced this pull request Apr 30, 2021
Co-authored-by: Scott Lahteine <github@thinkyhead.com>
thinkyhead added a commit that referenced this pull request Apr 30, 2021
Followup to #21283

Co-authored-by: Scott Lahteine <github@thinkyhead.com>
@rhapsodyv rhapsodyv deleted the multi-serial-and-keep-alive-hang branch May 30, 2021 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants