Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins: Improve error handling on plugin version mismatch #1838

Merged
merged 13 commits into from
Oct 23, 2022

Conversation

har7an
Copy link
Contributor

@har7an har7an commented Oct 23, 2022

This PR picks up a long-standing issue with respect to the plugin system. Since we do not make guarantees about plugin version compatibility, we always require the plugin versions to match the zellij version.

Current situation

This is usually not an issue for people that only install zellij from official releases, because in this case zellij takes care of updating the plugins proper itself. Things start to get hairy when developing zellij, because once a user has built the plugins themselves from main, they will not be updated on the next release automatically.

Previously we had no mechanism to check for this: The plugins were just loaded and executed. Errors always cropped up in the plugins update functions, where an Event is deserialized from stdin on the plugin side (i.e. in the WASM sandbox/VM). At this point, the serde error is unwrapped, and the plugin crashes along with the application.

Symptomatic for this sort of failure are multiple things:

  1. The actual unwrap inside the plugin is masked by WASM, which is evident from the logs where messages like this would occur (Note the log level "DEBUG"):
DEBUG  |tab-bar                  | 2022-08-27 19:56:48.928 [id: 0     ] thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: sequence, expected a string", line: 1, column: 53)', default-plugins/tab-bar/src/main.rs:29:1 
  1. The error shown on stdout contains lots of garbage:
called `Result::unwrap()` on an `Err` value: RuntimeError { source: Trap(UnreachableCodeReached), wasm_trace: [FrameInfo { module_name: "<module>", func_index: 196, function_name: None,
      func_start: SourceLoc(118399), instr: SourceLoc(121927) }, FrameInfo { module_name: "<module>", func_index: 97, function_name: None, func_start: SourceLoc(59952), instr: SourceLoc(60199) },
      FrameInfo { module_name: "<module>", func_index: 102, function_name: None, func_start: SourceLoc(61748), instr: SourceLoc(61869) }, FrameInfo { module_name: "<module>", func_index: 509,
      function_name: None, func_start: SourceLoc(409244), instr: SourceLoc(427678) }], native_trace:    0: <unknown>
         1: <unknown>
         2: <unknown>
         3: <unknown>
         4: <unknown>
         5: <unknown>
         6: <unknown>
       }
  1. We have no way of handling this, because in the code WASM gives us some obscure type of RuntimeError without any further information.

2 above is particularly annoying because it's the only hint the user has to go on and it starts with Trap(UnreachableCodeReached). Not helpful.

Improving the situation

This PR approaches the problem in a number of ways.

Don't mask the underlying error cause

#1827 laid the groundwork for this PR, in that it first dealt with the issue of version mismatches and reporting them. This was achieved by appending a descriptive error message before unwrapping on a failed call to update a plugin. Unfortunately, in the stdout that was shown to the user, this error was frequently covered by a related crash happening in the screen thread when trying to render the offending plugins pane. What happens there is that we dispatch a request to the plugin to render its output, and expect to receive a message here. However, the plugin may already have crashed at this point, closing the mpsc::Channel we're trying to receive from.

What we now do instead is:

  • Attach a context to the RecvError,
  • Print the error to the zellij logs, and
  • show a default text "No output from plugin received. See logs" in the plugin pane

This keeps the screen thread from crashing and allows us to display the real error cause on stdout.

Check zellij/plugin version compatibility

Furthermore, in order to prevent such situations from happening in the first place, we're now embedding a plugin version into the plugins at build time. As of currently, this is the zellij version (taken from zellij_utils::consts::VERSION), taken from the version in the Cargo.toml.

Since we're not making guarantees about backwards compatibility yet, the requirement is that Plugin version == Zellij version. We check the versions when starting the plugin (after loading it, but before updating it for the first time). If this check fails, we output an error such as this:

$ ./target/debug/zellij

Error occurred in server:

  × Thread 'wasm' panicked.
  ├─▶ Originating Thread(s)
  │   	1. ipc_server: NewClient
  │   	2. pty_thread: NewTab
  │   	3. screen_thread: NewTab
  │   	4. plugin_thread: Load
  │   
  ├─▶ At zellij-server/src/wasm_vm.rs:433:19
  ╰─▶ If you're seeing this error your plugin versions don't match your current
      zellij version. Detected versions:
      
      - Plugin version: Unavailable
      - Zellij version: 0.32.0
      - Offending plugin: tab-bar
      
      If you're a user:
          Please contact the distributor of your zellij version and report this error
          to them.
      
      If you're a developer:
          Please run zellij with the updated plugins. The easiest way to achieve this
          is to build zellij with `cargo make install`. Also refer to the docs:
          https://github.com/zellij-org/zellij/blob/main/CONTRIBUTING.md#building
      
      A possible fix for this error is to remove all contents of the 'PLUGIN DIR'
      folder from the output of the `zellij setup --check` command.
      
  help: If you are seeing this message, it means that something went wrong.
        Please report this error to the github issue.
        (https://github.com/zellij-org/zellij/issues)
        
        Also, if you want to see the backtrace, you can set the `RUST_BACKTRACE` environment variable to `1`.

In this case, I tried to run zellij compiled from main against plugins compiled on v0.31.4. Since the old plugins don't have any version information, we show the plugin version as "Unavailable".

Catching mismatches during development

The previous method of catching errors works wonderfully for users that just update their application from release to release. However, developers compiling zellij off of main will not update the zellij application version after every compilation. This is particularly tricky, as one may spend multiple hours developing on the latest main before performing commits, so a comparison such as "zellij git-rev" == "plugin git-rev" won't do the trick either. However, every modification increases the chance of a change in the Event serialization structure, making the plugins incompatible.

As mentioned above, the typical symptom tor this error is that deserialization of the Event on the plugin fails. At this point it would have been amazing if wasmer could propagate errors from an exported WASM function called inside the plugin onto the host. The WASM docs do not talk about this at all, and it isn't considered an error when one does change the called function inside the WASM module (plugin) to return a Result. As far as I could gather, however, this Result is silently dropped.

It is stated somewhere on the wasmer GitHub that a panic inside a WASM module will be converted to a RuntimeError, and there is even a RuntimeError::downcast, but that also doesn't give any hints about the underlying error cause.

Instead, we're overriding the panic hook to send a message (as string) to the host. Overriding the panic hook is okay in this case, because the default wasmer panic hook in our case doesn't contribute any meaningful information (likely due to wasm-opt used during building). Also, sending the panic message as string is the only sensible way to go: Since a panic is likely to occur upon deserialization, we obviously cannot expect to match the hosts data types any more. Hence, serializing to some other type and deserializing on the host successfully likely isn't possible at all.

Here's the resulting error for this error case:

$ ./target/debug/zellij -l devel

Error occurred in server:

  × Thread 'wasm' panicked.
  ├─▶ Originating Thread(s)
  │   	1. ipc_server: NewClient
  │   	2. pty_thread: NewTab
  │   	3. screen_thread: NewTab
  │   	4. plugin_thread: Update
  │   
  ├─▶ At zellij-server/src/wasm_vm.rs:629:5
  ╰─▶ 
      An error occured while receiving an Event from zellij. This is most likely
      caused by your plugins not matching your current zellij version.
      
      The most likely explanation for this is that you're running either a
      self-compiled zellij or plugin version. Please make sure that, while developing,
      you also rebuild the plugins in order to pick up changes to the plugin code.
      
      Please refer to the documentation for further information:
          https://github.com/zellij-org/zellij/blob/main/CONTRIBUTING.md#building
      
      
      Caused by:
          0: failed to deserialize object from stdin
          1: unknown variant `Foo`, expected one of `PageDown`, `PageUp`, `Left`, `Down`, `Up`, `Right`, `Home`, `End`, `Backspace`, `Delete`, `Insert`, `F`, `Char`, `Alt`, `Ctrl`, `BackTab`,
      `Null`, `Esc` at line 1 column 3476

The error in this case was provoked by introducing a new key "Foo", that the plugins didn't know about. As an added bonus compared to the previous situation, we now also get to see why exactly the deserialization failed in the output of stdout (no more hiding as "DEBUG" message in the logs).

and do not crash the application on failure to receive a render update
from plugins any longer. Instead, will print a simple string with a hint
to check the application logs, where a more thorough error indication
can be found.
to create ad-hoc errors with custom error types, without having to wrap
them into a `context()` before to turn the into anyhow errors.
and terminate execution with a descriptive error message in case the
plugin version is incompatible with the version of zellij being run.
so the user knows which plugin to look at in case they're using custom
plugins.
Previously we would accept cases where the plugin version was newer than
the zellij version, which doesn't make a lot of sense.
in call to `wasmer::Function::call` in case a plugin version mismatch
can occur.
that will print the panic message to a plugins stdout and then call a
panic handler on the host that turns it into a real application-level
panic.
and turn them into proper panics. These errors are symptomatic of an
uncaught plugin version mismatch, for example when developing from main
and compiling zellij/the plugins from source. Normal users should never
get to see this error.
for anyhow errors. The default anyhow error formatting of `{:?}` is
already very good, and we just made it worse by trying to invent our own
formatting.
@har7an har7an temporarily deployed to cachix October 23, 2022 09:34 Inactive
@har7an
Copy link
Contributor Author

har7an commented Oct 23, 2022

TODO

  • rustfmt
  • check whether this has any performance impact
  • check whether this somehow increases binsize

@har7an har7an temporarily deployed to cachix October 23, 2022 09:41 Inactive
Improve error handling on plugin version mismatch.
@har7an har7an temporarily deployed to cachix October 23, 2022 10:08 Inactive
@har7an
Copy link
Contributor Author

har7an commented Oct 23, 2022

@imsnif Are you fine with the new error formats posted above? You needn't read all the text, but you can skim the text in the code fences.

@imsnif
Copy link
Member

imsnif commented Oct 23, 2022

@har7an - skimming over the code blocks, this looks great. Minor nitpick - I think it looks better if the error voice is passive "the plugins do not match the current Zellij version".

Otherwise - pretty awesome. Will this happen for any sort of deserialization error?

@har7an
Copy link
Contributor Author

har7an commented Oct 23, 2022

Otherwise - pretty awesome. Will this happen for any sort of deserialization error?

It will happen for all unhandled panics in the plugins. The "version mismatch" message in particular will be shown for all deserialization failures in the call to update. I know that some of the other plugin API calls also deserialize things from stdin, they're simply unwrapping (without the pretty text block).

@har7an har7an temporarily deployed to cachix October 23, 2022 12:35 Inactive
@har7an har7an merged commit 75801bd into zellij-org:main Oct 23, 2022
@har7an har7an deleted the errors/dont-unwrap-in-plugin-pane branch October 23, 2022 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants