Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Potential bug when using SAML and workers might result in "Unsolicited response" errors #7530

Open
clokep opened this issue May 19, 2020 · 9 comments
Labels
A-SSO Single Sign-On (maybe OIDC) A-Workers Problems related to running Synapse in Worker Mode (or replication) S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@clokep
Copy link
Member

clokep commented May 19, 2020

I'm unsure if this will be a problem in reality or is just a potential for issues, but figured I should document it. This is somewhat similar to #6705, but is:

  • Specific to SAML.
  • Not specific to UI authentication (e.g. it will apply to login/registration as well).

The SAML handler stores state about ongoing SAML requests in memory (see uses of _outstanding_requests_dict in the synapse.handlers.saml_handler.SamlHandler class).

In worker mode, it is possible for a request to get created and the callback to occur on different workers causing an error about an unrequested SAML response.

I believe the workaround is to ensure that the following endpoints all go to the same worker:

  • /_matrix/client/r0/login/sso/redirect
  • /_matrix/saml2/authn_response
  • /_matrix/client/r0/auth/(org.matrix.login.sso|m.login.sso)/fallback/web
@anoadragon453 anoadragon453 added the A-Workers Problems related to running Synapse in Worker Mode (or replication) label May 19, 2020
@clokep
Copy link
Member Author

clokep commented May 22, 2020

I believe this is a duplicate of #7056.

@clokep clokep closed this as completed May 22, 2020
@clokep
Copy link
Member Author

clokep commented Jul 7, 2020

Based on the conversation in #7056, it seems like this might be a different cause that ends up in the same error.

@clokep clokep reopened this Jul 7, 2020
@babolivier
Copy link
Contributor

@clokep If you're reopening it, could you please rename this issue with something more explicit? "Potential bug in SAML + worker mode" sounds quite hazy.

@clokep clokep changed the title Potential bug in SAML + worker mode Potential bug when using SAML and workers might result in "Unsolicited response" errors Jul 7, 2020
@localguru
Copy link
Contributor

localguru commented Oct 29, 2020

Hi,

@clokep I think same problem here. I run 3 generic.worker via nginx ip_hash balancing, login SAML only. When saving keys at e.g. at logout I get this error

{"errcode":"M_UNRECOGNIZED","error":"Unrecognized request"}

and the client ends up on URL https://.../_matrix/client/r0/auth/m.login.sso/fallback/web?session=xxxx .

error from generic.worker log:

2020-10-29 16:41:58,485 - synapse.http.server - 76 - INFO - GET-1699 - <XForwardedForRequest at 0x7fb2efc15da0 method='GET' uri='/_matrix/client/r0/auth/m.login.sso/fallback/web?session=xxxx' clientproto='HTTP/1.0' site=18103> SynapseError: 400 - Unrecognized request
2020-10-29 16:41:58,485 - synapse.access.http.18103 - 311 - INFO - GET-1699 - 129.70.xxx.xxx - 18103 - {None} Processed request: 0.001sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 59B 400 "GET /_matrix/client/r0/auth/m.login.sso/fallback/web?session=xxxx HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" [0 dbevts]

Removing regex ~ ^/_matrix/client/(r0|unstable)/auth/.*/fallback/web$ from generic.worker locations, so that the request ends up at the main homeserver process on port 8008, it's working fine.

Also when adding

# SAML requests.
^/_matrix/client/(api/v1|r0|unstable)/login/sso/redirect$
^/_matrix/saml2/authn_response$

as recommended in workers.md not even the SAML login works anymore.

Ciao
Marcus

@clokep
Copy link
Member Author

clokep commented Oct 29, 2020

The workaround right now is to ensure that all saml logins go to a single process (usually the main process). Sounds like that's what you've done to get it working?

@localguru

This comment has been minimized.

@clokep
Copy link
Member Author

clokep commented Oct 30, 2020

The question is whether one should run an own generic.worker for these three SAML paths or if it is a bit overkill and just let them run into the main process (8008).

The question is, if docs/workers.md should not be better adapted, if several generic.workers cannot handle SAML requests. The problem has cost me some grey hair.

It probably is not necessary for those endpoints to go to a worker. They are only used during initial login.

Multiple workers not handling the SAML endpoints properly is a bug.

@erikjohnston erikjohnston added z-bug (Deprecated Label) z-p2 (Deprecated Label) labels Nov 2, 2020
erikjohnston pushed a commit that referenced this issue Nov 6, 2020
If SSO login is used (e.g. SAML) in a multi worker setup, it should be mentioned that currently all SAML logins must run on the same worker, see #7530

Also, if you are using different ports (for example 443 and 8448) in a reverse proxy for client and federation, the path `/_matrix/media` on the client and federation port must point to the listener of the `media_repository` worker, otherwise you'll get a 404 on the federation port for the path `/_matrix/media`, if a remote server is trying to get the media object on federation port, see #8695
@richvdh
Copy link
Member

richvdh commented Dec 17, 2020

#8942 introduces another point which will fail if an attempt is made to route SSO traffic to multiple workers, though #8966 is more of a blocker anyway

@richvdh richvdh added A-SSO Single Sign-On (maybe OIDC) S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. and removed z-bug (Deprecated Label) z-p2 (Deprecated Label) labels Dec 23, 2021
@dklimpel
Copy link
Contributor

This is related to: #9427

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-SSO Single Sign-On (maybe OIDC) A-Workers Problems related to running Synapse in Worker Mode (or replication) S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

7 participants