Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing https support via Let's Encrypt appears broken (instructions problematic) #115

Closed
JuanCab opened this issue Aug 7, 2018 · 21 comments
Labels
support Support questions (should be on discourse.jupyter.org instead)

Comments

@JuanCab
Copy link
Contributor

JuanCab commented Aug 7, 2018

On a freshly installed jupyterhub that is visible to the outside world, I followed the Let's Encrypt instructions on the Enabling HTTPS document page. I confirmed sudo -E tljh-config show returns the expected content compared to what is in the documentation.

Problem 1) When I do sudo -E tljh-config reload proxy, nothing happens. In fact, I realized that the connection hangs if you are doing this through the terminal on the jupyterhub. This is not surprising since it is shutting down http and turning on https. However, there is no warning in the documentation that this will happen.

Problem 2) When I try to go to the https connection, it is active, but the certificate is NOT being recognized as "verified by a third party." (in Chrome, this is NET::ERR_CERT_AUTHORITY_INVALID) It does appear to be created since its name is "TRAEFIK DEFAULT CERT".

The documentation should be updated to fix Problem 1, and I would appreciate any hints as to how to 'redo' the proxy connection properly. I did try re-running sudo -E tljh-config reload proxy from ssh, and it returned Proxy reload with new configuration complete but didn't fix the issue.

We did revert to a snapshot of the VM from before activation of HTTPS and try the instructions from an SSH terminal. The result was the same except that sudo -E tljh-config reload proxy from ssh, returned Proxy reload with new configuration complete (since the http session terminal was not used), but the certificate is still not recognized as a third party verified certificate. Is there something more we need to do?

@JuanCab
Copy link
Contributor Author

JuanCab commented Aug 7, 2018

Failure is due to lack of a proper DNS entry for our server (no "A" entry specifically). Working on it, but I am closing this problem for now.

@JuanCab JuanCab closed this as completed Aug 7, 2018
@JuanCab JuanCab reopened this Aug 7, 2018
@JuanCab
Copy link
Contributor Author

JuanCab commented Aug 7, 2018

Actually, this fixed Problem 2, Problem 1 (the confusing issue of running the commands within an http connected terminal) still exists.

@ajhenley
Copy link

I tried the same from ssh all the way with the same result.

New Ubuntu 18.04 install
sudo apt update
sudo apt upgrade
followed the "your own server" instructions to the letter (https://the-littlest-jupyterhub.readthedocs.io/en/latest/install/custom-server.html)
then followed the https instructions and got this

$ sudo tljh-config reload proxy
Proxy reload with new configuration complete

but

https still doesnt work

@yuvipanda yuvipanda added the support Support questions (should be on discourse.jupyter.org instead) label May 20, 2019
@parthjoshi2007
Copy link

I am facing the same issue. There is an invalid HTTPS certificate that the hub is served with. No negotiation with letsencrypt whatsoever. For now, I'm setting up letsenrypt with certbot (https://certbot.eff.org/lets-encrypt/ubuntubionic-other) and getting the certificate and key separately and using the manual HTTPS setup for TLJH

@lucas-mior
Copy link

lucas-mior commented May 27, 2019

Same issue here, I'll try fixing it as @parthjoshi2007 did.
Did you install and setup Certbot after TLJH installation?

@yuvipanda
Copy link
Collaborator

Heya! I just merged #328, seen in http://tljh.jupyter.org/en/latest/howto/admin/https.html. There's a short 'troubleshooting' section too. Would love to see the logs from traefik here, so we can help figure out what's going on.

@tomliptrot
Copy link

tomliptrot commented Jun 5, 2019

Hi,

I am getting the same issue. I follow the instructions but then get an invalid hub certificate. @yuvipanda Here are my traefik logs:
logs.txt

@tomliptrot
Copy link

This might be part of the problem:
'Unable to obtain ACME certificate for domains "jupyter.ortom.co.uk" : unable to generate a certificate for the domains [jupyter.ortom.co.uk]: acme: Error -> One or more domains had a problem:\n[jupyter.ortom.co.uk] acme: Error 400 - urn:ietf:params:acme:error:connection - Fetchinghttp://jupyter.ortom.co.uk/.well-known/acme-challenge/ntPU29uuqFL-B7fvSWildcV8sk5FlONSHD4FPpoSQYg: Timeout during connect (likely firewall problem)\n'

@tomliptrot
Copy link

But this bit is odd too
Jun 05 14:31:03 ip-172-31-38-191 traefik[11827]: time="2019-06-05T14:31:03Z" level=info msg="Starting provider *acme.Provider{\"Email\":\"tom@ortom.co.uk\",\"ACMELogging\":false,\"CAServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"Storage\":\"acme.json\",\"EntryPoint\":\"https\",\"KeyType\":\"\",\"OnHostRule\":false,\"OnDemand\":false,\"DNSChallenge\":null,\"HTTPChallenge\":{\"EntryPoint\":\"http\"},\"TLSChallenge\":null,\"Domains\":[{\"Main\":\"j\",\"SANs\":null},{\"Main\":\"u\",\"SANs\":null},{\"Main\":\"p\",\"SANs\":null},{\"Main\":\"y\",\"SANs\":null},{\"Main\":\"t\",\"SANs\":null},{\"Main\":\"e\",\"SANs\":null},{\"Main\":\"r\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\"r\",\"SANs\":null},{\"Main\":\"t\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\"m\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"c\",\"SANs\":null},{\"Main\":\"o\",\"SANs\":null},{\"Main\":\".\",\"SANs\":null},{\"Main\":\"u\",\"SANs\":null},{\"Main\":\"k\",\"SANs\":null}],\"Store\":{}}"

@efedorov-dart
Copy link

Facing the same issue.
Jun 13 15:30:53 paytonstudio traefik[20277]: time="2019-06-13T15:30:53Z" level=error msg="Unable to obtain ACME certificate for domains "studyworthy.xyz" : unable to generate a certificate for the domains [studyworthy.xyz]: acme: Error -> One or more domains had a problem:\n[studyworthy.xyz] acme: Error 400 - urn:ietf:params:acme:error:connection - Fetching http://studyworthy.xyz/.well-known/acme-challenge/xxxxxxx: Timeout during connect (likely firewall problem)\n"

@gantheaume
Copy link

gantheaume commented Jun 27, 2019

It looks like I have the same error: even if it's a 503, it seems Let's encrypt needs the "domain.bar/.well-known/acme-challenge/" folder to be reachable, and it can't reach it.

This article seems to be hinting to this : https://nixcp.com/lets-encrypt-the-client-lacks-sufficient-authorization-invalid-response/ (see towards the end)
No idea how this would be feasible with tljh.

Here's my "anonymised" error (can provide more if needed):
Jun 27 19:07:30 foo traefik[17773]: time="2019-06-27T19:07:30+02:00" level=error msg="Unable to obtain ACME certificate for domains \"foo.bar\" : unable to generate a certificate for the domains [foo.bar]: parthjoshi2007acme: Error -> One or more domains had a problem:\n[foo.bar] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Invalid response from http://foo.bar/.well-known/acme-challenge/EhJX[35moreCaracters]3oSI [ip.v4.XX.XX]: \"<!DOCTYPE html>\\n<html>\\n <head>\\n <title>503 Backend fetch failed</title>\\n </head>\\n <body>\\n <h1>Error 503 Backend fetch f\"\n"

So I'll do like @parthjoshi2007 and set it up with certbot for now.

@gantheaume
Copy link

gantheaume commented Jun 29, 2019

Ok, so to me the error is "clear" :

From: https://certbot.eff.org/docs/using.html#webroot

The webroot plugin works by creating a temporary file for each of your requested domains in ${webroot-path}/.well-known/acme-challenge. Then the Let’s Encrypt validation server makes HTTP requests to validate that the DNS for each requested domain resolves to the server running certbot. An example request made to your web server would look like:

66.133.109.36 - - [05/Jan/2016:20:11:24 -0500] "GET /.well-known/acme-challenge/HGr8U1IeTW4kY_Z6UIyaakzOkyQgPr_7ArlLgtZE8SX HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Note that to use the webroot plugin, your server must be configured to serve files from hidden directories. If /.well-known is treated specially by your webserver configuration, you might need to modify the configuration to ensure that files inside /.well-known/acme-challenge are served by the webserver.

And tljh doesn't allow to reach these files, thus, visibly, the challenge fails.

Thinking of it, I hadn't set up the DNS redirection properly: I had set up a permanent web forwarding, not an A DNS (for foo.bar to ip) and CNAME DNS records (for www.foo.bar to foo.bar)
Explanations here, and setup instructions if you're on Gandi: https://docs.gandi.net/en/domain_names/common_operations/link_domain_to_website.html
Other good Explanations : https://support.dnsimple.com/articles/a-record/

Now if I had done this properly from the start, it may have worked with the tljh's default letsencrypt; When I find time, I'll test ;)
(as I guess it works much better for the certificate renewal).

EDIT: This was indeed the problem, see my next post

Meanwhile, I finally got cerbot to work ( https://certbot.eff.org/lets-encrypt/ubuntubionic-other ) after quite a bit of trial-error, so I'm going to post what I'd been happy bumping on myself. However, it's just what I did on my server, there may be shorter and simpler, but to be sure that would require a bit of testing that I don't have time to do.

I was in root, all this will need extra sudo's otherwise.

First, I undid all I had set up during my previous trials to setup https (we never know):

tljh-config unset https.enabled
tljh-config unset OR remove-item AnyOtherStuffTested
tljh-config reload

Then I tried the standalone certbot:
sudo certbot certonly --standalone --preferred-challenges http -d foo.bar -d www.foo.bar
But I had this error:
Problem binding to port 80: Could not bind to IPv4 or IPv6.

So I started with:
ufw allow 80
But it didn't work yet

Actually, the reason why it wasn't working is that tljh still had it's frontend running on my address.

So to see if I could stop it, I tried (note: my jupyter instances/servers where all already shutdown, no idea if it's important):
systemctl | grep running
systemctl stop jupyterhub.service
Still not enough; by running:
ss -tlnp | grep -E ":(80|443)"
I saw that I still had traefik squatting the ports; so:
systemctl stop traefik.service

And yay! Finally
sudo certbot certonly --standalone --preferred-challenges http -d foo.bar -d www.foo.bar
worked :)

So

systemctl start traefik.service
systemctl start jupyterhub.service

I finally could load my key and certificat following the instructions in the second part of the tutorial: http://tljh.jupyter.org/en/latest/howto/admin/https.html

Now the problem I guess, is that for certificat renewal, I'll have to shut down the server again; so I'll definitely try the proper way anew later.

@gantheaume
Copy link

Ok, so still fulfilling my noob role in this story, I ended up totally messing up my install.
So I restarted from zero, and this time tested the proper tljh way of setting up a certificate.
And guess what, it worked!
So the issue was me not setting up the DNS records properly, confirmed.

By the way, having a look at sudo systemctl status traefik.service can help identify things a bit, if there is some network problem (I found it useful).

@ajhenley
Copy link

ajhenley commented Jun 30, 2019 via email

@gantheaume
Copy link

gantheaume commented Jul 4, 2019

I have literally done the install dozens of times and it never worked. Which instructions did you follow?

Sorry for my late answer, I'm quite busy at the moment; Here is precisely all I did, from a clean Ubuntu server 18.04 install:

If your user hasn't the sudo rights:

su
usermod -a -G sudo yourusername
exit

From now on, everything is run from the normal user "yourusername":

sudo apt-get update
sudo apt-get upgrade  ## Enter on all dialogs if there are some
sudo dpkg-reconfigure locales ## to have locals set up properly and stop having LC errors; I chose EN-US utf8
sudo apt-get install linux-headers-generic ethtool libc-dev linux-libc-dev python3-dev
sudo reboot

Now all is ready, we can do:

sudo ls ## just to have the sudo password entered
curl https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/bootstrap/bootstrap.py | sudo -E python3 - --admin myfirstadminuser ## that's precisely the command of the install instructions in the manual: http://tljh.jupyter.org/en/latest/install/custom-server.html

Then, get things going; I don't know if it's all needed:

export PATH=/opt/tljh/user/bin:${PATH}
nano ~/.bashrc && source ~/.bashrc  ## Added the export path from above; source: http://tljh.jupyter.org/en/latest/howto/env/user-environment.html
sudo env PATH=${PATH} conda update -n base conda ## do not forget the "env"; it's actually missing from the tutorial page above, I'll think about editing it.

At last, the normal SSL procedure from this page: http://tljh.jupyter.org/en/latest/howto/admin/https.html

sudo tljh-config set https.enabled true
sudo tljh-config set https.letsencrypt.email email@example.com ## more precisely, my email is hosted on mydomain.me, but I don't think it's important
sudo tljh-config add-item https.letsencrypt.domains mydomain.me
sudo tljh-config add-item https.letsencrypt.domains www.mydomain.me
sudo tljh-config show

When all is good:
sudo tljh-config reload proxy

Now if you configured the DNS records properly (see my previous long post), all should go fine, and going to "mydomain.me" should bring you directly on the login secured with https ;)

Good luck testing ;)

Note that i already had a working https setup on the same domain using the universal letsencrypt procedure {my long post above) but I then wiped everything at started with a new ubuntu install, so it should not affect anything.
Second, all this was part of quite a bit of trial and error, so you're welcome to suggest improvements!

(By the way, it seems that the only reliable way of installing extra python modules is to use the command sudo -E pip intall module in the jupyter notebook terminal online! ‒and doing a sudo -E pip install --upgrade pip before‒. I didn't manage any install of working modules any other way ‒for example through ssh‒. When I have time I'll dig this, as it's another issue. Linked help page, that details the steps: http://tljh.jupyter.org/en/latest/howto/env/user-environment.html)

@ajhenley
Copy link

ajhenley commented Jul 5, 2019

Thanks so much...

@asvinp
Copy link

asvinp commented Jan 5, 2020

Not sure if it'll help anyone else but basically, had to port forward the HTTPS port 443 on my router. Had only done it for 80. ( ¬_¬)

@hoenie-ams
Copy link

hoenie-ams commented Mar 16, 2020

@gantheaume's tip to use sudo systemctl status traefik.service helped me to figure out my issue. SSL was working fine but then the certificate expired. The problem was the firewall I set up after the initial installation. Look's like port 80 is needed for the renewal of the certificates...

@dschofield
Copy link

error msg="Unable to obtain ACME certificate for domains "a_domain.com" : unable to generate a certificate for the domains [a_domain.com]: acme: Error -> One or more domains had a problem. [a_domain.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://a_domain.com/.well-known/acme-challenge/ra01JKbw3Wv194BDVhjSeK_nkbFA-UVYqnhv08LUoM [2606:4700:3037::681b:a340]

Port 80 must be open for HTTP traffic over IPv4. I had mine restricted to IPv6 (by mistake) and allowing IPv4 traffic on 80 resolved it.

@buggythepirate
Copy link

buggythepirate commented Oct 29, 2020

Piggybacking a bit on @gantheaume solution...

I ended up here after installing TLJH on an Azure virtual machine. For me let's encrypt did not work either at first. sudo journalctl -u traefik showed either timeouts or server misbehaving in the ACME error message. My problem was caused setting up the DNS records AFTER running the install process. Configuring the Let's encrypt proxy and reloading the proxy with sudo tljh-config reload proxy did not fix the problem.

My fix: Make sure your configuration is correct and then restart your virtual machine. Afterwards everything worked smoothly

So here's the proper way to do it for future reference:

  1. First setup the DNS records
  2. then run
    sudo tljh-config set https.enabled true sudo tljh-config set https.letsencrypt.email email@example.com ## more precisely, my email is hosted on mydomain.me, but I don't think it's important sudo tljh-config add-item https.letsencrypt.domains mydomain.me sudo tljh-config add-item https.letsencrypt.domains www.mydomain.me sudo tljh-config show sudo tljh-config reload proxy

@consideRatio
Copy link
Member

This issue covered a lot of debugging related to failure to setup HTTPS.

I think what was missing from the documentation was perhaps notes on:

  • verifying that your device is reachable from the internet on port 80 and port 443
  • and that one need to restart traefik if one wants to make the domain certificate acquisition to trigger be re-attempted
  • and that let's encrypt can block you if you try and fail too many times

Since this issue is long and hard to follow at this point, and that I consider it to be resolved by better documentation. I'm closing this an opening a new one referencing these documentation improvements as the action point for that new issue, and pointing back to this as its origin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Support questions (should be on discourse.jupyter.org instead)
Projects
None yet
Development

No branches or pull requests