Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OFFICIAL: Branch Designed to Deploy to Production, Toward a v0.0.3-prod tag #202

Merged
merged 52 commits into from
Jul 22, 2024

Conversation

pwdel
Copy link
Member

@pwdel pwdel commented Jul 11, 2024

New architecture designed to deploy to Digital Ocean.

  • Automated Container for Certs Renewal
  • Clean Deployment Script

Passing Criteria:

  • Able to deploy via script on HTTPS
  • Need to handle reverse proxy so we can visit domain.com/api/v0/home and get a response from the web.
  • Can write NGINX.conf manually if necessary, we can do another pass to get fully automated deployment if necessary but we need to be able to discuss it in the comments below and agree upon merging, and start a new ticket.

@pwdel pwdel changed the base branch from main to deployattempt July 12, 2024 20:01
@pwdel pwdel changed the base branch from deployattempt to main July 13, 2024 12:26
@pwdel pwdel changed the title Deploy prod Branch Designed to Deploy to Production, Toward a v0.0.3-prod tag, will conflict with v0.0.3-dev Jul 13, 2024
@pwdel pwdel changed the title Branch Designed to Deploy to Production, Toward a v0.0.3-prod tag, will conflict with v0.0.3-dev Branch Designed to Deploy to Production, Toward a v0.0.3-prod tag Jul 13, 2024
@pwdel pwdel changed the title Branch Designed to Deploy to Production, Toward a v0.0.3-prod tag OFFICIAL: Branch Designed to Deploy to Production, Toward a v0.0.3-prod tag Jul 19, 2024
@pwdel
Copy link
Member Author

pwdel commented Jul 19, 2024

If you run the command on your terminal without running the script what does it do?

docker build --no-cache -t socialpredict-backend:latest -f /Users/patrick.delaney/PP/socialpredict/backend/Dockerfile /Users/patrick.delaney/PP/socialpredict/backend/.

OK, so I found out what the problem was here...basically my docker daemon was not running properly, I had to do a hard reset. So basically, the entirety of #214 is invalid.

OK so this brings up the idea of troubleshooting notes. Basically, we could have the setup script on either prod or dev link to the Github repo README.md, which we could have branch off to sections on troubleshooting for prod or dev, and we could include various notes in this entire discussion on what to do in various common situations that come up, e.g. such as the docker daemon not working properly.

@ntoufoudis
Copy link
Collaborator

OK so this brings up the idea of troubleshooting notes. Basically, we could have the setup script on either prod or dev link to the Github repo README.md, which we could have branch off to sections on troubleshooting for prod or dev, and we could include various notes in this entire discussion on what to do in various common situations that come up, e.g. such as the docker daemon not working properly.

I would suggest that you create a file inside the README folder where you can put all the "issues/errors" you have discovered to keep it as a reference and figure out how to proceed with them. If we can modify the script to fix them we do so, otherwise we provide relevant documentation on how to troubleshoot them.

@ntoufoudis
Copy link
Collaborator

So my thought is that Nginx is probably crash looping because it can't find the certificate, so we indeed do have a problem where we deleted the certificate prior to it being re-issued to us...so yeah, have to wait about 20 hours to run ./SocialPredict install again.

Yes, if nginx doesn't find a SSL certificate in production, it will fail. If you reached the limit you can set "STAGING=1" in the .env and Let's encrypt will issue a certificate for staging.

@pwdel
Copy link
Member Author

pwdel commented Jul 20, 2024

OK here's what I tried this morning:

  1. Pulled deploy_prod
  2. ./SocialPredict install ... used same .env file, did not rebuild images, re-issued SSL certificate.
  3. Ran ./SocialPredict up https://brierfoxforecast.com launched properly. However, still have a 502 error on https://brierfoxforecast.com/api/v0/home where I'm expecting a response.
  4. Ran './SocialPredict downand then./SocialPredict installagain, with STAGING=1 on the .env file to re-build images. SelectedExisting data found for brierfoxforecast.com. Continue and replace existing certificate? (y/N) y` ... got the following:
### Starting Webserver ...
[+] Running 1/1
 ✔ Container socialpredict-nginx-container  Started                                                                                                                                                         0.7s

### Deleting dummy certificate for brierfoxforecast.com ...

### Requesting Let's Encrypt Certificate for brierfoxforecast.com ...
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Account registered.
Requesting a certificate for brierfoxforecast.com

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/brierfoxforecast.com/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/brierfoxforecast.com/privkey.pem
This certificate expires on 2024-10-18.
These files will be updated when the certificate renews.

NEXT STEPS:
- The certificate will need to be renewed before it expires. Certbot can automatically renew the certificate in the background, but you may need to take steps to enable that functionality. See https://certbot.org/renewal-setup for instructions.

### Shutting down Webserver ...
[+] Running 2/2
 ✔ Container socialpredict-nginx-container  Removed                                                                                                                                                         0.3s
 ✔ Network scripts_default                  Removed
  1. Checked and saw the following after deploying:
root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker ps -a
CONTAINER ID   IMAGE                    COMMAND                  CREATED              STATUS              PORTS                                                                      NAMES
1acfaed3ffa4   nginx:latest             "/docker-entrypoint.…"   About a minute ago   Up About a minute   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   socialpredict-nginx-container
3e995c650825   socialpredict-frontend   "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                  socialpredict-frontend-container
7be91058061e   socialpredict-backend    "./supervisor.sh ref…"   About a minute ago   Up About a minute   0.0.0.0:8086->8080/tcp, :::8086->8080/tcp                                  socialpredict-backend-container
28d428531f10   postgres                 "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:5433->5432/tcp, :::5433->5432/tcp                                  socialpredict-postgres-container
ee0f9604a738   certbot/certbot:latest   "/bin/sh -c 'trap ex…"   About a minute ago   Up About a minute   80/tcp, 443/tcp                                                            socialpredict-certbot-container
  1. Attempt to visit https://brierfoxforecast.com/ but it goes to a white screen, SSL invalid, which is expected. Attempt to visit http rather than https and it redirets, showing a white screen, console shows our standard websocket warning:
[vite] connecting...
client.ts:77 WebSocket connection to 'wss://brierfoxforecast.com/' failed: 
setupWebSocket @ client.ts:77
(anonymous) @ client.ts:67
Show 2 more frames
Show less
client.ts:77 Uncaught (in promise) DOMException: Failed to construct 'WebSocket': The URL 'wss://localhost:undefined/' is invalid.
    at setupWebSocket (https://brierfoxforecast.com/@vite/client:505:20)
    at fallback (https://brierfoxforecast.com/@vite/client:484:22)
    at WebSocket.<anonymous> (https://brierfoxforecast.com/@vite/client:520:13)
  1. Ran ./SocialPredict install again and got the following error again.
### Requesting Let's Encrypt Certificate for brierfoxforecast.com ...
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Account registered.
Requesting a certificate for brierfoxforecast.com
An unexpected error occurred:
Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: brierfoxforecast.com, retry after 2024-07-21T21:43:26Z: see https://letsencrypt.org/docs/duplicate-certificate-limit/
Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.

So basically, I have to wait until end of day Sunday to re-deploy again. This is very frustrating. We really need to change the script so that we do not HAVE to delete our SSL certificate if it has already been issued. Currently the way it is structured is that if you select, "No," it quits the entire script without even trying, given the script you have on hand.

Secondly, because of the results I had earlier, I believe we still are not able to access https://brierfoxforecast.com/api/v0/home upon deployment ... I understand that we may be able to access this on development, that's understood and I can see that on my side as well. However this does not mean it works in production across the web. This is why I go back to the idea of a reverse proxy...which is the way I was able to deploy http://brierfoxforecast.com prior to obtaining the https.

I see on dev that there is another application error dealing with signing in that I am trying to fix, but I believe this is likely separate from the reverse proxy problem. On dev I can definitely access the endpoint localhost/v0/api/home but then there is a sign-in error within the login...different problem that seems to have arisen.

…nginx should function with 8089 as the external port. This might not actually work with our reverse proxy, just moving it back to where it was earlier.
@pwdel
Copy link
Member Author

pwdel commented Jul 20, 2024

OK, update on dev:

String Login Error Issue

All right so I have had this really odd issue where on local, I was unable to log in and was getting a bunch of odd string errors upon logging in. I had thought something had somehow broken on how the login modal, for some reason, or somehow not compatible with what had happened in introducing the new deployment scripts.

I had previously successfully deployed using ./SocialPredict install && ./SocialPredict up which had set the ADMIN_PASSWORD to adminpass and then I later successfully changed the ADMIN_PASSWORD to password ... however I recalled that by using, ./SocialPredict down is simply a wrapper for, docker compose down which preserves the database state on the disk, which means, the admin password had not changed.

So on dev, this should not really come up as a problem anymore, unless someone repeats the same workflow as me. We might document this within local_setup.md type notes.

However overall, I think dev is completely good to go now, after having figured out what was happening. I did update the frontend login console errors and errors that show up so users attempting to log in can explicitly see that they have the wrong password as opposed to, "string error," so that should help.

Ramifications for Prod

On prod though, if someone deploys with ADMIN_PASSWORD = 12345 and then they re-deploy and set ADMIN_PASSWORD = ABCDEF, then they might get a login error because ADMIN_PASSWORD had been saved to disk. E.g. this is our main way of changing ADMIN_PASSWORD right now, e.g. re-deploying.

So on the one hand, we can document this for now in order to pass the MR. On the other hand, this probably needs to be added on to a wish list of items to update and change, e.g. the TO_DO_LIST_SCRIPTS.md document recently created. Perhaps there could be a function to either over-write or not over-write the old ADMIN password. Currently the way the script is written, it misleads the executor into thinking they had changed the ADMIN_PASSWORD when they had not.

@ntoufoudis
Copy link
Collaborator

The 502 error you are receiving is because backend server takes too much time to start. If backend server starts then you can access /api/v0/home normally.

https://social.ntoufoudis.com/api/v0/home

The issue is with the backend server that takes too much time to start and not with the nginx configuration.

@pwdel
Copy link
Member Author

pwdel commented Jul 20, 2024

OK great...yep, so then...it looks like you are using the latest commit based upon this updated error message.

How long did it take approximately for the backend to finish booting?

image

Were you able to log in and everything as admin, create a new user?

So one thing I would like to fix, doesn't have to be on this MR, is basically... resolving a market was not working on prod (and possibly not on dev either).

So if you 1. log in as admin and then 2. create a new user 3. log in as that user, change password 4. log back in, create a market. 5. try to resolve that market, either YES / NO, does it resolve?

Again, not critical for this MR. I believe we can probably finish this MR and merge it now based upon what you are showing.

@ntoufoudis
Copy link
Collaborator

In development mode on my local machine, I can log in as admin, create a new user, log in as that user. Then there is no option to change the password. And also I cannot create a market since I get the error "Market creation failed: Invalid token Create.jsx:88:16".

In production mode on my server at https://social.ntoufoudis.com, I can log in as admin, but the UI I get is like a normal user and not admin. I cannot create a new user. Also I cannot create a new market, I get the same error.

In production, the backend needs about 30 to 70 seconds to start.

@ntoufoudis
Copy link
Collaborator

Also when I try to visit the profile page, in both production and development, I get the following error in the backend server:
/backend/handlers/users/publicuser.go:45 record not found
[0.533ms] [rows:0] SELECT * FROM "users" WHERE username = 'undefined' AND "users"."deleted_at" IS NULL ORDER BY "users"."id" LIMIT 1

@ntoufoudis
Copy link
Collaborator

I believe these errors have to do with the code of the images and not the script itself.

@pwdel
Copy link
Member Author

pwdel commented Jul 21, 2024 via email

@donghj2000
Copy link

Did not you see my messages on Discord ?
image

@pwdel
Copy link
Member Author

pwdel commented Jul 22, 2024

All right now that I have the capability to re-issue an SSL, I have been able to re-start deployment.

Checking the containers and logs I get:

root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# ./SocialPredict up
[+] Running 8/8
 ✔ Network scripts_default                     Created                                                                                                                                                                                           0.1s
 ✔ Network scripts_database_network            Created                                                                                                                                                                                           0.2s
 ✔ Network scripts_frontend_network            Created                                                                                                                                                                                           0.2s
 ✔ Container socialpredict-postgres-container  Started                                                                                                                                                                                           8.0s
 ✔ Container socialpredict-certbot-container   Started                                                                                                                                                                                           7.9s
 ✔ Container socialpredict-backend-container   Started                                                                                                                                                                                           9.4s
 ✔ Container socialpredict-frontend-container  Started                                                                                                                                                                                          12.1s
 ✔ Container socialpredict-nginx-container     Started                                                                                                                                                                                           7.9s
root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker ps -a
CONTAINER ID   IMAGE                    COMMAND                  CREATED          STATUS          PORTS                                                                      NAMES
463f9815cb03   nginx:latest             "/docker-entrypoint.…"   15 seconds ago   Up 7 seconds    0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   socialpredict-nginx-container
a7093bda68f4   socialpredict-frontend   "docker-entrypoint.s…"   21 seconds ago   Up 10 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                  socialpredict-frontend-container
84454a42a019   socialpredict-backend    "./supervisor.sh ref…"   21 seconds ago   Up 12 seconds   0.0.0.0:8086->8080/tcp, :::8086->8080/tcp                                  socialpredict-backend-container
7537ddb15430   postgres                 "docker-entrypoint.s…"   21 seconds ago   Up 13 seconds   0.0.0.0:5433->5432/tcp, :::5433->5432/tcp                                  socialpredict-postgres-container
a3ef5b4f3b5d   certbot/certbot:latest   "/bin/sh -c 'trap ex…"   21 seconds ago   Up 14 seconds   80/tcp, 443/tcp                                                            socialpredict-certbot-container
root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker compose logs socialpredict-backend-container
no configuration file provided: not found
root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker compose -f socialpredict-backend-container
^C
root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker logs socialpredict-backend-container
Server started with PID 8
Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.

I'm not sure if the, "no configuration file provided" matters...?

OK then after around 70 seconds, I attempted to check logs again and got:

2024/07/22 12:48:32 /backend/util/postgres.go:26
[error] failed to initialize database, got error failed to connect to `host=db user={PG_USERNAME} database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "{PG_USERNAME}" (SQLSTATE 28P01))
2024/07/22 12:48:32 Error opening database: failed to connect to `host=db user={PG_USERNAME} database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "{PG_USERNAME}" (SQLSTATE 28P01))

OK, so I didn't re-write my .env file.

I wonder if the message:

root@breirfoxforecast-alpha:/home/testdeploy/socialpredict/scripts# docker compose logs
no configuration file provided: not found

Means that the .env file is not picked up if you don't re-write it...?

@ntoufoudis
Copy link
Collaborator

You get this error because docker-compose.yaml is not specified. It is located inside the scripts folder and docker can't find it. You need to specify it like this:

docker compose --env-file .env -f ./scripts/docker-compose-prod.yaml logs

@pwdel
Copy link
Member Author

pwdel commented Jul 22, 2024

Trying to diagnose this a bit further to see what happened.

So currently we have staging up and running at : https://brierfoxforecast.com/api/v0/home

However, the backend is not able to connect to the database, even after having been up and running for quite a while, and the backend having been up.

We're seeing:

2024/07/22 12:48:32 /backend/util/postgres.go:26
[error] failed to initialize database, got error failed to connect to `host=db user={PG_USERNAME} database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "{PG_USERNAME}" (SQLSTATE 28P01))
2024/07/22 12:48:32 Error opening database: failed to connect to `host=db user={PG_USERNAME} database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "{PG_USERNAME}" (SQLSTATE 28P01))

So I'm just looking into the environments of the two containers to see what we have. Passwords changed below, but used a key to represent password consistency.

In the database:

docker exec -it socialpredict-postgres-container /bin/bash
root@7537ddb15430:/# env
POSTGRES_PASSWORD=12345
POSTGRES_USER=ABCDEF
POSTGRES_DB=socialpredict_db

Then in the backend:

root@84454a42a019:/backend# env
POSTGRES_PASSWORD=12345
CONTAINER_NAME=socialpredict-backend-container
POSTGRES_DATABASE=socialpredict_db
DB_USER=ABCDEF
POSTGRES_USER=ABCDEF
POSTGRES_CONTAINER_NAME=socialpredict-postgres-container
DB_PASS=12345

OK, so basically we have:

  • POSTGRES_USER == POSTGRES_USER
  • POSTGRES_PASSWORD == POSTGRES_PASSWORD == DB_PASS
  • POSTGRES_DB = POSTGRES_DATABASE

Which in https://github.com/openpredictionmarkets/socialpredict/blob/deploy_prod/scripts/docker-compose-prod.yaml#L3

Corresponds to:

  db:
   ...
      #      ENVIRONMENT: ${APP_ENV}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DATABASE}

  backend:
    environment:
      DB_HOST: db
      DB_USER: ${POSTGRES_USER}
      DB_PASS: ${POSTGRES_PASSWORD}

However looking at the error:

2024/07/22 12:48:32 /backend/util/postgres.go:26
[error] failed to initialize database, got error failed to connect to `host=db user={PG_USERNAME} database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "{PG_USERNAME}" (SQLSTATE 28P01))
2024/07/22 12:48:32 Error opening database: failed to connect to `host=db user={PG_USERNAME} database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "{PG_USERNAME}" (SQLSTATE 28P01))

The backend seems to be reporting that is attempting to use, PG_USERNAME rather than DB_USER to log in, which might be because this is the standard way of creating a username using the standard postgres image.

Looking for this variable on backend and postgres containers:

:/backend# env | grep PG_USERNAME
(null)
:/# env | grep PG
PG_MAJOR=16
PG_VERSION=16.3-1.pgdg120+1
PGDATA=/var/lib/postgresql/data

So this indicates that possibly the variable name for the username needs to be formatted differently. Looking at the postgres Docker documentation: https://hub.docker.com/_/postgres

It appears that the only required variable is POSTGRES_PASSWORD while optionals are: POSTGRES_USER and POSTGRES_DB which is what we are using.

Looking at our postgres initialization within our backend, I see:

https://github.com/openpredictionmarkets/socialpredict/blob/deploy_prod/backend/util/postgres.go#L16

	dbHost := os.Getenv("DB_HOST")
	dbUser := os.Getenv("POSTGRES_USER")
	dbPassword := os.Getenv("POSTGRES_PASSWORD")
	dbName := os.Getenv("POSTGRES_DATABASE")
	dbPort := os.Getenv("POSTGRES_PORT")

So the only one that looks off there is possibly, POSTGRES_DATABASE however looking on the backend we see:

POSTGRES_DATABASE=socialpredict_db

...and in the postgres container we see:

POSTGRES_DB=socialpredict_db

So it seems that the way our backend application is lined up with Golang Gorm lines up and points POSTGRES_DATABASE over to POSTGRES_DB.

That being said, attempting to hot fix the issue, I did docker exec -it into the backend container, and then set the export PG_USERNAME=ABCDEF so that it's the right username, and then I got:

root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker compose --env-file .env -f ./scripts/docker-compose-prod.yaml logs | grep backend
socialpredict-backend-container   | Server started with PID 8
socialpredict-backend-container   | Setting up watches.  Beware: since -r was given, this may take a while!
socialpredict-backend-container   | Watches established.
socialpredict-backend-container   |
socialpredict-backend-container   | 2024/07/22 12:48:32 /backend/util/postgres.go:26
socialpredict-backend-container   | [error] failed to initialize database, got error failed to connect to `host=db user=ABCDEF database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "ABCDEF" (SQLSTATE 28P01))
socialpredict-backend-container   | 2024/07/22 12:48:32 Error opening database: failed to connect to `host=db user=ABCDEF database=socialpredict_db`: failed SASL auth (FATAL: password authentication failed for user "ABCDEF" (SQLSTATE 28P01))
  • So now we seem to be using the correct name, but perhaps the password is off.
  • Doing: export PG_PASSWORD=12345 using the proper password, we still have the same problem.

@pwdel
Copy link
Member Author

pwdel commented Jul 22, 2024

OK, I think I know what has happened, when I run the following, it shows that the

PostgreSQL Database directory appears to contain a database; Skipping initialization

So not only does postgres not re-create the data in the database upon initialization because of the convention that the volume mount https://github.com/openpredictionmarkets/socialpredict/blob/deploy_prod/scripts/docker-compose-prod.yaml#L12:

Means that what is stored at:

    volumes:
      - ../data/postgres:/var/lib/postgresql/data

Will be used first, it also just completely skips creating the database in the first place. So basically, I had originally created this database with a completely different username and password, which is still stored at ../data/postgres:/var/lib/postgresql/data

root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker compose --env-file .env -f ./scripts/docker-compose-prod.yaml logs | grep postgres
socialpredict-postgres-container  |
socialpredict-postgres-container  | PostgreSQL Database directory appears to contain a database; Skipping initialization
socialpredict-postgres-container  |
socialpredict-postgres-container  | 2024-07-22 12:46:12.489 UTC [1] LOG:  starting PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
socialpredict-postgres-container  | 2024-07-22 12:46:12.491 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
socialpredict-postgres-container  | 2024-07-22 12:46:12.491 UTC [1] LOG:  listening on IPv6 address "::", port 5432
socialpredict-postgres-container  | 2024-07-22 12:46:12.508 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
socialpredict-postgres-container  | 2024-07-22 12:46:12.550 UTC [27] LOG:  database system was shut down at 2024-07-20 13:29:32 UTC
socialpredict-postgres-container  | 2024-07-22 12:46:12.610 UTC [1] LOG:  database system is ready to accept connections
socialpredict-postgres-container  | 2024-07-22 12:48:32.062 UTC [37] FATAL:  password authentication failed for user "ABCDEF"
socialpredict-postgres-container  | 2024-07-22 12:48:32.062 UTC [37] DETAIL:  Connection matched file "/var/lib/postgresql/data/pg_hba.conf" line 128: "host all all all scram-sha-256"
socialpredict-postgres-container  | 2024-07-22 12:51:12.623 UTC [25] LOG:  checkpoint starting: time
socialpredict-postgres-container  | 2024-07-22 12:51:12.681 UTC [25] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.012 s, sync=0.008 s, total=0.059 s; sync files=2, longest=0.005 s, average=0.004 s; distance=0 kB, estimate=0 kB; lsn=0/19873D0, redo lsn=0/1987398
socialpredict-backend-container   | 2024/07/22 12:48:32 /backend/util/postgres.go:26

@pwdel
Copy link
Member Author

pwdel commented Jul 22, 2024

OK, so I attempted to delete the data at:

    volumes:
      - ../data/postgres:/var/lib/postgresql/data

I did the following, "graceful" way to do it:

  1. ./SocialPredict down
  2. remove the directory recursively.
  3. ./SocialPredict up

Then after waiting a while, it came back up sucessfully.

Looking at the postgres logs after having done this, it does indeed re-create the database from scratch:

root@breirfoxforecast-alpha:/home/testdeploy/socialpredict# docker compose --env-file .env -f ./scripts/docker-compose-prod.yaml logs | grep postgres
socialpredict-postgres-container  | The files belonging to this database system will be owned by user "postgres".
socialpredict-postgres-container  | This user must also own the server process.
socialpredict-postgres-container  |
socialpredict-postgres-container  | The database cluster will be initialized with locale "en_US.utf8".
socialpredict-postgres-container  | The default database encoding has accordingly been set to "UTF8".
socialpredict-postgres-container  | The default text search configuration will be set to "english".
socialpredict-postgres-container  |
socialpredict-postgres-container  | Data page checksums are disabled.
socialpredict-postgres-container  |
socialpredict-postgres-container  | fixing permissions on existing directory /var/lib/postgresql/data ... ok
socialpredict-postgres-container  | creating subdirectories ... ok
socialpredict-postgres-container  | selecting dynamic shared memory implementation ... posix
socialpredict-postgres-container  | selecting default max_connections ... 100
socialpredict-postgres-container  | selecting default shared_buffers ... 128MB
socialpredict-postgres-container  | selecting default time zone ... Etc/UTC
socialpredict-postgres-container  | creating configuration files ... ok
socialpredict-postgres-container  | running bootstrap script ... ok
socialpredict-postgres-container  | performing post-bootstrap initialization ... ok
socialpredict-postgres-container  | syncing data to disk ... ok
socialpredict-postgres-container  |
socialpredict-postgres-container  | initdb: warning: enabling "trust" authentication for local connections
socialpredict-postgres-container  | initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
socialpredict-postgres-container  |
socialpredict-postgres-container  | Success. You can now start the database server using:
socialpredict-postgres-container  |
socialpredict-postgres-container  |     pg_ctl -D /var/lib/postgresql/data -l logfile start
socialpredict-postgres-container  |
socialpredict-postgres-container  | waiting for server to start....2024-07-22 17:01:44.552 UTC [46] LOG:  starting PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
socialpredict-postgres-container  | 2024-07-22 17:01:44.565 UTC [46] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
socialpredict-postgres-container  | 2024-07-22 17:01:44.591 UTC [49] LOG:  database system was shut down at 2024-07-22 17:01:43 UTC
socialpredict-postgres-container  | 2024-07-22 17:01:44.615 UTC [46] LOG:  database system is ready to accept connections
socialpredict-postgres-container  |  done
socialpredict-postgres-container  | server started
socialpredict-postgres-container  | CREATE DATABASE
socialpredict-postgres-container  |
socialpredict-postgres-container  |
socialpredict-postgres-container  | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
socialpredict-postgres-container  |
socialpredict-postgres-container  | 2024-07-22 17:01:45.635 UTC [46] LOG:  received fast shutdown request
socialpredict-postgres-container  | waiting for server to shut down....2024-07-22 17:01:45.649 UTC [46] LOG:  aborting any active transactions
socialpredict-postgres-container  | 2024-07-22 17:01:45.684 UTC [46] LOG:  background worker "logical replication launcher" (PID 52) exited with exit code 1
socialpredict-postgres-container  | 2024-07-22 17:01:45.685 UTC [47] LOG:  shutting down
socialpredict-postgres-container  | 2024-07-22 17:01:45.706 UTC [47] LOG:  checkpoint starting: shutdown immediate
socialpredict-postgres-container  | 2024-07-22 17:01:46.098 UTC [47] LOG:  checkpoint complete: wrote 922 buffers (5.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.124 s, sync=0.095 s, total=0.413 s; sync files=301, longest=0.010 s, average=0.001 s; distance=4255 kB, estimate=4255 kB; lsn=0/1912058, redo lsn=0/1912058
socialpredict-postgres-container  | 2024-07-22 17:01:46.111 UTC [46] LOG:  database system is shut down
socialpredict-postgres-container  |  done
socialpredict-postgres-container  | server stopped
socialpredict-postgres-container  |
socialpredict-postgres-container  | PostgreSQL init process complete; ready for start up.
socialpredict-postgres-container  |
socialpredict-postgres-container  | 2024-07-22 17:01:46.284 UTC [1] LOG:  starting PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
socialpredict-postgres-container  | 2024-07-22 17:01:46.289 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
socialpredict-postgres-container  | 2024-07-22 17:01:46.289 UTC [1] LOG:  listening on IPv6 address "::", port 5432
socialpredict-postgres-container  | 2024-07-22 17:01:46.307 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
socialpredict-postgres-container  | 2024-07-22 17:01:46.328 UTC [62] LOG:  database system was shut down at 2024-07-22 17:01:46 UTC
socialpredict-postgres-container  | 2024-07-22 17:01:46.361 UTC [1] LOG:  database system is ready to accept connections
socialpredict-postgres-container  | 2024-07-22 17:06:46.385 UTC [60] LOG:  checkpoint starting: time
socialpredict-postgres-container  | 2024-07-22 17:06:56.718 UTC [60] LOG:  checkpoint complete: wrote 105 buffers (0.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=10.280 s, sync=0.015 s, total=10.334 s; sync files=73, longest=0.005 s, average=0.001 s; distance=330 kB, estimate=330 kB; lsn=0/1964A30, redo lsn=0/19649F8

Then as mentioned by @ntoufoudis ... I was able to log in as Admin, but the profile page is that of a regular user, not as an admin user.

image

On the back end, I'm seeing:

socialpredict-backend-container   | 2024/07/22 17:29:25 /backend/handlers/users/publicuser.go:45 record not found
socialpredict-backend-container   | [1.410ms] [rows:0] SELECT * FROM "users" WHERE username = 'undefined' AND "users"."deleted_at" IS NULL ORDER BY "users"."id" LIMIT 1

So, I'm not sure why yet, this appears to be more of an application error.

So that being said, perhaps we can finally finish this MR, then move into application-error related problems and other prioritizations later. What do you think @ntoufoudis ? Ready to merge finally now that we have this full record of what was going on?

@pwdel pwdel assigned donghj2000 and unassigned donghj2000 Jul 22, 2024
@ntoufoudis
Copy link
Collaborator

Yes I think we should merge this. And then proceed to tackle each app error.

@pwdel pwdel merged commit 6158387 into main Jul 22, 2024
3 checks passed
@pwdel pwdel deleted the deploy_prod branch August 12, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean and Reproduceable Deployment Script for Prod Automated Container For Certs Renewal
3 participants