-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OFFICIAL: Branch Designed to Deploy to Production, Toward a v0.0.3-prod tag #202
Conversation
OK, so I found out what the problem was here...basically my docker daemon was not running properly, I had to do a hard reset. So basically, the entirety of #214 is invalid. OK so this brings up the idea of troubleshooting notes. Basically, we could have the setup script on either prod or dev link to the Github repo README.md, which we could have branch off to sections on troubleshooting for prod or dev, and we could include various notes in this entire discussion on what to do in various common situations that come up, e.g. such as the docker daemon not working properly. |
I would suggest that you create a file inside the README folder where you can put all the "issues/errors" you have discovered to keep it as a reference and figure out how to proceed with them. If we can modify the script to fix them we do so, otherwise we provide relevant documentation on how to troubleshoot them. |
Yes, if nginx doesn't find a SSL certificate in production, it will fail. If you reached the limit you can set "STAGING=1" in the .env and Let's encrypt will issue a certificate for staging. |
OK here's what I tried this morning:
So basically, I have to wait until end of day Sunday to re-deploy again. This is very frustrating. We really need to change the script so that we do not HAVE to delete our SSL certificate if it has already been issued. Currently the way it is structured is that if you select, "No," it quits the entire script without even trying, given the script you have on hand. Secondly, because of the results I had earlier, I believe we still are not able to access I see on dev that there is another application error dealing with signing in that I am trying to fix, but I believe this is likely separate from the reverse proxy problem. On dev I can definitely access the endpoint |
…nginx should function with 8089 as the external port. This might not actually work with our reverse proxy, just moving it back to where it was earlier.
OK, update on String Login Error Issue All right so I have had this really odd issue where on local, I was unable to log in and was getting a bunch of odd string errors upon logging in. I had thought something had somehow broken on how the login modal, for some reason, or somehow not compatible with what had happened in introducing the new deployment scripts. I had previously successfully deployed using So on dev, this should not really come up as a problem anymore, unless someone repeats the same workflow as me. We might document this within local_setup.md type notes. However overall, I think dev is completely good to go now, after having figured out what was happening. I did update the frontend login console errors and errors that show up so users attempting to log in can explicitly see that they have the wrong password as opposed to, "string error," so that should help. Ramifications for Prod On prod though, if someone deploys with So on the one hand, we can document this for now in order to pass the MR. On the other hand, this probably needs to be added on to a wish list of items to update and change, e.g. the |
The 502 error you are receiving is because backend server takes too much time to start. If backend server starts then you can access /api/v0/home normally. https://social.ntoufoudis.com/api/v0/home The issue is with the backend server that takes too much time to start and not with the nginx configuration. |
In development mode on my local machine, I can log in as admin, create a new user, log in as that user. Then there is no option to change the password. And also I cannot create a market since I get the error "Market creation failed: Invalid token Create.jsx:88:16". In production mode on my server at https://social.ntoufoudis.com, I can log in as admin, but the UI I get is like a normal user and not admin. I cannot create a new user. Also I cannot create a new market, I get the same error. In production, the backend needs about 30 to 70 seconds to start. |
Also when I try to visit the profile page, in both production and development, I get the following error in the backend server: |
I believe these errors have to do with the code of the images and not the script itself. |
Ok I will look into this
…On Sun, Jul 21, 2024 at 4:21 AM Vasileios Ntoufoudis < ***@***.***> wrote:
I believe these errors have to do with the code of the images and not the
script itself.
—
Reply to this email directly, view it on GitHub
<#202 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADFQCVODQICV56U3DJX25LDZNN4TNAVCNFSM6AAAAABKXPK2OSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRGU2DCMBYHA>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
All right now that I have the capability to re-issue an SSL, I have been able to re-start deployment. Checking the containers and logs I get:
I'm not sure if the, "no configuration file provided" matters...? OK then after around 70 seconds, I attempted to check logs again and got:
OK, so I didn't re-write my .env file. I wonder if the message:
Means that the .env file is not picked up if you don't re-write it...? |
You get this error because docker-compose.yaml is not specified. It is located inside the scripts folder and docker can't find it. You need to specify it like this:
|
Trying to diagnose this a bit further to see what happened. So currently we have staging up and running at : https://brierfoxforecast.com/api/v0/home However, the backend is not able to connect to the database, even after having been up and running for quite a while, and the backend having been up. We're seeing:
So I'm just looking into the environments of the two containers to see what we have. Passwords changed below, but used a key to represent password consistency. In the database:
Then in the backend:
OK, so basically we have:
Corresponds to:
However looking at the error:
The backend seems to be reporting that is attempting to use, Looking for this variable on backend and postgres containers:
So this indicates that possibly the variable name for the username needs to be formatted differently. Looking at the postgres Docker documentation: https://hub.docker.com/_/postgres It appears that the only required variable is Looking at our postgres initialization within our backend, I see: https://github.com/openpredictionmarkets/socialpredict/blob/deploy_prod/backend/util/postgres.go#L16
So the only one that looks off there is possibly,
...and in the postgres container we see:
So it seems that the way our backend application is lined up with Golang Gorm lines up and points POSTGRES_DATABASE over to POSTGRES_DB. That being said, attempting to hot fix the issue, I did
|
OK, I think I know what has happened, when I run the following, it shows that the
So not only does postgres not re-create the data in the database upon initialization because of the convention that the volume mount https://github.com/openpredictionmarkets/socialpredict/blob/deploy_prod/scripts/docker-compose-prod.yaml#L12: Means that what is stored at:
Will be used first, it also just completely skips creating the database in the first place. So basically, I had originally created this database with a completely different username and password, which is still stored at
|
OK, so I attempted to delete the data at:
I did the following, "graceful" way to do it:
Then after waiting a while, it came back up sucessfully. Looking at the postgres logs after having done this, it does indeed re-create the database from scratch:
Then as mentioned by @ntoufoudis ... I was able to log in as Admin, but the profile page is that of a regular user, not as an admin user. On the back end, I'm seeing:
So, I'm not sure why yet, this appears to be more of an application error. So that being said, perhaps we can finally finish this MR, then move into application-error related problems and other prioritizations later. What do you think @ntoufoudis ? Ready to merge finally now that we have this full record of what was going on? |
Yes I think we should merge this. And then proceed to tackle each app error. |
New architecture designed to deploy to Digital Ocean.
Passing Criteria: