-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System unavailable: trss.adoptopenjdk.net #1679
Comments
Server now responding to the ssh port. Unfortunately the backend Node.js process appears to be repeatedly crashing and restarting so the service is not yet responsive. |
From the backend logs -
|
Machine became unresponsive at 02:29:28 (base on the kernel messages) with an Out of memory situation. There is no information on restarting the mongodb service on https://github.com/AdoptOpenJDK/openjdk-test-tools/tree/master/TestResultSummaryService I have tried starting mongo - it first had a problem with Current status: mongodb, TRSSBackend and TRSSFrontend services are showing as |
@llxia Need your input on the |
Looks like all the database stuff had been stored on a ramdrive and is therefore lost and will need to be rebuilt. |
Machine now has 16Gb of swap (equals the amount of RAM) and a 160Gb |
AWS moved the IP address on the host. After it rebooted there was still a log entry with the pold IP address but it subsequently switched. PR in for inventory change. https://trss.adoptopenjdk.net address now pointing to the new IP address MongoDB is now running on a persistent filesystem (The new |
@llxia Can you update the documentation to cover the setup of MongoDB and how to restart it etc. |
We use standard cmd to install and to restart I will update the readme. |
Can this be closed now (nice job on the rescue BTW)? |
I was holding off until we have the documentation updated with details of what goes into the |
The format about If we need a backup copy of |
Absolutely agree passwords shouldn't be in there (although we can store that elsewhere) but things like the data directory that we've set for mongo should be along with the other specifics of the production server setup such as the location of the config file (The docs just say that you should provide a |
This is because we have the forever services created for TRSS. During the service creation, we can specify the
|
Is the information on forever mentioned anywhere else? We should definitely add those two commands into the And yes I would agree that since we have all of the code to start |
The information was mentioned in the previous issue and I did a demo/recording a while back with more up to date information. Just to be clear, the steps should be in the following order:
Correct me if I wrong, I do not think Step 2 can be in the playbook as it contains credentials. If we want to put Step 4 in the playbook (without Step 2), then we should start with an empty trssConf.json file. And Admin can create user/password in MongoDB and update trssConf.json manually later. |
Ansible has various mechanisms to inject credentials into playbooks. For example, there's Before sinking hours into updating the playbooks, please consider the best approach for #1689. |
@aahlenst To be clear my primary goal here is to ensure that what we have in production at the moment is documented along with the other setup instructions before putting time into moving it. |
This needs revisiting to see what state we are currently in so we can recreate the TRSS server easily if required. Keeping this in the July milestone so we can define next steps/plan. |
Bumping to next month so we can try and progress this in a timely manner - potentially after discussions at the AQAvit calls |
I still feel that we need the documentation on the setup in one place instead of having to find it from multiple sources, but since there seems no interest in doing this, I'm going to close this with a link to the comment that has the attachment with the extra instructions adoptium/aqa-test-tools#9 (comment) Related #1327 since that has been stalled in places due to lack of clarity on some parts of the setup. |
System is unresponsive. There was an issue with the TRSS server yesterday but it was able to be fixed on the machine. Today the system was competely unresponsive and not contact could be made with it (Initially reported at ~3am GMT)
I have managed to restart the host on the provider (AWS) but after 20 minutes it is not responding on the ssh port (although it is pingable)
The text was updated successfully, but these errors were encountered: