Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enterprise] Operations manual #1567

Merged
merged 16 commits into from
Dec 1, 2017
Merged

[Enterprise] Operations manual #1567

merged 16 commits into from
Dec 1, 2017

Conversation

schultyy
Copy link
Contributor

This PR adds a new document which we call the operations manual. The purpose of this document is to provide guidance for our customers in case something goes wrong.

This PR ships with a very small version of the manual. Our plan is to enhance it over the time with more and more topics.
The document has multiple entry points, for each problem (like "My builds don't get worked off") we provide a section with strategies to resolve the problem.

If these strategies don't lead to any results the user gets pointed to the Contact support section explaining how they best get in touch with us (What we need from them etc.).


---

The Operations Manual is a guideline which helps you to resolve problems with your Travis CI Enterprise instance. Our plan is to enhance this document with frequently occuring support topics over the time. If you would like to see a specific problem case covered here as well, please get in touch with us via [enterprise@travis-ci.com](mailto:enterprise@travis-ci.com).

This comment was marked as spam.


The Operations Manual is a guideline which helps you to resolve problems with your Travis CI Enterprise instance. Our plan is to enhance this document with frequently occuring support topics over the time. If you would like to see a specific problem case covered here as well, please get in touch with us via [enterprise@travis-ci.com](mailto:enterprise@travis-ci.com).

This document has multiple entrypoints. Each entrypoint is a common problem which we've seen ocurring on a regular basis over the time. The section will guide you through it helping you to resolve it. If the problem couldn't be resolved you'll find instructions on how to proceed at the bottom of the document.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.


This document has multiple entrypoints. Each entrypoint is a common problem which we've seen ocurring on a regular basis over the time. The section will guide you through it helping you to resolve it. If the problem couldn't be resolved you'll find instructions on how to proceed at the bottom of the document.

Throughout this document we'll be using the following terms to refer to the two components of your Travis CI Enterprise installation:

This comment was marked as spam.

This comment was marked as spam.

2. run `travis bash`. This will open a bash session with `root` privileges into the Travis container.
3. Then run `cat /usr/local/travis/etc/travis/config/travis.yml | grep -A1 encryption:`. Create a backup of the value returned by that command by either writing it down on a piece of paper or storing it on a different computer.

> Without this key the information in the database is not recoverable.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.


The files are located at `/var/travis` on the platform machine. Please run `sudo tar -czvf travis-enterprise-data-backup.tar.gz /var/travis` to create compressed archive from this folder. After this has finished, copy this file off the machine to a secure location.

## Builds don't get worked off

This comment was marked as spam.


### Symptoms

In the Travis CI Web UI you see none of the builds getting worked off. They're either in no or the `queued` state. Cancelling and restarting them doesn't make any difference.

This comment was marked as spam.


### Strategies

Below you will find different strategies to resolve the problem. They're meant to be followed in order. After you've completed the steps for a strategy please restart a build in the Travis CI Web UI to see if it gets picked up. If that's not happening, please advance to next strategy.

This comment was marked as spam.


#### Connection to RabbitMQ got lost

We're using RabbitMQ to schedule builds for the worker machine(s). Sometimes it can happen that the worker machine(s) lose the connection to RabbitMQ and therefore don't run any new builds anymore. This is a known problem on our side and we're working on resolving this. To get everything back to normal, restarting the machines usually suffices. To do that, connect via `ssh` and run the following command:

This comment was marked as spam.


#### Ports are not open Security groups / firewall

A source for the problem could be that the worker machine is not able to talk to the platform machine.

This comment was marked as spam.


## Contact support

To get in touch with us, please write a message to [enterprise@travis-ci.com](mailto:enterprise@travis-ci.com). In your message we'd like to ask you to answer the questions below.

This comment was marked as spam.

- A support bundle (You can get it from https://yourdomain:8800/support)
- Worker log files (They can be found at `/var/log/upstart/travis-worker.log`) - If you're using multiple worker machines, we need the log files from all of them.

Is anything special with your setup? There are certain information we can already see, such as the hostname, which IaaS provider you're using, but there are lots of other things we can't see which could lead to something not working. Therefore we'd like to ask you to also answer the questions below in your support request (if applicable):

This comment was marked as spam.

Copy link
Contributor

@acnagy acnagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking awesome! I had a bunch of tone suggestions, but the direction and content are great

layout: en_enterprise

---
Welcome to the Travis CI Enterprise Operations Manual! This a living document which provides guidelines and suggestions for troubleshooting your Travis CI Enterprise instance. If you have questions about a specific situation, please get in touch with us via [enterprise@travis-ci.com](mailto:enterprise@travis-ci.com).

This comment was marked as spam.

---
Welcome to the Travis CI Enterprise Operations Manual! This a living document which provides guidelines and suggestions for troubleshooting your Travis CI Enterprise instance. If you have questions about a specific situation, please get in touch with us via [enterprise@travis-ci.com](mailto:enterprise@travis-ci.com).

This document is made of multiple topics. Each topic is a common problem which we've seen ocurring on a regular basis over the time. The section will guide you through it helping you to resolve it. If the problem couldn't be resolved you'll find instructions on how to proceed at the bottom of the document.

This comment was marked as spam.

- `Platform machine`: The virtual machine that runs most of the Travis web components. This is the machine your domain is pointing to.
- `Worker machine`: The worker machine(s) run your builds.

> Please note that this guide is geared towards non-HA setups right now.

This comment was marked as spam.

This comment was marked as spam.

2. run `travis bash`. This will open a bash session with `root` privileges into the Travis container.
3. Then run `cat /usr/local/travis/etc/travis/config/travis.yml | grep -A1 encryption:`. Create a backup of the value returned by that command by either writing it down on a piece of paper or storing it on a different computer.

> Without this key the information in the database is not recoverable.

This comment was marked as spam.


## Builds are not starting

### Symptoms

This comment was marked as spam.


### Symptoms

In the Travis CI Web UI you see none of the builds are starting. They're either in no or the `queued` state. Cancelling and restarting them doesn't make any difference.

This comment was marked as spam.


### Strategies

Below you will find different strategies to resolve the problem. They're meant to be followed in order. After you've completed the steps for a strategy please restart a build in the Travis CI Web UI to see if it gets picked up. If that's not happening, please try the next strategy.

This comment was marked as spam.


#### Connection to RabbitMQ got lost

We're using RabbitMQ to schedule builds for the worker machine(s). Sometimes it happens that the worker machine(s) lose the connection to RabbitMQ and therefore don't run any new builds anymore. This is a known problem on our side and we're working on resolving this. To get everything back to normal, restarting the machines usually suffices. To do that, connect via `ssh` and run the following command:

This comment was marked as spam.

This comment was marked as spam.


## Contact support

To get in touch with us, please write a message to [enterprise@travis-ci.com](mailto:enterprise@travis-ci.com). It would be very helpful for Support if you could include the following:

This comment was marked as spam.

@schultyy schultyy merged commit 84a59d6 into master Dec 1, 2017
@schultyy schultyy deleted the js-operations-manual branch December 1, 2017 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants