Skip to content

Admin Guide: FAQ

Gabriel Vîjială edited this page Mar 5, 2024 · 12 revisions

Operating Liquid Investigations Servers

Something broke, how do I fix it?

  • Identify and save logs for failed job in the Nomad UI. Service health check responses are available in the Consul UI.
  • Check issues labelled bug on the public board. If the bug is not tracked please add it.
  • Ask on Slack!

Wiki.js doesn't auto-login users

Problem: Users have to click the profile icon in the top right to be authenticated, instead of it happening automatically on entering the wiki.

Fix: Remove all "Permissions" and all "Page Rules" from the "Guest" group. The "Guest" group should NOT ever have access to anything; this causes Wiki.js to auto-login the user.

Invalid HTTPS certificates

Problem: Traefik doesn't update the stored certificates if the configuration changes.

Fix: Administrator will have to wipe the certificates after a change in any of these:

To wipe the certificates one should delete the Consul KV entries /traefik and /liquid/traefik/acme from the Consul UI.

To follow Traefik's progress in getting the HTTPS certificates from LetsEncrypt use the Nomad UI to follow its console output.

Nomad won't schedule jobs

Problem: Invocation of ./liquid deploy fails without starting a good number of Docker containers.

Fix:

  1. View Nomad errors in the Nomad UI. Nomad errors will be available in the Nomad UI after navigating to the job.
  2. Check Nomad logs with docker exec cluster ./cluster.py tail nomad - report any errors found there.
  3. Try running a clean reset.
  4. Configure docker daemon GOMAXPROCS if you are running on many cores. For machines with >32 cores, we recommend configuring docker with GOMAXPROCS of 8-12.
  5. If running on recent RHEL or similar linux distributon, try to Disable SELinux.

Snoop gets stuck when processing

Problem: When processing many collections in parallel, one might run out of RabbitMQ memory. To check if this has happened:

  • Proxy port 10.66.60.1:9990 from server onto local machine localhost:9990 using SSH LocalForward configuration.
  • Visit http://localhost:9990/_snoop_rabbit/, login with username guest and password guest.
  • Look at the "Memory" cell on the "Overview" screen. If it's red, then you're out of memory.

Fix: If you're out of RabbitMQ memory, do one of these:

  • increase rabbitmq_memory_limit to 4+ GB, or
  • process less collections at the same time (set process = off on some of them, turn on again later)

Elasticsearch won't index documents

Problem: When the elasticsearch disk exceeds some 90% limit, elasticsearch will lock itself up.

Fix: Be sure to free up some disk space first, then run:

export ES_ADDR=10.66.60.1:9990/_es
curl  -XPUT "$ES_ADDR/_cluster/settings" -H 'Content-Type: application/json' -d '{"persistent":{"cluster.blocks.read_only":false}}'
curl  -XPUT "$ES_ADDR/_all/_settings" -H 'Content-Type: application/json' -d'{ "index.blocks.read_only_allow_delete" : null } }'

... where 10.66.60.1 is the network address configured in cluster.ini and liquid.ini.

Docker Hub - Download Rate Limit

Problem: Docker Hub has been decreasing their free anonymous download limit. When deploying, you might reach this limit on your host IP.

Fix: See the Docker article on download-rate-limit.

Temporary work-around:

  • create a free Docker Hub account for each instance
  • do a docker login with new credentials on the machine running ./liquid deploy.
Clone this wiki locally